Managed Services: Analysis and Response Systems

Jade Networks helps companies with the design, implementation, deployment, management and outsourcing of enterprise networks. We also help in areas including disaster planning and recovery, policy planning and analysis, network and vulnerability assessment, network automation, and others. Our managed services focus on both the operational and security aspects of enterprise networks. We break these services apart into Security, Monitoring, and Analysis/Response Systems however all are tightly related and are treated separately for descriptive purposes only. This document describes Analysis and Response Systems.

Collecting operational data is great but the real power of these systems is their ability to analyze collected data to detect, alert, and resolve anomalies. The ability to analyze and react to problems is found in many products ranging from security, monitoring, SIEM, applications and custom built tools. For problems that can be anticipated and which their solution is well understood it is usually possible to codify the resolution in a program or script which can be invoked upon detection. This type of response system automation creates much more reliable systems that can react in real-time to problems without direct operator involvement. This reduces costs and results in much higher application availability.

Collecting The Data

Security and operational information can come from any device or sensor on the network. These include traditional sensors such as temperature, fan rotation speeds and the like for direct hardware monitoring to network and application sensors embedded into devices (servers, routers, switches, security devices, etc.) and software. Some of these sources can directly log information using standard network protocols such as syslog. Others can be polled for their status using protocols like SNMP. Some applications, for example the Apache Web Server, do not directly use reporting protocols but log their information directly using local logfiles. Other applications or services may do none of the above and need to be directly queried to determine their status.

In the Managed Services: Monitoring page we cover Network Management systems, provide an introductory overview to SIEM and discuss the similarities and differences between the two monitoring solutions. We go into greater depth on SIEM systems below. Network Management systems typically collect their own data through observation or polling of devices and sensors. SIEM systems rely upon computer generated data which they process to understand the world around them. Both of these monitoring solutions need to filter and then normalize, or convert the information into a standard internal format for storage and later use. Unfortunately there are no completely standard formats for logging data or logfile formats. The syslog protocol (both versions) standardize the log message envelope but not the message part of the logging data. Data received from the Linux network stack looks very different from the data received from a Cisco router or ASA, which is very different from any application layer logging. Applications which only log to their own files are apt to each use different formats. Filtering and normalizing the data to a standard internal format can represent a significant upfront effort (time and money), especially for non-standard (or common) devices or applications. This is an area which is commonly underestimated by organizations.

Centralized Log Management

The amount of logfile data generated by modern systems can be massive. Having all this data spread across the network is not only undesirable but defeats the purpose of systems designed to monitor, analyze, and correlate all enterprise network activity. While some threats can be detected and handled at the server or device level, threats which span multiple devices and/or time require a more sophisticated solution. The first step is to collect all the logged data together. Depending on the network size and analysis requirements this may be a single system, or the data can be replicated across several. If the analysis systems are off-site then the filtering and reduction of the data to be exchanged is also a factor. In some systems certain types of analysis is better performed on-site, whereas other (usually longer term analysis and correlation) can be done off-site. For example hardware monitoring of things like air flow, temperatures, and such have little bearing with security but a lot to do with the operational health of the network. Historic analysis of this data is not apt to produce significant new insights and is a good candidate to be stored separately on-site and not integrated with the hybrid SIEM.

We use the Zabbix Network Management platform internally to collect a combination of hardware, application, and operating system metrics. Triggers are set for conditions which need resolution, and whenever possible automated scripts are built to correct problems upon detection. Logfile data on the other hand is collected on each server using the syslog-ng logging application. The syslog-ng client monitors individual logs such as the Apache logs and system logging, including the Linux journal and kernel logs. This information is collected and sent using the syslog protocol to a central logging server. Each server also monitors applications and other events on the machine who send syslog messages locally for forwarding. Our central logging machine then filters, stores, and forwards combined logging streams to other SIEM servers on the network for their use. As an example, every server operating system is configured to look for and report network stack threats. These get logged first through the kernel logging, then picked up by syslog-ng, and then sent to the central logging server which records the event and then sends the information onto the downstream SIEM servers. Network stack threats from all Jade servers gets collected together so we have a clear picture if a threat is an isolated incident on a single machine or a coordinated attack against several assets at the same time.

Security Information and Event Management (SIEM)

For an overview on SIEM systems please see Managed Services: Monitoring. The syslog-ng application and service described above, while a very sophisticated and capable system logging solution, also functions as a basic SIEM solution. It performs logfile aggregation, filtering and logfile propagation to downstream SIEM solutions. It reads logging data from multiple sources and combines as configured. It can also perform basic log analysis (pattern matching) and data reduction. It also has limited alerting and response capability. The syslog-ng utilities are an excellent resource to use as a collector or preprocessor of logging data however they do not have in-depth data analysis or correlation capabilities.

The ability to combine logging data with other data sources such as database activity monitoring, netflow, file integrity monitoring and threat intelligence feeds and then to establish context, analyze, correlate, prioritize and then presenting the findings in reports and visually is where the real power of SIEM solutions shine. Forensics tools further augment these capabilities. Advanced SIEM solutions provide the tools needed to bring together data collected across the enterprise (servers, devices, firewalls, IPS/IDS, UTM, and other). Not only can these solutions detect threats in a single stream (authentication attacks against a single SMTP or IMAP server for example), but can see threats across the entire enterprise. Real time analysis is applied across the enterprise resulting in more informed decisions and the ability to detect and react to a far greater number of attacks. Non-real time analysis can be done off-line, potentially by a different SIEM server, to analyze a much bigger dataset spanning more resources and/or time windows. This provides additional insights to threats against the enterprise as not all threats occur within a small window of time and are self-contained.

As mentioned above, the initial configuration of data collection, filtering and normalization can be complex and time consuming. SIEM systems need continual tuning, requiring considerable time from skilled professionals. These are not set and let run type of systems as enterprise networks tend to be dynamic in nature. The threat landscape is also continually changing. Necessary effort levels and resource requirements are often underestimated by organizations.

Other common uses for SIEM solutions are for compliance reporting, historical analysis and incident investigation. When combined with packet capture (in tandem with an IDS/IPS solution) very sophisticated and thorough forensic analysis is possible. These systems however tend to be larger and more expensive as full network packet capture and retention requires a much larger amount of storage (and management).

Response Systems

As with the Network Management response solutions described in our Monitoring page, it is critical that any SIEM solution be able to react to detected threats in real-time. Most SIEM solutions are able to perform predefined actions to mitigate a threat. Not all however have the ability to invoke customer supplied programs or scripts. This ability in our opinion is essential to solutions we provide as it forms the basis for building an extensible automated response system.

Alerts

Just as with Network Management systems, SIEM triggers can invoke alerts or notifications upon detection. These can have multiple levels (informational, warning, critical, disaster, etc) and can invoke different notification methods (email, instant messaging, web page update, etc). The first inclination of a new security designer is to build in as many controls and alerts as possible in an attempt to understand the network to the fullest. This may prove to be interesting for about a day or two but gets tiring quickly. The purpose of a properly configured SIEM solution is to reduce the administrator workload while providing greater insight and automation against known threats. Having to dig through several thousand email or IM alerts gets old VERY quickly and detracts from the real job of the administrator.

In our network we send most alerts by email. There are legitimate reasons why information level alerts may want to be configured into a system, despite their potential to generate huge number of messages. We use a highly configurable distribution list (mailing list) manager in our email environment. This tool allows us to setup different list addresses and configure the delivery methods on a list or per-user basis. For high volume lists such as informational alerts we build lists configured for digest mode delivery where many messages are sent as a single digest message to the administrators. Instead of receiving 2,000 individual informational alerts, the administrator may receive just a few digests over the course of a day. These can be archived in case they are needed for later problem investigation. We configure different alert levels with different mailing lists parameters depending on severity and type. Critical or higher level alerts get sent immediately to the appropriate administrator. We are currently evaluating ways where we can extend critical alerts to IM notifications resulting in immediate notification on mobile devices. This tiering of mailing list capabilities means the critical alerts are not lost in the sea of informational messaging. Even so, it is strongly suggested to be prudent and only send alerts that are really useful to keep the message volume as low as possible.

Common Defense Mechanisms

Many operational problems and common security threats are well known. Examples include hung servers or applications, application authentication attacks, and other similar situations which are frequently encountered. In situations where the mitigation of these problems can be codified with a very high degree of certainty (near or at 100%), automated solutions can be put in place. Common responses to threats include blocking the attacking IP addresss/network or user. Depending on the network capabilities these can be permanent or time limited blocks. The Jade Security Framework we are developing internally provides network wide control over network block lists of varying types and durations to the SIEM server. This allows us to build automated systems which can detect network attacks in progress and disable access at the exterior firewall without operator intervention.

The ability to be able to write custom programs or scripts to mitigate problems is in our opinion one of the most important capabilities to look for in any SIEM or Network Management solution. Being able to react and correct problems quickly and automatically ensures much higher service availability to customers and lets overworked administrators better sleep through the night. Instead of being on call to deal with every problem, most can be automated and alerts sent to the administrator informing them of what happened and the action taken.

What’s Next?

The descriptions provided here are meant to be an introduction to analysis and response systems and not an exhaustive review of the discipline. Each solution has their own way of doing things and different capabilities. We hope that the introductory information presented here can be useful in forming a basic foundation and a basis for building solutions catering to the unique needs of each customer. This introduction, like the rest of the security industry, is subject to change as threats and solutions continue to evolve. For more information on how we can work together or on any of the material presented here contact us.