The ABCs of IRP
The risks brought by cybersecurity threats are becoming too varied in quantity and sophistication. They can’t be handled on the fly by security teams. To face the multiplicity of scenarios, anticipation is of prime importance.
To increase the readiness of companies, public institutions developed frameworks, such as the NIST incident response plan (National Institute of Standards and Technology).
No matter if you’re a small company or a large one, you may store critical data that are attractive from an attacker’s point of view or part of a broader value chain, or just relying on an information system to run.
It’s not a matter of “if” you’ll experience a cybersecurity incident, but more of a “when.”
Thus, every company has to anticipate incidents. They must develop and implement an Incident Response Plan (IRP), a future projection of your company in potential scenarios targeting it. It gives IT and cybersecurity teams instructions on responding to a severe security incident, such as a data breach or leak, a ransomware attack, or a loss of sensitive information.
Not having developed a detailed security incident response plan in the first place can lead to additional damage in several ways. When you’re hit with a breach, for instance. Security and management teams will have difficulty understanding and responding to the incident. Doing things on the fly is more complex that acting according to a plan. Plus, they’ll be prone to making expensive mistakes.
“Nobody panics when things go according to plan!”
This is where IRPs and Orchestration and Automation capabilities can be combined. SOARs’ (Security Orchestration, Automation, and Response) prominent ability is to enable you to design playbooks, from the detection to the remediation. By pulling security events from alert centers, enriching them, and taking remediation actions, you’re ultimately planning your IRP in the face of a given incident.
This way of anticipating the incident is similar to the NIST incident response plan, which is constituted of four phases:
Below, we’ll see how the SOAR is adding to your NIST incident response plan by reviewing each step as determined by the NIST.
Where the SOAR brings value to the NIST incident response plan
First, your plan needs to detail who’s on the incident response team, their contact information, their role, and when you need to contact or inform team members.
Each person listed as an actor in the IRP needs to understand their place on the team and what they have to do should the incident occurs. The NIST adds that they must also recall the details within the IRP to respond quickly. Nobody’s going to learn an IRP by heart. This is where automating your plan of action will bring some value. You know the processes. Your role should be limited to acting as an oversight.
It also adds to the reactivity of the overall team should a security incident occurs. Even if your team is well-prepared, your IRP is in shape, and everyone reads it before bedtime, you’re acting at human speed. Attackers have the first-mover advantage, and they harness automation. You must do the same to operate as much as possible at machine speed.
Anticipating an attack also helps ensure you leave no holes behind, or worse, you create new ones to keep attackers continue exploiting and infiltrating.
In short, developing this plan directly into your SOAR is a new way to approach IRP and cybersecurity. It gives you unprecedented chances to act at the maximum speed possible.
This is where your different teams are working hand in hand on the same platform. SOC (Security Operations Center) teams and CSIRT (Computer Security Incident Response Teams), gathering and emulating around a common platform, in a fusion center.
Besides that, you’re also looking to limit the number of potentially occurring incidents by implementing controls based on your risk assessment results. As you know, there’s no 100% confidence. Residual risk inevitably persists even with all the controls in place.
This is where we begin our journey on the IRP.
2. Detection and analysis
This phase regroups six different concepts: Attack vectors, Signs of incident, Analysis, Documentation, Prioritization, and Notification.
You can determine common attack vectors when you read reports of incidents and breaches. They’re used as starting points as to what steps you’ll take if this event should occur, such as external device, attrition, web-based apps attack, email, deception, loss, or theft.
Signs of an incident: precursors and indicators
Once the main attack vectors are determined, you can move on to determine the signs allowing you to detect incidents coming via these vectors. They can be detected by various means: network-based and hosts-based IDPs, antivirus software, log analyzers, but also manual means.
These signs are called Precursors or Indicators. A precursor sign is a sign that an incident may occur in the near future (usage of a vulnerability scanner, a new exploit released, threats of attacks by particular groups). On the opposite, an indicator is a sign that an incident may have already occurred or may be occurring now (such as network intrusion detection sensor, antivirus alerts, file creation, account creation, login attempts, unusual traffic in or outflows, the unusual flow of activity on a device, etc.)
The ideal is to detect every upcoming attack before it even happens. You and I know that this isn’t possible. You need to prepare yourself for the case that you’re going to face an incident one day or another. Thus, planning your response step by step is the best thing to do.
Sources of precursors and indicators
You can identify precursors and indicators using many different sources. The most common sources are computer security software generating alerts, logs, information publicly available, and, more directly, people.
Alerts and logs are identified by IDPs, SIEM, EDR (antivirus, activity, file creation), Network and host sensors, file integrity software, and vulnerability scanner.
People are your usual insiders or external attackers.
The analysis is a crucial part of the NIST incident response plan. You’re going to analyze and validate a given incident. The result will determine your incident response so this step conditions all the succeeding ones being its scope – which networks, systems, or applications are affected – its origin, who or what originated the incident, and how it is occurring.
One of the main issues concerning incident detection and analysis is the accuracy of precursors and indicators. Companies can generate thousands or even millions of those indicators daily, most of which happen for other reasons than security incidents. Even if accurate, it may not be a security incident or likely result from human error and be benign.
Each indicator should ideally be analyzed and assessed to determine its legitimacy. Considering what’s written above, finding an actual security incident in this haystack can be daunting.
Companies thus have to build dedicated and experienced teams to analyze these incidents and determine what is happening and if this incident is revealing a security alert. It means recruiting skilled workers to triage and confirm incidents, highly repetitive and tedious tasks, besides not forgiving in case of forgetting or mistakes.
Companies are looking for other ways to tackle this challenge, like automation. As the processes to determine if an incident is legit or not are known, so are the possibilities for automation. The only challenge is the level of skills required to design workflows. Again, this is where a platform like Mindflow, SOAR no code, can bring benefits to a CSIRT.
They design workflows to quickly analyze and validate each incident, following the already-known and predefined process and documenting to handle incident analysis at scale.
Below are examples from the NIST incident response plan framework published on their website:
- Profile Networks and Systems: measuring the characteristics of expected activity so that changes to it can be more easily identified
- Understand Normal Behaviors: Conducting periodic log reviews should keep the knowledge fresh, and the analyst should be able to notice trends and changes over time.
- Create a Log Retention Policy: how long log data should be maintained may be extremely helpful in analysis because older log entries may show reconnaissance activity or previous instances of similar attacks. Another reason for retaining logs is that incidents may not be discovered until days, weeks, or even months later.
- Perform Event Correlation: Evidence of an incident may be captured in several logs containing different data types. Correlating events among multiple indicator sources can be invaluable in validating whether a particular incident occurred.
- Use Internet Search Engines for Research: help analysts find information on unusual activity.
- Run Packet Sniffers to Collect Additional Data: If an incident occurs over a network, the fastest way to collect the necessary data may be to have a packet sniffer capture network traffic.
- Filter the Data: filter out categories of indicators that tend to be insignificant or show only the types of indicators with the highest significance.
To ensure transparency and facilitate the understanding of what happened, every step taken by the incident response team must be documented and timestamped, as the NIST incident response plan is listed below:
- Status of the incident, that is to say: new, in progress, investigation, resolved, etc.
- Indicators gathered
- Incidents related to it
- Analysts taking part in the incident’s handling
It means that you have to enforce the auditability of any actions taken by members of CSIRT. As they hop on different tools to analyze the incident, auditability can be hard to maintain.
On a SOAR, since you pull all the tools needed to handle the incident, actions can be tracked from one platform. It eases the broad auditability of your processes: from the accomplishment of the workflow, the opening, and closing of the incident ticket, and a sum up of all actions and tools pulled to handle it.
Prioritizing the handling of the incident is also a critical point in your incident response process. According to the NIST incident response plan, to minimize potential nefarious consequences, incidents should be prioritized based on the following factors:
- Functional impact: security incidents that target IT systems usually imply consequences on its users and thus on the business side
- Information impact: security incidents may affect your company’s information in three ways illustrated by the CIA triad that is confidentiality, integrity, and availability
- Recoverability: by its size and the type of resources it affects, an incident can necessitate a different amount of time to be recovered from
Now that you have these factors, you can combine the functional and information impact to determine the business impact of the incident. To illustrate this, take a distributed denial-of-service attack against your public web server. It can reduce the functionality of users attempting to access your server. On the opposite, unauthorized root-level access to your public web server can lead to a data breach, which can have a long-lasting impact on your reputation.
How does this prioritization work with the SOAR? Here, the analysis step conditions the prioritization. Thanks to the ability to gather information from different sources:
- internal such as gathering information from the devices infected
- external, with the help of threat intelligence databases
You can predetermine the workflow’s future path because precedent steps help assess the incident’s functional and information impacts.
So, considering the results of the triage and enrichment processes, you can automatically decide to attribute the incident to a specific analyst by conditioning the attribution of the case according to your attribution policy.
Once you have analyzed and prioritized an incident, you may have to notify specific individuals in your enterprise or public authorities. Sometimes, it’s even mandatory, or you’re potentially facing fines.
The NIST incident response plan lists some persons and authorities to contact in such cases:
- Head of information security
- Local information security officer
- Other incident response teams within the organization
- External incident response teams (if appropriate)
- System owner
- Human resources (for cases involving employees, such as harassment through email)
- Public affairs (for incidents that may generate publicity)
- Legal department (for incidents with potential legal ramifications)
- US-CERT (required for Federal agencies and systems operated on behalf of the Federal government; see Section 18.104.22.168)
- Law enforcement (if appropriate)
This is why most incident response plan list particular persons to notify. In next-gen SOAR, the Workflow panel is your IRP.
You should be able to specify all the necessary steps for a given case in this pane of truth to gather everyone around it and instore enterprise-wide collaboration.
As to mandatory communications, you can predetermine them in your workflow through a step notifying the analyst in charge or the legal team to prepare a communication related to this incident.
3. Containment, eradication, and recovery
Containment is crucial to put in place before a given incident overwhelms your resources or delivers more damage to your systems. Because eradication usually takes way more than what’s needed for an attacker or an automated chain of attack to go through each step towards delivering the payload, containment provides your CSIRT time to make the right decisions and start to deploy remediation processes.
To ease this decision-making moment (shut down a system, disconnect it from a network, disable certain functions), IRPs need to implement strategies and procedures aimed at this very containment. Whatsmore is that by benefiting from anticipating the incident, you can tailor containment procedures and thus limit the potential negative effects.
Ultimately, predetermined containment strategies tailored to your systems and the type of incident enhance decision-making besides fastening it.
As to the type of incident and the containment strategy involved, the NIST incident response plan has provided a list of criteria you have to consider:
- Potential damage to and theft of resources
- Need for evidence preservation
- Service availability (network connectivity, services provided to external parties)
- Time and resources needed to implement the strategy
- Effectiveness of the strategy (e.g., partial containment, full containment)
- Duration of the solution (an emergency workaround to be removed in four hours, a temporary workaround to be removed in two weeks, a permanent solution)
In every incident, it’s crucial to document pieces of evidence that you and your team gather. Predetermining how these pieces of evidence can be gathered according to applicable laws and regulations is life-saving in some cases.
However, collecting evidence from your computing resources can present some challenges, although it can be of interest to procure material for internal and external use.
As said above, evidence collection, as documentation, can be predefined and performed across all the response plans and brought bit by bit in an incident ticket.
Identification is an important step. However, asking someone to try and identify an attacker while he has to put out fires in his system is like saying, “try to don’t break anything!” to firefighters trying to put out a fire in your building! So, although this information can be of prime importance, incident response teams generally prefer to stay focused on handling the incident, and their primary goal is to minimize the business impact.
However, there are a few steps to achieve to gather some pieces of intelligence to identify the attacker. The NIST incident response plan lists some of them.
- Try to validate the Attacking Host’s IP Address.
- Research attacking host through dedicated search engines using the apparent source IP address
- Query incident databases with the pieces of evidence (IOCs) collected during the attack.
- Monitoring known attackers’ communication channels, usually on the dark web, where they could brag or put up to sell stolen data from your company
Eradication and recovery
Entering the eradication phase means that you’ve analyzed the incident and determined strategies to contain and thus affected hosts across your company. Moving to the eradication phase, you will identify hosts and start remediation actions to allow the return to normal operations.
That is to say that your team will need to eliminate the attack’s origin on the affected devices, delete malware, disable compromised accounts, or close active vulnerabilities.
Usually, your team needs to go on every point of attention, your EDR, your WAF, or your Active Directory, to implement these measures. On a SOAR, eradication comes right after the analyst in charge decides to go on with the remediation phase. Once the incident has been cornered, you can design your workflow and the decision tree pacing the way to the eradication phase: deleting the quarantined malware, fetching and deleting similar emails, finding and deleting similar IOCs on your Network, etc. As your first step, this one is also automated from a single place.
Once you have successfully performed the eradication phase, you can begin the recovery. This phase is first about restoring systems from clean backups, restoring systems from scratch, restoring compromised files, final scans, changing passwords, updating firewalls exception rules, and so on. Again, you can design every step from your SOAR, clean backups or file versions to restore, for instance.
Then, you can start making changes and updates to your security architecture, addressing the vulnerability that enabled the security incident, and raising awareness on the human side if the incident originated from a human error.
4. Post-incident activities
Once the incident has been stopped, the security architecture is back on track. This is the time for debriefing what happened and how to identify future similar incidents and stop them earlier. Try to ask yourself if any of the actions part of your IRP should have been done better or more efficiently.
Finally, the documentation and evidence are gathered throughout the incident lifecycle. Ideally, you should be able to objectively assess each incident via a root cause analysis.
A SOAR brings a lot of value since it allows you the time to do it. Classically doing IRP, i.e., manual, doesn’t give you time. You’re only going from incident to incident. By automating processes and keeping a role of oversight and decision upon results, you generate free time to be employed at assessment and, ultimately, improvement of your workflows.
Below, look at some ways to objectively assess each one of the incidents you had to deal with listed by the NIST incident response plan.
- Identifying which precursors and indicators of the incident were recorded
- Determine how effectively the incident was logged and identified and
- Comparing IOCs to known frameworks to determine where the attacker was on the kill-chain and assess potential loopholes in your defenses
- Determining if the incident caused damage before it was detected
- Performing a strict sense Root Cause Analysis, the actual cause of the incident was identified, identifying the vector of attack, the vulnerabilities exploited, and the characteristics of the targeted or victimized systems, networks, applications, and taking action.
- Determining if the incident is a recurrence of a previous incident
- Determining if there’s a difference between your initial impact assessment and the final impact
- Identifying which measures, if any, could have prevented the incident.