Introducing

AI··Rooms

The largest LAM in the world

Cybersecurity

Cybersecurity

Debating the future of SOC with Google and Microsoft

Debating the future of SOC with Google and Microsoft

Sep 24, 2021

Paul-Arthur

Jonville

Last week, Anton Chuvakin, a Security Strategist at Google, and Carson Zimmerman, a Security Engineer at Microsoft, debated the future of the SOC. Since great content and vision are universal, we summarized the discussion.


The debate was divided into five thematic areas, and speakers made many insightful comments on each. 

Participants:

  1. Carson Zimmerman (CZ): Microsoft Security engineer. Ten strategies of a world-class cybersecurity operation center

  2. Anton Chuvakin (AC): Google Security strategist. Former job at Gartner. Security Warrior: Know Your Enemy

Host: Nimmy Reichenberg (NR): CMO at Siemplify

Main Takeaways

SIEM's evolution as a function

SIEM (Security Information and Event Management) won't die as a function. Telemetry is needed. What matters is integrating every technology analyst's use and a coherent architecture. Although suppressing it isn't conceivable, people need to reduce the panes of glass.

Disruption in SOC

People must stop thinking about SOCs (Security Operations Center) as tiers or teams inside a specific room. Collaboration across the whole SOC is needed. The feedback loop, cohesion, and coherence between agents must be strengthened. Walls need to be deconstructed.

The new role for humans

New tools, such as ML/AI, are attractive. Unfortunately, they don't show the same level of transparency and trust as old tools (snoring). Analysts should be able to learn the techniques to achieve the same level of confidence.

Regarding automation in general, the aim is to increase the human-per-terabyte data ratio.
In some cases, automation is a must: worms, for example. People need a faster-than-human reaction.

There are two buckets: AI/ML and human. People have to determine which problem falls into which bucket. This is a suitable investment and the correct factor for success in making ML/AI work. SOAR, ML, SIEM, and big data platform vendors often miss that.

What is Tier 1 today?

Tiers are notions that are challenged. Some organizations tend to rotate agents between missions. Others have implemented an SRE logic among SOCs (detection engineers develop solutions to solve detection challenges at scale. They also manage the alerts their solutions create).

What about fully automating tier 1?

Tier 1 tasks are tedious but valuable for training junior agents. The aim is ultimately to make every analyst's life more manageable and maintain ways to attract, recruit, and train juniors. Companies need to grow their workforce and show career progression. Otherwise, they'll have tier 2 and 3 analysts and no tier 1 juniors.

EDR is taking over NDR

The pendulum is swinging in favor of EDR. Alas, both technologies could be pushed into niches in the long term because of the cloud particularities. At some point, there will no longer be those networks pointing ideal for NDR to monitor trafficking as in former networks nor a machine to put an agent on soon. Will there be a cloud-native feature that enables analysts to imitate netflow telemetry?

Now that you have all the keys, let's dive into the debate!


How is SOC evolving?

Some companies are taking a SOC-less approach, but is it thoughtful? There may be a miscomprehension of what a SOC is. As defined in the past, the SOC may be dead, yes. The big dark room close to the other parts of the company is no longer operant, but that is not to say that the SOC is dead. 

SOCs as the center of expertise: A SOC will always be the center of knowledge. The constituency can differ from one company to another—back-office or remote, for example—but the need for a team remains.

Physical SOCs under pressure

In addition, the pandemic affected SOCs—people attached to SOCs as a physical space endured stress during the pandemic. In contrast, considering the SOC as a team that did not necessarily work in a physical room, others were more agile in adapting to this situation. Yes, it's about adaptation.

The main point is collaboration and cohesion across the team. It's where physical SOCs endured stress when forced to work remotely.

There are also concerns about the SOC's current structure. Can we still structure them according to a tiered scale?

What is the impact of cloud migration on security tools?

Are SIEMs challenged as a function or a technology?

The SIEM has existed for 25 years and is at the heart of the SOC. But today, do SOCs still need one? Can they substitute it? What should the SIEM be used for?

A SIEM is a set of technologies and capabilities to collect security-relevant telemetry. Persist it to support detection and correlation on that telemetry. Provide a rich analytic framework, including both off-the-shelf detection and the ability to modify those detections or create new ones and all the things necessary to support the alerts that come out: filtering, enrichment, down selecting, de-duplication—finally, the analyst's ability to undertake this position and escalate the alert.

Are SIEMs going to be replaced?

The question is not the SIEM but what technology they will use to accomplish that outcome, which is described as the SIEM's function. Twenty years ago, for many SOCs, the only option was to choose a SIEM product, implement it, and leverage it. Today, however, there are more options.

A SIEM is a set of technologies gathered to analyze data; this function cannot die.
But, of course, every particular technology can die. Today, XDR is presented as one current famous potential SIEM killer. But, the chances of SIEM eating XDR are high.


SIEMS as a way to build a coherent architecture

One of the original values of an SIEM was to reduce the panes of glass for the analyst if well integrated with the other tools Indigenous to the SOC (although the single pane of glass, as promised by vendors, pretty much never existed!). Multiple combinations exist SIEM+ML or SIEM+ EDR, and so on. The point is that a coherent architecture is needed. Could analysts compose an architecture with tools that do not come from the SIEM vendor? It's possible; SOARs could be part of that.


Are on-premise SIEMs going to die?

One of the significant trends in the SIEM market today is cloud migration. Most new SIEM implementations are on the cloud. Is on-premise deployment dead or dying? Is there still a practical use for it?

What is meant when people talk about cloud SIEM? A sliding scale. Technologies were used before the name cloud was used to designate someone else's computer, and some ways to integrate data from other computers were used. Today, it is called a cloud. The question is, how are these techniques blended?

In hybrid schemes, you don't have control over all tools. This is not truly a continuous scale. People want appliances for their data servers, and many still want physical servers and their data servers.

In the long term, on-premise may die, although there will be discontinuities. For example, about sovereignty, sometimes European people don't want to use the cloud from a company not based in Europe. Moreover, analysts sometimes need to build a data set you can't send on the cloud.

The on-premise may die (or be relegated as a niche technology) when the cloud delivers data analytics on a whole other scale: analytics advantages, shared data analysis advantages, and TI advantages. SIEM on-premise will be too far behind, and it would be like choosing between using a spaceship, a bike, or horses.


Between NDR and EDR: Who wins?

The current balance favors EDR. Priorities have changed. In 2013-14, people fought not to deploy agents. It was like a tsunami hitting their systems: crashes, blue screens, kernel panics.

Even though people hated the Endpoint approach in the first years, it ultimately won. The rebalance of the EDR and Endpoint approach is pushing the NetworNetwork auxiliary role. Though it did not kill NDR, the pendulum is still swinging.

Ultimately, between EDR and NDR, log analytics would win. Why do we still run network sensors? Do you have significant chokepoints where network sensors make sense? As we move to cloud services, all the analysts have are logs and no network to put sensors on. Hosts are not managed by the constituencies that the SOC serves.

Ultimately, EDR and NDR should both fear the migration to the cloud

It won't be a surprise if, in the medium term, EDR and NDR would both be pushed into a niche. With SaaS growing, the traditional server VM base will decline, and EDR won't have a machine to put an agent on soon.

In the end, both of these technologies should fear log observability, micro-services, and containers.

The question is, how many servers are companies running on-premise or IaaS? Companies must consider a strategy for turning off an EDR or circumventing an NDR. What is the composite sensing scenario, and what happens when a high-tier adversary bypasses it? What is the fallback?

One trend that has brought EDR to its current status is the cloud. There are no longer those network choke points ideal for NDR to monitor trafficking as in former networks. How will companies monitor their networks in a cloud environment where you can't put online sensors?

Which of these has a higher chance of survival on the cloud? For now, they are copying their on-premise mentality into the cloud. As we move into the cloud, in-app telemetry will surpass both technologies.

Cloud providers allow companies to beacon some network sensing capability at the network layer easily. The point is, is there a cloud-native feature that enables analysts to imitate netflow telemetry? And how easy is the integration with some cloud-based appliances?

Cost-wise, when on-premise, SOCs could buy and refresh network sensors. In the cloud, are there easy mechanisms to place the cost of the virtual network appliance over the SOC? What will be the costs of that? If analysts have to sniff high-volume traffic in the cloud, it will induce high costs. There is a problem of the economic scale in this case.


SOC sustainability and automation

SOC structure needs to change...

If people think of SOC and Security operations as functions, how should we structure them? Traditionally, tier 1 would mean alert triage. Today, that's not as absolute. Alert triage and the initial investigation are not tier 1 anymore.

Key Questions:

  1. What is today's proportion of those "tier 1" roles within the whole SOC?

  2. Can we still call them tier 1, or must we change that terminology?

  3. How do we support young engineers in this evolving structure?

Evolving SOC Structures:

  • Some organizations are rotating people in these functions.

  • Others have implemented an SRE logic among SOCs:

    • Detection engineers develop solutions to solve detection challenges at scale.

    • They also manage the alerts their solutions create.

Concerns and Considerations:

  • How do we support young engineers without jeopardizing more advanced functions?

  • Historically, tier 1 was the entry point. But with no permanent tier 1, how do companies accomplish that support?

The Push for Automation:

  • Analysts could create playbooks for initial triage done by machines.

  • Conceptually, this works and would allow every SOC member to improve their work through:

    • Better alert de-duplication

    • Enrichment

    • Funneling

    • Filtering

    • Machine Learning

  • This approach would decrease the number of people doing alert triage.

The concern about automating tier 1 functions highlights the need for a balanced approach that improves efficiency while providing growth opportunities for new team members.


...to attract new talents and young engineers

The cybersecurity industry faces several vital challenges regarding talent acquisition and development. Companies struggle to answer critical questions: Where do they source senior talent? How can they ensure career progression for their existing staff? What's the pathway for advancement to tier 2 roles? Where do junior people enter the industry, and how can organizations cultivate talent if they cannot grow it internally?

To address these challenges, companies need to change their approach fundamentally. The traditional model of funneling talent through established pathways is no longer sufficient. Organizations must adopt more inclusive language and practices, ensuring everyone in the SOC feels valued and sees growth potential.

The COVID-19 pandemic has further complicated this landscape. While it opened up opportunities for companies to hire talent from anywhere, it also revealed a shifting attitude among workers. People are increasingly resistant to tedious, repetitive jobs. This challenges tier 1 analyst roles, often involving monotonous routines. Companies that don't address this risk higher turnover rates in these critical positions.

To combat these issues, organizations must focus on building careers that appeal to young professionals. This involves creating pathways to hire junior staff, providing them with meaningful experiences, sparking their interest in the field, and encouraging their growth and development. The goal should be to create an environment where young talent sees a future for themselves in cybersecurity.

Ultimately, these challenges circle back to the fundamental issue of team cohesion. Companies need to consider how they structure their SOCs carefully. The aim should be to build coherent teams with clear lines of sight and effective feedback loops. It's crucial to avoid creating silos or walls between different components of the SOC. Instead, organizations should strive for an integrated approach that allows for seamless collaboration and knowledge sharing across all levels of the security team.


Can automation entirely replace human roles in SOC?

In the future, will cybersecurity be about our robots fighting their robots? What is the role of humans today?

There's an analogy here. Regarding IDS, why have so many people used Snort for so long? If analysts could run it correctly, they could get the alert, the packet, and the signature. This gave the analyst a great deal of transparency and trust.

When we talk about ML/AI, we need to ask ourselves what we are doing to support the same level of transparency and trust in the telemetry that results from it. One solution is to learn the techniques as an analyst.

Are humans dead? The question is more about the proportion between humans and terabytes. More terabytes and fewer humans.

Today, more data, threats, and a more complicated environment exist. The things that need to be secured grow way faster than humans. Although full automation is impossible in the short term because humans have cognitive characteristics that are difficult to imitate for a machine, the robot/human ratio needs to grow. It doesn't mean that humans need to be replaced, but that one human should be able to manage more terabytes.

But, there are some scenarios where there is a need to respond faster than human time. Some threats can only be dealt with by machines. Worms, for example. The reaction needs to be immediate. Humans can't go hunt after worms.

What will be the role of humans in tomorrow's SOC?

Moreover, there are problems humans feel good at solving themselves and others with the help of machines. Determining which falls in what bucket is the suitable investment and the right success factors to make ML/AI work. This is also why so many SIEM vendors lost contracts and bids when they were the incumbents. This is what the SOAR, ML, SIEM, and prominent data platform vendors need to bear in mind when they promise to enable their customers to achieve the success shown in the demos during sales time.

Automate processes with AI,
amplify Human strategic impact.

Get a demo

Automate processes with AI,
amplify Human strategic impact.

Get a demo