Cybersecurity: prioritize remediation of the vulnerable surface

(To Andrea Piras and Marco Rottigni)

28/07/21

Prioritization is the art of answering the questions "where do I start?" and "with what do I continue?"

In this article we analyze how to compare theory and practice to solve in the most effective way a security problem that afflicts companies of all sizes, orders and levels: the remedy of the vulnerable surface.

This problem usually arises with the evolution of maturity in managing the life cycle of vulnerabilities in your IT landscape. When a company decides to structure a program to estimate, analyze and understand its vulnerable surface, the starting point is to use more or less commercial systems to probe its internal and external, infrastructural and application systems, in order to understand their degree. of resilience with respect to known vulnerabilities.

This process usually starts with scans with variable periodicity, which generate reports on identified vulnerabilities. Far from being a point of arrival, this beginning normally highlights a huge number of issues, the potential of which is completely beyond the capabilities of the company.

Hence the need to manage the life cycle of this vulnerable surface, examining various elements in order to understand how to best use the mitigation capabilities by minimizing the residual risk.
Up to using expanded contexts, such as compromisability information or, more generally, cyber threat intelligence. Here is defined the evolution from Vulnerability Assessment, Vulnerability Management and finally to Threat & Vulnerability Management.

So let's see in theory some methods to determine the correct action priority.

Before going into the theory it is good to make a premise: calculating the priority is part of a larger plan which is that of risk assessment. In this regard, please refer to the ISO 31010: 2019 standard [1] and the NIST documents SP-800-37 [2] and SP-800-30 [3], in particular the elements that make up the evaluation: identification, analysis and risk weighting as shown below.

A key point in having an effective remediation plan is to properly prioritize. We then introduce the risk analysis approach and the method used. NIST divides the approach into: threat-oriented, impact-oriented and vulnerability-oriented.

For the sake of brevity, the threat-oriented model is taken as the risk model, in which prioritizing the risk means evaluating the threats and - after calculating a risk matrix - obtaining a numerical value: the higher the value, the higher the risk.

As for the risk assessment methods, they are mainly divided into qualitative, quantitative and semi-quantitative methods.

The literature teaches us that in the qualitative method in order to prioritize the risk I need to calculate a score for each threat, the so-called Risk score, given by Probability x Impact according to the formula R = P x I.

La chance indicates the possibility that an unexpected event capable of causing damage - in this case the threat - will occur; L'impact indicates the damage potentially caused by the threat.

Using a 5x5 risk matrix, each threat is assigned a probability and an impact, with different qualitative levels as shown in the figure below.

The advantage of this approach is in ranking the risks, focusing on the highest priority ones.
On the other hand, the qualitative analysis is subjective and is based on the perception that stakeholders have towards the threats analyzed, combined with their experiences and skills.

The quantitative method has a more structured approach.
Being able to count on time and resources to dedicate, we have the possibility of having statistical data - eg. how many times an event has occurred in a year - in addition to an accurate impact analysis - eg. if the "X" service is inactive, it makes you lose 1000 euros / hour.
It therefore becomes possible to "translate" the risk into significant numbers to support strategic decisions.

The quantitative method uses the parameters EF (Exposure Factor), SLE (Single Loss Expectancy), ARO (Annualized Rate of Occurrence) and ALE (Annualized Loss Expectancy).
The expected loss for the single asset is equal to:

SLE = asset value x EF

SLE measures how much a threat affects a given asset, such as a server, framework, or even a service consisting of software and hardware.

ARO, on the other hand, measures the frequency of this threat over a period of one year. The expectation of economic loss for a given asset over the year is given by:

ALE = SLE x ARO

Taking a numerical example, suppose that a company has an e-commerce service that invoices 1 million euros per year; the sales service is therefore taken into consideration, consisting of the set of hardware, software and people who work.
Let's assume that a DDoS attack, which blocks sales and the activity of operating people, has an exposure factor of 5%.
Finally, suppose that this type of attack has been scored 6 times in the last 3 years, i.e. it has an ARO = 6/3 => ARO = 2.

SLE = € 1.000.000 x 0.05 = € 50.000

ALE = € 50.000 x 2 = € 100.000

Following the risk model described, the company loses on average € 100.000 per year.

To simplify the example, a remediation plan could be to install a next generation Firewall with IDS / IPS to block these DDoS attacks for a cost of € 50.000 + € 5.000 / year of maintenance.

Through the semi-quantitative method, the assessments are made according to qualitative methods in order to immediately prioritize the risk and activate an action plan, to then re-elaborate the data and transform the qualitative terms into numbers to provide a more accurate economic estimate.

After this first theoretical part, analyzing the practice in the management of a vulnerable surface in complex IT environments we see that the level of difficulty increases: to prioritize the remedial actions it is necessary to act by combining some of the methods illustrated, in order to achieve a perception more descriptive than analytical risk and threat, mapped onto its own digital surface.

As seen, the risk is complex to calculate, especially cyber risk.
Very different from other types of risk (financial, entrepreneurial), it is multi-faceted and based on certain evidence and intelligence perceptions. It becomes important not to trust only a numerical representation of severity, criticality, in favor of a descriptive perception of risk.

Basically I have to be able to describe what worries me, to count on a system that converts these perceptions of mine into prioritization factors.
Examples of description:

Perimeter, i.e. the ability to match the system on which the vulnerability is discovered to the perimeter to which it belongs, to increase (or decrease) the objective criticality of the resource.
Age of vulnerabilities, considering both when they appeared in public and when they were detected in your digital environment.
Business impact: for example, consider all the vulnerabilities that if compromised can facilitate a DDoS attack, or an infection that spreads through worms, or even a ransomware outbreak.
Probability that the threat turns into an attack: for example if a vulnerability is already armed with an exploit, perhaps already part of Exploit Kit easy to find (even for hire) for a motivated attacker; in fact, the probability that it will be exploited for an attack is certainly higher than for an unknown vulnerability or for which the compromise technique has yet to be developed.
Own attack surface. Sun Tzu used to say "know yourself and know your enemy and you will survive a hundred battles". Knowing your digital landscape helps to prioritize the remedy based on the network context on which you have to operate.
For example, focusing on remedying vulnerabilities that are present only on running kernels or services. Or primarily on systems exposed to the internet.

An example of how a technology platform supports this perceptual description of risk is shown below, based on the Qualys solution:

Once the perception of the threat has been described, it is important to have a further context, regarding the patches: both those available and those already installed on the systems, to carry out an obsolescence check on the latter.

It is also important to analyze the vulnerabilities deriving from incorrect configurations - especially in cloud-like environments where responsibility is shared: if I instantiate a storage on AWS or Azure and I forget to restrict the list of IPs that can access it, I risk a very serious data leakage; if I forget to enable multifactor authentication on an instance, the consequences could be even worse.

Often the remedial activity (patching, configuration modification, implementation of compensatory controls) is performed by company teams that are not the same as those responsible for the detection and classification of vulnerabilities ... therefore an interdepartmental glue is needed that facilitates integration and automation to avoid conflicts and operational inefficiencies. This can result in the exploitation of application programming interfaces (API) that can be activated to transform the information of each individual platform / application into encrypted and secure information flows, which become enabling factors for interdepartmental operational flows.

The last theme is the tracking of the status quo, otherwise called observability.
This translates into the study of forms of aggregation of raw data into information that is easier to understand; it can take place with a dynamic representation - such as dashboards - or static - such as PDF reports or other formats - which help to keep track of progress, anomalies detected and inefficiencies in the process over time.

For example, aggregate vulnerabilities found in the last 30 days, from 30 to 60, from 60 to 90.
Then for each category map the existence of a patch to remedy, highlighting the availability of exploits for those vulnerabilities.
Finally, make this information dynamic, constantly updated in order to provide each interested party with an image of the status quo and the efficiency of the remediation process.

Remedying a vulnerable surface, even a very large and articulated one, is certainly not easy; however, the organization of a monitored and prioritized life cycle in a holistic and effective way makes it possible to confine the residual risk to an acceptable level, avoiding a dangerous transformation into an attack surface.

_References:

_{[1] https://www.iso.org/standard/72140.html - IEC 31010: 2019 Risk management - Risk assessment techniques}

_{[2] https://csrc.nist.gov/publications/detail/sp/800-37/rev-2/final - Risk Management Framework for Information Systems and Organizations: A System Life Cycle Approach for Security and Privacy}

_{[3] https://csrc.nist.gov/publications/detail/sp/800-30/rev-1/final - Guides for Conducting Risk Assessments}