Knowledge Base
1 CONTENTS
2 RISK MANAGEMENT BASICS
2.1 WHAT IS RISK MANAGEMENT?
2.1.1 DEFINITION OF RISK
According to the International Risk Management Principles and Guidelines standard (AS/NZS) ISO 31000:2009 (“ISO 31000”), risk is defined as “effect of uncertainty on objectives”. This definition has two important implications:
§ Risk is a neutral term (neither positive nor negative). It simply describes the potential for deviation from an expected outcome. Risks can therefore be subdivided into Threats & Opportunities to indicate whether their influence on an objective is positive or negative.
§ As risk is effect of uncertainty on objectives, risks cannot be expressed without first defining the objectives to be achieved. An objective can be financial, schedule, or health and safety related to name a few. Therefore, the first aspect of the Risk Management Process is “Establishing the Context” of the environment in which the risks are to apply.
A risk definition should identify both cause and effect, and at a minimum should be rated for probability and assessed against all applicable impacts. As described in the section on Risk Attributes, it is essential that risks are expressed to avoid ambiguity and misinterpretation.
2.1.2 DEFINITION OF PROJECT RISK MANAGEMENT
ISO 31000 defines Risk Management as “Coordinated activities to direct and control an organisation with regard to risk”. The PMI Practice Standard for Project Risk Management (PRM) and the UK Association for Project Management (APM) both define the Project form of risk management as conducting the processes (see Section 2.2) and the PMI Practice Standard goes on to explain the purpose of PRM as being to increase the outcomes of positive risks and decrease those of negative risks.
2.1.3 PRINCIPLES OF RISK MANAGEMENT
ISO 31000 incorporates eleven Principles which it asserts are required to achieve effective Risk Management. All are important, but it is worth stating the first three to indicate the importance of Risk Management:
a) Risk Management creates and protects value
b) Risk Management is an integral part of all organisational processes
c) Risk Management is part of decision making
2.1.4 FRAMEWORK OF RISK MANAGEMENT
ISO 31000 defines a framework for Risk Management to ensure that Principle b) above will be achieved by enabling Organisations to include the Risk Management framework in the organisation’s overall management framework. In the case of a Project, the Risk Management framework can be incorporated in the Project Execution Plan and supporting Procedures.
The elements of the Risk Management Framework are elaborated in ISO 31000 under the following five headings:
1. Mandate and Commitment
2. Design of framework for managing risk
3. Implementing risk management
4. Monitoring and review of the framework
5. Continual improvement of the framework
2.2 RISK MANAGEMENT PROCESS
2.2.1 PROCESS DIAGRAM
The risk management process as defined in ISO 31000 is presented below and each of the process steps described below the diagram:
2.2.2 ESTABLISHING THE CONTEXT
Establishing the context of a project is an important first step to any risk analysis. Without establishing the context in which the risks are to be framed, it is impossible to determine the significance of any given uncertain event. Establishing the Context consists of 5 main components:
Identifying Key Project Information
It is important to gather and specify key project information such as project name and description, key dates and budgets, as this provides crucial insight into the context of the risks that may follow. It is impossible to complete the sections that follow without at least a basic understanding of this information.
If available, the ‘Project Execution Plan’ (PEP, sometimes called the Project Management Plan) is usually a good source of this data.
It is also important to note the key stakeholders involved in the project, as this can also influence other aspects of the context settings, especially the Project Significance.
Rating the Project’s Significance
The significance of the project to the stakeholder conducting the risk management process is dependent on both the magnitude of the investment (in terms of time and money) and the expected returns of the project, relative to the monetary capital of the stakeholder or the significance of the project to the strategic goals of the stakeholder. What one company considers to be a small investment could be ‘make or break’ for another. Project significance can also extend beyond cost and schedule to reputation, environmental, or other types of significance.
Project significance guides the level of effort to be invested in risk management by the stakeholder, as risk management, like quality management, can devour as much of the stakeholder’s resources and capital as the stakeholder is prepared to invest. The higher the significance of the project, the more the stakeholder can justify investing in risk management.
Establishing Goals
The key reason for defining the project goals is that risks only apply to a project if they threaten or enhance the project goals. If the goals have not been defined, there would be doubt on whether a risk is relevant.
Is the aim to establish the project for:
• Maximum operating efficiency?
• Minimum cost?
• In minimum time?
• For minimum environmental impact?
A goal can be anything provided it applies to the project.
Identifying the Approach to Risk Management
For risk management, it is also important to identify and agree on the approach that is to be taken to risk management. This includes the frequency with which the Risk Identification, Risk Analysis, Risk Evaluation, & Risk Treatment cycle is to be performed or reviewed. Whether a qualitative and/or quantitative analysis approach is to be used at each major phase boundary of the project is also of importance as this establishes a plan and affects the budget for execution of risk management services.
Evaluating Risk Management Performance to Date
For mature projects organisations, a further consideration when reviewing risk management performance is to evaluate the risk management performance to date. This helps to identify the strengths and weaknesses of what has been done to date, and whether it has helped to identify and treat risks that might have otherwise affected the project’s objectives. If the risk management performance to date has been poor, alternative risk management approaches and strategies need to be examined to improve this.
2.2.3 RISK IDENTIFICATION
Risk identification is the first of three steps in the ISO 31000 Risk Management Process that comes under the heading of Risk Assessment.
What is risk identification?
Risk identification refers to the “process of finding, recognizing, and describing risks” (ISO 31000). For safety risks, this may refer to formal processes like hazard reports or scheduled inspections. However, for schedule and cost risks and uncertainties, processes such as workshops, interviews, or historical data sets are more commonly used. The relative benefits of these different approaches are discussed in the section that follows.
Risk Identification Methods
There are many ways to gather risk data, and some are more suited to some situations than others. Ultimately, there is no ‘right answer’ as to which is best, but risk managers / analysts should be aware of the alternatives available and choose the best combination for the project and the risk management context identified.
§ Historical Data – Where available, historical data is almost always the best resource to use as the input to an analysis, as it bypasses the potential influence of individual risk attitudes. If performing a quantitative analysis in a sophisticated analytical tool, actual historical data can be incorporated into models (along with trend information for future projections) using custom probability density distributions. This provides the most unbiased basis for identification of uncertainty trends, but is complicated and time consuming to prepare. An example of where use of historical data may be appropriate is in the modelling of a project where the price of fuel will be a determinant of project economic success.
§ Interviews – Conducting interviews to gather risk information involves identifying key personnel within a project team and spending time with them individually to assess their attitudes towards different sources of uncertainty in the project. After all participants have been interviewed, the results for each source of uncertainty are collated and averaged to arrive at a final position for inclusion in the risk database or model. Interviews that are conducted so that the results are anonymous are especially effective in reducing the effect of senior management pressures to ‘toe the company line’ which can sometimes cause people to provide overly optimistic responses to risk questions. However, interview processes usually take an extended time, and interviewees may have different frames of reference and biases when providing opinions on sources of project uncertainty that necessitate further time to reconcile conflicting opinions.
§ Workshops – Workshops are useful in that they provide a quick and straightforward means of arriving at consensus views on sources of uncertainty within a project. They have the added benefit of ensuring a common frame of reference for all persons involved when expressing attitudes towards the uncertainties discussed. However, workshops require careful and experienced facilitation to ensure that some voices and opinions do not become dominant and others are forced to “fall into line” or are not heard. Further, organizing all the necessary (usually quite senior) stakeholders to be available at the same time can prove difficult.
For information on the what data should be gathered as part of best practice processes for risk identification, please refer to the section on Risk Properties.
Interview / Workshop Techniques
The following are some general techniques for the identification of risks. Each has their own benefits and limitations:
§ Brainstorming – A simple technique whereby stakeholders are asked to identify risks to the project based on their own perceptions or experience. This type of exercise is unstructured and particularly useful for identifying risks that might fall outside of traditional risk breakdown structures. If combined with use of “Post-its” on a board, groups can build up clusters of similar risks to indicate perceived importance and risks on which to focus (see also Risk Breakdown Structures and Checklists below).
§ SWOT Analysis – SWOT, as it applies to risk analysis is a structured risk identification process whereby stakeholders are asked to identify the strengths & weaknesses (internal factors) and opportunities & threats (external factors) of a particular project or process.
SWOT analysis tends to follow the same general processes as brainstorming, but overlays the structure of the 4 quadrant approach to assist in the identification process. Unlike other techniques, SWOT analysis is particularly useful for helping in the process of identifying positive risks (opportunities).
See SWOT Analysis diagram below (source: https://en.wikipedia.org/wiki/File:SWOT_en.svg)
§ Assumptions Analysis – The principal behind assumptions analysis is that every assumption actually represents a potential risk to the project. By challenging every assumption we’ve made to ask “what if it’s wrong?” we can identify new sources of risk.
§ Risk Breakdown Structures and Checklists – An RBS is another example of a structured approach to brainstorming that can help in the risk identification process. By first identifying classes of uncertainty on a project (eg. Procurement uncertainty, weather uncertainty, etc.) we are provided with a prompting for the different potential sources of risk. Organisations typically develop an RBS for recurring types of projects against which risks are identified in order to ensure that the risk identification process always covers all the known potential sources of risk.
§ Specific Opportunity Identification Sessions – Experience has shown, notwithstanding the benefit of SWOT analysis, that opportunities are difficult for groups to identify when also identifying threats. The tendency is for threat identification to dominate. It can be especially beneficial, rather like value engineering, to run special Opportunity identification sessions, whether guided by an RBS or alternative systematic approach, to identify Opportunities and how to enhance them through treatments.
2.3 RISK ANALYSIS
As noted in ISO 31000, Risk Analysis involves development of an understanding of the risks. Through risk analysis causes and effects of risks are identified, along with the likelihood of their occurrence. It also provides input into determining whether treatments are required.
Risk analysis is the process of characterising the risks that have been identified using the processes outlined in Risk Identification. It is typically broken into two distinct stages or aspects: Qualitative Risk Analysis and Quantitative Risk Analysis. These stages are sequential because quantitative risk analysis cannot be undertaken without qualitative risk analysis preceding it.
These two aspects of risk analysis are considered in turn.
2.3.1 QUALITATIVE RISK ANALYSIS
What is Qualitative Risk Analysis?
Most of what ISO 31000 expresses about risk analysis is Qualitative Risk Analysis (QlRA).
QlRA involves considering the causes and consequences of risks and their likelihood of occurrence. The scale of each of the applicable types of consequences is considered, as is the level of likelihood. QlRA considers risks individually rather than the overall effect of the identified risks on the project.
ISO 31000 is careful to use qualitative terms for levels of risk likelihood and consequence and emphasises that the extent of uncertainty applicable to the determination of likelihood and consequence levels should be documented. It distinguishes between qualitative likelihood (expressed in levels) and quantitative probability (expressed in fractions of 1 or percentages). Likewise, it distinguishes between qualitative consequences (expressed in levels) and quantitative impacts (expressed in quantifiable units such as time or cost).
The UK Association for Project Management (APM), in its publication Project Risk Analysis and Management (PRAM) Guide (2nd Edition 2004), defines qualitative (risk) assessment as “an assessment of risk relating to the qualities and subjective elements of the risk – those that cannot be quantified accurately. Qualitative techniques include the definition of risk, the recording of risk details and relationships and the categorisation and prioritisation of risks relative to each other” (the last aspect referring to the risk evaluation step of risk assessment).
PMI does not distinguish between qualitative and quantitative terminology in its Practice Standard for Risk Management. It equates likelihood with probability and consequence with impact and defines the performance of Qualitative Risk Analysis as “The process of prioritising risks for further analysis or action by assessing and combining their probability of occurrence and impact.”
Our recommendation is to accept that QlRA is the subjective assignment of levels of likelihood and consequences (of various categories as applicable) to risk events and, where possible, to express the lower and upper thresholds of those levels semi-quantitatively, as probability percentages and impact values (cost, time relative to the total project budget and duration), during the initial step of the risk management process: Setting the Context.
In qualitative risk analysis, a Probability / Impact (PI) Matrix is usually used to represent the severity of a risk, using the assumption that risk severity or magnitude is the combination of likelihood and consequence. In semi-quantitative terms, Risk Exposure = Probability x Impact.
Risks are assessed for probability along the vertical axis, and impact is assessed along the horizontal axis. However, the impact units and thresholds are different for different category consequences. This is what enables risks of differing consequence categories to be combined in the one PI matrix and ranked in the one qualitative risk analysis register.
An example Probability and Exposure Level Guide to the PI Matrix that follows is shown below, for both Threats and Opportunities. The Exposure Level colour matches the level number in the scheme illustrated.
This approach is shown in the following PI Matrix for a Financial Impact matrix for a project with a value of $1.5 billion (100% impact). Each cell in the matrix is numbered according to the level of risk exposure. Organisations typically have different processes for handling risks according to the exposure level. Level 4 threats may be required to be referred to the Chief Executive or the Board Risk Committee, while level 3 threats may have to be dealt with by the Project Manager and level 2 threats and below may be managed by the project Risk Manager. Other numbering systems may be used, such as where each matrix cell has a unique number but the numbers within an exposure level are greater than the numbers in the level below and lower than the lowest number in the level above.
Semi-quantitative risk analysis extends this concept to apply numerical thresholds to the matrix cell edges. In the example above, a minor impact financial risk has been defined as being one with a value greater than $3.75 million but less than $37.5 million. These numbers define the vertical edges of the impact levels moving from left to right along the horizontal impact axis.
A similar process defines the horizontal edges of the five levels of the vertical probability axis. So for the Probability thresholds defined in the Probability and Exposure Level Guide above the PI matrix, the four boundary thresholds between Rare, Unlikely, Possible, Likely and Almost Certain are 2%, 10%, 50% and 80% respectively
The semi-quantitative matrix allows for finer delineation between risk exposures, as risks can be placed either low or high within each square. A qualitative matrix would place them at the mid-point of each cell.
When to use Qualitative Risk Analysis
Qualitative Risk Analysis is the entry step for risk analysis. It must be performed before quantitative risk analysis can be used. In addition it is the only way by which risks of all kinds of impact categories can be integrated into the one register. So risks describing Environmental, Health and Safety, Operational, Business and Reputational Impacts can all be included in a single Project Risk Register even though they do not have a commonly quantifiable metric for impact.
Benefits and Limitations of Qualitative Risk Analysis
As noted above Qualitative Risk Assessment enables the comparative rating of environmental, reputational, health and safety, and other qualitative impacts that cannot readily be reduced to a single unifying metric such as a financial or durational impact. Taking safety risk as an example, a risk could be rated for impact on a scale ranging from “First Aid Injury” through to “Multiple Fatalities”.
Furthermore, where risks cover difficult or intractable problems for which no obvious treatments are apparent, qualitative risk analysis offers the best means of continued management and development of resolution.
However, qualitative approaches to risk analysis are unable to provide an overall measure of how risky a project is. For this Quantitative Risk Analysis is required. In addition, qualitative risk analyses start to show their limitations when a greater level of definition is required to inform decision making. Qualitative systems become cumbersome to work with when increasing the number of likelihood and consequence levels, and may still fall short of truly identifying the relative exposures of different risks in the register.
Additionally, qualitative systems are hampered by linguistic barriers associated with the individual’s interpretation of the qualitative terms. This is because the meaning inferred through usage of terms changes both between individuals and cultures. Some methodologies take a semi-quantitative approach to defining qualitative risk metrics to deal with this difficulty. This is achieved by defining quantitative thresholds associated with each qualitative label, as described above and distributing these at the start of the risk identification / rating process.
2.3.2 QUANTITATIVE RISK ANALYSIS
What is Quantitative Risk Analysis?
Quantitative Risk Analysis (QnRA) of a project models project uncertainty and risk events to produce numerical outputs expressing the riskiness of the project overall at differing probability levels. Quantitative risk analysis can reveal more about the potential impact of risks on a project than traditional qualitative risk registers that typically only rank risks via a ‘heat map’. One of the most used QnRA techniques is Monte Carlo Simulation (MCS) modelling which can produce rankings of contributors to uncertainty (see 3 Schedule Risk Analysis using a CPM-based MCS tool).
When to use Quantitative Risk Analysis
Quantitative risk should be used in any situation where a project is to be undertaken that is either too complex to assess by traditional qualitative means, or where the project is of significant importance to the organization undertaking it. QnRA can be used on any project involving quantifiable measures however and often provide a great deal more value to a project through examination and reporting of risk and uncertainty drivers and behaviours.
Benefits and Limitations of Quantitative Risk Analysis
One of the primary distinctions between qualitative and quantitative assessments is that quantitative assessments can allow for evaluation of range uncertainties as well as distinct risk events. This means that a quantitative model can deal with estimating inaccuracies as well as events that may or may not occur. This results in a more comprehensive model of uncertainty on a project.
Unlike qualitative techniques, quantitative analysis also allows you to model the consequence of risks as they apply to a model of a project or scenario. This is important in schedule risk analysis as a schedule risk may have a high impact rating, but it may have no impact on the overall model completion if it applies to an area of the schedule with high float. Similarly, risks in quantitative models can be assessed concurrently. This means that if two risks fire together, it is possible to model that one risk supersedes the other and the impact of the second is nullified.
Quantitative risk assessment is sometimes criticised for its complexity. Quantitative risk assessments can be rendered of little use or worse, downright misleading by being facilitated by persons with little or no experience in performing them. Quantitative risk models are complex to get right and should be handled by specialists who work with them day in and day out and thus develop the necessary depth of knowledge to produce realistic and reliable results.
2.3.3 COLLECTING RISK ANALYSIS INPUTS
Stakeholder Involvement – Getting a Balance of Opinions
Wherever it is necessary to base risk analysis inputs on stakeholder opinions, it is crucial that a broad cross-section of project representatives be involved in order to obtain a balanced perspective of project uncertainty. Ideally, stakeholder involvement in determining project uncertainties should involve:
§ Balanced representation of all parties where potential for conflict of interest exists - In some circumstances, multiple parties may be involved with conflicting vested interests in seeing a more optimistic or pessimistic result from a risk analysis. In such circumstances, it is important that representation is given to all parties involved so as to minimise bias.
§ Representatives of all facets of project delivery – In most complex projects, responsibility for different facets of project delivery is delegated to a range of personnel according to their individual specialisations. In such circumstances, it is important that one or more representatives from each specialisation be included in the risk ranging and identification processes in order to gather the most accurate and informed cross-section of information possible.
§ Both junior and senior project personnel – Junior personnel add value as they’re usually more actively involved in performing the work and may have a more ‘hands on’ feel for current areas of potential uncertainty. Senior personnel bring a wealth of experience and are very useful for identifying areas of concern from past projects. Additionally, senior personnel are usually privy to additional information that may not be available to the more junior representatives that might be pertinent to the eventual risk outcome.
Impartiality & Answerability
Ultimately, in order to obtain a truly accurate outcome from a risk analysis, the process should be as impartial as possible. This is difficult to achieve for most persons working directly within a project team because they’ll usually be answerable to a senior member of the project team with a vested interest in obtaining a result that fits with expectations in order to get the project “over the line”.
Specialist independent risk consultants are not bound by the same constraints and can often bring additional skills to the table that will likely result in a quicker and more accurate analysis than those conducted “in-house”. However, even when using independent consultants, it is desirable that they be engaged and be answerable to the organisation rather than the project team in order to remain truly independent.:
2.3.4 RISK EVALUATION
According to ISO 31000, the process of risk evaluation involves assessing each risk against the objectives of the project and external criteria to see whether the risk and/or its magnitude (exposure) are acceptable or tolerable to the project. As stated earlier, a risk is only a risk insofar as it directly impacts on objectives. External criteria usually refer to compliance with safety, environmental or other statutory requirements or legislation. Risk evaluation assists in determining whether risk treatments are required in addition to existing controls, to bring the risk within an acceptable exposure for the project organisation.
Risk Appetite
Except where determined by safety, legislative or financial insurance requirements, there is typically no right answer as to what constitutes an acceptable risk. Evaluation of risks and the decision as to what constitutes an acceptable response plan ultimately depends on an organization’s ‘appetite’ or ‘tolerance’ for risk, the nature of the risk, and the organisation’s ability to influence factors contributing to or stemming from the risk. Also relevant is the risk management context of the project (how important the project is to the organisation and the resources available to manage the risk).
In the specialised field of technical and safety critical risk management, there are criteria for deciding acceptable levels of risk and by implication, treatments, expressed in the terms “As Low As Reasonably Practicable” (ALARP) and “So Far As Is Reasonably Practicable” (SFAIRP). These terms define the limits to which organisations with a Duty of Care (eg, Transport Authorities) are required to go to protect human life etc. The terms are used to distinguish the required efforts from whatever is possible, which may be grossly disproportionate to the increased level of protection resulting. This kind of Risk Management is outside the scope of this Knowledge Base. Further information on these areas may be obtained by inputting ALARP or SFAIRP into a search engine.
The Decision Authority Matrix
In many organizations, a decision authority matrix may be in place to formalise the sign-off procedures for the evaluation and subsequent management of risks. A decision authority matrix itemises the thresholds of risk consequence (probability * impact) at which risks must be reported and / or their treatment strategy approved.
2.3.5 RISK TREATMENT
After evaluating risks, the next step in the process is risk treatment. Risk treatment refers to risk action plans relating to the general strategies of elimination, allocation of ownership & modification of exposure. The first step in risk treatment is to assess what responses are most appropriate to deal with the risk.
Each of these strategies is discussed briefly below (more details are provided in Section 2.4.2 Risk Treatments / Mitigations), using the following Strategy diagram for Opportunities and Threats.
Risk Elimination (Avoid or Exploit)
Risk elimination refers to the removal of uncertainty from a risk. The probability of occurrence is converted to either 0% (for threats) or 100% (for opportunities).
Re-allocation of Risk Ownership (Transfer or Share)
Allocation of risk ownership refers to the process of enacting contracting strategies or similar to modify exposure to a risk. This approach accepts that we will not be modifying the actual characteristics of the risk (probability or impact), but that it is possible to modify our exposure to it by sharing exposure with a third party. It is worth noting that it is rare to be able to transfer a threat entirely to another party, whether contractually or by insurance. It is usually more realistic to define the process as sharing.
Modification of Risk Exposure (Mitigate or Enhance)
For some risks, we may be able to modify the potential impact or probability of the risk occurring to mitigate or enhance its consequence. In the case of threats, an effective risk treatment would reduce the probability of the risk occurring or its impact should it occur. Conversely, in the case of an opportunity, effective risk treatments would increase the probability that the organisation could capitalise on the opportunity and/or its beneficial impacts should it occur.
Pre & Post-treatment Risk Assessments
To assess the efficacy of risk treatments, it is important to compare the risk exposure rating both pre and post treatment. This is often referred to as pre- and post-mitigation, even though several other treatment types are possible, as noted above.
The post-treatment risk rating is referred to as the ‘residual risk’. Maintaining an understanding of the pre-treatment risk assessment rating is important as it helps to understand what the exposure if risk treatment plans are not implemented or they fail to control the risk adequately.
Execution of Treatment Plans
This is a key part of Risk Management that is often under-emphasised. Effective implementation of risk treatments is crucial and may involve creation of mini-projects. Treatments may be pro-active, requiring deterministic expenditure of effort and money, or may be contingent, involving detailed planning for actions to follow immediately the occurrence of the risk is detected. Without implementation of effective treatments, Risk Management may achieve little.
2.3.6 COMMUNICATION & CONSULTATION
Communication and Consultation is an integral part of the risk management process. It informs every stage of the risk management process and should involve both internal and external stakeholders. Because it is about future uncertain events, Risk is based on opinion which in turn is based on perception. Perception can be informed by values, needs, assumptions, concepts and concerns. All of the aforementioned factors will likely vary from stakeholder to stakeholder, so getting a balance of stakeholder perspectives is essential.
As identified in ISO 31000, a consultative approach to risk management may:
Help establish the context appropriately;
Ensure that the interests of stakeholders are understood and considered;
Help ensure that risks are adequately identified;
Bring different areas of expertise together for analysing risks;
Ensure that different views are appropriately considered when defining risk criteria and in evaluating risks;
Secure endorsement and support for a treatment plan; and
Enhance appropriate change management during the risk management process.
2.3.7 MONITORING & REVIEW
Effective risk management is not a ‘tick-the-box’ exercise. It is not something that can be done up front then parked in a corner somewhere. To truly add value to a project, risk management must be regularly monitored and reviewed to ensure that the risk monitoring and assessment are up to date and risk treatments are being implemented as agreed in a timely way.
Essentially, the risk management system must be proactive rather than reactive in dealing with risk. It is only by regularly re-examining the known and potential sources of risk and their potential consequences that informed decisions can be made to help reduce exposure to threats and capitalise on opportunities.
ISO 31000 identifies the benefits of effective monitoring and review processes as:
Ensuring that controls (and treatments) are effective and efficient in both design and operation;
Obtaining further information to improve risk assessment;
Analysing and learning lessons from events (including near-misses), changes, trends, successes and failures;
Detecting changes in the external and internal context, including changes to risk criteria and the risk itself which can require revision of risk treatments and priorities; and
Identifying emerging risks.
2.4 RISK ATTRIBUTES
Risks have many different attributes and components, all of which need to be considered and addressed if we are to characterise a risk appropriately.
2.4.1 RISK PROPERTIES
Risk Type
People often use the term ‘risk’ as a purely negative word, but as explained earlier, it’s really defined as “effect of uncertainty on objectives”. This means that it encapsulates both positive and negative potential effects. As such, two different terms are used to qualify the term risk to differentiate between positive and negative uncertainties:
Opportunities are those risks with potential to impact positively on a project’s objectives.
Threats are those risks with potential to impact negatively on a project’s objectives.
For effective risk management, it is important that similar effort be devoted to the identification and management of opportunities and threats to maximise value to the project.
Nomenclature & Meta-language
After risk type, the definition of a risk is critically important; it sets the context for the rest of the attributes that follow. When people are browsing through a risk register, they will typically be looking at the risk name and its positioning within the register. Therefore, it’s important that the risk name concisely defines the full nature of the risk including its cause and effects.
One way of ensuring that this happens is to use risk meta-language: a structured risk naming technique that clearly separates the cause-risk-effect aspects of a potential threat or opportunity to succinctly express the full nature of the risk. Risk meta-language usually follows a structure similar to “Due to <cause>, there is a risk that <risk event> may happen, resulting in <effects>.”
If a risk can’t be expressed in this format, it’s likely that it lacks definition or perhaps isn’t even really a risk!
Managing Risks Qualitatively & Semi-Quantitatively: Bow Tie Diagrams
At this point, the different emphases of Qualitative and Quantitative Risk Analysis become relevant.
For Qualitative Risk Analysis, risks are often grouped into Areas of Risk, whereby a common Risk Event may be triggered by a range of causes and may cause another range of consequences. Treatments and controls are then listed, divided into Proactive or Causal, affecting the causes on the one hand and Reactive, Consequential or Adaptive, affecting or dealing with the Consequences on the other hand.
For Qualitative Risk Analysis it makes sense to construct so-called “Bow Tie Diagrams” with the various causes directed at the single Risk Event and the various Consequences flowing from the Risk Event. The Bow Tie diagram can show the Causal Controls in between the Causes and the Risk Event and the Consequential Controls between the Risk Event and the various Consequences.
It is possible to develop Semi-Quantitative Risk Analysis Bow Tie Diagrams, combining features of Fault Trees and Event Trees.
The following Bow Tie Diagram was included in a paper “Combining EA techniques with Bow-Tie Diagrams to enhance European Port Security” by Nikolaos Papas of BMT Hi-Q Sigma Ltd., Basingstoke, UK (Nikolaos.papas@hiqsigma.com).
2.5 MANAGING RISKS QUANTITATIVELY
For Quantitative Risk Analysis using the Monte Carlo Method, each cause of the risk event is treated as a separate risk definition, with the consequences for that causal risk included in the risk definition, whether singular or multiple. This enables applicable treatments for that causal risk event to be associated with it and for Pre- and Post-treatment Risk Assessments to be determined for the applicable risks. It also enables analysis and comparison of combinations of proposed treatments for a given risk to select the best cost/benefit combination of treatments.
Risk Registers for Quantitative Risk Analysis are likely to include higher numbers of separately defined and more precise risks than Risk Registers for Qualitative Risk Analysis. But the Qualitative Risk Register based on Bow Tie Risks is more suitable for managing groups of related risks and treatments.
2.5.1 RISK QUANTIFICATION
Various qualitative risk parameters have corresponding quantitative risk equivalents. There are also additional parameters unique to Quantitative Risk Analysis (QnRA).
Likelihood / Probability
Each threat and opportunity within a register must be assessed for Likelihood. A risk’s likelihood is an expression of the chance that the risk will occur. Likelihood can be expressed qualitatively using terms for levels such as “Rare”, “Unlikely”, “Possible”, “Likely”, and “Almost Certain”, or quantitatively using a percentage scale from >0% to <100%, when it is referred to as Probability.
A risk is only a risk insofar as it has potential to produce an impact on a project’s objectives. However, a risk with a probability of 0% could be considered to not be a risk at all as it will never occur. A risk with a probability of 100% is defined in projects as an Issue and should appear in an Issues Register instead of the Risk Register if it is a Threat and has a negative impact on one or more Project Objectives such that the Objective(s) cannot be achieved.
Consequence / Impact
ISO 31000 defines Consequence as the outcome of an event affecting objectives. In the context of project risk management, the event may be taken as a risk event. The consequences may be certain or uncertain and may be expressible qualitatively or quantitatively. When expressed quantitatively, it is normally defined as an Impact.
Risk consequence is an expression of the effect of the risk should it occur. A risk can have multiple categories of consequence such as; Safety, Cost, Environmental, Schedule, & Reputational.
Consequences can be expressed qualitatively using terms for levels such as “Insignificant”, “Minor”, “Moderate”, “Major”, & “Catastrophic”. Alternately, where appropriate, consequences can be expressed quantitatively using increments of an appropriate unit, in which case they are referred to as impacts.
Each quantifiable impact type may be assigned an impact distribution to define the range of uncertainty in understanding of the risk’s effect. This is useful because often risks may not be definable with precisely quantifiable outcomes. Take for example the risk of a flood. Historical weather data may show that a flood occurs, say, every 5 years in a particular region, so we may be able to define the probability in any one year as 20%. What is not definable is the extent of the flooding should it occur. It could range from heavy rain causing minor localised flooding, through to moderate or even major flooding throughout a region. In these situations an impact distribution range may be required to define the time delay for project activities affected by the flood and/or the costs of recovery from damage caused by the flood.
An impact distribution range can be characterised by a minimum, most likely, and maximum impact value for each quantifiable impact type. Such ranges are known as impact probability distributions.
Risk Magnitude / Exposure
A risk’s magnitude or exposure refers to the combined effect of its probability and impact assessments. A low probability risk with a low impact assessment may be considered to affect the project objectives negligibly or to have a low risk exposure, whereas a high probability risk with a high impact assessment would be considered to affect project objectives with high or even extreme risk exposure.
A risk may be able to be expressed with a range of probabilities and impacts representing in some cases a continuum of risk exposures from Low Impact/High Probability through Moderate Impact/Moderate Probability to High Impact/Low Probability. An example may be the risk of an adverse weather event. These may range from regular occurrences of high rainfall in a 24 hour period causing cessation of work on parts of the affected project, through unusually high rainfall causing minor flooding through a major cyclonic event directly striking the project site and causing substantial damage taking weeks to reinstate. As the impact severity increases, the frequency or probability in any given period decreases.
In some Risk Registers one, two or all three of these descriptions may be included as separately identifiable and treatable risks, each with a separate probability and impact range (as described under Consequence / Impact) and possible set of treatments.
For quantitative risk analysis, there are typically three different types of exposure rating associated with any given risk:
Pre-treatment exposure refers to the combined effect of the original probability and impact assessment. This is the magnitude of the risk if nothing is done about it.
Post-treatment exposure refers to the combined effect of the probability and impact assessments of the risk after applying all accepted or implemented treatments. This is the magnitude of the risk if all accepted treatments are successfully implemented.
Target exposure refers to the expected probability and impact of the risk after implementation of all accepted treatments. There should be some auditable basis for expecting that the Target Exposure is achievable if all accepted treatments are implemented, preferably based on realistic assessments of the effects on probability and impact of the risk by each accepted treatment. If this cannot be demonstrated, the validity of the Target Exposure may be open to question.
It is not uncommon in qualitative risk analysis for Target Exposure ratings to have no audit trail to prove their validity.
Risk Status (Inactive, Active, Past)
To manage the risks in the register effectively, a helpful feature is that of the risk status, to focus attention where it is most needed. Typically, risks are assigned one of three statuses:
Inactive risks are those that have been identified but not accepted as actively applicable in the risk register. Inactive risks may be awaiting further information to better define them before being accepted or may have been rejected, either as being invalid or as of negligible or too low magnitude to warrant being made active. Inactive risks need to be regularly reviewed and converted to ‘active’ status if justifiable or discarded from further consideration and transferred to a discarded risk register.
Active risks are those that are currently recognised as open threats to or opportunities for the project. Active risks need to be regularly monitored and fully assessed by the project team to ensure that they are treated as necessary. Active risks need to be regularly reviewed, the status of agreed treatments reported against planned implementation commitments and dates and action taken to ensure revised dates are agreed where planned dates have not been met.
Past risks are those that have been assessed to no longer pose a threat or opportunity to the project’s objectives. This usually occurs through change of project phase or through the passage of time. Risks should only be assessed as ‘past’ when they genuinely can no longer have an impact on a project’s objectives. Where risks are no longer applicable, they should be converted to ‘past’ status as appropriate and recommendations made regarding any time or cost contingency allocated against them.
Risk Owner
Each risk should be assigned a risk owner. The risk owner is the person accountable for all necessary steps required to manage the risk including its treatments. That person may be responsible for the day to day monitoring and management of the risk or may delegate that responsibility to someone else, in which case that responsible person reports regularly to the Risk Owner. The risk owner should be someone with a full technical understanding of the risk and its implications. Additionally, the risk should only be assigned to an owner who has the authority to ensure that all necessary steps required to manage the risk are enacted. Assigning risk responsibility without authority ultimately results in the inability to manage the risk effectively.
2.5.2 RISK TREATMENTS / MITIGATIONS
Treatment Strategy
For a brief introduction to this section refer to section 2.3.5 Risk Treatment.
A risk’s treatment strategy refers to the overall combination of treatments that best capitalises on the opportunity or minimise its threat. Each risk should be assigned a combination of treatments that best suits both the risk itself and an organisation’s ability to influence the factors contributing to and the outcomes associated with the risk. Broadly speaking, there are eight different types of treatments for dealing with risks: four for threats, and four for opportunities.
Threats:
Avoid – Avoidance refers to the general strategy of eliminating the uncertainty associated with a threat by preventing it from occurring. Naturally, only some threats can be avoided, and this usually involves a change in strategy or similar to eliminate the possibility of the risk occurring.
Transfer / Share – Through insurances, effective contracting strategy, or other similar means, it may be possible to transfer some or (rarely) all of the risk associated with a particular threat to a third party. This is known as threat transfer. Threat transfer is one of the most commonly practiced forms of risk treatment, but it usually indirectly or directly involves the payment of fees to the party assuming responsibility for the risk in order to compensate them for the additional threat exposure. For Project Owners, it is rare to be able to transfer a risk entirely; it is more realistic to consider this to be risk sharing.
Reduce – In some instances, it is possible to reduce the overall threat exposure by either reducing (but not eliminating) the probability of occurrence and/or impact level. Reduction strategies are only effective in instances where an organization has the ability to directly affect the factors contributing to or outcomes associated with the risk. This form of treatment is usually known as risk mitigation and is what many think of when considering risk treatments.
Accept – Organizations may choose to simply accept that a threat may or may not occur on a project and choose to do nothing about it. This is known as accepting a threat. There are many reasons for accepting a threat, but some common explanations are:
The threat is of little consequence to the project’s objectives;
No other mitigation strategy is possible; or
The cost of mitigating the threat does not provide a good pay-off in risk exposure reduction.
An extreme example may be to accept the threat of the project site being struck by a meteorite.
Opportunities:
Exploit – The polar opposite of avoidance strategies, opportunity exploitation refers to the process of ensuring that an opportunity eventuates (converting its probability to 100% certain).
Share – An organization may choose to share an opportunity to create mutual benefit between itself and another stakeholder involved in a project to increase the risk exposure.
Enhance – As opposed to reduction strategies for threats, enhancement strategies for opportunities seek to maximise the probability of an opportunity occurring or to maximise the positive impact on the project’s objectives should it occur.
Ignore – If an opportunity is of little consequence, beyond influence, or too difficult / costly to treat in any other way to increase its likelihood or consequence, an organization may choose to simply ignore it. In doing so, they accept that the opportunity may or may not arise and their potential benefit from it will be limited to the inherent characteristics of the opportunity as originally identified.
Nomenclature & Description
Having decided on a general strategy for risk treatment, it is important to clearly identify how this is to be achieved. The first step in doing this is to clearly name and describe all of the various strategies that could be employed to modify the risk exposure. Similar to the strategies described for naming risks, we can also use a meta-language approach for naming treatments. An example of this might be; “By <performing action>, <describe outcome>, resulting in <effect on probability and/or impact of risk>”.
Responsibility
Similar to the assignment of an owner to each accepted risk, each accepted risk treatment must be assigned an Owner. This treatment Owner is accountable to the Risk Owner for the management and successful implementation of the treatment, must know how each step of the agreed treatment is to be implemented and have the authority to make decisions about their implementation. The Treatment Owner may delegate responsibility for day-to-day actions in implementing the treatment but remains accountable for the successful implementation of the treatment.
There may be multiple treatments implemented for individual risks and a different Treatment Owner for each treatment. The Treatment Owner and the Risk Owner may also be the same person.
The implementation of the risk treatment is an essential step in the risk management process, as without assignment of treatment accountability and subsequent implementation, treatment identification is a theoretical exercise only.
Treatment Status
Treatment status refers to a progressive system of classifications for managing risk treatments through their lifecycle. There are five different statuses for risk treatments:
Potential treatments are those that have been identified but not approved for further action.
Accepted treatments are those that have been approved for further action including analysis of their effects on the risk, but the plan for implementation has not been started..
Started treatments are those that have begun to be implemented, but are not yet completed.
Applied treatments are those that have been fully implemented.
Rejected treatments have been considered but concluded to be not worth implementing, usually because the value of the benefit derived (in reduced schedule and/or cost impact) does not exceed the cost.
Determining whether treatments should be accepted or rejected is a primary purpose of QnRA.
It is important to ensure that risk treatment statuses are accurately maintained and reported to ensure that all necessary actions are taken for the treatments to be effective in a timely way.
Until agreed treatments are fully implemented, risk management is not effectively applied to that risk.
3 SRA USING A CPM-BASED MCS TOOL
3.1 BASIC CONCEPTS
3.1.1 WHAT IS SCHEDULE RISK ANALYSIS?
Schedule risk analysis (SRA) is a quantitative risk analysis technique to forecast project schedule outcomes, assess what drives them and to estimate schedule contingency. All schedules are plans for future events and thus inherently involve uncertainty. Using the Monte Carlo Method, SRA replaces single values with probability distributions for task durations to quantify plan uncertainty to assess whether the current allocations of time to complete a program of work are sufficient. In addition, risks describing events that may or may not happen are inserted in the SRA model to determine the probabilistic effect of significant risks on the project model.
Schedule contingency is then measured in terms of percentile (or ‘P’) values that indicate the percentage confidence that works will be completed on or before a given date.
The latest CPM-based MCS tools are able to quantify and rank the primary schedule risk drivers by systematically excluding each schedule risk and its associated activities from the MCS model and, by difference, quantify the effect this risk has upon the probabilistic duration / completion date. This technique, identified here as Quantitative Exclusion Analysis (QEA) provides a reliable way of determining the main drivers of schedule uncertainty.
Correlation is also measured between activities in the schedule model and the project durations of interest to identify the main drivers of schedule risk, expressed as “sensitivities”.
The schedule risk analysis process has secondary benefits including ‘stress testing’ of the plan, and exposure of those areas of the program that may be sensitive to change.
3.1.2 WHY PERFORM SCHEDULE RISK ANALYSIS?
Planning is all about predicting the future, so nothing is absolutely certain, yet ordinary (“deterministic”) scheduling software makes no allowance for this, allowing only a single ‘best guess’ duration for any activity. Such planning tools are described as deterministic because they determine single values for schedule dates.
Further, scheduling tools like Primavera P6 have no facility for modelling risk events - events that may or may not occur but that would affect the timing of the project schedule if they did. The planner has to choose whether to include a risk event or exclude it from the schedule.
Normal or ‘deterministic’ planning software is useful for calculating a single end date for a project but cannot determine how likely that date is to be achieved. Such planning software is often misused to make a schedule fit a preconceived date. Unless the plan is assessed by people knowledgeable about the time required to carry out similar projects (i.e., is able to benchmark the schedule), or who are familiar with the detailed execution strategy and durations of key tasks in the project, the flawed basis of the schedule may not be apparent until the investment decision has been made and execution is well under way.
This is the case where the project plan has been constructed as a ‘backward-pass’ schedule, made to try to justify a desired end date by squeezing activity durations and logic. Such an approach ignoring the need for realistic contingency allowances and excluding the possibility of risk events occurring is not only wishful thinking, but almost certainly doomed to fail.
In the past and even presently, project proponents have relied on the addition of ‘rule-of-thumb’ contingency (eg +/- 15%) to set more realistic targets, allowing for assumed levels of uncertainties of completing the project “averaging out” to the selected contingency percentage. However, this method is crude at best, and provides no auditable justification for the calculated completion date incorporating such contingency. The allowance may be overly generous or completely inadequate.
SRA is capable of addressing these issues. By assigning duration risk factors to groups of activities or ranging groups or individual elements within a schedule according to their duration risk or uncertainty, risk analysts are able to identify uncertainties due to underlying risk. Additionally, incorporating risk events enables modelling of those potential events that may or may not occur but would change the project timing if they did.
The SRA schedule model, incorporating realistic risk factors / duration uncertainties and significant risk events, is simulated many hundreds or thousands of times, randomly choosing duration values from within each defined activity duration range for each iteration and including each risk event according to the percentage probability of occurrence, ensuring that over the many iterations, the frequencies of selection of the random duration values match the duration probability distributions defined for each task and risk event task. The entire critical path calculation outcomes are recorded for each ‘iteration’ (forward pass latest early dates and backward path earliest late dates, together with Free and Total Float, for each task). Statistical analysis of these results then provides an auditable and mathematically justifiable basis for allocating schedule contingency. All inputs to the schedule model are available for examination and may be adjusted if examination of the results indicates that changes should be made.
As indicated in the introduction, there are other benefits of conducting SRA, such as the ability to identify and quantify the major drivers of uncertainty within the project. Often it’s not the original critical path that is found to be the main driving path through the project, due to higher uncertainty existing in a deterministically non-critical pathway. Identifying this early can help the project management team introduce cost-effective measures that may save days, weeks or even months of project duration!
3.1.3 WHEN TO PERFORM SCHEDULE RISK ANALYSIS
SRA has been proven to be useful at all phases of a project’s lifecycle, as discussed below.
Schedule Risk Analysis Pre-Execution
SRA during tendering or prior to Financial Investment Decision potentially gives the greatest benefit, as it enables the best understanding and therefore opportunities for control of schedule uncertainties before commitment to project execution. Maximising control of the project schedule usually minimises the potential for cost and schedule over-runs in the project. SRA can therefore be a major determinant in assessing and managing the potential for profit or loss on a project effectively.
Performing SRA at these early stages also maximises ability to plan for things that may or may not occur (risk events) as well as the potential for alternate execution strategies (probabilistic branching). This is a clear advantage over normal ‘deterministic’ plans that are limited to only known scope and one execution pathway. Modelling these uncertainties enables assessment of the relative benefits of different strategies, including any additional uncertainties that one strategy may introduce compared to others.
SRA ‘stress tests’ the project schedule, helping ensure that its construction is robust before using it as a major decision making or control tool on the project. Additionally, the analysis identifies and ranks the probabilistic critical paths through the project, highlighting those areas in which effort could best be expended to ensure timely project delivery.
Outputs of the analysis enable setting of realistic targets for completion of the project and its intermediate milestones, allocating contingency levels as appropriate at the project and organisational levels.
Schedule Risk Analysis in Execution
SRA in execution serves a different, but still important role. Unlike pre-execution, the schedule targets are already set, but performance must still be tracked against these objectives.
Conducting SRA during execution helps identify and highlight emergent trends and risks on a project. Through actual occurrence of risk events or even because of risk treatments implemented as a result of early analyses, activities identified as critical in the early phases of the project may have shifted. Some critical pathways may have disappeared and others may have emerged. The project team may be too focused on managing previously identified critical paths to notice the emergence of other project logic that may overtake the original pathways as primary threats to project schedule objectives. Early identification is crucial to maximising ability to deal with project uncertainties to capitalise on opportunities and respond to threats.
Continued schedule risk analysis through execution serves to provide realistic assessments of schedule performance against objectives. All too often in projects, the ‘magic dates’ syndrome plays out, where the early activities slip, but the end milestone dates are required to remain fixed. SRA shines a spotlight on this because of the ‘Merge Bias’ effect whereby increasingly overlapped converging parallel paths of activities have decreasingly small chances of being achieved on time due to the probability of the milestone (logic node for the parallel paths) being completed on time is the product of the probabilities of each of the paths being completed on time.
Another useful function of SRA during execution is to re-assess where in the logic network schedule contingency is required, when more may be required and when contingency may be released due to expiration of exposure to uncertainty and applied risk events.
Timing versus Ability to Intervene
However, there is a direct relationship between the timing of the identification of project uncertainties and the ability to act on these uncertainties to maximise the positive effect on risk outcomes. As the diagram below demonstrates, the earlier project uncertainties are identified, the more time will be remaining to plan and manage ways of reducing threats and increasing opportunities, effectively reducing our risk exposure. Conversely, the later the implementation of such measures, the more expensive they become, as shown in the following indicative graph.
Ability to Change Risk Exposure vs. Cost of Changing Risk Exposure over project time
3.1.4 COMMON SOURCES OF SCHEDULE UNCERTAINTY
There are many factors that contribute to schedule uncertainty in a project environment. What these are will depend on a myriad of influences including the type of project, the project setting, and those who are involved in its execution. However, these factors can ultimately be narrowed down to a few types or classes of schedule risk, as described below:
Duration Uncertainty / Risk Factors: This is a broad classification that refers to any one or more of a multitude of risk factors which can influence the working time taken to complete a package of work. Examples of duration risk factors include:
Quantity uncertainty: Uncertainty over how many units of work are required to be completed.
Rate / productivity uncertainty: Uncertainty over how many hours are required per unit of quantity.
Staffing / Resource uncertainty: Uncertainty over how many workers will be available to complete the specified works.
Risk Events: These are events that may or may not occur, but if they do, will impact on one or more aspects of project timing. Examples of different types of risk events are:
Engineering / Design Risks: eg. “There is a risk that re-design may be required due to changes in client specifications, resulting in delays to issuance of purchase orders.”
Procurement / Fabrication / Supply Risks: eg. “There is a risk that a transport barge may be lost at sea due to extreme conditions, resulting in loss of modules.”
Construction Risks: eg. “There is a risk of industrial relations disputes at site due to pay conditions, resulting in downtime while negotiations are completed.”
Commissioning Risks: eg. “There is a risk of commissioning delay due to damaged and/or faulty equipment, resulting in downtime while the problem is resolved.
Business Risks: eg. “There is a risk that permit conditions may be changed due to a change of government, resulting in a reduced requirement for environmental rectification and associated schedule savings.
Logic Uncertainty: In basic planning, we’re limited to one type of logical link between activities. They’re either always there, or we don’t put them in at all. However, in probabilistic scheduling, we can model links that may or may not exist between tasks, or even choose between alternate pathways for the schedule to follow.
Probabilistic Links: These are links that may or may not exist between two tasks. The link is assigned a probability of existence and switches on an off accordingly during simulation, similar to a risk event. Probabilistic links could be used to model, for example, the situation in which there’s only a 70% chance that a permit may be required to commence a particular construction activity.
Probabilistic Branching: This can be used to model situations in which there are two or more potential and mutually exclusive solutions or pathways for completing an objective. An example of this might be building a bridge versus building a tunnel to achieve grade separation from a busy road. Each pathway is assigned a probability (the sum of which must total 100%), then either is selected in each iteration when the project is simulated, according to their percentage probability of selection.
Calendar Uncertainty: Both duration uncertainties and calendar uncertainties affect the overall duration of a task. However, unlike duration uncertainties which influence the working time taken to perform a package of work, calendar uncertainties determine the times in which this work can be performed. There are two main types of calendar uncertainties:
Weather Uncertainty: Weather uncertainty refers to both the variations in normal working downtime associated with inclement weather, as well as downtime associated with one-off weather events such as cyclones (elsewhere typhoons or hurricanes).
Working Roster Uncertainty: At the early stages of a project, there may still be uncertainty surrounding the roster arrangements for workers. Will the office staff work a 7½ hour day or an 8-hour day? What shift rotation or time on / time off patterns will the construction staff work? These types of uncertainties can have significant impact on the project completion date.
3.2 SCHEDULE RISK ANALYSIS PROCESS
3.2.1 SCHEDULE RISK ANALYSIS PROCESS OVERVIEW
The following diagram represents a process sequence or workflow for conducting effective Schedule Risk Analysis (SRA) using a CPM-based MCS tool such as:
Oracle's duration ranging application Primavera Risk Analysis (PRA), or
Safran Software's risk factor based application Safran Risk (SR).
The sections that follow discuss in detail each of the steps involved in this process.
3.2.2 SCHEDULE PREPARATION
Preparation of the schedule for use in SRA is of the utmost importance as it forms the foundation of all things to come. As with any process, if the foundation isn’t right, a quality product is unlikely. There are many considerations when preparing a schedule for use in SRA, typically based on good planning principals. If the schedule has been well constructed to begin with, very little modification is likely to be required to use the plan as a risk model. Major considerations are listed below:
Technical requirements for schedules for use in risk analysis:
Basic Design
The schedule model must represent all known scope that may influence key dates.
The schedule model must be representative of the current execution plan.
The schedule should be appropriately structured to meet the client’s reporting requirements for both schedule and cost outcomes from the analysis. Such requirements should be discussed early in the process to avert the requirement for re-design.
The schedule should have clearly identified milestones for any RA reporting targets.
The schedule model should be representative of the size and complexity of the work and especially not be overly-summarised.
The schedule model must have key stakeholder ‘buy-in’. The stakeholder requesting the SRA should ‘own’ the schedule to be used in the analysis and accept its validity.
Logic
The schedule should be as fully linked as possible. Typically, an indication that the schedule is adequately connected is a relationship to activity ratio of around 2 to 1. Below 2 to 1 may be acceptable but requires increasing attention to validate the dependencies the lower the ratio.
Each activity should have at least one start predecessor and at least one finish successor, and each of these should preferably be driving. Where this logic does not apply, it is possible for the lengthening of an activity to shorten the project duration (driving FF predecessor, driving SS successor)!
The critical and near critical paths through the plan should be logical and make sense to the stakeholders.
The schedule should make minimal use of FS +lag relationships as these typically represent undeclared tasks that should be subject to uncertainty.
The validity of any SS or FF long lag relationships should be assessed against the level of detail in the schedule. Detailed schedules should use such relationships sparingly (prefer use of FS –lag relationships instead), whereas such relationships are a logical necessity of more summarised schedules. Planning “common sense” should be used.
Constraints
The schedule should use logic in place of constraints where possible. For example, it is preferable to include predecessor logic than an early constraint which may prevent “schedule optimism” from being revealed. Start or Finish No Earlier Than constraints prevent tasks starting any earlier than their constrained dates.
The schedule must not use ‘hard’ constraints that could produce negative float in a plan or mandatory constraints that prevent successors from moving.
Avoid Expected Finish Constraints and minimise the use of As Late As Possible (Zero Free Float) Constraints.
Where available, an ‘Always Critical’ constraint should be placed on an intermediate key milestone prior to its analysis so that criticality ratings of its logical predecessors are accentuated in cruciality tornado diagrams and not due to other overlapping pathways. Such constraints should subsequently be removed to analyse other key intermediate milestones and to analyse the true critical path(s) through the entire plan.
Schedule Size
There is no fixed guidance for this, but guidance based on experience suggests the following:
Preferred limit is up to about 2,000 normal (non-zero duration) activities that are included in critical path calculations. PRA is able to analyse this size of schedule acceptably quickly (hardware speed advances tend to increase this limit). SR is capable of much faster analysis and can usually handle 10,000 activities in the same time that PRA can handle 2,000.
Above this, filtering tasks by Total Float can be used to select those tasks to which risk factors or duration ranging may be applied. Keep the TF limit as high as possible while achieving acceptable analysis speeds.Although there is no fixed limit on the size of schedule to be analysed, increasingly large schedules become correspondingly slower to analyse. Furthermore, larger schedules not using Risk Factors require more complex correlation models to counter the central limit theory (which asserts that larger models with smaller average durations will produce narrower distributions of results around the mean). Related RFs (such as families of different discipline productivities) should also be correlated.
Use of small summarised schedules is to be avoided as they are likely to produce unrealistically optimistic Monte Carlo analyses, due to the elimination of logic nodes (eg, intermediate milestones) that bring together multiple strands of schedule logic. The Merge Bias Effect causes the probability of such logic nodes finishing by their planned date to be the product of the probability of each logic strand being completed by the planned date. The acts as a barrier to earlier completion of a schedule. Summarising schedules tends to reduce the number of such logic strands and nodes and therefore falsely reduce the real barriers to earlier completion.
So the SRA schedule size used to model the project should be as large as is required to represent the project scope and complexity adequately within the practical limits of analysis.
Resources
Plan resources have the potential to slow SRA analysis times significantly and should be removed if not required. There are also differences in the way that PRA and SR (including embedded Safran Project (SP)) and other planning applications (such as Primavera P6) calculate resource driven task durations which can cause unexpected differences in the two versions.
Use of larger numbers of resources may greatly increase analysis times as well as narrow the resultant distributions. This is a known problem in Integrated Cost & Schedule Risk Analysis (IRA - see later Knowledge Base discussion of this when added) and may also affect SRA.
Planning Units
Unless dealing with a high-level plan with long activity durations and long project duration, it’s almost always preferable to work with a planning unit of ‘Hours’ over a planning unit of ‘Days’. When planning units are set to days, small percentage variations on shorter tasks are not expressed, as the task duration is always rounded to the nearest integer. A 4.5% variation of a 10 day task would still be expressed as 10 days, whereas the same task in a schedule planned in hours would be expressed as 10.45 days equivalent in hours.
Unlike Primavera P6, PRA is not capable of working in minutes, and instead has a minimum planning unit of ¼ hour blocks. This may result in some minor discrepancies in activity durations in plans exchanged between the two applications. It should be noted however that increasingly smaller planning unit durations result in increased scheduling and analysis times in PRA.
SR can work in minutes, but it can slow down Quantitative Exclusion Analysis dramatically, particularly if probabilistic weather calendar risks are present. Unless there are compelling reasons to work in minutes, the strong advice is to avoid this.
Planning units must always be set when first importing the schedule into PRA or SR and careful schedule checks made to ensure discrepancies are minimised at that time.
Calendars
In multi-phase projects involving, say, design, procurement, construction and commissioning, calendar changes are generally unavoidable when modelling accuracy is important. But changes in working periods per day and/or per week may result in “gaps” in criticality when trying to determine the driving critical paths through a model. Mixed working periods occur when the calendars attached to a predecessor and a successor task are not the same. For example, if the calendar of a predecessor task were set to 24 hours a day, and the calendar of a successor task only allowed for 9am to 5pm, any time the predecessor task finished between the hours of 6pm and 8am, float would be created between the two tasks, resulting in loss of criticality. In general, when a predecessor has more available working periods in a week than its successor, the total float of the predecessor will be greater than that of the successor.
In the special case of the successor having zero float, the critical path must be defined by ‘longest path’ rather than zero float because the predecessor task will have positive float.
Constructing versus adapting a schedule for risk analysis
The question of whether to adapt the existing schedule or construct a new schedule specifically designed for use for SRA is crucial and one that must always be answered at the beginning of any analysis. Here are our thoughts on this important question.
It is not uncommon to find that schedules are of poor quality after the commencement of the process and some time and effort should always be allocated to examining and circumventing any issues that may arise from the schedule’s construction.
The size of the schedule is a key consideration when preparing a schedule for SRA. If there are too few activities (excessive summarisation), key dependencies and visibility of the true drivers of the project can be missed and Merge Bias Effect restraint on early completion unrealistically excluded. If there are too many activities, the model can become unwieldy and significantly less useful in revealing the main risk drivers.
RIMPL prefer to use the project schedule, with its built-in logic representing real dependencies between elements of the project, rather than create a summary schedule with the concern that key logic may be omitted.
Ultimately, however, it is unavoidable that some schedules are of such poor technical construction (refer to Technical Requirements for Schedules for use in risk analysis), or are of such a size (or both!) that they cannot be used or adapted. In such circumstances, a summary schedule must be constructed. Other circumstances that could require a summary schedule include combining different schedules, or a program of projects.
3.2.3 GATHERING SCHEDULE UNCERTAINTY INFORMATION
For information on available methods of gathering schedule uncertainty data, please refer to 2.2.3 Risk Identification.
3.2.4 ASSIGNING SCHEDULE UNCERTAINTY RANGES
Once the schedule is technically robust and suitable for risk analysis, it’s time to input the range / risk factors information gathered from the project team and its stakeholders.
Duration uncertainty typically refers to a 3-point estimate of how long a task may take:
Optimistic Estimate: If everything went as well as could be expected, how long might the activity take?
Most Likely Estimate: Under normal conditions, how long might the activity take?
Pessimistic Estimate: If everything went poorly, how long might the activity take?
It is important to disregard any consideration of specific risk events (opportunities or threats) when considering task durations. For example, it is not appropriate to say that an optimistic duration for a task may be 0, as that is equivalent to saying that it might not have to be done at all. If this is the case, the task should be converted into a risk task and assigned a probability of existence. Similarly, it is not appropriate to provide an overly pessimistic estimate on the basis that something might change that fundamentally alters the nature of the task to be performed. This again is a risk event. All task duration uncertainties must be assessed on the basis of the known scope and the assumed conditions that underpinned the development of the schedule to begin with.
Distribution Types
When it comes to assigning duration uncertainty to a schedule, the distribution type selected can have a substantial impact on how the uncertainty is expressed in the model. The distribution type ultimately defines the way in which the Monte Carlo engine samples data around the limits specified. The types can be divided into Bounded and Unbounded distributions.
Bounded Distributions
These are shapes that have definite cutoff values at their extremities. This makes them easier to visualise but not necessarily the most realistic representations of real-life situations.
There are a few commonly used types of bounded distributions in PRA, SR and other MCM tools, including:
Triangle
The triangle distribution is perhaps the most commonly used type as it is the default shape and is simple to comprehend. Like most, it can be positively or negatively skewed, with the probability of sampling values at the extremities decreasing linearly from the highest probability (Most Likely) point. Suitable for use where there is a definite Most Likely value and the distribution is not highly skewed.
Trigen
The trigen distribution is like triangular, but with the corners cut off. The optimistic and pessimistic limits are set at specified probability boundaries rather than zero probability. When a trigen limit is set, there is an x% chance that the distribution limit can be exceeded. Using trigen distributions, PRA automatically calculates the absolute boundaries (zero probability) for the distribution. The value for x is also controlled by the user and can be different for optimistic and pessimistic limits.
Good for use where inputs provided tend to be too narrow.
BetaPert
The BetaPert distribution is best described as a Normal distribution that allows for positive or negative ‘skewness’ (bias). Unlike the Normal distribution that is symmetrical either side of the most likely value, the BetaPert distribution allows for the most likely value to be closer to either the optimistic or pessimistic value while preserving a smooth, ‘Normal’ distribution shape. The BetaPert distribution shape has the least proportion of probability distributed to the skewed (further) extremity (‘thin tail’). It may be best suited to activities for which reliable performance data is available, which is more centrally clustered, but highly skewed.
Unbounded Distributions
There are a couple of distributions that fall into this category which are relevant to CPM-based QRA.
Normal Distribution
The Normal distribution is symmetrical either side of the most likely value and also unbounded on both the low and high side. Although most probability distributions of duration and cost data are pessimistically skewed, the normal distribution is the most well-known and understood distribution. It is used to describe such symmetrical distributions for which empirical information is available to define the mean and standard deviation. These parameters are needed to define the normal distribution and are otherwise difficult for Subject Matter Experts to provide.
The unbounded low side can lead to invalid negative low values.
LogNormal Distribution
The LogNormal distribution of a variable (duration or cost, say) is the probability distribution for which the natural logarithm of the variable is normally distributed and thus symmetrical either side of the mean of the logs of the variable, with a standard deviation of the logs of the variable. However, the probability distribution of the variable itself is skewed pessimistically as a consequence. The distribution is bounded at 0 on the low side and unbounded on the high side. It works well describing many skewed distributions of durations and costs in projects, both inputs and outputs. Many inputs to projects and the project cost and schedule outputs themselves are well described by LogNormal distributions.
It is difficult for SMEs to provide the required Standard Deviation to characterise the distribution. But it works well where there is empirical evidence to define the parameters.
3.2.5 DURATION RISK FACTORS
Risk Factors are underlying causes of risk and uncertainty in projects that, if identified, quantified, and applied to project activities, can significantly increase ability to understand what drives project schedule risk.
Define Causes of Uncertainty
Duration risk factors apply a methodology that defines and assigns uncertainty to plan tasks through identification of common contributors (or ‘factors’) that affect probabilistic duration outcomes. Unlike normal three-point estimates of uncertainty, risk factors are better described as causal contributors to project uncertainty that affect groups of activities (or resources) within a model equally during a simulation. For example, whereas a collection of construction tasks may all have separate duration uncertainties as they are independent tasks, there are also likely to be common factors such as site productivity rates that influence their durations to a similar extent.
Define Correlation
The risk factors methodology has another significant benefit over traditional 3-point estimates in that it can take the guess work out of defining an effective correlation model. One of the key features of the Monte Carlo Method is that it inherently assumes all elements in a model are independent. To over-ride this invalid assumption for related tasks in a project schedule, a risk modeller is required to make educated guesses regarding the degree of relatedness between groups of tasks then enter these values as correlation percentages against each of the applicable model elements in the group. In contrast, the risk factors methodology removes this guesswork from the analysis as it effectively defines the degree of relatedness between elements by the action of multiple risk factors on overlapping groups of activities.
The validity of this is dependent on all significant risk factors being identified and applied.
How do Risk Factors Work?
The following steps briefly outline how a risk factors methodology can be applied to a schedule risk model:
Stakeholders identify common sources of uncertainty within a model that have the potential to influence task duration outcomes. These are the risk factors. Their impacts may range from <100% of the deterministic value to >100%
Characteristics of each risk factor are defined including:
Description,
Impact range distribution (3 point probability distribution and shape),
Probability of occurrence (may be 100% or less), and
Correlation between related risk factors.
Risk factors are then mapped to tasks and/or resource assignments.
When a risk analysis is run, the Risk Factors logic intercepts each iteration event and modifies each task duration and/or resource assignment value according to the net effect of the individual or multiple risk factors that have been applied.
The modified task / resource assignment information then provides the inputs for the scheduling engine before the results are calculated and committed as the final iteration data.
3.2.6 MAPPING RISK EVENTS TO THE SCHEDULE
Mapping risk events to schedule tasks is a challenging aspect of the schedule risk analysis process, as it requires a detailed understanding of the nature of the risks and schedule tasks involved in the process. The integrity of any schedule risk model is dependent on the validity of the risk/task mappings, such that their impact is neither overstated nor understated. The following discussion focuses on PRA rather than SR, which works somewhat differently, although the mapping principles are exactly the same.
Things to Consider when Mapping Schedule Risks
When a schedule risk is mapped into a PRA schedule model, it is then referred to as a schedule risk-task. This is a task that has a probability of occurrence less than 100%, a deterministic duration of 0 days, but a probabilistic duration distribution greater than 0 days. By definition, a risk is not a risk if it has no impact on the objectives of the project. Therefore, probabilistic impact ranges including minimums of 0 days are not recommended.
The placement of risk-tasks within the probabilistic model is a significant determinant of their impact on the probabilistic schedule outcomes. Factors which influence the effect of a risk on the schedule model include:
The probability of the risk’s occurrence in any given iteration. If a risk is specified as 50% probable and the project model were to be simulated 1000 times, the risk will exist (occur with an impact from its distribution) 500 times. The more iterations in which a schedule risk-task is triggered, the more effect it will have on successor probabilistic completion dates.
The risk’s probabilistic duration distribution profile. Similarly to normal tasks with duration uncertainty, risk-tasks are usually assigned a duration distribution profile. This is the duration impact that the risk-task will have on its parent task’s successors should it occur. Risk-tasks with larger duration distributions produce larger changes in probabilistic schedule outcomes of successor tasks than those with smaller duration distributions.
The criticality of the task to which the risk is mapped. Risks must be applied in context. A risk with a high probability and high impact may have less probabilistic impact on project objectives than a low probability low impact risk if the task to which the former applies is rarely on the critical path while the latter affects a task frequently on the critical path.
The logic applied to predecessors and successors of the risk-task. The schedule logic into which the risk is mapped (to the parent task) is also important as it determines how the project-level risk behaves when applied at the activity level. If we assume that a threat risk-task is mapped to the end of its parent task, the successor logic of the parent task should be replicated on the risk-task. However, if the parent task has no driving (or near-driving) finish-successor logic, or only successor logic stemming from its start, the risk-task cannot actually impact on any successor task. As stated earlier (3.2.2 Schedule Preparation), each task in an SRA model should have at least one start-predecessor and at least one finish successor.
Series or Parallel Schedule Risk Event Assignments
As discussed earlier, a detailed understanding of the schedule and the nature of the risks in a model is important when performing risk-mappings. A risk event applied incorrectly within a schedule can have its influence understated or exaggerated depending on its context. How the risk event is defined, to a large extent, determines the way it should be mapped. Then the schedule logic determines the overall probabilistic impact on the project.
Consider the following Foundation Excavation Schedule and associated risk events:
Foundation Excavation Schedule - in Series
Series or Parallel Schedule Risk Events Mapping
Case A
“There is a risk that rock may be encountered during excavation for foundations on the project site, leading to project delays.” Preliminary geotechnical investigations of the site suggest a 30% probability. The presence of rock is estimated to increase the excavation time by an impact distribution range of 15% / 25% / 50% of planned duration.
Mapping this project level Risk A into the above Foundation Excavation Schedule requires that it be mapped to each of the three tasks, each with a 30% probability, but the impacts occur independently (that is, there is no existence correlation between the three risk-tasks). Each excavation task has a 30% probability of occurrence, with the impact distribution proportional to the duration of each task.
Case B
“There is a risk of encountering rock during excavation for foundations in each of the three areas comprising the project site, leading to project delays.” Geotechnical investigations of each area of the site have produced the following probabilities of rock in each area:
Area S1: 50%; Area S2: 25%; Area S3: 5%
As for Case A, the presence of rock increases the estimated excavation time by an impact distribution range of 15% / 25% / 50% of planned duration.
In this case there are three separate risks, applicable independently to the three different areas with their different probabilities. However, each has the same impact delay range distribution of 15% / 25% / 50% of the parent task duration.
Case C
The excavation logic is changed to accelerate the work by doing all three area excavations in parallel, as shown in the diagram below:
Foundation Execution Schedule - in Parallel
Case C - Risk Events Mapped in Parallel
Where two or more activities affected by a risk are arranged in parallel, the impact may be applied 100% to each pathway because the delay may occur which ever path is followed in the project.
By paralleling the parent tasks the overall effect of the risks in case A and B applied to Case C will be lessened, with the largest combined task and risk uncertainty driving the model uncertainty.
If instead of expressing the schedule impact as a percentage of each duration, the project level risk were expressed in terms of an overall impact range, such as “…causing a delay of 5d / 8d / 15d, it would be necessary to apportion the impact range between the three excavation tasks.
These examples illustrate the importance of the wording of the risk and the logic in determining how a project risk is mapped at the activity level.
3.2.7 WEATHER MODELLING IN SCHEDULE RISK ANALYSIS
Almost every project that involves outdoor work will be subject to some kind of weather conditions that may dictate working and non-working periods for all or part of the workforce. In normal deterministic plans, this is usually accounted for by making an allowance for downtime in the relevant plan calendars. However, in reality, weather is often more complex and uncertain than this, and requires special probabilistic techniques to be able to model its potential effects appropriately.
The key principle to understand in modelling weather effects is that the weather is independent of the schedule activities, occurring as an uncertain but seasonally fixed backdrop, over which the schedule activities move. If the activities are delayed and change their seasonal timing, the SRA model must reflect the change in delay impact risk. The following discussion is based on the use of PRA, but the principles apply equally to SR, where probabilistic weather calendars are defined then applied as a special type of calendar risk factor.
Weather modelling can be broadly divided into three main categories:
Weather uncertainty refers to the variations in normal weather conditions. This is the variability / fluctuation of normal weather patterns within specified time periods. For example, in the month of May in a specific region, there may be an average 10 hours of downtime due to inclement weather. However, historical data may show that this could be as little as 5 hours, or as much as 30 hours. An impact probability distribution would be required to express that uncertainty.
Weather events behave somewhat like risk events and are assigned a probability of existence. Weather events typically refer to distinct events such as floods, fires and cyclones or hurricanes. These are events that may or may not occur, but if they do, may have a similar impact on productive downtime across large portions of a project plan. Similarly to risk events, weather events can be assigned optimistic, most-likely and pessimistic duration ranges, and can be applied selectively to tasks within a schedule risk model.
Weather windows refer to periods within a schedule model in which certain operations can or cannot be undertaken. Unlike weather events, weather windows have 100% probability of existence. Their uncertainty stems from when they will start and how long the period will last. Classic examples of weather windows are the opportunity to truck materials over non-permanent ice-roads, the ability to use rivers for barging goods or the period for which roads are impassable during the wet season in tropical areas.
The incorporation of weather modelling in schedule risk analyses adds significant benefits over the options for allowing for weather in normal deterministic schedules because of the non-linear range of outcomes possible as the schedule shifts seasonally. The principals of weather modelling can also be used to model other types of uncertainties causing downtime which have seasonal patterns.
3.2.8 PROBABILISTIC LINKS & BRANCHING
One of the advantages of schedule risk analysis is that it is capable of modelling uncertainty not only in terms of duration and risk events, but also logic. Logic uncertainty is an important aspect of schedule risk, but something that is often overlooked. It refers to the ability to set the probability that one or more pathways through a project plan will be followed. Probabilistic logic in PRA can be divided into two types: probabilistic links & probabilistic branching. In SR, probabilistic branching is also available, but the equivalent of probabilistic links is the corollary: the probability of an activity not existing, called "non-existence risks". The PRA types are discussed in the sections that follow.
Probabilistic Links
Deterministic plans force assumptions to be made regarding the workflow through a particular process, even if two tasks may or may not be linked. An example of this could be the link between an environmental approvals process and a construction task. In the early stages of the project, we might suspect that the approval is required before starting construction, but we can’t be entirely sure until we’ve learned more. This is where a probabilistic link could be useful. If the percentage probability (that regulatory approval to start construction will be required) can be estimated, then during modelling, the link will only exist for that percentage of the simulation iterations through the approval task to the start of construction.
Probabilistic Branching
While probabilistic links are useful when it is uncertain whether two tasks are related, this does not help when more complex modelling is required of alternate pathways / execution strategies to the same objective.
Probabilistic branching is applicable where there are two or more mutually exclusive ways to accomplish some part of a project and the choice has not yet been made on which to use.
For example, probabilistic branching might be used to model the difference between modular and stick building in a site construction process. Or different contracting strategies may be expressed in probabilistic branching where it is unclear which strategy may be used.
Each branch is assigned a probability of existence, but the sum of the probabilities of all branches must add to 100%. When a schedule risk analysis is run, each pathway (or branch) is randomly sampled according to its probability of existence, including all successors in the branch. However, only one pathway may be selected and the other branches do not exist in that iteration.
3.2.9 DURATION CORRELATION
Correlation is an important component of any schedule risk analysis model. It is the means of advising the Monte Carlo software of the degree to which tasks are to be treated as related. As noted earlier, an important inherent assumption of Monte Carlo analytical software is that every element in the model is completely independent of every other element. Thus, when a high value is selected from one range, there is nothing to stop a low value from being selected from another. This is not a problem for unrelated activities, but where works are related through a common denominator such as the same contractor, or the same physical locality, it is natural that poor performance at one task is likely to be at least partially reflected in another.
For this reason, it is necessary to incorporate correlation into any probabilistic model. This is increasingly important as the numbers of tasks in the model increase to many hundreds or even thousands of activities. In such situations, a statistical phenomenon known as the ‘Central Limit Theorem’ becomes particularly evident. In simple terms, this is the tendency for the results of increasingly large analyses to approximate a normal distribution and to be more and more tightly grouped around the mean value (lower variance/standard deviation, higher “kurtosis” or ‘peakedness’ of the distribution). This is because of the apparently ‘random’ selection of high values from some distributions and the counter-selection of low values from others, combining together across many iterations to result in very little overall variance either side of the mean.
To a great extent this problem is solved by use of the Risk Factors method in SR. Ranges are not applied directly to activities or groups of activities, but through underlying Risk Factors (RFs). Through the use of multiple, often overlapping RFs, correlation is automatically applied to the SRA model without correlation models being required. However, if related RFs are applied, any relatedness must be taken into consideration by applying correlation, whether weak, moderate or strong, between the RF impact distributions.
In PRA, by correlating related activities, we ensure that commonly themed or related packets of work are constrained to greater or lesser extents (depending on the percentage correlation) to be sampled from the same end of the spectrum of possible outcomes. This results in a greater net variance either side of the mean. A correlation percentage of 0% means no association between activities. Conversely, a correlation percentage of +100% forces complete proportionality in the sampling of distributions between activities in a linear fashion. In rare situations negative correlation may apply, where a higher duration being sampled from one activity trends towards a lower duration being sampled in another.
The challenges with correlation are:
To determine to groupings in the model to which correlation should be applied and
To identify the levels of correlation applicable in the absence of data.
Correlation and the Merge Bias Effect (MBE) are factors to be balanced in development of realistic SRA models. The MBE causes larger and more complex schedule models to be more realistic and preferable to smaller more summarised models. But the larger and more complex the SRA model becomes, the more important realistic correlation becomes and the more challenging it is to achieve it.
3.2.10 SCHEDULE MONTE CARLO SIMULATION
Monte Carlo Method Simulation (MCS) is a mathematical technique that uses repeated random sampling within specified distributions to calculate the probability of defined outcomes. The principal of the method is that by simulating a process many times using ranged parameters before doing something in actuality, a mathematically based prediction of how the real process may eventuate can be calculated. The method was invented in the Second World War to simulate nuclear events during the Manhattan Project to develop the atomic bomb and has been adapted to an increasingly widespread range of applications since.
As applied to schedule risk analysis, MCS involves the random sampling of uncertainties within a project schedule. As identified earlier, there are four main types of uncertainty in schedule risk analysis:
Duration Uncertainty
Risk Events
Logic Uncertainty
Calendar Uncertainty
For each of these elements, and against each item in the model, uncertainties are randomly sampled for duration and / or probability. Normal forward and backward pass critical path method calculations are then made, and the resultant early and late start and finish dates for each of the tasks / milestones in the schedule are recorded, as are the task durations. After this process has been repeated many hundreds or thousands of times, the results are collected, ready for interpretation. These are discussed below in 3.2.11 Interpretation of Schedule Analysis Results.
How many simulations / iterations are required?
It is a general statistical principal that the more times data is sampled the more meaningful the results are expected to be. For example, if we were to ask a random person on the street if they liked ice cream, it wouldn’t be appropriate to infer from their answer that the rest of the population felt the same way. However, the more people we asked (especially if they were from a wide range of demographics), the more statistically significant our observations should become.
The same principle applies to SRA. Simulating a project only a handful of times will likely produce wildly varying results with little statistical significance. However, as the number of simulations performed is increased, the precision of analysis results will also increase.
This is especially true of schedules that contain many low probability high impact risks. It is statistically unlikely that more than one of these risks will occur at once unless running many simulations, but where this does happen, it is likely to have a substantial effect on the outcome of that iteration and influence the overall results more significantly.
Sensitivity analyses are sometimes made to apply the full impact of more than one high impact low probability risk simultaneously to a project model to assess the resilience of the project.
It is often felt that the major threats to large scale complex projects come from such low probability high impact risks.
There is no clear-cut answer as to how many simulations may be required to obtain statistically significant results from an analysis. But the generalisation can be made that the central properties of a distribution (mean, standard deviation, P50) do not change much between a few hundred iterations and several thousand. What do change are the properties at the “tails” of the distribution and, particularly for large projects, the conservative end of the analysis (P80/P90) is of great interest for sizing schedule (and cost) contingency. The inherent uncertainty in Monte Carlo analyses is such that if low probability risks are being analysed, higher numbers of iterations are desirable, say around 5,000, to reduce the % uncertainty of the modelling (often masked by using a fixed ‘seed’ (see below) to start the iterations so that consistent results are obtained). The inherent uncertainty of MCM modelling is inversely proportional to the number of iterations and may be less than 0.5% at around 5,000 iterations.
The problem with this is the practical limit of time of analysis: large and complex projects have large SRA models which take many minutes and sometimes hours to analyse, particularly where probabilistic weather modelling is included. So compromises may have to be made.
Some Monte Carlo schedule simulation tools come with features that allow you to continue simulating until the mean results move by less than a specified threshold. The point at which the results are no longer making significant differences to the results is often referred to as ‘convergence’. By analysing until convergence is maintained over a number of iterations, we can be relatively confident that continued analysis will add little to the validity of the results we will observe, provided we are not so concerned about the “tails”.
A useful compromise may be to use fewer iterations until the model is finalised and then do the final results using the recommended full number of iterations.
What is an analysis ‘seed’ & why do we use it?
Running a different number of simulations will produce different results. However, less obvious is why running two different analyses on the same plan, using the same number of iterations in each, might produce two different answers. The reason why this occurs is because Monte Carlo in its purest form is always completely random in its sampling. Therefore, if two identical analyses are run, it’s unlikely that exactly the same result will be produced. Although perfectly statistically valid, there are two problems with this approach:
Firstly, it can be confusing to users attempting to interpret the results; and
Secondly, the computational processes required to produce and interpret useful results from such truly random sampling are quite resource intensive, and require significant processing time.
However, Monte Carlo schedule risk analysis tools may allow the user to control the use of something referred to as a ‘seed’. This is the starting point for the set of instructions for the random sampling processes to follow that increase analysis performance speeds and also ensure that the same plan modelled twice in the same way will always produce the same results. All subsequent values are generated randomly, so although the simulation is now following the same sampling pattern, it is still, in effect, a random process. But the apparent randomness of the results is significantly reduced.
Tests run by RIMPL have shown that the effect of changing the seed varies truly random analysis results depending on the number of iterations:
For 1,000 iterations of a schedule model with, say, 1,000 activities, the percentage variation between analyses through seed changing may be 1-1.5%.
If the number of iterations is increased to 5,000, the percentage variation between analyses through seed changing may drop to 0.5%.
It is important to note that any change in a model effectively changes the way that the seed interacts with the model, and therefore is tantamount to changing the seed itself. It is only by ensuring that a sufficient number of iterations have been performed in each analysis that the apparent ‘noise’ in the analysis results associated with this seed change will be minimised. This is important if the effects of risks are being measured by difference. If the probabilistic effect of a risk is small, its effect may be exceeded by the randomness introduced by the effective seed change effect of removing the small risk. This can explain anomalous results for low probabilistic impact risks from SRA modelling.
3.2.11 INTERPRETATION OF SCHEDULE ANALYSIS RESULTS
As stated earlier, schedule risk analysis results are derived from date and duration information collected across many hundreds or thousands of simulations of a risk-loaded schedule. When interpreting this data however, it is important that it can be conveyed in a simple and meaningful form. The commonly accepted means of presenting MCS results uses a histogram overlaid with a cumulative curve to display percentile data. An example plan finish date histogram is shown below:
Histogram & Cumulative Curve Data
Histogram Data
In a histogram, data is grouped into collectors called ‘bins’ across a particular dimension (e.g., date, duration). The frequency with which the data set falls into these collectors is represented as height on the vertical axis (shown on the left axis in the example above as ‘Hits’ [iteration results]). By structuring the data in this way, a visual representation of the clustering of results along the measured dimension (finish dates in this case) is displayed.
In the example above, the finish date of the entire plan has been used as the bin parameter along the horizontal axis. The dimension is always the same as the metric to be reported.
In simple terms, the above date histogram shows the results for all iterations (‘Hits’) of a Monte Carlo SRA for the Finish Date of the Entire Plan. The results are plotted from the earliest date of an iteration (29Jan14) to the latest (15Apr14). The height of each bar represents the number of hits that fell within the date range represented by the width of the bar. The highest bar records that 49 hits occurred on 16Feb14.
Bin sizes are a flexible variable, and in the case of schedule risk analysis, information by the hour, day, week, or month etc may be reported. It is important that the bins are sized large enough to allow for adequate visual representation of trends, but not so large that they hide important information about the model. As an example, if finish date information is collated by month, then a plan might show a marked drop off in frequency in December which could cause some confusion. However, collating the same information by week might reveal that the drop-off is actually caused by no hits in the last week of the month where the holiday calendar downtime occurs.
Cumulative Curve Data
The cumulative curve adds up the number of hits in each bar progressively so that it represents the number of iterations up to a particular date. In effect, an intercept from the curve to the horizontal axis represents the percentage of iterations up to that date or the probability of the Entire Plan finishing on or before that date. So the highest bar also corresponds to the percentage of iterations up to 16Feb14, which the vertical axis intercept tells us is the 50% point.
Schedule Risk Analysis results usually refer to ‘P-values’. These are the percentile values. For example, in the output diagram shown above, the P90 finish date value of 13-Mar-2014 indicates 90% confidence that the completion date will be on or before 13-Mar-2014.
The earliest and latest dates are usually of less interest than the P10, P50 and P90 (or P20, Pmean and P80). The intercepts used by organisations vary according to their risk policies, “risk tolerance” or “risk appetite”.
Also showing on the cumulative curve is the Planned or Deterministic Date for the Entire Plan Finish Date. In this case, it is on or just after the earliest date on the chart and has a probability of less than 1%. This makes it clear that the planned date for the project is highly unlikely to be achieved.
In summary, the date histogram reveals a lot more about the feasibility of the project than the deterministic plan can. It identifies the range of possible date outcomes, how likely the planned date is to be achieved (very unlikely), what is an aggressive date (say 11Feb14, a 30% probable date or P30), what is a likely date (say 16Feb14, the P50) and what is a conservative date (say 13Mar14, the P90).
Positively Skewed & Negatively Skewed Distributions
Skewness & Kurtosis
Skewness
One of the reasons that it is important to be able to clearly visually represent the distribution of data in the histogram is that it gives us an indication of Skewness.
Skewness refers to the asymmetry of data around a central point, and indicates a bias in the trending of results. Data can be positively skewed, negatively skewed, or without skew. A positive skew indicates a longer tail after the central measure, whereas a negative skew indicates a longer tail before the central point.
In SRA, Skewness is important as it indicates the bias toward optimism or pessimism relative to the mean. If data is positively skewed, the P90 dates will be further from the mean than the P10 dates will be from the mean and the skew or bias is pessimistic. It also means that Pmean (the centre of the distribution) will be greater than P50 (the median of the distribution).
Conversely, if data is negatively skewed, the P90 dates will be closer to the mean than the P10 dates and the skew is optimistic. Pmean will be less than P50.
Understanding the potential for movement of schedule dates towards either extreme of the analysis is important in understanding overall risk exposure.
Kurtosis
In addition to the Skewness of a distribution, it is important to understand the overall Kurtosis or ‘peakedness’ of results. The kurtosis describes the narrowness of the distribution or the extent to which results are tightly clustered around the mean.
In SRA, it is important to understand these parameters as they can help establish or challenge the credibility of the model. Results with very ‘peaky’ results around the mean are likely to be caused by narrow ranging of activities on and around the critical path, or from deficiencies in the correlation model, or both.
3.2.12 EXAMINATION OF SCHEDULE DRIVERS
Apart from the ability to assess schedule contingency, one of the key benefits of SRA is that it enables the drivers of uncertainty within a model to be assessed and ranked. This is important and valuable because it identifies the risks and tasks to be focused on to reduce the riskiness of the schedule most effectively.
There are two main techniques used to identify and rank the uncertainty drivers:
Sensitivities (derived by correlation output analysis) and
Quantitative Exclusion Analysis (using MCS of the SRA model with and without each uncertainty driver).
It is in this area that SR has a significant advantage over PRA, as will become apparent in the explanations that follow.
Dealing with Sensitivities first:
What are sensitivities?
Sensitivities are measures of correlation or dependence between two random variables or sets of data. Correlation measures the tendency for the behaviour of one variable (the independent variable (IV)) to act as a predictor of the behaviour of another (the dependent variable (DV)) in terms of some quantifiable measure. Sensitivities can range in value from +1 (perfect direct or positive dependency), through 0 (no relationship), to -1 (perfect inverse or negative dependency). Positive values represent a positive relationship between the IV and the DV, and negative values represent a negative relationship. Sensitivities approaching 0 indicate progressively weaker associations between the IV and the DV, with a sensitivity of 0 indicating no statistically measurable relationship between the behaviours of the two variables. It is important to be aware that correlation does not measure causality. It does indicate the likelihood of causality, but a high sensitivity does not guarantee causality, as discussed further below.
What is the use of Sensitivities?
Sensitivities are used in quantitative risk analysis to assess the effect of one element in a model on another element, or on a key measure within the model as a whole. Measuring the degree of relatedness between variables helps to identify and rank key elements within a model that may be positively or negatively affecting measured outcomes so that scarce project management resources may be focused on improving project outcomes most efficiently.
Duration sensitivity
SRA driver assessment measures the relatedness between the changes in each task’s duration from iteration to iteration and the change in overall project duration. This is known as Duration Sensitivity. Duration sensitivity is not limited to measuring impact at the project level; it is also measurable against the duration to any task, summary, or milestone within the plan. Similarly, duration sensitivity can also be measured from a summary task rather than an individual task, enabling, for example, the duration sensitivity of all mechanical construction tasks against the overall finish date of the project to be measured.
The problem with duration sensitivity (Criticality & Cruciality)
Unfortunately, duration sensitivity has one fundamental flaw; it doesn’t measure whether the independent variable is actually driving the dependent variable. It only looks at the correlation between its duration and the start or finish date of the observed target. Therefore, in schedule risk analysis, a hammock task that spanned from the start of the project to the end of the project would measure a perfect (100%) duration sensitivity when measured against the finish date of the project. This is because whenever the finish date extends, so too does the hammock task, thus creating a perfect correlation between the two observed elements. In reality, the hammock isn’t actually a determinant of the completion of the project at all, but rather a product of it. Similarly, a predecessor with very high float can have high duration sensitivity, but never drive the project end date because it is never on the critical path.
To assess whether a task is actually a determinant of the date of the dependent variable, a metric to measure the potential for the task to drive the date must be added.
That metric is called Criticality, which measures the percentage of iterations in which a task was on the critical path in any one simulation. The higher the Criticality, the more frequently the task was involved in the calculation of the completion date for the selected target. Multiplying Criticality and Duration Sensitivity together giving us a criticality moderated duration sensitivity metric known as Cruciality.
While Cruciality is a powerful indicator of the relative importance of tasks as drivers of a Target milestone or task within a schedule model, it is important to note that Cruciality only works in this way if the critical path(s) that directly drive the target milestone are able to be isolated. The presence of alternate critical paths that are unrelated to the completion of the dependent variable may confuse the criticality results and the consequent cruciality metric, even if only dependent predecessors of the target milestone have been selected. This is because other critical paths can cross over the paths to the target milestone in leading to non-target tasks.
Correlation and causation
When dealing with issues of correlation, it is easy to infer causation where none exists. This is especially true when dealing with large data sets from quantitative models such as a schedule, where there commonly exist sets of complex interactions between the elements in the model. The issue here is whether the changes in the IV are actually causing the perceived changes in the DV, or if the two may be related through a third variable. In statistics, this third variable is known as an ‘undeclared independent variable’ and it has the ability to alter significantly the calculated values for any type of sensitivities.
In SRA, perhaps the most frequent example of an undeclared independent variable is the presence of input duration correlation between tasks. These correlations form an integral part of the model in that they counter the ‘Central Limit Theory’ and prevent unrealistically narrow or ‘peaky’ distributions. However, when looking at sensitivity calculations, the correlation between the sampling of one task’s duration and another task’s duration represents an undeclared and uncontrolled variable, which may modify and can invalidate the sensitivity result.
As mentioned earlier, sensitivity calculations do not measure the strength of the relationship between two variables, but the similarity in rates of change between them. Thus, if a very small duration distribution and a very large duration distribution are 100% correlated via the duration correlation model, this acts as an undeclared independent variable, and the sensitivity for the smaller duration distribution will be calculated as equal to that of the larger distribution when measured for influence on total start or finish date variability.
Therefore, while sensitivity calculations are useful measures for gaining some insight into the drivers of measured outcomes in the model, they are inherently flawed in that we can never fully control for undeclared independent variables.
In summary, sensitivity rankings cannot be adjusted for the effects of applied correlation groupings. Another more reliable means of ranking is needed. This brings us to the most effective way of quantifying and ranking drivers of risk.
Quantitative Exclusion Analysis (QEA)
This is a technique which removes each source of uncertainty from the Monte Carlo model, re-runs the simulation and reports the differences at chosen P-levels. This can be done for individual risk events or activities, for groups of activities or Risk Factors, or for entire classes of uncertainty. The advantage of this approach is that is unaffected by correlation and produces reliable, quantified differences. The differences must be reported with the P values at which they have been measured because they usually change with the level of probability.
This can be done manually using PRA, but is automated in SR, which is an enormous benefit.
Confusingly, Safran call QEA "Sensitivity Analysis", but it has nothing to do with correlation. It is an automated process that quantifies the duration differences for a given P value on the entire project, or any key milestone or section of a project. The following QEA Tornado diagram quantifies the duration risk drivers of a Tollway Construction project finish date milestone at Pmean probability. It ranks the largest contributors to completion uncertainty, enabling the project team to focus their efforts on mitigating schedule risk where their efforts can be expected to be most fruitful.
Note that SR enables the user to produce the same tornado diagram for any pre-defined focus percentile values. This enables analysis at whatever P-levels required by the project team to support contingency or other analysis and reporting.
Note that the bottom Risk Factor for Commissioning duration uncertainty is negative because the RF Impact Distribution has been assigned a negative skew, as explained in the screen dump below the QEA Tornado Diagram.
Quantitative Exclusion Analysis (QEA) of a Safran Risk Tollway Construction Project at Pmean
(Note the negative 7d Commissioning Risk Factor due to negatively skewed duration distribution)
Commissioning Risk Factor - assigned negatively skewed Trigen Impact Distribution,
showing mappings and notes explaining reasons for choice of shape of distribution.
4 ICSRA USING A CPM-BASED MCS TOOL
4.1 BASIC CONCEPTS
4.1.1 WHAT IS INTEGRATED COST & SCHEDULE RISK ANALYSIS?
Integrated Cost & Schedule Risk Analysis (IRA) is a quantitative Monte Carlo analysis technique for assessing probabilistic cost and schedule outcomes. Unlike traditional separate cost and schedule risk analysis techniques, in IRA, the cost risk analysis is directly incorporated with the schedule risk analysis, such that both elements are analysed concurrently in a single model. Costs are divided into time independent (eg. materials) and time dependent (eg. labour), then allocated to the schedule tasks to which they relate. During analysis, as task remaining durations change, so too do the calculated values of the time dependent cost components.
Using IRA, we can gain many valuable insights into a project and its probabilistic behaviours. These insights include, but are not limited to:
How likely is it that the deterministic schedule and estimate will be achieved?
What are appropriate levels of cost and time contingency in order to be x% confident of achieving project objectives within budgeted allowances?
Which risks / uncertainties are the biggest determinants of project cost and schedule outcomes?
What is the likely time and/or cost consequence of failing to adequately treat a known risk?
Which execution option will likely result in the optimum balance between performance on cost and schedule objectives?
4.1.2 THE EVOLUTION OF THE IRA METHODOLOGY
Estimators and cost controllers have traditionally come from the ranks of quantity surveyors and cost engineers, while project planners and schedulers have tended to emerge from the ranks of design and construction engineers. The first group is concerned with counting and quantities while the second group is more concerned with how things are put together.
This has carried through to the application of Monte Carlo simulation: Cost Risk Analysis (CRA) tends to be performed by practitioners with estimating or cost control backgrounds, using spreadsheet based tools like @Risk and Crystal Ball. Conversely, project planners who have transitioned into schedule risk analysis (SRA) tend to use Monte Carlo simulation tools built on project planning software such as Primavera Risk Analysis (PRA, formerly Pertmaster) or, more recently, Safran Risk (SR, built on planning software application Safran Project, SP).
Before the widespread adoption of the IRA approach, where separate cost and schedule risk analyses had been conducted, the outputs of a SRA would be fed into a CRA as a schedule risk allowance using an assumed cost ‘burn rate’ over the contingency period, as shown in the graphic below.
Traditional way of combining Schedule & Cost Risk Analyses
Limitations of Traditional Approach to combining Schedule and Cost Risk Analyses
There are three clear issues with this approach which we will identify as the ‘when’, ‘where’, and ‘why’ of how traditional approaches to combining cost and schedule risk fail to accurately characterise the true project uncertainty. To demonstrate this point, consider the following scenario:
“A schedule risk analysis reveals that an additional 30 days of contingency are required to the planned duration to be 90% confident of achieving project completion on time. It also reveals that the bulk of the duration uncertainty for the project is distributed across its construction phase. The results of the schedule risk analysis are then input to the cost risk analysis at an assumed cash burn rate of $1 million dollars a day based on the conservative assumption of peak construction manning levels.”
When:
The first issue relating to the combining of separate cost and schedule risk analyses lies in the assumed cash burn rate per day. For the example above, because the bulk of the duration uncertainty was identified as coming from the construction phase, the assumed cash burn rate per day was calculated based on peak construction manning levels. However, this assumption is overly pessimistic, as only a proportion of the construction period will actually run at peak manning levels. What if delay occurred before all contractors had been mobilized to site? The capital cost impact of the delay would understandably be significantly reduced. Similarly, critical path delays affecting pre-execution engineering or approvals would have drastically different cost impact profiles. The calculated cost of delay is clearly dependent on when the delay occurs.
Where:
The second issue relates to where in the program a delay occurs. It is likely that the schedule will consist of multiple parallel paths of tasks that ultimately converge on one completion milestone (either directly or through other connected tasks). Some of these paths will be dominant in determining the completion date of the project, occurring frequently on the critical path, whereas others will not. It is entirely possible for a chain of tasks to be significantly delayed, but never impact on the overall project critical path. However, even though they’re not impacting on the end date of the project, prolongation costs will still be incurred associated with the delays, due to longer use of hired equipment, labour, etc. Calculating the cost of a schedule allowance based on delay to project completion fails to account for these non-critical delay cost uncertainties.
Why:
The third issue with the traditional approach to cost and schedule risk analysis deals with why a particular answer was given, what was driving it, and the assumptions and methodology of how it was derived. Separate cost and schedule risk analyses will almost always have different assumptions that underlie their inputs. A cost risk analysis that draws from the result of a schedule risk analysis must take account of the schedule assumptions underlying the schedule answers to accurately portray forecast cost of delay over-runs. Further, because the schedule is analysed separately from cost, the visibility of individual schedule elements as drivers of delay cost is lost. We can indeed attribute a certain amount of cost contingency requirement to schedule, but why it is required, what drives it, and how it has been derived is very difficult to express through such a methodology.