Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn't arrive within 3 minutes, check your spam folder.

Ok, Thanks
What is ITIL availability management?

What is ITIL availability management?

As ITIL says it: The purpose of the availability management practice is to ensure that services deliver agreed levels of availability to meet the needs of customers and users. > Definition: Availability The ability of an IT service or other configuration item to perform its agreed function when required. Availability

The Office Bantomime Team profile image
by The Office Bantomime Team

As ITIL says it: The purpose of the availability management practice is to ensure that services deliver agreed levels of availability to meet the needs of customers and users.

Definition: Availability The ability of an IT service or other configuration item to perform its agreed function when required.

Availability Management is ever more crucial as a service to customers as it is vital to ensure the user's service availability is not disrupted at any point, and if it is, minimal downtime should only be expected.

Availability Management as a function needs to negotiate and agree on targets that are achievable to ensure availability by designing applications and infrastructure that can provide a service deliverable of availability levels. The focus needs to ensure that services and components are able to accumulate the data required, in order to measure the access of availability, by measuring, monitoring, reporting and analysing improvements by planned work.

What is MTBF & MTRS in ITIL?

To sum up availability management in the simplest terms, the availability of a customer's service will depend on failure and how quickly the failure can be restored. In the world of ITIL, these are expressed as:

(MTBF) Mean Time Between Failures:

(MTBF) Is a measurement of how often the service fails.

As an example, a service with an MTFB of a month, fails on average, 18 times each year. This would be a measurement,

(MTRS) Mean Time To Restore Service

(MTRS) Is a measurement of how quickly the service is restored after a failure.

As an example, a service with MTRS of 2 hours will on average recover fully after the failure has been recorded. We must remember that the service can not always be restored in the time frame agreed. MTRS will be measured over a period where the incidents occurred on average where there was a  disruption to the service availability.

Invest in better technology and processes

The drive is for organisations to ensure the design of MTRS is optimised so that services that are impacted can be fully restored quicker and efficiently without compromising the customer's operation. Better solutions need to be invested in to ensure systems recover automatically and quicker with very little service impact.

Metrics and reporting

Some customer services rely on the availability being permanent and any small incident of disruption can be disastrous. Focusing on the way the availability is defined for the particular service is important to understand as a business. Having this understanding with the customer and the user makes it easier to put metrics and reporting, dashboards in place. Many business organisations calculate the percentage of availability based on MTBF and MTRS but these percentage figures rarely match customers’ experience and expectations and are not appropriate for most services.

Different application failures

It would be a good idea also to consider which vital business functions are impacted when different system applications fail. To review and trend slow performance, especially when the service is effectively unusable when the service needs to be available to the customer. Knowing when the service provider carries out planned maintenance can be taken into account when measurements are put in place to calculate the minutes of downtime users are impacted by.

Targets and agreements

It's vital that user satisfaction is being maintained and met. Ensuring the system availability is kept within the desired agreement provides confidence and reassurance in the overall service. Most organisations don't have the required function of dedicated availability management

Risk management

The activities required to ensure availability management is being maintained tends to be distributed around the business. Some businesses account for it within risk management and others may have it included in capacity and performance management. Some businesses will have technical engineers or site reliability engineers (SREs) who will be required to manage and improve the availability of specific services. Processes around this will need to be introduced to ensure the regular testing of failover and recovery mechanisms.

Adopting Availability Management

It's important that a business is on board with availability management and may already have a process in place when calculating and reporting availability metrics.

In order to drive and incorporate availability management as a function, the business will need to ensure the culture, experience and knowledge is taken into consideration to successfully implement it.

Important factors to consider - Availability Management

  • Plan Availability management must be considered with regards to the in service portfolio decisions and agreements, especially when setting goals and direction for services and practices.
  • Improve When planning and making improvements, availability management must
    ensure that services are not degraded.
  • Engage Availability requirements for new and changed services must be aware, understood and captured.
  • Design and transition New and changed services must be designed to meet
    availability targets and vital testing of availability controls is needed during the transition cycle.
  • Obtain/build Availability is a consideration when building components or
    obtaining them from third parties in terms of the customer's expectancies.
  • Deliver and support This activity includes measurement of availability and
    reacting to events that might affect the ability to meet availability targets.
The Office Bantomime Team profile image
by The Office Bantomime Team

Subscribe to New Posts

Lorem ultrices malesuada sapien amet pulvinar quis. Feugiat etiam ullamcorper pharetra vitae nibh enim vel.

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Read More