Initial Fixed Losses
Clearly, if a choice existed, it would be more
advantageous to have one 60-minute disruption
of services rather than ten 10-minute disruptions.
Strictly using availability as a key metric does not
differentiate between these two scenarios. A second
criterion that is often included, but less so by IT
personnel, is reliability. Availability and reliability
are often used interchangeably without a clear
understanding of either terminology. In the data
center industry, they are expressed as:
Where: MTBF is defined as the Mean
Time Between Failure
MTTR is defined as the Mean Time
to Repair
t is the time interval under evaluation
It should be noted that in reliability theory and
reliability engineering, there are variations to how
availability is expressed and calculated, and it depends
on how downtime as a result of maintenance, supply
delays or administrative delays are accounted. How
these delays are accounted for is beyond the scope
of this discussion. Therefore, the availability expression
shown will be used, since it is the most common
expression employed.
May/June 2019 I 31
While BICSI 009 has not yet been approved
as an American national standard, given the mix
of participants that contributed and the adherence
to current BICSI Standards Program, BICSI 009 has
met those specific requirements of an ANSI industry
standard. This is an important distinction when
compared with proprietary guidelines that overtly,
or sometimes covertly, promote a single vendor’s
product or service.
REDUCE UNPLANNED OUTAGES:
AVAILABILITY OR RELIABILITY?
The primary driver of all data center operation policies
and procedures is to reduce unplanned disruptions
in the data center IT services. In a perfect scenario,
there would be no unplanned disruptions. Depending
on the level of redundancy of the critical infrastructure
(e.g., IT equipment ITE compute, storage and network
hardware, power, cooling, space) supporting the data
center IT services, the failure of a single component
or system does not automatically equate to a disruption
of the IT services.
The data center industry has typically focused
on availability as a key metric when measuring past
success, defining future objectives, or evaluating
possible solutions or processes to be implemented.
Focusing solely on availability may (and often does)
lead to an unexpected increase in the quantity of
unplanned disruptions. The cost of unplanned data
center disruptions is not linear to the total duration
of the unplanned disruption or combined unplanned
disruptions. Figure 1 shows that each disruption
to service has an initial fixed cost associated with
that disruption, and it is not dependent on the duration
of the disruption. Each disruption also has a cost per
minute factor associated with the disruption. This
includes tangible financial costs, such as lost revenue
or labor and equipment costs to restore services;
they also include intangible costs, such as damaged
reputation or lost customers. The actual fixed and
variable costs can vary significantly depending on the
critical infrastructure that experienced an unplanned
outage and the magnitude of data center IT services
that were impacted.
Cost
Duration
$/Min
0 1 2 3 4
FIGURE 1: Unplanned disruptions-duration versus cost.