32436PN - Page 31

Initial Fixed Losses Clearly, if a choice existed, it would be more advantageous to have one 60-minute disruption of services rather than ten 10-minute disruptions. Strictly using availability as a key metric does not differentiate between these two scenarios. A second criterion that is often included, but less so by IT personnel, is reliability. Availability and reliability are often used interchangeably without a clear understanding of either terminology. In the data center industry, they are expressed as: Where: MTBF is defined as the Mean Time Between Failure MTTR is defined as the Mean Time to Repair t is the time interval under evaluation It should be noted that in reliability theory and reliability engineering, there are variations to how availability is expressed and calculated, and it depends on how downtime as a result of maintenance, supply delays or administrative delays are accounted. How these delays are accounted for is beyond the scope of this discussion. Therefore, the availability expression shown will be used, since it is the most common expression employed. May/June 2019 I 31 While BICSI 009 has not yet been approved as an American national standard, given the mix of participants that contributed and the adherence to current BICSI Standards Program, BICSI 009 has met those specific requirements of an ANSI industry standard. This is an important distinction when compared with proprietary guidelines that overtly, or sometimes covertly, promote a single vendor’s product or service. REDUCE UNPLANNED OUTAGES: AVAILABILITY OR RELIABILITY? The primary driver of all data center operation policies and procedures is to reduce unplanned disruptions in the data center IT services. In a perfect scenario, there would be no unplanned disruptions. Depending on the level of redundancy of the critical infrastructure (e.g., IT equipment ITE compute, storage and network hardware, power, cooling, space) supporting the data center IT services, the failure of a single component or system does not automatically equate to a disruption of the IT services. The data center industry has typically focused on availability as a key metric when measuring past success, defining future objectives, or evaluating possible solutions or processes to be implemented. Focusing solely on availability may (and often does) lead to an unexpected increase in the quantity of unplanned disruptions. The cost of unplanned data center disruptions is not linear to the total duration of the unplanned disruption or combined unplanned disruptions. Figure 1 shows that each disruption to service has an initial fixed cost associated with that disruption, and it is not dependent on the duration of the disruption. Each disruption also has a cost per minute factor associated with the disruption. This includes tangible financial costs, such as lost revenue or labor and equipment costs to restore services; they also include intangible costs, such as damaged reputation or lost customers. The actual fixed and variable costs can vary significantly depending on the critical infrastructure that experienced an unplanned outage and the magnitude of data center IT services that were impacted. Cost Duration $/Min 0 1 2 3 4 FIGURE 1: Unplanned disruptions-duration versus cost.