The perils of ignoring data centre load testing


Dave Wolfenden, the director of Mafi Mushkila, a vendor agnostic leading provider of data centre related services, examines why - despite the fact that testing a data centre prior to formal handover to the client is a normal part of the construction process - few IT professionals tend to look at the testing process that ultimately can affect the reliability of the facility

Testing a data centre prior to formal handover to the client is a normal part of data centre construction, but few IT professionals look at the testing that is carried out, that can affect modern centres in the future. The usual reason for this is that, until a full set of servers and allied IT systems are in place, it is perceived as impossible to test a data centre under full loading conditions.

This is actually an incorrect assumption, and one that seems to be perpetuated by a number of misunderstandings regarding the complex technologies involved in a typical data centre.

In many cases airflow within the data centre is modelled using Computational Fluid Dynamics (CFD) modelling software during the design phase. In addition to the testing set out by the commissioning team the CFD model should be proven before the IT infrastructure is installed.

The reality is that the testing of a good data centre needs to be carefully planned and executed to ensure continuous operation for the design life of the data centre.

The data centre facility should be tested at a variety of load levels, working up to 100 per cent load. The majority of energy consumed by IT infrastructure is rejected as heat, the simplest way to replicate the IT infrastructure is the use of fan heaters.

In the past these varied from 2 or 3kw domestic fan heaters to large floor standing space heaters to produce load. In most cases the safety thermal cut out had to be removed to cope with elevated temperatures within modern data centres.

The heaters are often connected to temporary power supplies. These types of load do not reflect the airflow and temperature range akin to IT infrastructure and do not test the power supply end-to-end.

The CFD model at 100 per cent is likely to assume that the data centre is fully occupied with floor standing and rack mounted IT infrastructure. The reality is that during testing only some of the racks may be installed. To ensure the testing process is valid, temporary measures need to be in place to ensure the layout and load distribution to reflect the CFD model layout.

These measures could include installation of temporary IT racks, blanking, construction of temporary walls / aisle containment and the implementation of heaters and server emulators that reflect the load distribution across the data centre. If the customer’s IT racks have been implemented the heat load should be connected using the power strips installed within the racks. This may be the only time that the power strips are fully loaded (and therefore completely tested).

Whilst the latter two issues can be met using sensible planning, effective heat control is something of a science in its own right, as dissipating heat - from whatever source - within the data centre is a critical process.

If carried out poorly or using unreliable technology, then a runaway heat problem can quickly turn into an IT disaster, shortening both system and server lifespan at best - and causing equipment failures at worst.

Given companies increasing reliance on data centres to service the IT needs of their business, an equipment failure can cause a number of problems - ranging from a temporary outage of telephony and computer services for staff and allied personnel, all the way to a failure of an organisation's e-commerce web site - causing customer confusion, loss of brand loyalty and an ongoing loss of revenue.

The ROI and cost conundrum

In an ideal world, a business could throw enough money at a data centre project to ensure 100 per cent uptime and happy customers, as well as staff. In the real world however - even in a mission-critical application - there are clear ROI (Return on Investment) issues that must be addressed when planning, testing and maintaining an effective facility.

For most of our clients, this translates to the effective testing of a data centre at all possible stages in its planning and development, all the way from the computer modelling aspect of the installation, right through to the test heat and power loading prior to the installation of the relevant IT systems and servers.

Our observations suggest that the financial imperative of data centre testing means that - perhaps inevitably - some suppliers will attempt to `cut corners' by using less-than-appropriate testing methodologies.

This is particularly true when it comes to a process called Heat Load Testing, where the prodigious heat output of servers and allied IT systems in a data centre is emulated using specialist equipment such as a heat load bank or server emulator.

Although this testing equipment is cost-effective, the fact that the units have to be installed in a `shell' data centre - i.e. relatively complete apart from the IT systems and servers - for several weeks and on a wide scale, has led some companies to superficially test the heat and electrical load, rather than apply the in-depth testing processes that are required to ensure a centre will operate efficiently throughout its service lifetime.

So why do we need server emulators to complete the heat load testing process?

The reason for this is that a new IT equipment room, data centre - or modular data centre - is designed and expected to run continuously for the duration of its design lifetime, which can amount to many years, even in today's rapidly evolving IT arena.

To achieve this level of reliability it is necessary to thoroughly test the infrastructure before it goes into operation, both physically – using test equipment – and using appropriate CFD software to model the airflow within a facility and provide a graphic analysis of how the hot and cool air flows.

Using actual servers to complete the tests is not possible for a variety of reasons, including the cost of filling the data centre with servers, the potential for damage to IT equipment and the time it would take to reset servers after each test.

Coupled with the need for fixed, predictable loading during testing, a server emulator provides a variable electrical load and produces a heat load. These loads allow the testing of the electrical and cooling systems in a controlled environment.

The test units in detail

On the electrical test front, the use of head load banks and allied systems can make life simpler for data centre developers and facilities managers, as well as on the power governance front, as they help prove the efficacy of static transfer switches under partial and full load conditions.

As part of this element of the testing process, good testing equipment allows the thermal inspection of all joints and connections under a full load condition before the building becomes operational, so reducing the fire risk. One useful side effect of this process is that the electrical assessment process provides confirmation that power monitoring and billing equipment is operating correctly, as well as minimising risks and issues that may not otherwise be found for several years.

Allied to the electrical check process is the testing of ancillary systems such as electro-mechanical and mechanical units, pumps, cooling and chiller systems, as well as room air conditioning units (RACU) where appropriate.

These test processes are also useful for load testing of intermediate heat exchangers - which are usually installed to reduce water leakage loss in the suite, with capacities ranging from 100,000 litres all the way down to 250 litres.

Other processes can also include the proving of fail-safe systems on high density racks - such as confirming doors will open in the event of in rack cooling component or system failure.

On the water chilling side, the testing process normally requires load testing to prove that the chilled water ring has a sufficient volume of cold water to allow the chillers to restart when a generator kicks in, so negating the requirement to UPS-equip the chillers for resilience.

Understanding the commercial risk

All of these methods are, we believe, a fundamental aspect of data centre testing as the comprehensive checking of electrical and chilling/cooling systems is infinitely preferable - on several fronts - than destroying a bank of servers.

As an example, a rack of heaters can cost just a few thousand pounds, against the cost of a rack of servers that can cost into six figures.

By including effective testing as an integral part of the commercial risk evaluation and mitigation process, our observations suggest that this supports a timely sign off for data centre and allied buildings, and their acceptance into service.

Arguably and more importantly, by documenting a safe and reliable testing phase of a data centre deployment, this can act as proof to insurers that the systems are fit for purpose under full load, as well as providing high levels of assurance that the components and systems are set up and configured correctly.

A short history of data centre testing

Although Mafi Mushkila started life in the 1990s building early data centres for clients such as major insurance companies, over the last six years the business has moved into the data centre load testing arena for financial organisations, which include major household banking names.

In many ways, the need for effective testing has evolved alongside the astonishing rate at which computer processing power has progressed over the last decade, with consequent increases in hard drive capacities, allied with the rapid parallel evolution of cloud computing resources.

It is important to understand here that, despite its name, a cloud computing resource still involves the use of a data centre located somewhere in the world.

The growth in data centre usage is such that energy experts report that 2% of the world energy usage is accounted for by data centres - a figure that is expected to rise progressively in the coming years.

Our approach to data centre testing has been to develop a range of rack mounted server emulators to meet differing requirements that range from basic load, close match delta-t to servers and high density applications – using combinations of units capable of emulating up to a 6.5 megawatts consumption curve using modular blocks of up to 3.75kW on an ex-stock basis.

The use of a rack mounted format means that test units - whether rented, leased or purchased - can be slotted into and out of data centre racks as and when required, with blanking plates installed where appropriate.

It is also worth noting that the use of effective testing procedures like this means that the IT staff responsible for preparing a data centre for final handover are able to complete end-to-end testing of the electrical elements of the facility, and once again ensure the efficient longevity of the centre.

{jathumbnail off}