Table of Contents

Datacenter Optimization

The aim of optimization is to:

Important note: increasing infrastructure reliability should be the main objective. Once infrastructure is fully stable and capable to face datacenter external and internal failures, free time will be available to optimize, experiment new optimizations, etc.

General formulae

PUE

Power usage effectiveness (PUE) is an indicator to measure data center energetic performance. PUE should takes into account all energy losses. There are however many formulae variations seen in datacenters. In theory, PUE should be:

PUE=(Total facility energy used)/(Useful energy used)

Note that this formulae can leads to PUE bellow 1 if heat is re-used.

This also leads to interpretations (commercial guys really like this ;-)). Most of the time, this result in:

PUE=(Total facility energy used)/(IT energy used at PSU)

However, this is not really accurate as for example, there are often other components in the datacenter than the cluster on the transformers (lights, offices, etc), and also, if measure is taken at PSU, with an air cooled server for example, this does not take into account the fans inside server (which is cooling part), etc. Also, this PUE does not include IT PSU performances, as IT energy used is taken at IT plug, before IT PSU. Measuring energy used by the whole IT system after PSU would be difficult.
In general, the more probes on Electrical/cooling/IT equipments, the more accurate is PUE. You need hypothesis for the remaining variables.

Standard PUE are 1.4-1.6 for air cooled datacenters, and 1.1-1.2 for watercooled datacenters.

The best way to really reduce PUE are:

However, PUE does not render the calculation performances, i.e. flops/watt which is also an important performances indicator.

Flops performances

With new green computers, it is important to take into account flops per watts, considering the energy measured before IT PSU (Local):

Flops/W_L=(flops delivered)/(IT energy used)

Still, this indicator do not take into account network and real applications performances per delivered watt.

Cooling optimization

First and most important: think globally. If optimizing somewhere generates a major loss somewhere else, then it’s not globally efficient.

Second and important thing: information is key. To optimize a datacenter, you need probes everywhere, and you need to monitor these probes before a modification, after, and then after on the long term. If you don't have money, use Arduinos combined with cheap sensors like DHT11.

To optimize cooling, main objective is to increase temperatures delta at maximum, taking into account:

A good strategy would be:

  1. Rise IT room temperature to maximum supported by IT equipments (consider the fact that it reduce time available in case of emergency).
  2. Check IT power consumption during this IT room temperature rise, and ensure fans do not draw too much energy as a consequence (can be a step effect, fans do not regulate linearly but by steps).
  3. Check also blowers of air handling units consumption.
  4. Rise cold water temperature, keeping into account the psychometric chart to prevent any water condensation (by increasing water temperature, this should reduce condensation).
  5. Before this step, check maximum temperature allowed by equipments like chillers (some cannot operate above a specific temperature)
  6. Check if power used globally has been reduced.
  7. At the end, pumps and air handling units fans should not have to blow much more because Delta T where somehow conserved, only temperatures where increased.
  8. Using higher temperature, chillers efficiency should have raised a lot because their Delta T with exterior air was increased ! It also allows more free cooling if available on chiller.

Of course, to increase Delta T in IT room, use confinement. There is no need to buy very expensive confinement, just ensure it resists fire and doesn't disturb fire detection/extinction strategy. This will massively increase Delta T into air handling unit.

Same strategy apply for watercooled IT equipments (means CPU is directly cooled by water). Try to increase Delta T at maximum. However, consider seriously the delay to shut everything down in case of emergency. Main water loop of watercooled systems has often small volume, and temperature increase very quickly in case of cooling failure.

Power optimization

There are not a lot of ways to optimize power consumption. It is only a mater of good calibration:

In general, optimize equipment range of use: too much loaded means equipment is in danger, too less loaded equipment means low efficiency.

Resources

Resources :

Articles :