Saturday, January 24, 2009

Making 10GE Green – 10GBASE-T and Wake on LAN

Written by Bill Woodruff

Minimizing power sometimes comes from unexpected places. As with most ICs, 10GBASE-T silicon entered the market at a relatively high power, but with process shrinks, it is seeing a geometric reduction in power over time. So how can a data center become greener by taking advantage of today’s sub-6W 10GBASE-T PHYs?

Leverage from Virtualization
Virtualization offers the capability to dynamically direct which physical server to assign workloads to. As the workload within a data center ebbs and flows, this capability to perform dynamic consolidation can deliver a great benefit.

The Green Grid consortium has identified putting servers into sleep state as one of the Five Ways to Reduce Data Center Server Power Consumption. Examine how the power consumed in a server varies based on the workload on that server. A good example was discussed in the session at WinHEC2008 titled “Windows Server Power Management Overview” (Figure 1). Power does decrease with decreasing workload. However the server’s power at idle only dropped to 65% of the power at full workload. Clearly there is great leverage -- an additional 65% reduction from full-load power -- in reducing power consumption if the server can be transitioned into sleep state.

10geslide1.jpg


Sleep States and Wake on LAN
Early efforts at power management recognized the importance of retaining the ability to perform certain tasks off-hours. Wake on LAN (WoL) provided the ability for network-directed activities, such as backup and maintenance, to be performed on PCs that have been put into a standby or hibernate state at night.

Industry-standard interfaces that enable OS-directed power management are defined by the Advanced Configuration and Power Interface. ACPI defines active states and sleep states (e.g., Standby, S3, retains memory power; Hibernate, S4, copies RAM to disk). As an interface standard, ACPI does not define implementation.

When entering a sleep state, the system begins a power-down process, turning off power to the processor and other elements of the system -- but not to the network interface when WoL is enabled. PCISIG has defined a separate auxiliary supply, Vaux, which provides 375mA at 3.3V. Given the low power available, one step in the process of invoking a sleep state is to reduce the link speed to the lowest speed possible. A 10GBASE-T PHY will break its 10GE link and re-establish the link at 100BASE-TX. A network adapter will also endeavor to segment functionality to remain under this Vaux power limit.

While in sleep state, the network adapter will monitor the link for a “magic packet”. When the magic packet is received, the process for exiting the sleep state will begin. Part of that process will be to return full power to the network adapter, and to re-establish the link at 10GBASE-T.

10GE Never Sleeps?
Is it true that 10GE never sleeps? This may no longer be important. Today’s servers typically have three separate network connections, one for the control plane, one for Ethernet traffic, and a third for Storage. LAN on Motherboard (LOM) 1000BASE-T ports typically drive the control plane, while NICs support 10GE Ethernet and HBAs support storage. Since WoL is a control plane function, the 10GE links not required during sleep states are powered off.

The converged network changes things. Moving control and storage connections onto the Ethernet port reduces cabling and complexity greatly. However the constraints of today’s technology practically eliminate the ability for 10GE to support sleep states and WoL.

It’s not that 10GE can’t monitor and flag a magic packet. That would be the simple part of the equation. The challenge is the Vaux power limit, and working within the constraints set by ACPI and PCISIG. Today’s 10GE NICs, HBAs and CNAs push up against the 25W upper limit for a PCIe card. When the server shifts to a sleep state and gates off all supplies, Vaux will provide only a bit over 1W. Monitoring for the magic packet at that power is not within the capabilities of today’s 10GE technology.

Partitioning Enables WoL
So how can we get 10GE to work on a 1 watt budget? Optical alternatives with SFP+ offer lower power than 10GBASE-T, however this path offers little hope for WoL.

The typical data path in an SFP+ based system will include the optical module itself (about a watt), an EDC chip (another watt) and the adapter silicon configured to only monitor for the magic packet. The power of the optical module and the EDC chip will be lower in the GE mode, as will the adapter silicon. But even though the adapter silicon only needs the most rudimentary MAC logic to monitor for the magic packet, these are complex devices where power can be dominated by leakage. Count on the MAC itself to require well over 1 watt, even when shifted to 100M speeds. Thus optical SFP+ based systems are ill suited to operate off of Vaux.

10GBASE-T can, in fact, get around this limitation, even when 10GBASE-T systems use the same stable of MAC silicon. The solution is to place some simple MAC monitoring functionality into the PHY itself.

The market has recently seen the debut of triple speed 10GBASE-T PHYs designed to enable the converged network to support WoL, through the following steps:

• Initiate the process to enter the sleep state. Break the 10GBASE-T link and re-establish the link at 100BASE-TX.
• Instruct the transceiver to enter WoL mode. In this mode, all unnecessary elements will be gated off, including SGMII. No external signals will be driven except for the GPIO which provides the interrupt upon receipt of the magic packet.
• Enable the soft switch on the NIC, which gates the supplies for the PHY. The system Vaux 3.3V supply will now supply the PHY as the other PCIe bus power supplies are removed.
• Monitor the traffic on the line. As an extension of its ability to monitor 10GBASE-T traffic for basic statistics and robustness, the transceiver also watches for a magic packet at 100BASE-TX.
• Upon detection of a magic packet, flag an interrupt on a GPIO pin. A controller on the NIC will need to initiate the process of pulling the server out of its sleep state. Note that the content of the magic packet to flag can be set by the user.

How Green is it?

Let’s look at the impact in a data center by examining one scenario.

• Active power per server at 400W (based on configuration and number of processors, power can range from below 200W to over 600W)
• Power at Idle at 50% of active power, 200W (actual depends on configuration)
• Power at Sleep, 2W, power savings entering sleep state of 198W (Vaux power ‘rounded up’ to 2W)

Given that each server entering a sleep state represents a 198W savings, the “green” factor becomes the ratio of servers in a sleep state to the total number of servers. A data center with 1000 servers, which employs dynamic consolidation to reduce the average number of active servers by 20%, will save about 40KW by employing 10GBASE-T vs. SFP+.

But what about the artifact that 10GBASE-T has higher power than SFP+? A comparison can be made to discover the crossover where the power savings from putting a server into sleep state exceeds the higher power (for the time being) of 10GBASE-T over SFP+. Assume a 10GBASE-T PHY at 6W and an EDC chip at 2W. Note that these power values will be on each end of the link, giving a direct-attach copper link an 8W power advantage over the 10GBASE-T link. This 8W power advantage for SFP+ quickly pales compared to the 198W power advantage for a server entering a sleep state (Figure 2). Once dynamic allocation provides for a consolidation of more than 4%, the “greening” of the data center will accelerate and quickly become material.

10geslide2.jpg


10GBASE-T and Green, only getting better

In the example just cited, the 10GBASE-T PHY extends conventional power management technologies to 10GE. Enabling dynamic consolidation with WoL is an important part of implementing an energy efficient strategy in the data center. But the importance of adopting the RJ45 and copper cabling increases as technology matures.

Moore’s Law applies to 10GBASE-T. Today’s 10GBASE-T transceivers will be improved by successive generations of process shrinks and innovations, with corresponding decreases in power and increases in density. The analysis above and its savings will be eclipsed as 10GBASE-T drops in power to below what SFP+ optical, or even direct-attach copper, can achieve.

Benefits that come from scaling will be augmented with important standards advances such as Energy Efficient Ethernet, or IEEE802.3az. This standard strives to reduce link power during periods of reduced demand, lowering both link power, as well as systems power.

10GBASE-T builds on four generations of twisted pair copper technology, from 10 megabit to 10 gigabit. In each generation copper interconnect has dominated. The challenges may change, but many factors favor 10GBASE-T, which now include being green.

0 comments:

Recent Posts