HP MPX200 Storage Router

The HP MPX200, based on the QLogic iSR6200 series of intelligent Storage Routers, offers a combination of high-performance multi-protocol storage connectivity with online and offline data replication and migration capabilities. The 1U unit supports redundant hot-plug power and cooling modules as well as redundant interface blades. There are two different interface blades available; one with two 8Gb FC and four 1GbE ports and one with two 8Gb FC, two 1GbE and two 10GbE ports.

The unit supports FC, FCoE, FCIP and iSCSI, and is compatible with Brocade, Cisco and Q-Logic FC switches and HBAs as well as storage arrays from Dell, EMC, HP, Hitachi, IBM, NEC and SUN. Finally, the MPX200 supports all major operating systems including Windows (and Hyper-V), Red Hat and SuSE Linux, VMWare and HP UX.

The MPX200 can support a variety of use-case scenarios:

  • The unit can act as a storage bridge. For example, it is possible to connect a variety of FC-based storage arrays to the MPX200 and then have storage clients (e.g. servers) connect to that storage via iSCSI and/or FCoE through the MPX200 and B- or C-Series Converged Network Switches.

  • The unit can also be used to extend the reach of an organization’s SAN Fabric through an IP network. It is possible, for example, to connect two separate MPX200 units, each connected to a separate SAN infrastructure, together through an IP network using FCIP. Data can then be accessed, replicated or migrated between arrays in each SAN environment through the FCIP link(s).

 

  • The unit can be used to replicate data. In this use-case, LUNs from two separate arrays  are mapped to an MPX200. A server is then configured to access storage on one array through the MPX200. The MPX200 can then be used to mirror all data from that one array to the second array for HA, DR or backup purposes.

  • The unit can also be used to migrate data between SAN arrays. When used to migrate data, LUNs from both source and target SAN arrays are mapped to the MPX200. The unit then replicates data from the source array to the target array. Data migration can be performed offline, where the affected server is taken out of production, or online, where the affected server is left in an operation state. When online migrations are performed, it is first necessary to insert the MPX200 into the data stream between the affected server and the source array. This allows the affected server to continue to read and write data during the migration process.

Aside from the acquisition costs, use licensing is also required. Use licensing is done either per TB of data migrated through an MPX200 or for each array that is connected to an MPX200.

VERITAS Agent for Hitachi/HP XP 3 Data Center Replication

The Veritas Agent for Hitachi/HP XP 3 Data Center Replication (aka 3DCHTC), bundled with the 4Q2011 Agent Pack, provides support for data failover and recovery in environments that use HTC or CA replication between three separate Hitachi or HP XP-class SAN frames. In a typical 3 data center (3DC) implementation, data is replicated synchronously between adjacent data centers within close geographical proximity and asynchronously to a geographically remote third data center. This mode of operation provides concurrent support for site-to-site (S2S) HA requirements as well as for zone-to-zone (Z2Z) DR requirements.

 

Configuration Overview:

Below is a diagram showing the typical intended setup for a 3DCHTC environment.

Data Centers 1 and 2, within close geographic proximity, host a production metro (S2S) cluster. Each cluster node has Fibre Channel access to one or more LUNs which are paired to its partner node using Sync replication.

Data Center 3, located in a remote geographical location, hosts a local DR cluster whith shared Fibre Channel access to one or more LUNs. These LUNs are paired to the LUNs zoned to one of the production cluster nodes using Async (Journal) replication.

The Production and DR clusters are linked together using Vertias’ Global Cluster (GCO) technology. Each cluster is configured with a local failover group using LUNs configured for 3DC replication. Fencing within each cluster is managed by Vertias’ cluster engine (HAD). This ensures that a local resource group can only be online on one node at a time. The local failover groups for each cluster are then linked using CGO which ensures fencing between the two clusters.

The node which is configured for both Sync and Async replication (in DC 2 above) is said to be the “Primary” node. The node which is configured for Async replication only (in DC 1 above) is said to be the “Sync Target” node. The node(s) which are configured for Async replication only (in DC 3 above) are considered to be “Async Target” nodes.

 

HORCM/XP RAID Manager Configuration:

The 3DCHTC agent uses the HORCM/XP RAID Manager to monitor and control SAN replication. In order to support 3DC replication, therefore, HORCM must be installed on all nodes and configured in an asymmetric fashion. At a minimum, two separate device groups are required, one for the synchronous S2S link and one for the asynchronous Z2Z link.

On the Primary node, the HORCM.Conf file is configured for both the S2S and Z2Z links with replicated LUNs sitting in both device groups. Both device groups share a common HORCM instance and can share a common CMD device.

On the Sync Target node, the HORCM.Conf file is only configured for the S2S device group. On the Async Target(s) nodes, the HORCM.Conf files are only configured for the Z2Z device group.

Below are examples of HORCM50.Conf files in a 3DC configuration:

Primary Node:

Sync Target Node:

Async Target Node:

 

Replication Patterns:

3DC replication is supported in one of two different patterns. In a 1:1:1 pattern, replicated volumes are mounted on the Sync Target node for read/write operation. Data is replicated from the Sync Target using the S2S link to the Primary Node which, in turn, replicates the data to the Async Target node(s) using the Z2Z link.

In a 1:2 pattern, replicated volumes are mounted on the Primary Node for read/write operation. Data is replicated from the Primary node to the Sync Node using the S2S link and from the Primary node to the Asyn Target node(s) using the Z2Z link.

Under no circumstances can 3DC replication originate off the Async Target node. In order to mount replicated drives on the Async Target Node for read/write operation (which should only occur to support a DR scenario), replication on either the S2S or the Z2Z link must first be suspended. 

LUN Identities:

Under a 1:1:1 pattern, the LUNs on the Sync Target node are in P-Vol status for the S2S link. The LUNs on the Primary node are in dual status (one for each device group), specifically S-Vol status for the S2S link and P-Vol status for the Z2Z link. The LUNs on the Async Target node(s) are in S-Vol status.

Under a 1:2 pattern, the LUNs on the Sync Target and Async Target nodes are in S-Vol status for the S2S and Z2Z links respectively. The LUNs on the Default node are in P-Vol status for both S2S and Z2Z links.

Below are example outputs from pairdisplay in a 1:1:1 pattern. Note that LDEV 29381 on the Primary Node takes on differing identities depending on which device group you are looking at.

Primary Node:

Sync Target Node:

Async Target Node:

 

Agent Configuration:

The following are type-specific attributes for the 3DCHTC agent:

Base Directory: Specifies the home directory for the RAID Manager software.

Sync Device Group Name: Name of the Sync Device Group managed by the agent.

Async Device Group Name: Name of the Async Device Group managed by the agent.

Instance: HORCM Instance number for the Sync and Async device groups.

Default Mode: The original replication role of the “attached” SAN frame. This can be PRIMARY, SYNC_TARGET or ASYNC_TARGET.

User: The user ID under which the HORCM Manager is to be started if it is not running.

Domain: The domain for the user ID specified above.

Password: The password for the user ID specified above.

Split Takeover: A flag that determines whether the agent permits failover to an S-Vol device when replication is disconnected or suspended (e.g. the local P-Vol device is in PSUE state or the target S-Vol device is in SSUS state).

Link Monitor: A flag that determines whether the agent will periodically attempt to resynchronize S-vol devices if replication has been disconnected or suspended.

If the flag is set to 0, then no resync is attempted. If the flag is set to 1, then the agent will make periodic attempts to resync Sync/S2S replication. If the flag is set to 2, then the agent generates an SNMP trap whenever the statuses of attached P-Vol or S-Vol devices change.

Expected Behavior:

Whenever the production cluster experiences an application or host failure on an active node, the associated resource group should fail over to the surviving node either automatically or manually depending on the AutoFailOver attribute for the faulted group. When this happens the 3DCHTC agent will issue a horctakeover command to swap LUN identities between the Primary and Sync Target nodes and enable storage on the surviving node to be mounted for read/write operation. The direction of Sync replication between the Primary and Sync Target nodes will be swapped as a result of this action. Replication between the Primary and Async Target nodes should be maintained.

Whenever the production cluster experiences a storage failure on an active node, the associated resource group should fail over to the surviving node either automatically or manually depending on the AutoFailOver attribute for the faulted group. When this happens the 3DCHTC agent will issue a horctakeover command to enable storage on the surviving node to be mounted for read/write operation. Because of the SAN failure, however, LUN identities between the Primary and Sync Target nodes will not be swapped. Instead, S-Vol status on the surviving node will be maintained with replication in SSWS state. If the surviving node is the Primary node, then replication to the Async Target should be maintained. Otherwise replication between the Primary and Async Target nodes will report as failed as well.

Whenever there is a SAN replication link failure either between the Primary and Sync Target nodes or between the Primary and Async Target nodes, VCS will take no action (e.g. no failover will occur). The 3DCHTC agent response is dictated by the LinkMonitor attribute. If LinkMonitor is set to 0, then no action is taken. If LinkMonitor is set to 1, then the agent will periodically attempt to resync any suspended Sync/S2S S-Vol devices using the pairresync command.

Additionally, in the case of SAN replication link failure between the Primary and Sync Target nodes, any subsequent failover between the two nodes will depend of the value of the SplitTakeover attribute. If SplitTakeover is set to 0 then the 3DCHTC agent will fault on failover preventing S-Vol devices form being mounted for read/write operation. If SplitTakeover is set to 1 then the 3DCHTC agent will issue a horctaekoever command to enable read/write operation on S-Vol devices. After SAN replication is restored, manual resynchronization will be required.

Whenever the production cluster experiences a catastrophic failure (e.g. either the Primary and Sync Target nodes experience a failure or the associated SAN devices on both nodes experience a failure), the associated resource group should attempt to fail over to an Async Target node in the DR cluster either automatically or manually depending on the ClusterFailOverPolicy global service group option.

In either case the 3DCHTC agent response on an Async Target node is dictated by the SplitTakeOver attribute. If SplitTakeOver is set to 0 on an Async Target node, the 3DCHTC agent will take no action and the resource will fault. If SplitTakeOver is set to 1, the agent will issue a horctakeover to enable read/write operation for Async S-Vol devices (remember that Async Target storage devices cannot assume P-Vol status with Sync Replication in place). Device status should report as SSWS.

In order to recover from a catastrophic failure (e.g. move operations from the DR cluster and back onto the production cluster) the 3DCHTC agent will attempt to perform the following actions:

  1. Split Sync replication between the Primary and Sync Target nodes.
  2. Resynchronize all data from the active Async Target node back to the Primary node.
  3. Enable read/write operation on the Primary node.
  4. Restart replication from the Primary node to the Sync Target node.

Global Clusters:

As a final note, the 3DCHTC agent was written to work within a GCO framework, that is to say two separate Veritas clusters linked by a wide-area connector (WAC) communications process. While this is not a hard requirement (it is possible to configure the 3DCHTC agent on two non-related clusters and manage failover in a purely manual process), GCO does provide fencing control, preventing a global failover group from running on both a Production and a DR cluster concurrently.

Global clusters, however, bring with them their own requirements in terms of configuration. GCO technology was written to support local clusters (clusters that sit on one network segment) and not replicated S2S clusters. A description of these limitations is beyond the scope of this document (e.g. the topic deserves a blog entry all to itself) but it should be emphasized that GCO is not trivial to implement and support when S2S clusters are involved.

High Availability (Part 3)

In the last installment I made the assertion that pursuing availability for a given application/IT service only made sense so long as the marginal cost of doing so exceeded the cost of downtime for that application. To illustrate this point I set up a scenario whereby an application was hosted on a single server and which could not be easily recovered in the event of a system failure but which could be instantly “failed over” to a second, third and/or fourth server should such an outage occur. We all know, of course, that such an application does not often exist in the real world.

Fundamentally speaking, high availability is driven by minimizing the loss that occurs with a service outage (colloquially known as downtime). Planned downtime is the loss that results from a known scheduled event such as system maintenance. Because the event is known in advance, any associated downtime can usually be minimized with proper planning and mitigation. Unplanned downtime, on the other hand, is the loss that results from unanticipated events such as a system failure. In the case of unplanned downtime, the key to meeting the SLO for the associated service lies in the ability to rapidly recover operational status after an outage occurs. Rapid recovery is systemic to the idea of high availability.

Not surprisingly, then, recovery brings with it its own set of objectives. Recovery time objective (RTO) is the maximum time allowed to recover operational status after a service outage occurs. RTO has a direct relationship with the SLO of an IT service. If the SLO for an application is four nines, for example, then the RTO for that application cannot exceed 52.6 minutes (and then pray that you do not experience a second outage for that year). While the concept is simple, execution is anything but. Remember that the RTO clock starts with a service outage. If it takes 20 minutes for your monitoring and reporting systems to alert you to an outage in the example above, then you only have 32.6 minutes to analyze the situation, formulate a plan and then execute a recovery in order to meet the RTO.

Recovery point objective (RPO) refers to maximum amount of data, usually stated as a function of time, that is allowed to be at risk of loss as a result of a service outage. Generally speaking, a recovery point is the difference between the last data protection event and the event that causes the service outage. By way of example, if you backed up a database and then loose the DB server three hours later, you will have lost three hours of data once the restore in complete. Moving forward, if you wanted to maintain an RPO of four hours for that data, then you would need to ensure that backups (or other data protection measures) are executed at least every four hours.

Note that the relationship between RTO and RPO is usually cumulative. Keeping with the example above, let’s say that you backed up your database at midnight and that the DB system crashed at 3:00am. The recovery point for the event is three hours. The RTO clock, however, starts at 3:00am. If recovery takes an hour to complete, then the total impact or loss due to the outage is now four hours.

Note also that a given application or IT service may not have an RPO. Systems that do not “own” data (think middle-ware or user-access applications) do not run the risk of losing data and therefore do not need a defined RPO.

SIOS SteelEye DataKeeper for Windows

SteelEye DataKeeper 7.2.1 (SEDK) for Windows is a real-time application-layer data replication engine that is compatible with Windows 2003, Windows 2008 and Windows 2008r2 and which can integrate with either MS Cluster Server or Windows Server Failover Cluster. SEDK provides an OS instance the ability to mirror one or more volumes on a source server to different volumes on a target server across any network. Once a pair of volumes have been mirrored (the act of doing so is destructive to the target volumes), SEDK intercepts all writes to a given source volume and replicates the data to the paired target volume.

How it works:

SEDK uses an intent log (aka a bitmap file) to track changes made to a source volume. This log, which is stored by default in a subdirectory off of the install directory for SEDK, provides SEDK with a persistent record of write requests that have not been committed to the source and target servers. This, in turn, allows SEDK to survive service interruption without having to perform a full resync after recovery.

After the intent log has been updated, replication can occur in either a synchronous or asynchronous manner.  The difference between the two has to do with the order in which writes are made to the source and target volumes. With synchronous replication, write requests are transmitted to the target server and executed against the target volume before being executed against the source volume. With asynchronous replication, write requests are executed against the source volume first and then are transmitted to the target server for execution. Write operations are always executed in the same order to both source and target volumes to ensure consistency between the two.

In the event of a service interruption, all changes to the source volume will be tracked in the bitmap but no updates will be sent to the target server. Status for mirrored volumes, under such a circumstance, will change from “Mirroring” to “Resync pending”. Once the service interruption has been resolved, SEDK will begin to read sequentially through the bitmap file to determine which blocks on the source volume have changed while in “Resync pending” state. Those changes will then be pushed to the target server for write execution. This process is referred to as a “Partial Resync”.

Network considerations:

In order for SEDK to be successful, it is critical that sufficient bandwidth be made available to keep paired volumes in “mirroring” state. Before setting up SEDK in a production environment, SIOS recommends that perfmon be run for a period of time in order to understand the volume of write activity for the volumes to be replicated. Specifically “Disk Write Bytes/Sec” should be examined for each candidate volume. The SEDK user manual provides recommended maximum rates of change for several common network bandwidths.

The good news is that SEDK also includes the ability to compress data, using Zlib compression libraries, prior to transmission across the network. Doing so, however, does increase CPU overhead.

If the source and target servers sit on either side of a physical firewall, or have MS Windows Firewall installed on them, it will be necessary to configure those firewalls to allow SEDK to communicate. The user manual clearly states which ports are required for SEDK.

Finally, SEDK will attempt to use all the network bandwidth that it has available. This may present a problem if the network connection is shared with other applications or if the source and target servers are sitting on either side of a WAN circuit with limited bandwidth. This problem can be further aggravated during an initial or full synchronization. To limit this risk, SEDK includes the option to limit or throttle the network bandwidth it uses.

Disk considerations:

SEDK will only support disks whose sector size is 512 bytes.

Resizing dynamic disks is not supported while those disks are part of an SEDK mirror. Before resizing a dynamic disk, any mirrors must be deleted. After both the source and destination disks have been resized, the mirrors can be re-created.

If the bitmap is to be relocated, the new target directory must be created prior to reconfiguring SEDK with the new location. In no cases can the bitmap be placed on a dynamic disk. It must be placed on a basic disk.

SEDK application components:

At the heart of SEDK is ExtMirr.sys, a kernel-mode driver which is responsible for all replication activity between mirror endpoints. ExtMirr.sys communicates with local volumes, NTFS.Sys, the network stack and the DataKeeper Service.

The DataKeeper service, ExtMirrSvc.exe, acts as a relay between the DataKeeper GIU and command-line interfaces and the ExtMirr.sys. All commands that manipulate the configuration of a mirror have to flow through this service. It is not necessary, however, for this service be running in order to maintain mirrored volumes.

The DataKeeper GUI, Datakeeper.mmc, is an MMC 3 based interface that allows an admin to configure, control and obtain the status on mirrored volumes. This GUI can be run from any system so long as it can connect to the Datakeeper service on a system that needs to be managed.

SEDK also provides a command-line interpreter, EMCMD.exe, that can be used to configure and control mirrored volumes from a command-line interface. This is handy if an admin needs to be able to include such actions in a script.

Installation:

SEDK leverages the Flexera InstallShield to provide for a rapid and smooth installation process. After the setup utility has been launched, it only takes four clicks, not counting licensing which will be covered below, and a reboot to complete the process. The only two decisions that need to be made are which components to install (Server, User Interface or both) and where to install them.

Licensing:

SEDK uses a run-time license, meaning that while it is possible to install the software without a license, it is not possible to start and run SEDK without one.

SEDK requires a unique license key for each server. This can be a little tricky to set up. After installation has been completed, the SEDK License Key Manager will launch. The License Key Manager will interrogate the system on which it is run to produce a Host ID. The Host ID, as well as an authorization code or Entitlement ID, which is delivered with the software, are both required to properly license SEDK.

To obtain a license key, an admin will need to log onto a SIOS web site and enter both the Host ID and Entitlement ID. The automated system will then email a license key file which is then copied to the correct server and “installed” via the License Key Manager. This process will need to be repeated for each server in the enterprise.

As a final note, the Host ID is dependent on, among other things, the primary NIC in the system. If that NIC is replaced, it will be necessary to re-license the server with the new Host ID.

Configuration:

The ease and flexibility in which mirrored volumes are created and managed is the real strength to SEDK. Using the GUI Manager, an admin is able to quickly create one or more “jobs”, each with one or more “mirrors” through a three-step wizard. Once a job is created, mirrors can easily be added, modified or deleted.

Mirrors can either be configured in a one-to-one relationship with a single source and singe target volume or in a one-to-two relationship with a single source and two target volumes on either one or two target servers. It is also possible to set up a many-to-one configuration through several one-to-one mirrors against a single target server. In all cases, a volume can only act as a target for a single source volume.

Each mirrored pair maintains its own job state and can be configured with a unique combination of compression and network throttle settings. This allows an admin to very granularly monitor and control the performance of the environment.

SEDK Cluster Edition:

One of the other strengths of SEDK is its ability to integrate with either Windows Server 2008 Failover Clustering (WSFC) or Windows 2003 Microsoft Cluster Server (MSCS). With a cluster running, the DataKeeper GUI will provide an admin with the option to register a mirror as a cluster storage resource, a DataKeeper volume, as a part of creating a mirror. It is also possible to register an existing mirror as a cluster storage resource using the EMCMD command-line interface.

Once an SEDK mirror is registered as a clustered DataKeeper volume, it can be included as a resource in a cluster service. Moving such a cluster service from one node to another will change the direction of copy for any DataKeeper volumes included in the cluster service.

Failure of an active cluster node with a mirrored volume will result in one of two outcomes. If the clustered Datakeeper volumes are synchronized, which is to say the state is “Mirroring”, then the cluster will be able to successfully bring the volumes online on the surviving cluster node. The target volume will become the source volume for SEDK and the status will report as “Resync pending”. Once the failed system comes back online, SEDK will go through a partial resync.

If clustered DataKeeper volumes are out of sync, they will usually report as either “Paused” or “Resyncing”.  Under such conditions, a cluster may not be able to bring DataKeeper volumes online on a surviving cluster node after the primary node fails. This action is desirable as it preserves the integrity of the data between the two nodes. SEDK will go through a partial resync once the failed node comes back online but a failed cluster service will continue to stay offline until an admin clears the fault and manually brings the service back online.

Performance overhead:

Running SEDK can generate potentially significant overhead, depending on how it is configured. In order to understand the magnitude of this overhead I ran a series of tests in a controlled environment. Two virtual machines were configured with 2 VCPUs and 4GB RAM each on an single ESX 4 server in a lab. No other VMs were running on the server. Five volumes were mirrored between the two VMs using SEDK 7.2.1. A dedicated 1GB NIC was used to host the replication traffic. No other traffic was pushed through the NIC. An internal diagnostic tool was then used to generate drive I/O on one or more of the drives while perfmon captured performance data on 15-second intervals.

I ran two series of tests. In the first series, I increased I/O to a single drive in order to test SEDK’s ability to handle “scale up” load. In the second series, I spread the I/O out across an increasing number of drives (1 per work thread) in order to test SEDK’s ability to handle “scale out” load.

Each test included nine runs. The first run was included to baseline performance and did not include mirroring or, obviously, compression. The second run was done on mirrored volumes with no compression configured. Runs three through nine were performed on mirrored volumes with increasing levels of compression. Network throttling was never enabled.

Each run included five stages, with each stage adding an additional work thread. In all cases, a thread of work was able to generate something in the neighborhood of 28,000kb/sec of disk I/O with the ratio of read-to-write activity at about 2-to-1 on the baseline runs. The data from each stage is the average of about 10 minutes of activity.

In my tests, simply mirroring drives increased CPU overhead by up to 30-40% while dropping write activity by up to 12%. Adding compression on top of mirroring increased CPU overhead by 120-300% or more while dropping write activity by at least 40%.

Additionally, in my tests, enabling compression at its lowest level had a significant impact. SEDK was able to achieve a compression ratio of between 3 and 4-to-1 (e.g. for every 3 to 4 bytes send written to a mirrored drive, SEDK sent 1 byte of traffic over the network).

Interestingly, while higher levels of compression had a measurable impact on CPU and write activity, the compression ratio never showed a material change. My testing showed no benefits to running on anything higher than the lowest level of compression. This, however, may simply have been a result of the type of data written to the drives.

Opportunities for improvement:

No software is without its bugs and SEDK is no exception to this rule. There are a number of issues, and associated workarounds, listed in the user manual. There are more, I am sure, listed on the vendor website. The following are issues that I experienced during my evaluation.

There is a known issue with 64-bit perfmon counters that I tripped over almost immediately. After installing SEDK, I was unable to add counters to a perfmon log on any of my Windows 2008r2 test systems. To get around this, I renamed ExtMirrPerf.dll, the SEDK 64-bit perfmon counter dll. I have been told that this issue will be corrected in a forthcoming version.

I noticed that the DataKeeper GUI manager will, under some circumstances, incorrectly show job status as red or failed. For example, when first launching the manager, all jobs will show red for 15 to 20 seconds and then transition to green. Additionally, when the direction of mirroring is being changed as a result of a cluster failover, any associated jobs will transition from green to red and then back to green in the GUI manager. My hunch is that this happens when the GUI manager is unable to determine the status of a job. It would be nice, however, for the manager to show “status pending” or “unknown” rather than “failed” under the circumstances.

During load testing, I also noticed that the DataKeeper GUI manager would occasionally loose connection to the local host. Interestingly, when this happened, no errors were reported, nor was there any interruption to existing mirrors. I was, however, unable to manage (e.g. change the settings of) any jobs running on the host. In order to get around this I had to stop and restart the DataKeeper service.

SpaceSniffer

SpaceSniffer is a light-weight and portable Windows program that allows users to quickly understand storage utilization. The program makes use of a Treemap visualization to not only show the directory structure of a storage device, but also the relative space used by various subdirectories and files within that directory structure. SpaceSniffer allows users to easily navigate up and down the directory structure and includes a deceptively powerful filter. Best of all, SpaceSniffer does not require “installation” in the traditional sense (you just drop the app in a directory and launch it) and it is published as FreeWare. Within 60 seconds of installing it, I had identified and deleted over 6GB of old stuff off my productivity system. How cool is that?

Why HP is getting out of the PC business.

On Thursday, August 28th HP announced Q3FY11 results, made a number of major announcements and provided revised guidance on FY 2011 year-total revenue and earnings per share. The next day HP stock gaped down by 20%, shedding over $9 billion in market capitalization.

Among the major announcements, HP ‘s CEO, Leo Apotheker, informed the market that HP’s board of directors “has authorized the exploration of strategic alternatives for the company’s Personal Systems Group. HP will consider a broad range of options that may include, among others, a full or partial separation of PSG from HP through a spin-off or other transaction.”

Before I begin to look into why HP would make such a move, let me clearly state up front that while I both work for HP and own stock in the company, I am in no way authorized to speak on its behalf. In my current role as an IT solutions architect, I have no access to insider information that would explain such a move. All the information I rely on for this analysis is publically available on the Internet.

For the 3QFY11, HP booked somewhere between $31 and $32 billion in revenue (the numbers will vary slightly depending on what is included and excluded). Of that total, $9.6 billion (about 30%) was attributed to PSG, the group responsible for commercial and consumer desktop and notebook PCs. Operational profit for the PSG group was $567 million.

On a year-over-year basis, PSG was actually a star among its peers. PSG was one of only two groups that were able to show an increase in operational profit compared to 3QFY10 results and PSG managed to pull that off on a 3% decline in year-over-year revenue. One could reasonably assume that this is a testament to the operational discipline Mark Hurd worked to instill into the corporation.

Why then would HP’s board of directors decide to divest itself of the group? PSG booked over $40 billion in revenue, not an insignificant sum of money, in FY10 and is on pace to do the same this year. Additionally, HP is the world’s leading vendor of PCs with something in the neighborhood of 18% of the market, a position it has enjoyed since 2007.

The answer to this, I feel, can be found in the numbers. To start with, the PC business operates on razor thin margins. According to Canalys, HP shipped 14,687,210 PCs in 1Q11. According to HP, the PSG group is averaging $591 million in operational profit per quarter this year (HP’s fiscal quarters do not align to calendar quarters). Dividing one number by the other quickly shows that HP is only earning about $40 for each unit shipped.

Indeed, while PSG was responsible for about 30% of the revenue for the third quarter, just squeaking by the HP Services group for 1st place, it only accounted for 16% of the operating profit, falling behind ESSN for 4th place. When looking at operating profit as a percent of revenue, PSG sits at 5.9%, behind all the other groups HP reports against.

This is not a new development. Looking back over the past thirteen quarters, one sees that quarterly revenues for PSG have bounced around between $8 billion and $10 billion with operational profits falling between $375 million and $670 million. Despite PSG’s sucess in prior years, there has been no recent indication of long-term growth in either metric. Yes, HP has shown that it can compete successfully in the PC marketplace. It has, however, failed to show that it can continue to organically grow the business.

Nor does growth through acquisition seem likely. Besides HP, the leading PC makers include Acer, Dell, Toshiba and Lenovo. After the rankerous aquisition of Compaq in 2002, it is hard to imagine who else in this group HP might acquire and one would seriously question why HP would invest money into a market segment that yields a 5% margin on operations.

Finally there is the issue of looking ahead. The latest IDC forecasts for desktop and portable PC growth is a mixed bag. Growth for desktop PCs is projected to be negative in some markets and anemic overall. IDC forecasts for portable PCs, however, are somewhat better.

But there is a rub. The IDC forecasts for “portable PCs” do not include tablets. Remember when I said that HP is the market leader in PCs? That too did not include tablets. According to a recent analysis by Display Search, which only looks at notebook and tablet PCs, Apple has already surpassed HP in “Mobile PCs”. According to their analysis, Apple shipped 13.5 million mobile PCs in 2Q11 to achieve a 21% market share compared to 9.7 million mobile PCs shipped by HP, who holds a 15% market share.

For Apple, this represents 136% year-over-year growth. Nearly 80% of the units Apple shipped were iPads. Sales of non-Apple tablets were up 25% year-over-year for the same period. Many analyst now fear that sales of tablets are beginning to cannibalize sales of notebook PCs. And that brings us to one of the other major announcements from HP. After launching its webOS powered TouchPad tablet two less than two months ago, HP announced that it was discontinuing operations for webOS devices.

Having abandoned, at least for now, its attempt to compete in tablet PCs, HP’s prospects in the PC business are gloomy at best. HP’s board of directors must have felt that it is better to either spin-off or sell the business now versus watching it continue to erode market share over the coming quarters. Still, one has to wonder at their approach. Telling the world that you want out of the PC business just might turn out to be a self-fulfilling prophecy.

Cloud Computing

I was visiting with my mother the other day when she casually asked, “What is all the hullabaloo with this Cloud Computing I have been reading about?” Interestingly enough, later that same day a neighbor asked me almost the same thing over dinner.

Indeed Cloud Computing has become quite the buzzword in IT. Google reports over 114 million hits on the phrase, compared to just 42 million for “Virtualization”. Every week I stumble over references to articles, talks, white papers and seminars from both people and institutions that claim to be experts on the topic.

It helps, then to both have a base understanding of what constitutes Cloud Computing and to be able to explain it, even at a high level, to non-IT people if, for no other reason, it might get me another free dinner with the neighbors.

The National Institute of Standards and Technology (NIST) in Special Publication 800-146 defines Cloud Computing thusly:

“Cloud Computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”

Unfortunately this is not something that I am likely to memorize soon (let’s be honest, I have trouble remembering my cell phone number) and, even if I did, is not something that would satisfy the curiosity of either my mother or my neighbors. Because of this I have settled on explaining Cloud Computing by describing a set of essential characteristics and comparing those to something that is familiar to most people, their electrical service.

Here, more or less, is how I describe them:

1. Ubiquitous Access

Electricity is available almost everywhere in the US thanks to a unified power grid that is built on a common set of standards. This empowers consumers to power a wide variety of devices, such as hair driers, large-screen TVs and blenders, through a common interface, generally a NEMA 5-15 15-Amp, 125-Volt grounded outlet here in the US.

Likewise, Cloud Computing is characterized by pervasive network access, generally speaking this is via the Internet, using standard communication protocols that promote the use of heterogeneous client platforms, such as laptops, tablets, PDAs and smart phones.

 2. On-demand Service

One of the things we take for granted about electricity is that it is pretty much available on demand. As consumers, for example, we are not required to schedule with our utility company when we will run a load of laundry through the washer and drier. Nor are we forced to wait for electricity to become available to do so. We expect electricity to be available when we want it.

Likewise, Cloud Computing is characterized by the ability to rapidly and unilaterally provision computing resources (servers, storage, applications and what not) as needed. This can either be done automatically or via an automated process initiated by the user, but requiring no human interaction with a given service’s provider.

3. Elastic Capacity

One of the other things we take for granted about electricity is that we don’t have to tell anyone in advance how much we need. If, for example, we decide one month to do laundry for everyone in the neighborhood, we fully expect the utility company to provide the additional electricity required to do so with no advanced notice. If we decide the next month to let the laundry pile up, we expect the utility company to handle our decline in demand with equal forewarning.

Likewise, Cloud Computing is characterized by the ability to rapidly provision additional computing resources as the demand for those resources increase and to release those resources just as quickly when demand subsides. Ideally all this is all done in an automated fashion, requiring no human intervention.

 4. Resource Pooling

Think about this for a moment. Do you know where your electricity comes from? Generally speaking power generation companies own and operate power plants, which can be fueled by coal, natural gas, wind, water or solar energy and which are scattered across the country. Generation companies sell electricity at wholesale prices to various retail electric providers who, in turn, package electricity with transmission and delivery service for sale to end users like you and me. By the time we get it, there is almost no way to determine where the electricity originated from.

In much the same fashion, with Cloud Computing, resources are pooled by providers or hosting companies to serve multiple customers with different physical and virtual resources dynamically assigned and reassigned according to demand. End users generally have no control or even knowledge of where these resources are located. Indeed, these resources may be scattered across different data centers located throughout the world and which might even be owned and managed by different companies.

5. Metered Use

Even wonder how much electricity you are using? The answer is pretty easy to get. Just wonder out into your back yard and look at the meter bolted onto the back of your house. This information is both easy to obtain and is pretty handy because at the end of the month, your utility company will charge you for the electricity you used.

In a similar fashion, Cloud resources are subjected to appropriate usage metering (storage used, CPU time, page hits, number of user accounts and the like). This information can be monitored and reported against by both the service provider, who is concerned about optimizing resource utilization across the board, and the consumer, who may be more focused on overall usage patterns and/or billing.

When looked at it from this perspective, there is not much of a mystery to idea of Cloud Computing. Implementation, of course, is another matter entirely.

Now … who’s turn is it for dinner?

Marginal Cost Analysis for Data Center Availability

For this installment, I wanted to go back and look at the marginal cost of data center availability. In a previous blog post, I had argued that chasing uptime becomes an increasingly expensive affair, which only makes sense so long as the marginal cost of uptime does not exceed the marginal revenue derived from the increase in availability.

To illustrate this point, I used a hypothetical one-server application with cost and revenue numbers that were completely made up. The cool thing about using hypothetical examples with made-up data is that you would have to be a complete idiot for them to not support your argument. But what about real-world data? How would my argument hold up given the real-world analysis that I have reviewed in my last couple of posts?

For our analysis, let us assume that we have a requirement to provide an environment that would support 5 megawatts (MW) of computing equipment within 20,0000 square feet of data center space. What would it cost to build a Tier-1, Tier-2, Tier-3 and Tier-4 data center to support this environment?

For each of the four proposed data centers, the total cost of white space will be $6 million (20,000 square feet x $300 per square foot). The total cost for MEP, however, would vary by tier level.

For a Tier-1 data center, with no redundancy, we would only need to support a total capacity of 5,000 kW for power and cooling. At a cost of $11,5000 per kW, this means that the total cost for MEP in a Tier-1 data center would come out to $52.5 million.

For a Tier-2 data center, with N + 1 redundancy, we would need to provide some additional capacity for power and cooling. For this example, let us assume that 20% is sufficient. This brings the total requirement for power and cooling to 6,000 kW. At a cost of $12,500 per kW, the total cost for MEP in a Tier-2 facility would be $75 million.

In my last blog, I had suggested that most Tier-3 data centers are probably built around 2N redundancy for power. For this analysis then, let us say that the total requirement for power and cooling comes to 10,000 kW for a Tier-3 data center. At a cost of $23,000 per kW, the total cost of MEP in such a facility would come to a hefty $230 million.

Finally, for a Tier-4 data center, with 2(N + 1) redundancy, we would be looking at a requirement for 12,000 kW of capacity. At a cost of $25,000 per kW, the total cost of MEP in a Tier-4 data center would come to a staggering $300 million.

In order to provide a rational analysis, however, we cannot just look at total capital costs. We must look at both costs and availability over a common unit of time. It has been my experience that companies amortize fixed assets like buildings on a 30-year basis and MEP systems on a 15-year basis. For our analysis then, we will divide white space and MEP costs by 30 and 15 respectively to look at costs on an annual basis.

When we put all of this information into chart form, we end up with something like this:

  Tier-1 Tier-2 Tier-3 Tier-4
Cost of White Space $6,000,000 $6,000,000 $6,000,000 $6,000,000
Cost of White Space per Yr $200,000 $200,000 $200,000 $200,000
Cost of MEP $57,500,000 $75,000,000 $230,000,000 $300,000,000
Cost of MEP per Yr $3,833,333 $5,000,000 $15,333,333 $20,000,000
Total Costs per Yr $4,033,333 $5,200,000 $15,533,333 $20,200,000
Marginal Cost per Yr $0 $1,166,667 $10,333,333 $4,666,667

 

Okay, so that gives us the cost information we are looking for. What about the benefits, the availability that all this money is buying us? Those numbers are actually pretty easy to get. Remember in a previous post I had provided some real-world availability numbers provided by the Uptime Institute for each of the four categories of data centers in our analysis. Using those numbers, we can quickly calculate total and marginal uptime, in hours, on an annual basis.

  Tier-1 Tier-2 Tier-3 Tier-4
Availability 99.67% 99.75% 99.98% 99.99%
Uptime per Yr (Hrs) 8,731.09 8,738.10 8,758.25 8,759.12
Marginal Uptime per Yr (Hrs) 0.00 7.01 20.15 0.88

 

With both the marginal costs per year and the marginal availability per year in hours, it is simply a matter of dividing one by the other to come up with the marginal cost per hour of availability. As a reminder, we are not talking about operational costs here. Our sole focus is on capital costs. Specifically, what does it cost to build an increasingly available environment to support our computing equipment. 

  Tier-1 Tier-2 Tier-3 Tier-4
Marginal Cost per Yr $0 $1,166,667 $10,333,333 $4,666,667
Marginal Uptime per Yr 0.00 Hrs 7.01 Hrs 20.15 Hrs 0.88 Hrs
Marginal Cost per Hr $0 $166,476 $512,871 $5,327,245

 

So what does all this tell us? Well, basically it says that the decision to build a Tier-2 data center over a Tier-1 data center will provide an additional 7 hours of availability per year at an amortized cost of $166,476 per hour of availability. Likewise the decision to go with a Tier-3 data center over a Tier-2 data center will provide an additional 20 hours of availability at a cost of $512,871 per hour. Finally, the decision to build out a fully redundant Tier-4 data center over a Tier-3 data center will provide an additional 53 minutes of availability at a cost of $5,327,245 per hour.

Even with real data, availability, it seems, becomes increasingly expensive.

An Observation About Data Center Uptime and Costs

In my last blog entry I regurgitated, in summary form, the analysis of an Uptime Institute white paper on the cost drivers for modern utility data centers. I understand that this may not have been a very exciting thing to read. However, with data like this, we can start to do some interesting real-world cost analysis and who among us doesn’t drool over an opportunity like that?

As we start to do this, let me take a moment to make an observation about the information I presented in my last two blog entries. It is important to remember that the Uptime Institute only recognizes four distinct classifications for data centers. They do not award extra credit for data centers that go above and beyond the minimum requirements for a given tier. As such, there is no such thing, for example, as a Tier 2.5 or 3 “Plus” data center.

I think this is important to remember as we look at the delta’s in real-world data on the cost per kW and availability across the board, particularly between Tier-2 and Tier-3 data centers.

  Tier-1 Tier-2 Tier-3 Tier-4
Cost per kW $11,500 $12,500 $23,000 $25,000
Downtime per Yr 28.91 Hrs 21.90 Hrs 1.75 Hrs 0.88 Hrs

 

 

 

 

Looking at this information, I am left wondering why there is such a large gap between both the cost per kW and availability between a Tier-2 and a Tier-3 data center. A Tier-2 data center, remember, is defined by N+1 redundancy and a single distribution path for power and cooling. Likewise, the definition for a Tier-3 data center is N+1 redundancy, but adds a requirement for a redundant non-active distribution path for power and cooling.

My experience with Tier-3 data centers, however, is that most of them exceed the minimum requirements for Teir-3 certification. More specifically, in my experience, Tier-3 data centers provide 2N power and cooling capacity over multiple active distribution paths up to a certain point. The Tier-3 data centers that I am familiar with usually fail to obtain Tier-4 certification because of their reliance on something like a single chill-water loop or a non-redundant power bus linking utility and generator power to the switching gear. My speculation is that this trend in real-world data centers accounts for the large differences we see in the models.

Because of this, I will also assume in my analysis that the total power capacity for a Tier-3 data center is actually 2N and not N+1 as is defined by the Uptime Institute. For example, if the requirement is to provide for enough capacity to power and cool 5 megawatts (MW) of data processing gear, I will assume that a Tier-1 data center will have a total capacity of 5 MW. For  Tier-3 data center, however, I will assume that the total capacity will be 10 MW.

Data Center Costs

In a previous blog entry I talked about the various data center tiers as defined by the Uptime Institute. That should beg the question, “well … how much do these things cost?” Not surprisingly, the Uptime Institute has provided some data on that as well.

In a white paper entitled “Cost Models: Dollars per kW plus Dollars per Square Foot of Computer Floor” the authors, Turner and Brill, present a summary of their analysis of 20 major data center projects starting in 2007. From that analysis, they conclude that the primary construction cost drivers for a data center include: 

  • Power and cooling capacity of the facility as measured by total kW
  • The functional tier of the data center as defined by the Uptime Institute
  • Size of the white space as measured per square foot
  • Size of “empty space” (e.g. space included in the original facility but reserved for future expansion) as measured per square foot.

The white space and empty space components provide for raw tenant space (both the raised floor and total incidental office space within the facility) with no power or cooling capacity and are independent of the functional tier level. What converts the raw tenant space into a functional data center are the mechanical, electrical and plumbing (MEP) systems that support the data processing equipment. The power and cooling components are sized in kW by tier level and provide for the acquisition and installation of the MEP infrastructure as well as the gross space required to house it.

From all this, the authors propose the following cost model: 

  • Tier-1 data centers cost $11,500 per kW of redundant capacity.
  • Tier-2 data centers cost $12,500 per kW of redundant capacity.
  • Tier-3 data centers cost $23,000 per kW of redundant capacity.
  • Tier-4 data centers cost $25,000 per kW of redundant capacity.
  • White space cost $300 per square foot.
  • Empty space cost $190 per square foot. 

The authors conclude by cautioning the reader that these estimates are based on 2007 data, are subject to certain design and size considerations, do not include the cost of land acquisition and have a fudge factor of +/- 30%.