SteelEye DataKeeper 7.2.1 (SEDK) for Windows is a real-time application-layer data replication engine that is compatible with Windows 2003, Windows 2008 and Windows 2008r2 and which can integrate with either MS Cluster Server or Windows Server Failover Cluster. SEDK provides an OS instance the ability to mirror one or more volumes on a source server to different volumes on a target server across any network. Once a pair of volumes have been mirrored (the act of doing so is destructive to the target volumes), SEDK intercepts all writes to a given source volume and replicates the data to the paired target volume.
How it works:
SEDK uses an intent log (aka a bitmap file) to track changes made to a source volume. This log, which is stored by default in a subdirectory off of the install directory for SEDK, provides SEDK with a persistent record of write requests that have not been committed to the source and target servers. This, in turn, allows SEDK to survive service interruption without having to perform a full resync after recovery.
After the intent log has been updated, replication can occur in either a synchronous or asynchronous manner. The difference between the two has to do with the order in which writes are made to the source and target volumes. With synchronous replication, write requests are transmitted to the target server and executed against the target volume before being executed against the source volume. With asynchronous replication, write requests are executed against the source volume first and then are transmitted to the target server for execution. Write operations are always executed in the same order to both source and target volumes to ensure consistency between the two.
In the event of a service interruption, all changes to the source volume will be tracked in the bitmap but no updates will be sent to the target server. Status for mirrored volumes, under such a circumstance, will change from “Mirroring” to “Resync pending”. Once the service interruption has been resolved, SEDK will begin to read sequentially through the bitmap file to determine which blocks on the source volume have changed while in “Resync pending” state. Those changes will then be pushed to the target server for write execution. This process is referred to as a “Partial Resync”.
In order for SEDK to be successful, it is critical that sufficient bandwidth be made available to keep paired volumes in “mirroring” state. Before setting up SEDK in a production environment, SIOS recommends that perfmon be run for a period of time in order to understand the volume of write activity for the volumes to be replicated. Specifically “Disk Write Bytes/Sec” should be examined for each candidate volume. The SEDK user manual provides recommended maximum rates of change for several common network bandwidths.
The good news is that SEDK also includes the ability to compress data, using Zlib compression libraries, prior to transmission across the network. Doing so, however, does increase CPU overhead.
If the source and target servers sit on either side of a physical firewall, or have MS Windows Firewall installed on them, it will be necessary to configure those firewalls to allow SEDK to communicate. The user manual clearly states which ports are required for SEDK.
Finally, SEDK will attempt to use all the network bandwidth that it has available. This may present a problem if the network connection is shared with other applications or if the source and target servers are sitting on either side of a WAN circuit with limited bandwidth. This problem can be further aggravated during an initial or full synchronization. To limit this risk, SEDK includes the option to limit or throttle the network bandwidth it uses.
SEDK will only support disks whose sector size is 512 bytes.
Resizing dynamic disks is not supported while those disks are part of an SEDK mirror. Before resizing a dynamic disk, any mirrors must be deleted. After both the source and destination disks have been resized, the mirrors can be re-created.
If the bitmap is to be relocated, the new target directory must be created prior to reconfiguring SEDK with the new location. In no cases can the bitmap be placed on a dynamic disk. It must be placed on a basic disk.
SEDK application components:
At the heart of SEDK is ExtMirr.sys, a kernel-mode driver which is responsible for all replication activity between mirror endpoints. ExtMirr.sys communicates with local volumes, NTFS.Sys, the network stack and the DataKeeper Service.
The DataKeeper service, ExtMirrSvc.exe, acts as a relay between the DataKeeper GIU and command-line interfaces and the ExtMirr.sys. All commands that manipulate the configuration of a mirror have to flow through this service. It is not necessary, however, for this service be running in order to maintain mirrored volumes.
The DataKeeper GUI, Datakeeper.mmc, is an MMC 3 based interface that allows an admin to configure, control and obtain the status on mirrored volumes. This GUI can be run from any system so long as it can connect to the Datakeeper service on a system that needs to be managed.
SEDK also provides a command-line interpreter, EMCMD.exe, that can be used to configure and control mirrored volumes from a command-line interface. This is handy if an admin needs to be able to include such actions in a script.
SEDK leverages the Flexera InstallShield to provide for a rapid and smooth installation process. After the setup utility has been launched, it only takes four clicks, not counting licensing which will be covered below, and a reboot to complete the process. The only two decisions that need to be made are which components to install (Server, User Interface or both) and where to install them.
SEDK uses a run-time license, meaning that while it is possible to install the software without a license, it is not possible to start and run SEDK without one.
SEDK requires a unique license key for each server. This can be a little tricky to set up. After installation has been completed, the SEDK License Key Manager will launch. The License Key Manager will interrogate the system on which it is run to produce a Host ID. The Host ID, as well as an authorization code or Entitlement ID, which is delivered with the software, are both required to properly license SEDK.
To obtain a license key, an admin will need to log onto a SIOS web site and enter both the Host ID and Entitlement ID. The automated system will then email a license key file which is then copied to the correct server and “installed” via the License Key Manager. This process will need to be repeated for each server in the enterprise.
As a final note, the Host ID is dependent on, among other things, the primary NIC in the system. If that NIC is replaced, it will be necessary to re-license the server with the new Host ID.
The ease and flexibility in which mirrored volumes are created and managed is the real strength to SEDK. Using the GUI Manager, an admin is able to quickly create one or more “jobs”, each with one or more “mirrors” through a three-step wizard. Once a job is created, mirrors can easily be added, modified or deleted.
Mirrors can either be configured in a one-to-one relationship with a single source and singe target volume or in a one-to-two relationship with a single source and two target volumes on either one or two target servers. It is also possible to set up a many-to-one configuration through several one-to-one mirrors against a single target server. In all cases, a volume can only act as a target for a single source volume.
Each mirrored pair maintains its own job state and can be configured with a unique combination of compression and network throttle settings. This allows an admin to very granularly monitor and control the performance of the environment.
SEDK Cluster Edition:
One of the other strengths of SEDK is its ability to integrate with either Windows Server 2008 Failover Clustering (WSFC) or Windows 2003 Microsoft Cluster Server (MSCS). With a cluster running, the DataKeeper GUI will provide an admin with the option to register a mirror as a cluster storage resource, a DataKeeper volume, as a part of creating a mirror. It is also possible to register an existing mirror as a cluster storage resource using the EMCMD command-line interface.
Once an SEDK mirror is registered as a clustered DataKeeper volume, it can be included as a resource in a cluster service. Moving such a cluster service from one node to another will change the direction of copy for any DataKeeper volumes included in the cluster service.
Failure of an active cluster node with a mirrored volume will result in one of two outcomes. If the clustered Datakeeper volumes are synchronized, which is to say the state is “Mirroring”, then the cluster will be able to successfully bring the volumes online on the surviving cluster node. The target volume will become the source volume for SEDK and the status will report as “Resync pending”. Once the failed system comes back online, SEDK will go through a partial resync.
If clustered DataKeeper volumes are out of sync, they will usually report as either “Paused” or “Resyncing”. Under such conditions, a cluster may not be able to bring DataKeeper volumes online on a surviving cluster node after the primary node fails. This action is desirable as it preserves the integrity of the data between the two nodes. SEDK will go through a partial resync once the failed node comes back online but a failed cluster service will continue to stay offline until an admin clears the fault and manually brings the service back online.
Running SEDK can generate potentially significant overhead, depending on how it is configured. In order to understand the magnitude of this overhead I ran a series of tests in a controlled environment. Two virtual machines were configured with 2 VCPUs and 4GB RAM each on an single ESX 4 server in a lab. No other VMs were running on the server. Five volumes were mirrored between the two VMs using SEDK 7.2.1. A dedicated 1GB NIC was used to host the replication traffic. No other traffic was pushed through the NIC. An internal diagnostic tool was then used to generate drive I/O on one or more of the drives while perfmon captured performance data on 15-second intervals.
I ran two series of tests. In the first series, I increased I/O to a single drive in order to test SEDK’s ability to handle “scale up” load. In the second series, I spread the I/O out across an increasing number of drives (1 per work thread) in order to test SEDK’s ability to handle “scale out” load.
Each test included nine runs. The first run was included to baseline performance and did not include mirroring or, obviously, compression. The second run was done on mirrored volumes with no compression configured. Runs three through nine were performed on mirrored volumes with increasing levels of compression. Network throttling was never enabled.
Each run included five stages, with each stage adding an additional work thread. In all cases, a thread of work was able to generate something in the neighborhood of 28,000kb/sec of disk I/O with the ratio of read-to-write activity at about 2-to-1 on the baseline runs. The data from each stage is the average of about 10 minutes of activity.
In my tests, simply mirroring drives increased CPU overhead by up to 30-40% while dropping write activity by up to 12%. Adding compression on top of mirroring increased CPU overhead by 120-300% or more while dropping write activity by at least 40%.
Additionally, in my tests, enabling compression at its lowest level had a significant impact. SEDK was able to achieve a compression ratio of between 3 and 4-to-1 (e.g. for every 3 to 4 bytes send written to a mirrored drive, SEDK sent 1 byte of traffic over the network).
Interestingly, while higher levels of compression had a measurable impact on CPU and write activity, the compression ratio never showed a material change. My testing showed no benefits to running on anything higher than the lowest level of compression. This, however, may simply have been a result of the type of data written to the drives.
Opportunities for improvement:
No software is without its bugs and SEDK is no exception to this rule. There are a number of issues, and associated workarounds, listed in the user manual. There are more, I am sure, listed on the vendor website. The following are issues that I experienced during my evaluation.
There is a known issue with 64-bit perfmon counters that I tripped over almost immediately. After installing SEDK, I was unable to add counters to a perfmon log on any of my Windows 2008r2 test systems. To get around this, I renamed ExtMirrPerf.dll, the SEDK 64-bit perfmon counter dll. I have been told that this issue will be corrected in a forthcoming version.
I noticed that the DataKeeper GUI manager will, under some circumstances, incorrectly show job status as red or failed. For example, when first launching the manager, all jobs will show red for 15 to 20 seconds and then transition to green. Additionally, when the direction of mirroring is being changed as a result of a cluster failover, any associated jobs will transition from green to red and then back to green in the GUI manager. My hunch is that this happens when the GUI manager is unable to determine the status of a job. It would be nice, however, for the manager to show “status pending” or “unknown” rather than “failed” under the circumstances.
During load testing, I also noticed that the DataKeeper GUI manager would occasionally loose connection to the local host. Interestingly, when this happened, no errors were reported, nor was there any interruption to existing mirrors. I was, however, unable to manage (e.g. change the settings of) any jobs running on the host. In order to get around this I had to stop and restart the DataKeeper service.