SAN Data Replication

One on the interesting aspects with regard to storage management today is replication. Vast amounts of storage can be replicated from disk to disk within a large array using software from the array vendors or storage software vendors. It is also possible replicate data from one array to another within the same datacenter or across geographic regions for use in disaster recovery scenarios.

Shadow Image / Business Copy

Hitachi Data Systems offers a product, Shadow Image, which allows users of HDS 9900 arrays to make copies of data within the array for backups or for test and development. Hewlett Packard resells this software under the name Business Copy for use on HP's XP arrays which are rebranded 9900's.

Shadow Image can be controlled from a graphical web interface via either HDS Storage Navigator or via HP's CommandView. The graphical interface is useful for tasks which do not require automation. For more complex tasks and tasks requiring automation there is a programmable interface which can be installed on a SAN attached host. This programable interface communicates with the array via command devices which are simply special luns created in the array and assigned to the host for this purpose. The disks that data is being replicated from are called PVOLS or primary volumes and the disks receiving this data are called SVOLS or secondary volumes. Control of the replication process can be distributed among several hosts via UDP services which are part of the programable interface.

When the pvol / svol pairs are initially created the entire content of each pvol is copied to the svols. During this process the pairs are in "COPY" status. Once the initial copy is complete they are in "PAIR" status. In order to use the svols the pairs must be split which will suspend the copy process. While the pairs are split the array keeps track of changes made to both pvol and svol so they may be quickly resync'ed in the future.

A typical use of this software is for backups of large filesystems or databases. For example, say you have a 6 TB database on a large busy database server. Backing up this volume of data across the network to a backup server can flood the network and place a significant load on the database server which may already be heavily loaded. The solution is to use Shadow Image. The svols could be assigned to the backup server then with the database in backup mode the pairs could be split. At that point the database could come out of backup mode and the svols could be mounted on the backup server. Backing up the svols would be equivalent to a full cold backup without using the network or database server at all.

Another interesting use for the software is to test upgrades using a copy of live data and application binaries. For example, suppose you need to test adding a patch to your Oracle installation or doing an upgrade. Rather that risk upgrading the live system you could mount the svols on a test server and test the upgrade. If the upgrade succeeds you could also reverse sync the pvol/svol pairs during a short maintenance window and then the live system would be upgraded. There are any number of ways to take advantage of these capabilities.

The real power of this software is in the fact that the replication occurs within the array and not on the host. This frees the host from having its disk IO bandwidth and CPU resources impacted by the replication.

True Copy / Continuous Access

True Copy is very similar to Shadow Image except that it replicates data from one array to another. Essentially you connect one or two fibre ports from one array to another. The arrays may be direct connected with the ports configured as 2 node arbitrated loops or fabric connected to a fibre channel switch. If connected via a switch the ports need to be zoned together if zoning is enabled on the switch. Data alway flows from the Master array to the Remote array. The port on the Remote array must be configured as an RCU Target port and the Master array port must be configured as an initiator port.

An interesting possibility with True Copy is long distance replication via a wide area network. This requires that the remote and local array ports be connected connected via a protocol converter at both sites. The protocol converter converts FCP to IP for transmission via the WAN. An example of a protocol converter is the Ultranet Edge Storage Router from McData. Here is a diagram of an example True Copy or Contiunous Access configuration.

True Copy Setup Overview

  • Set remote ports to RCU Target.
  • Set local ports to Initiator.
  • Create RCU relationships for the remote CUs using SSIDs. You need a RCU relationship for each CU that will map to a remote CU. Use xpinfo to get the ssid numbers. You can also use the output from the XPDT tool provided by the vendor. A third alternative to get the ssid is from the SVP.
  • Establish pairs by selecting a local cu:ldev and pairing it to a remote ldev using the remote port#, Hostgroup# and lun#. This is the remote port & host group number and lun number where the ldev is currently assigned. Note: The LUN number is in Hex.
  • Async operation is preferred and requires that consistency groups be created in advance. Use the Asynchronous Operation tab in the GUI to create the groups. A minimum of one Consistency Group is required.

When replicating databases all luns that make up the database should be in the same consistency group to ensure that the database is consistent when the pairs are split.

Issues That May Arise

  • There are some compatibility issues replicating from the XP12k to the XP1024 with regards to CU numbers. The 12k supports more than 32 decimal CU numbers but the 1024 is limited to 32 decimal. If CU numbers higher than 32 decimal need to be replicated to a 1024 then there is a problem. Also the 12k supports hostgroup lun numbers from 0-3FF while the 1024 supports only 0-FF so using lun assignments higher than FF will cause troubles.
  • The CNT Edge 1000 router will not work with direct connections and requires a fabric connection. The problem with this is that if you connect them to your local and remote fabrics the fabric switches may attempt to merge fabrics across the WAN link. Our work around was to build separate fabrics on both sides using low-end switches.
  • When the Brocade switch is connected to the CNT router the Brocade will ask for a WAN license. The WAN license is required to expand fabrics over a WAN. The work around is to set the Brocade port to ISL_RDY using the portcfgislmode command at the Brocade command line.
  • After the portcfgislmode command is issued the remote fabric will merge with the local fabric. After the merge the local array CA port must be zoned with the remote array CA port.
  • The CISCO switch ports that the CNTs are connected to will most likely need to be forced to 100Mb FDX. If they auto negotiate to half duplex they will likely be operating at about 1/100th of available capacity.

    Resources




You are visitor number 7031