Friday, February 27, 2009

Linux SAN disk migration plan using disk Raid mirroring method

I. General Purpose: SAN Data migration between Linux hosts

1. Operational risks: during the migration, the current production server will experience single point of failure on the SAN disks.
2. Other risks involve failure in cooperation among UNIX system adiministrators , DBAs, Fibre cable technicians and Storage Operators.
3. Roll back, if following this document step by step, we should be able to roll back at any point of this operation
4. The scope of this exercise is limited to using Linux native raidtool as the data migration method. Other possible methods are not explored. They may include, EMC SRDF/BCV/Snapshot, Veritas Volume Manager Mirroring, Veritas Datamovers, LINUX LVM and copying data via network or optical media etc.
II. Target Server Preparation
*Note, outputs in this document are for illustration only. They are not actual screen shots.
1. Identify, build and test the target server according to current company standards.

2. Make sure raidtool is installed:
targethost# rpm –qa|grep raid

If rpm has not been installed, download from vendor site and install.
Redhat example: http://people.redhat.com/mingo/raidtools/

Please follow vendor instruction on to build and install a kernel. In the end, you should make sure you have the following options set in your kernel configuration:

CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_RAID1=y
CONFIG_BLK_DEV_LVM=y


You'll also need initial ramdisk (initrd) support, and it's recommended to compile in the loop block device:

CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_INITRD=y
CONFIG_BLK_DEV_LOOP=y

Additionally, you should build in support for the filesystems you intend to use.

CONFIG_EXT3_FS=y

3. Host Bus Adapter Configurations (HBAs)
Verify /etc/modules.conf contains qla2300 alias:

alias scsi_hostadapter2_qla2300

Verify /etc/modules.conf contains qla2300 options (assuming using qla drivers. If not, please check for other corresponding HBA entries. )

options qla2300 ql2xfailover=0 MaxRetriesPerPath=0 MaxRetriesPerIo=0 ql2xmaxqdepth=16

Load HBA driver if necessary

targethost# modprobe –a qla2300

Collect the HBA WWN on the target server and save for the next SAN disk provision section.
targethost# cat /proc/scsi/qla2300/[1234] | grep port




II. SAN Disk Provisioning

1. Using the EMC SAN control center, do the disk provisioning. During the migration, the target Linux host will have visibility to both the new and old Claiion Disks.
2. The SAN typer disks on the source and target storage devices should be identical.

III. Source Server operations

1. On the source host, verify SAN disks are mirrored:
sourcehost# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hda1[0] hdc1[1]
125376 blocks [2/2] [UU]

2. Let’s say “/dev/hdc” is the disk we want to use as the swing disk,
sourcehost# raidsetfaulty /dev/md1 /dev/hdc1

3. Verify “/dev/hdc” is no longer engaged,
sourcehost# cat /proc/mdstat
md2 : active raid1 hda2[0] hdc2[1](F)
9644928 blocks [2/1] [U_]

4. Move the fibre cable from sourchost’s second HBA port and attach it to the targethost’s second HBA port

IV. Target Server operations


1. Discover the new disk by doing a system reboot.
If rebooting is not possible, you may alternately use the following

For logic:
Do one at a time
targethost#echo scsi-qlascan > /proc/scsi/qla2xxx/
targethost#cat to see new device at the end

For Emulex:
targethost#echo 1 > /sys/class/scsi_host/host/issue_lip
targethost#rescan-scsi-bus.sh -l -w -c

Do one path at a time
targethost#echo - - - > /sys/class/scsi_host/host/scan


Check for new sd devices:
targethost#fdisk -l | grep Disk

Verifying Storage and Paths
1. After new provisioning reload qla2300 driver.

targethost# rmmod qla2300
targethost# modprobe –a qla2300


2. Verify storage and paths:
Example output: (not real output)
targethost# inq.linux
------------------------------------------------------------------
DEVICE :VEND :PROD :REV :SER NUM :CAP(kb)
------------------------------------------------------------------


/dev/sdc :EMC :Claiion_New :5670 :74302000 :17677440
/dev/sdd :EMC :Claiion_New :5670 :74302000 :17677440
/dev/sde :EMC :Claiion_Old :5670 :74302000 :17677440




3. Partitioning the New Disk

Run fdisk on the new unused disk to give the same partition as the original source disk on the source server
:
targethost# sfdisk -d /dev/hde | sfdisk /dev/hdc
targethost# sfdisk -d /dev/hde | sfdisk /dev/hdd

4. Update /etc/raidtab

The first step is to set up /etc/raidtab. This file serves as the configuration file for the mkraid command. We need to set up a stanza for both partitions, and declare any partitions on hdd as a "failed-disk" for now. This will keep the md driver from trying to use them at this time.

raiddev /dev/md1
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
device /dev/hdc1
raid-disk 0
# this is our old disk, mark as failed for now
device /dev/hdd1
failed-disk 1



5. Converting the Partitions into RAID Devices

# mkraid /dev/md1
handling MD device /dev/md1
analyzing super-block
disk 0: /dev/hdd1, failed
disk 1: /dev/hdc1, 125464kB, raid superblock at 125376kB


6. Update fstab
Once this step is done, edit /newdisk/etc/fstab and set it up to mount the new LVM volumes in place of the old partitions, and to mount the /boot partition. We would change it to the following on the example machine:
#

/dev/md1 /NewSanDisk ext3 defaults 1 1





7. Reboot and Start Mirroring

targethost# reboot


Alternatively, you may start raid without rebooting:

targethost# raidstop /dev/md1; raidstart /dev/md1


Once the server is online, attach the SAN disk from the sourcehost to the mirror:

targethost# raidhotadd /dev/md1 /dev/hde1


Verify the mirroring is in progress:

targethost# cat /proc/mdstat

md1 : active raid1 hde1[2] hdc1[1]
9644928 blocks [2/1] [_U]
[>....................] recovery = 0.1% (11696/9644928) \
finish=13.7min speed=11696K/sec
unused devices:



Verify the mirroring has been completed:
targethost# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hdc1[0] hdd1[1]
125376 blocks [2/2] [UU]


You may perform additional data integrity testing by mounting the filesystem, by starting database servers etc.


8. Reboot and Start Mirroring

Once all tests are completed successfully, you now may remove the disk from sourcehost from the Raid:

targethost# raidhotremove /dev/md1 /dev/sde2


Verify that sde is no longer engaged:

targethost# cat /proc/mdstat,


Then add a second mirror pair from the target server:

targethost# raidhotadd /dev/md1 /dev/hde1

The verification process will be the same as the immediate preceding section.



V. Cleaning Up

1. Make the targethost take on the personality of the sourcehost by changing its hostname, IP address etc.
2. Shutdown and decommission and old sourcehost.
3. On EMC control center, decommission un-used disks Hyper devices.