Beyond Technology: Setting up software RAID

This should be a relatively straightforward way of setting up your whole system to use RAID. This info was taken from two howtos:

Software-RAID HOWTO^*
Boot+Root+Raid+Lilo HOWTO^*

Remember that this only makes sense after you've done it once... :)
Test PC:
PII-350, 384Mb RAM, and Adaptec 2940U2 SCSI controller, and 3 18Gb Seagate drives.
Test Mac:
Blue & White G3, 256MB RAM, Adaptec 2940 SCSI controller, 3 8GB SCSI drives.
These instructions have been tested on various flavors of OpenLinux, Red Hat, Mandrake, and now Yellow Dog Linux for PowerPC.
IMPORTANT NOTE: According to the howtos, if you're going to use IDE drives for this, you should only have one driver per channel. Slave drives will kill the performance of the RAID, so factor the purchase of a couple of IDE controllers into your equations. I have personally tested the Promise UDMA-100 cards in a RAID configuration, and they work very well.
RAID-5 requires 3 hard drives of the same size, so you should install those and make sure they work before starting this process.
General Partitioning notes:
Since RAID-5 isn't supported by most installers, you must first install Linux to one of the drives. Later on we'll convert that drive to become a part of the RAID. If you have at least 128MB RAM skip the swap partitions. We'll create a swapfile on the RAID later so that if a drive dies the box won't crash. Don't split the mount points up among partitions as you normally would. Put '/' on one of the large Linux partitions and leave the small 50Mb partitions and the large Linux partitions on the other 2 drives empty. To make our job easier later, create two 50Mb partitions at the front of the first 2 drives and leave those partitions empty for now.
Mac partitioning notes:
You may see lots of strange Apple partitions on your disk. As long as you're not dual-booting with MacOS go ahead and delete them. It won't hurt anything, and you can always put them back later with Apple's disk utilities.
IMPORTANT: Don't delete partition 1! The first partition of a Mac disk is the partition table, so that would cause all kinds of havoc.
In addition to the Linux partitions, allocate a 10MB Yaboot boot partition at the beginning of the first two disks. This is where your bootloader will go.
My PC partition structure looks like:

/dev/sda1 - 50Mb, Linux, empty
/dev/sda2 - 17Gb, Linux, /
/dev/sdb1 - 50Mb, Linux, empty
/dev/sdb2 - 17Gb, Linux, empty
/dev/sdc1 - 17Gb, Linux, empty

My Mac partition structure looks like:

/dev/sda1 - Apple partition map
/dev/sda2 - 10MB, Apple Bootstrap
/dev/sda3 - 50MB, Linux, empty
/dev/sda4 - 8GB, Linux, /
/dev/sdb1 - Apple partition map
/dev/sdb2 - 10MB, Apple Bootstrap
/dev/sdb3 - 50MB, Linux, empty
/dev/sdb4 - 8GB, Linux, empty
/dev/sdc1 - Apple partition map
/dev/sdc2 - 8GB, Linux, empty

After installing Linux to the first large partition, the next step toward a complete RAID-5 system is to recompile the kernel. Most distributions ship their SCSI support as modules. This is normally a good thing, but not if you want to load from a RAID device. That means we're recompiling. If you're using SCSI devices, make sure you include SCSI support, SCSI hard disk support, the driver for your SCSI controller in the kernel and not as modules. For IDE devices, include IDE support, IDE disk support, and support for your controller if needed (Promise cards have their own driver for example). Also, whatever filesystem your Linux drives are must be compiled in (ext2, ext3, ReiserFS, etc). I'll be using Reiserfs for this example. Make sure you turn off the extra checking option or Reiserfs can be really slow.
Mac Kernel Notes:
You'll need a recent PPC kernel for this to work on a Mac. These are available at www.ppckernel.org. I used 2.4.20-benh10. You'll also need a new version of Yaboot, available at penguinppc.org. I used 1.3.10. If you're accustomed to building kernels on Intel you generally use 'make bzImage' as your final step. Unfortunately compressed kernels aren't supported on PPC, so you'll have to use 'make vmlinux' instead.
Once the recompile is complete, move the kernel into place and edit grub/lilo/yaboot accordingly. Then reboot and check that all your hardware is seen.
Now we'll create the /etc/raidtab file that will configure your RAID devices. On the PC this should contain the following:

raiddev /dev/md0
raid-level 5
nr-raid-disks 3
nr-spare-disks 0
persistent-superblock 1
parity-algorithm left-symmetric
chunk-size 32
device /dev/sdb2
raid-disk 1
device /dev/sdc1
raid-disk 2
device /dev/sda2
failed-disk 0
raiddev /dev/md1
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
chunk-size 32
device /dev/sda1
raid-disk 0
device /dev/sdb1
raid-disk 1

On the Mac:

raiddev /dev/md0
raid-level 5
nr-raid-disks 3
nr-spare-disks 0
persistent-superblock 1
parity-algorithm left-symmetric
chunk-size 32
device /dev/sdb4
raid-disk 1
device /dev/sdc2
raid-disk 2
device /dev/sda4
failed-disk 0
raiddev /dev/md1
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
chunk-size 32
device /dev/sda3
raid-disk 0
device /dev/sdb3
raid-disk 1

This sets up /dev/md0 as our large RAID-5 array which will contain our root ('/') filesystem. But what does all that mean?

raiddev /dev/md0 - specifies that we're creating raid device /dev/md0
raid-level 5 - specifies that this is a RAID-5 array
nr-raid-disks - specifies the number of *active* disks in the array
nr-spare-disks - specifies the number of spare disks in the array (spare disks are used automagically if an active disk fails)
persistent-superblock 1 - puts a block on each RAID device that contains info about its position in the array (among other things). Ever wonder what happens if you physically re-arrange drives in the array by accident, or switch a cable to the wrong drive? Without this, the array wouldn't know and would go to pieces. On the PC this also allows booting from the array, which we'll get to later.
parity-algorithm left-symmetric - specifies the algorithm used to spread the parity info among multiple disks. Don't ask me how it works, I haven't a clue.
chunk-size 32 - specifies the chunk size the array writes in. this has an affect on performance, but since I don't understand all that too well I just use what the docs recommended.
device /dev/sdxx - specifies the device name of each partition to be included in the array.
raid-disk x - specifies a unique number assigned to each device in the array.
failed-disk x - specifies a device that is in a failed state. In this case, we specify our current non-raid boot device so that the raid doesn't try to incorporate it into the array yet. That Would Be Bad(tm).

What's the second RAID device for, you ask? Well, I'm glad you asked. Booting to a RAID-5 array is a bit of a problem, and /dev/md1 is part of the solution. Grub, LILO, and Yaboot do not understand the inner workings of a RAID-5 array, so they can't boot to one directly. LILO and Yaboot, however, can boot to a RAID-1 array. Therefore, once we've created our arrays, we'll make /dev/md1 into our /boot partition. It will contain the kernel, which is the only thing that needs to be accessible to the bootloader. We'll configure the bootloader to boot that kernel, and then we'll have a bootable RAID system.
Now let's create our arrays. This part is easy:

mkraid /dev/md0
mkraid /dev/md1

If all goes well, your arrays should be created without comment. Use the command 'cat /proc/mdstat' to check the status of your RAID devices.(md, by the way, stands for 'multiple devices'. It's the kernel's shorthand for RAID devices.)
NOTE: RAID autodetection steps are PC only, Mac users should skip this section and resume reading at the 'make filesystems' step.
Now that we know our arrays are working, let's stop them and setup auto-detection. Auto-detection makes use of the 'persistent superblock' that we enable in /etc/raidtab. It installs that superblock on each RAID device, and once we've set the partition type correctly, the kernel will see all our RAID devices at boot.

raidstop /dev/md0
raidstop /dev/md1

That stops the arrays so that we can modify them. Now run fdisk on *each disk* to alter the partition type:

fdisk /dev/sda
p
t
1
fd
w

This lists the parition table, selects a partition to work on, and then sets the partition type to RAID. It then writes the new partition table to disk. Do this to *each partition* to be used in the array. Then reboot and watch the kernel auto-detect your arrays.

Now, we'll make filesystems on our arrays. We'll make '/boot' ext2 and '/' Reiserfs. You can also use other filesystems. For the Mac I tested with Ext3.

mke2fs /dev/md1
mkreiserfs /dev/md0

Let's create a directory to mount our RAID to, so that we can copy our existing data over to the array. I used:

mkdir /raid

Now, we'll copy our stuff over to the new '/boot' parition:

mount -t ext2 /dev/md1 /raid
cp -a /boot/* /raid
umount /dev/md1

Now, for copying the '/' partition.

mount -t reiserfs /dev/md0 /raid
for i in `find / -type d -maxdepth 1|egrep -v 'boot|proc|raid|^/$'`
do
cp -a $i /raid
done
mkdir /raid/proc /raid/boot

Now, we need to modify the configuration files on the RAID so that it will mount things correctly. Edit /raid/etc/fstab, modifying the mount point for '/' and adding one for '/boot'. Something like:

/dev/md0 / reiserfs defaults 1 1
/dev/md1 /boot ext2 defaults 1 1

Now, umount '/raid'.
For PC, create a LILO configuration with a backup setting to test things:

umount /raid
vi /etc/lilo.conf

boot=/dev/sda1
install=/boot/boot.b
lba32
prompt
delay=50
timeout=50
default=linux-raid
image=/boot/vmlinuz-2.4.2-raid
label=linux-raid
root=/dev/md0
read-only
image=/boot/vmlinuz-2.4.2-raid
label=fallback
root=/dev/sda2
read-only

Now, simply run /sbin/lilo to setup LILO on your first partition. Note the 'fallback' entry. If something goes wrong you can still boot back to your non-RAID configuration by typing 'fallback' at the LILO prompt. Now, copy your lilo.conf to lilo.sda and lilo.sdb. We need one for each mirror of the RAID-1 partition. The reason is that we're going to install LILO on each so that if the primary disk fails, we can still boot. Essentially, we're making LILO redundant. Change /etc/lilo.sda so that the line reads 'boot=/dev/sda' and change /etc/lilo.sdb so that the line reads 'boot=/dev/sdb' and then install LILO onto the MBR of each drive:

/sbin/lilo -C /etc/lilo.sda
/sbin/lilo -C /etc/lilo.sdb

For Mac, create a Yaboot configuration with a backup setting to test things. Note that you can type 'fallback' at the yaboot prompt and get back into your non-RAID configuration if something goes wrong. Note that unlike the PC configuration, the Mac requires that we define our md devices for the kernel. That's what that 'append=' line is for.
Also, note the 'device=' line. That will be different depending on your machine. Run ofpath /dev/sda to get the Open Firmware path for your first SCSI drive. Put that in your 'device=' line.
Also important is the 'partition=' line. This should be the number of the partition that contains your kernel. In this case, the array /dev/md1 contains our kernel and it's on partition 3.
Now cp /etc/yaboot.conf /etc/yaboot.sda.conf and cp /etc/yaboot.conf /etc/yaboot.sdb.conf. Change the 'boot=' line in the second file to /dev/sdb2 and the 'device=' line to the result of ofpath /dev/sdb. Run ybin -C /etc/yaboot.sdb.conf and ybin -C /etc/yaboot.sda.conf to install Yaboot on both Bootstrap partitions.
Example yaboot.conf:

# ybin options
boot=/dev/sda2
magicboot=/usr/local/lib/yaboot/ofboot
delay=10
defaultos=linux
enablecdboot
# yaboot options
init-message="\nWelcome to Yellow Dog Linux\n\n"
timeout=50
default=linux
# yaboot images
image=/vmlinux-2.4.20-ben10
label=linux
root=/dev/md0
partition=3
append="md=0,/dev/sda4,/dev/sdb4,dev/sdc2 md=1,/dev/sda3,/dev/sdb3"
device=/pci@80000000/pci-bridge@d/ADPT,2940U2B@4/@0:
image=/boot/vmlinux-2.4.20-ben10
label=fallback
root=/dev/sda4
partition=4

Reboot and try it out.

Mac Note: The Blue & White G3 I used seems to have a pretty dumb Open Firmware. If you unplug the primary drive to test the array, be aware that the firmware takes a very long time to figure it out. In my case, it made me type 'mac-boot' before it would even fail over. Not very smart. I've been told that the G4's are better, but I haven't verified that.
If all goes well, you've just booted from the array. Now it's time to add that old partition into your RAID-5 array and enable redundancy. First, edit /etc/raidtab and change the label 'failed-disk' to 'raid-disk'. This tells the RAID the partition is OK for use now. Then add it to the array by running:

raidhotadd /dev/md0 /dev/sda2 (that's /dev/sda4 in our Mac configuration)

Use 'watch cat /proc/mdstat' to see it build the redundancy. You should see a line that says something about 'recovery' and an estimated time for completion. Once it finishes you are running a fully redundant system. You should be able to survive a hard drive failure without data loss.
Now it's time to set up our swapfile. It will exist inside the array so that a dead drive won't crash the machine. Generally you should set up a swapfile that is 2 times the size of your RAM, though for machines with lots of memory this may not be practical. First, figure out how many blocks you'll be using. This is figured out by taking the RAM count in MB and multiplying by 1024 (to convert to KB) and then doubling it. In my case I have 256MB, so 256*1024*2 is 524288.
Then cd / and dd if=/dev/zero of=swapfile bs=1024 count=524288. This will give 512MB of swapspace in /.
Now mkswap /swapfile and swapon /swapfile to create and activate the swapspace.
Next we'll add our new swap space into /etc/fstab so that it will be used automatically. Add a line to /etc/fstab that looks like this:

/swapfile swap swap
defaults 0 0

And we're done.

Written by Aaron Grewell on 11-April-2003.

Beyond Technology

Friday, December 6, 2019

Setting up software RAID

No comments:

Linux Software RAID