software RAID
Contents
Array Information
mdadm --detail /dev/md0
Array Creation
Examples :
mdadm --create /dev/md2 --metadata=0.90 --level=raid1 --raid-devices=2 /dev/sda2 /dev/sdb2
mdadm --create /dev/md0 --metadata=1.1 --level=raid5 --chunk=256 --raid-devices=6 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 missing
Example /etc/mdadm.conf content (use mdadm --detail --scan to get the details) :
ARRAY /dev/md0 level=raid5 num-devices=6 UUID=d81538a6:6abe192b:a59b3942:49fa9370
Fixing Mismatch Warnings
On RHEL5 there is a weekly cron job which starts a check of all software RAID arrays and produces output if anything is wrong. If such is the case, you will be receiving emails like this one :
From: root (Cron Daemon) To: root Subject: Cron <root@h01> run-parts /etc/cron.weekly /etc/cron.weekly/99-raid-check: WARNING: mismatch_cnt is not 0 on /dev/md1
This means that there are mismatched blocks between RAID members. Typically, for a RAID-1 mirror, some data is different between both disks, which is not normal.
Diagnosing
For some reason, the mdadm
tool doesn't include support for managing these mismatches. It all needs to be done using the /sys/block/md*/md/
pseudo-files.
To see the mismatch count for all of the software RAID arrays (not that the value is only updated when a check is run) :
# cat /sys/block/md*/md/mismatch_cnt 0 256
To force a repair of the software RAID array where the value is non zero (assuming it's md2) :
# echo repair > /sys/block/md2/md/sync_action
You can then follow the repair progress with something like this :
# watch cat /proc/mdstat
Which will result in something like this :
Every 2.0s: cat /proc/mdstat Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] md1 : active raid1 sdb2[1] sda2[0] 726802368 blocks [2/2] [UU] [>....................] resync = 0.5% (4312832/726802368) finish=323.4min speed=37231K/sec md0 : active raid1 sdb1[1] sda1[0] 5242816 blocks [2/2] [UU] unused devices: <none>
If you see that the repair is causing excessive I/O, then you can always force a lower speed for the operation. Just look at the above speed=
value or /sys/block/md1/md/sync_speed
:
# cat /sys/block/md1/md/sync_speed 31202 # echo 25000 > /sys/block/md2/md/sync_speed_max
At the very start of the repair, the mismatch count value should go back to zero, but it might increase when mismatches are found and corrected. This is why once it's finished, you should run a full check, after which the value should be back to zero again :
# echo check > /sys/block/md1/md/sync_action # watch cat /proc/mdstat # cat /sys/block/md*/md/mismatch_cnt 0 0
Replacing a Disk
Example commands for sdb2
member of md1
.
Manually remove a disk which starts reporting errors :
mdadm --manage --fail /dev/md1 /dev/sdb2 mdadm --manage --remove /dev/md1 /dev/sdb2
Also see badblocks for testing the removed disk for errors.
Once a disk dies and gets replaced, manual operations are needed to get the RAID arrays running again. Partitions need to be prepared beforehand, then re-added :
mdadm --manage --add /dev/md1 /dev/sdb2