Linux software RAID (mdadm) workflow and tips

Tip: Keep a hard drive asset register (inventory)

I strongly recommend maintaining an asset register for all of your hardware. Asset information for hard drives is especially useful when hard drives begin to physically fail and need to be replaced; being able to look up which bay/slot the failing hard drive resides in, will save you time and hassle. I maintain the following information about hard drives:

a) The purchase date and purchase price
b) hard drive installation date
c) The hard drive’s serial and model numbers
d) The hard drive’s storage capacity (e.g. 2GB)
e) The hard drive’s physical location (i.e. the computer and the drive bay where it’s currently installed)

Identifying failed disks using software

/proc/mdstat is my first point of call when troubleshooting RAID issues. /proc/mdstat will provide information on active arrays, including: which arrays are degraded, and the component devices that make up the arrays.

After being notified of a degraded RAID, it’s useful to find out which disk is failing, for this I use hdparm.

If the failed disk is still operational, hdparm can be used to identify the failed hard drive, as follows:

hdparm -i /dev/sdX

If the disk has physically failed, it can still be identified by using hdparm on the working disks, and then using the process of elimination.

The output of hdparm will include the hard drive’s Model number, and Serial number, which when combined with a hard drive inventory (see above) can be used to easily find and replace the failing hardware.

Do not activate (non-root) RAID partition(s) on boot

There are a number of scenarios where you may wish for your non-root RAID partitions to not be activated on boot. For example:

a) You recently added/removed/moved disks around in your system, and you would like to ensure that all of the disks are detected by the OS, before attempting to activate the RAID array.

b) There is a disk that is physically failing, causing the OS to hang while booting. You may wish to physically disconnect/reconnect disks, to help troubleshoot the failing drive. In this case, you do not want your OS attempting to activate RAID arrays, putting them into a degraded state.

Note: The instructions below only apply to Debian-based distributions. The Red Hat family of distributions have their own methods.

1. dpkg-reconfigure mdadm
2. Enter MD arrays needed for the root filesystem (in my case this is md/0)
3. Do you want to start MD arrays (in /etc/mdadm/mdadm.conf) automatically: No

Update (August 2016): In Debian Jessie (version 8) I have found that the above instructions do not work. Instead modify /etc/mdadm/mdadm.conf, adding a line such as:

ARRAY <ignore> UUID=6a50d74b:e8d7b30d:180c7f66:020a86e8

You can ascertain a RAID’s UUID by running:

mdadm --detail /dev/mdX