What is RAID?
Using Multiple Hard Drives for Performance and Reliability
RAID is a solution that was developed originally for the network server market as a means of creating large storage at a lower cost. Essentially, it would take multiple lower cost hard drives and put them together through a controller to provide a single larger capacity drive. This is what RAID stands for: redundant array of inexpensive drives or disks. To achieve this, specialized software and controllers were needed to manage the data being split between the various drives. Eventually the processing power of your standard computer system allowed the features to filter their way into the personal computer market.
Now a RAID storage can be used for three distinct purposes. These include capacity, security and performance. Capacity is a simple one that is typically involved in most every type of RAID setup used. For instance, two hard drives can be linked together as a single drive to the operating system effectively making a virtual drive that is twice the capacity. Performance is another key reason for using a RAID setup on a personal computer. In the same example of two drives being used as a single drive, the controller can split a data chunk into two parts and then put each of those parts on a separate drive. This effectively doubles the performance of writing or reading the data on the storage system. Finally, RAID can be used for data security. This is done by by using some of the space on the drives to essentially clone the data that is written to both drives. Once again, with two drives we can make it so that the data is written to both drives. Thus, if one drive fails, the other still has the data.
Depending upon the goals of the storage array that you want to put together for your computer system, you will use one of the various levels of RAID to achieve these three goals. For those using hard drives in their computer, performance is probably going to be more of an issue than capacity. On the other hand, those using solid state drives will probably want a way to take the smaller drives and link them together to create a single larger drive. So let's take a look at the various levels of RAID that can be used with a personal computer.
This is the lowest level of the RAID and actually does not offer any form of redundancy which is why it is referred to a level 0. Essentially, RAID 0 takes two or more drives and puts them together to fashion a larger capacity drive. This is achieved through a processor called striping. Data blocks are broken up into data chunks and then written in order across the drives. This offers increased performance because the data can be written simultaneously to the drives by the controller effectively multiplying the speed of the drives. Below is an example of how this might work across three disks:
|Drive 1||Drive 2||Drive 3|
In order for RAID 0 to work effectively for boosting the performance of the system, you need to try and have matched drives. Each drive should have the same exact storage capacity and performances traits. If they do not, then the capacity will be limited to a multiple of the smallest of the drives and performance to the slowest of the drives as it must wait for all the stripes to be written before moving to the next set. It is possible to use mismatched drives but in that case, a JBOD setup might be more effective.
JBOD stands for just a bunch of drives and effectively is just a collection of drives that can be accessed independently from one another but appear as a single storage drive to the operating system. This is typically achieved by having the data span between drives. Often this is referred to as SPAN or BIG. Effectively, the operating sees them all as a single disk but the blocks would be written across the first disk until it fills up, then progress to the second, then third, etc. This is useful for adding extra capacity into an existing computer system and with drives of various sizes but it will not increase the performance of the drive array.
The biggest problem with RAID 0 and JBOD setups is data security. Since you have multiple drives, the chances of corruption of data increased because you have more points of failure. If any drive in a RAID 0 array fails, all of the data becomes inaccessible. In a JBOD, a drive failure will result in the loss of any data that happened to be on that drive. As a result, it is best for those that want to use this method of storage to have some other means to back up their data.
This is a first true level of RAID as it provides a full level of redundancy for the data that is stored on the array. This is done through a process that is called mirroring. Effectively, all data that is written to the system is copied to each drive in a level 1 array. This form of RAID is typically done with just a pair of drives as adding more drives will not add any additional capacity, just more redundancy. To better give an example of this, here is a chart that shows how it would be written to two drives:
|Drive 1||Drive 2|
To get the most effective use from a RAID 1 setup, the system will once again use matched drives that share the same capacity and performance ratings. If mismatched drives are used, then the array capacity will be equal to the smallest capacity drive in the array. For instance, if a one and a half terabyte and a one terabyte drive were used in a RAID 1 array, the capacity of this array on the system would just be a single terabyte.
This level of RAID is highly effective for data security because the two drives are effectively the same. If one of the two drives fails, then the other has the complete data of the other. The problem with this type of setup is generally determining which of the drives is failed because often the storage becomes inaccessible when one of the two fail and won't get properly restored until a new drive is inserted in place of the failed one and a recovery process is run. As mentioned before, there is also no performance gain at all from this. In fact there will be a slight performance loss from the overhead of the controller for the RAID.
RAID 1+0 or 10
This is a somewhat complicated combination of both the RAID levels 0 and level 1. Effectively, the controller will need a minimum of four drives in order to function in this mode because what it is going to do is make two pairs for drives. The first set of drives is a mirrored array the clones the data between the two. The second set of drives is also mirrors but setup to be the strip of the first. This provides both the data redundancy and performance gains. Below is an example of how data would be written across four drives using this type of setup:
|Drive 1||Drive 2||Drive 3||Drive 4|
To be honest, this is not a desirable mode of RAID to be running on a computer system. While it does provide some performance boost it really isn't that good because of the huge amount of overhead on the system. In addition, it is a huge waste of space as the drive array will only at most half the capacity of all the drives combined. If mismatched drives are used, the performance will be limited to the slowest of the drives and capacity will just be double the smallest drive.
This is the highest level of RAID that can be found in consumer computer systems and is a much more effective method for increasing capacity and redundancy. It achieves this through a process of data striping with parity. A minimum of three drives is necessary to do this as the data is split into stripes on several of the drives but then one block across the stripe is set aside for parity. To explain this better, lets first take a look at how the data might be written across three drives:
|Drive 1||Drive 2||Drive 3|
In essence, the drive controller takes a chuck of data to be written across all the drives in the array. The first bit of data is placed on the first drive and the second is placed on the second. The third drive gets the parity bit which is essentially a comparison of the binary data on the first and second. In binary math, you have just 0 and 1. A boolean math process is done to compare the bits. If the two add up to an even number (0+0 or 1+1) then the parity bit will be zero. If the two add up to an odd number (1+0 or 0+1) then the parity bit will be one. The reason for this is that if one of the drives fails, the controller can then figure out what the missing data is. For instance, if drive one fails leaving just drive two and three, and drive two has a data block of one and drive three has a parity block of one, then the missing data block on drive one must be zero.
This provides effective data redundancy that allows all data to be restored in the event of a drive failure. Now for most consumer setups, a failure will still result in the system not being because it isn't in a functional state. In order to get the system functional, it is necessary to replace the failed drive with a new drive. Then a data reconstruction process must be done from the controller level which will then do a reverse boolean function to recreate the data on the missing drive. This can take some time, especially for larger capacity drives but it is at least recoverable.
Now the capacity of a RAID 5 array is dependent upon the number of drives in the array and their capacity. Once again, the array is restricted by the smallest capacity drive in the array so it is best to use matched drives. The effective storage space is equal to the number of drives minus one times the lowest capacity. So in math terms it is: (n-1) * Capacitymin. So, if you have three 2GB drives in a RAID 5 array, the total capacity would be 4GB. Another RAID 5 array that used four 2GB drives would have 6GB of capacity.
Now performance for the RAID 5 is a bit more complicated than some of the other forms of RAID because of the boolean process that must be done to create the parity bit when the data is being written to the drives. This means that the write performance will be less than a RAID 0 array with the same number of drives. Read performance on the other hand does not suffer as much as the writing because the boolean process is not done because it reads the straight data from the drives.
The Big Issue With All RAID Setups
I've discussed the various pros and cons of each of the levels of RAID that can be used on personal computers but there is another issue that many people don't realize when it comes to creating RAID drive setups. Before a RAID setup can be used, it first must be constructed either by the hardware controller software or within the software of the operating system. This essentially initializes the special formatting required to properly track how the data will be written and read on the drive.
This probably doesn't sound like a problem but it is if you even need to change how you want your RAID array configured. For instance, say you are running low on data and want to add an extra drive for either a RAID 0 or RAID 5 array. In most cases, you won't be able to without first reconfiguring the RAID array which will also remove any of the data that was stored in those drives. This means that you have to fully back up your data, add the new drive, reconfigure the drive array format that drive array and then restore your original data back to the drive. That can be an extremely painful process. As a result, make sure you really have the array setup up the way you want to the first time you do it.
Related Guide Picks