ostep-homework/file-raid
Remzi Arpaci-Dusseau c81bb394c3 fixed typo thanks to Giovanni Di Santi 2020-09-06 09:27:07 -05:00
..
README.md fixed typo thanks to Giovanni Di Santi 2020-09-06 09:27:07 -05:00
raid-graphics.py add raid graphical simulator (for class) 2020-05-20 13:11:15 -05:00
raid.py update to random seeds so as to work identically in Py2/Py3 2020-05-30 15:16:25 -05:00

README.md

Overview

This section introduces raid.py, a simple RAID simulator you can use to shore up your knowledge of how RAID systems work. It has a number of options, as we see below:

prompt> ./raid.py -h
Usage: raid.py [options]

Options:
  -h, --help            show this help message and exit
  -s SEED, --seed=SEED  the random seed
  -D NUMDISKS, --numDisks=NUMDISKS
                        number of disks in RAID
  -C CHUNKSIZE, --chunkSize=CHUNKSIZE
                        chunk size of the RAID
  -n NUMREQUESTS, --numRequests=NUMREQUESTS
                        number of requests to simulate
  -S SIZE, --reqSize=SIZE
                        size of requests
  -W WORKLOAD, --workload=WORKLOAD
                        either "rand" or "seq" workloads
  -w WRITEFRAC, --writeFrac=WRITEFRAC
                        write fraction (100->all writes, 0->all reads)
  -R RANGE, --randRange=RANGE
                        range of requests (when using "rand" workload)
  -L LEVEL, --level=LEVEL
                        RAID level (0, 1, 4, 5)
  -5 RAID5TYPE, --raid5=RAID5TYPE
                        RAID-5 left-symmetric "LS" or left-asym "LA"
  -r, --reverse         instead of showing logical ops, show physical
  -t, --timing          use timing mode, instead of mapping mode
  -c, --compute         compute answers for me

In its basic mode, you can use it to understand how the different RAID levels map logical blocks to underlying disks and offsets. For example, let's say we wish to see how a simple striping RAID (RAID-0) with four disks does this mapping.

prompt> ./raid.py -n 5 -L 0 -R 20 
...
LOGICAL READ from addr:16 size:4096
  Physical reads/writes?

LOGICAL READ from addr:8 size:4096
  Physical reads/writes?

LOGICAL READ from addr:10 size:4096
  Physical reads/writes?

LOGICAL READ from addr:15 size:4096
  Physical reads/writes?

LOGICAL READ from addr:9 size:4096
  Physical reads/writes?

In this example, we simulate five requests (-n 5), specifying RAID level zero (-L 0), and restrict the range of random requests to just the first twenty blocks of the RAID (-R 20). The result is a series of random reads to the first twenty blocks of the RAID; the simulator then asks you to guess which underlying disks/offsets were accessed to service the request, for each logical read.

In this case, calculating the answers is easy: in RAID-0, recall that the underlying disk and offset that services a request is calculated via modulo arithmetic:

disk   = address % number_of_disks
offset = address / number_of_disks

Thus, the first request to 16 should be serviced by disk 0, at offset 4. And so forth. You can, as usual see the answers (once you've computed them!), by using the handy "-c" flag to compute the results.

prompt> ./raid.py -R 20 -n 5 -L 0 -c
...
LOGICAL READ from addr:16 size:4096
  read  [disk 0, offset 4]   

LOGICAL READ from addr:8 size:4096
  read  [disk 0, offset 2]   

LOGICAL READ from addr:10 size:4096
  read  [disk 2, offset 2]   

LOGICAL READ from addr:15 size:4096
  read  [disk 3, offset 3]   

LOGICAL READ from addr:9 size:4096
  read  [disk 1, offset 2]   

Because we like to have fun, you can also do this problem in reverse, with the "-r" flag. Running the simulator this way shows you the low-level disk reads and writes, and asks you to reverse engineer which logical request must have been given to the RAID:

prompt> ./raid.py -R 20 -n 5 -L 0 -r
...
LOGICAL OPERATION is ?
  read  [disk 0, offset 4]   

LOGICAL OPERATION is ?
  read  [disk 0, offset 2]   

LOGICAL OPERATION is ?
  read  [disk 2, offset 2]   

LOGICAL OPERATION is ?
  read  [disk 3, offset 3]   

LOGICAL OPERATION is ?
  read  [disk 1, offset 2]   

You can again use -c to show the answers. To get more variety, a different random seed (-s) can be given.

Even further variety is available by examining different RAID levels. In the simulator, RAID-0 (block striping), RAID-1 (mirroring), RAID-4 (block-striping plus a single parity disk), and RAID-5 (block-striping with rotating parity) are supported.

In this next example, we show how to run the simulator in mirrored mode. We show the answers to save space:

prompt> ./raid.py -R 20 -n 5 -L 1 -c
...
LOGICAL READ from addr:16 size:4096
  read  [disk 0, offset 8]   
 
LOGICAL READ from addr:8 size:4096
  read  [disk 0, offset 4]   

LOGICAL READ from addr:10 size:4096
  read  [disk 1, offset 5]   

LOGICAL READ from addr:15 size:4096
  read  [disk 3, offset 7]   

LOGICAL READ from addr:9 size:4096
  read  [disk 2, offset 4]   

You might notice a few things about this example. First, the mirrored RAID-1 assumes a striped layout (which some might call RAID-10, or stripe of mirrors), where logical block 0 is mapped to the 0th block of disks 0 and 1, logical block 1 is mapped to the 0th blocks of disks 2 and 3, and so forth (in this four-disk example). Second, when reading a single block from a mirrored RAID system, the RAID has a choice of which of two blocks to read. In this simulator, we use a relatively silly way: for even-numbered logical blocks, the RAID chooses the even-numbered disk in the pair; the odd disk is used for odd-numbered logical blocks. This is done to make the results of each run easy to guess for you (instead of, for example, a random choice).

We can also explore how writes behave (instead of just reads) with the -w flag, which specifies the "write fraction" of a workload, i.e., the fraction of requests that are writes. By default, it is set to zero, and thus the examples so far were 100% reads. Let's see what happens to our mirrored RAID when some writes are introduced:

prompt> ./raid.py -R 20 -n 5 -L 1 -w 100 -c
... 
LOGICAL WRITE to  addr:16 size:4096
  write [disk 0, offset 8]     write [disk 1, offset 8]   

LOGICAL WRITE to  addr:8 size:4096
  write [disk 0, offset 4]     write [disk 1, offset 4]   

LOGICAL WRITE to  addr:10 size:4096
  write [disk 0, offset 5]     write [disk 1, offset 5]   

LOGICAL WRITE to  addr:15 size:4096
  write [disk 2, offset 7]     write [disk 3, offset 7]   

LOGICAL WRITE to  addr:9 size:4096
  write [disk 2, offset 4]     write [disk 3, offset 4]   

With writes, instead of generating just a single low-level disk operation, the RAID must of course update both disks, and hence two writes are issued. Even more interesting things happen with RAID-4 and RAID-5, as you might guess; we'll leave the exploration of such things to you in the questions below.

The remaining options are discovered via the help flag. They are:

Options:
  -h, --help            show this help message and exit
  -s SEED, --seed=SEED  the random seed
  -D NUMDISKS, --numDisks=NUMDISKS
                        number of disks in RAID
  -C CHUNKSIZE, --chunkSize=CHUNKSIZE
                        chunk size of the RAID
  -n NUMREQUESTS, --numRequests=NUMREQUESTS
                        number of requests to simulate
  -S SIZE, --reqSize=SIZE
                        size of requests
  -W WORKLOAD, --workload=WORKLOAD
                        either "rand" or "seq" workloads
  -w WRITEFRAC, --writeFrac=WRITEFRAC
                        write fraction (100->all writes, 0->all reads)
  -R RANGE, --randRange=RANGE
                        range of requests (when using "rand" workload)
  -L LEVEL, --level=LEVEL
                        RAID level (0, 1, 4, 5)
  -5 RAID5TYPE, --raid5=RAID5TYPE
                        RAID-5 left-symmetric "LS" or left-asym "LA"
  -r, --reverse         instead of showing logical ops, show physical
  -t, --timing          use timing mode, instead of mapping mode
  -c, --compute         compute answers for me

The -C flag allows you to set the chunk size of the RAID, instead of using the default size of one 4-KB block per chunk. The size of each request can be similarly adjusted with the -S flag. The default workload accesses random blocks; use -W sequential to explore the behavior of sequential accesses. With RAID-5, two different layout schemes are available, left-symmetric and left-asymmetric; use -5 LS or -5 LA to try those out with RAID-5 (-L 5).

Finally, in timing mode (-t), the simulator uses an incredibly simple disk model to estimate how long a set of requests takes, instead of just focusing on mappings. In this mode, a random request takes 10 milliseconds, whereas a sequential request takes 0.1 milliseconds. The disk is assumed to have a tiny number of blocks per track (100), and a similarly small number of tracks (100). You can thus use the simulator to estimate RAID performance under some different workloads.