initial LFS commit; fix small random thing in fsck

This commit is contained in:
Remzi Arpaci-Dusseau 2020-05-30 14:42:42 -05:00
parent dafe539f51
commit 7943ea3cd1
3 changed files with 1150 additions and 7 deletions

View File

@ -12,7 +12,14 @@ def dprint(str):
if DEBUG:
print(str)
# to make Python2 and Python3 act the same -- how dumb
# to make Python2 and Python3 act the same -- how dumb
def random_seed(seed):
try:
random.seed(seed, version=1)
except:
random.seed(seed)
return
def random_randint(low, hi):
return int(low + random.random() * (hi - low + 1))
@ -516,7 +523,7 @@ class fs:
# data bitmap 1001001000000000
# data [(.,0) (..,0) (v,2) (d,2) (e,2) (n,2) (s,5)] [] [] [(.,5) (..,0) (w,3) (k,1)] [] [] [t] [] [] [] [] [] [] [] [] []
def corrupt(self, whichCorrupt):
random.seed(self.seedCorrupt)
random_seed(self.seedCorrupt)
num = random_randint(0, 11)
# print('RANDINT', num)
if whichCorrupt != -1:
@ -697,11 +704,8 @@ print('ARG whichCorrupt',options.whichCorrupt)
print('ARG dontCorrupt', options.dontCorrupt)
print('')
# to make Python2 and Python3 act the same -- how dumb
if (sys.version_info > (3, 0)):
random.seed(options.seed, version=1)
else:
random.seed(options.seed)
# to make Python2 and Python3 act the same -- how dumb
random_seed(options.seed)
printState = False
printOps = False

218
file-lfs/README.md Normal file
View File

@ -0,0 +1,218 @@
# Overview
This homework involves a simulator of the log-structured file system, LFS.
The simulator simplifies the book chapter's LFS a bit, but hopefully leaves
enough in place in order to illustrate some of the important properties of
such a file system.
To get start, run the following:
```sh
prompt> ./lfs.py -n 1 -o
```
What you will see is as follows:
```sh
INITIAL file system contents:
[ 0 ] live checkpoint: 3 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
[ 1 ] live [.,0] [..,0] -- -- -- -- -- --
[ 2 ] live type:dir size:1 refs:2 ptrs: 1 -- -- -- -- -- -- --
[ 3 ] live chunk(imap): 2 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
create file /ku3
FINAL file system contents:
[ 0 ] ? checkpoint: 7 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
[ 1 ] ? [.,0] [..,0] -- -- -- -- -- --
[ 2 ] ? type:dir size:1 refs:2 ptrs: 1 -- -- -- -- -- -- --
[ 3 ] ? chunk(imap): 2 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
[ 4 ] ? [.,0] [..,0] [ku3,1] -- -- -- -- --
[ 5 ] ? type:dir size:1 refs:2 ptrs: 4 -- -- -- -- -- -- --
[ 6 ] ? type:reg size:0 refs:1 ptrs: -- -- -- -- -- -- -- --
[ 7 ] ? chunk(imap): 5 6 -- -- -- -- -- -- -- -- -- -- -- -- -- --
```
The output shows the initial file system state of an empty LFS, with a few
different blocks initialized. The first block (block 0) is the "checkpoint
region" of this LFS. For simplicity, this LFS only has one checkpoint region,
and it is always located at block address=0, and is always just the size
of a single block.
The contents of the checkpoint region are just disk addresses: locations
of chunks of the inode map. In this case, the checkpoint region has the
following contents:
```sh
checkpoint: 3 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
```
Let's call the leftmost entry (marked with a 3 here) the 0th entry,
the next one the 1st, and the last one (because there are 16) the
15th entry. Thus, we can think of them as:
```sh
checkpoint: 3 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
entry: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
```
This means that the first chunk of the inode map resides at disk address=3,
and that the rest of the inode map pieces have yet to be allocated (and
hence are marked "--").
Let's now look at that chunk of the inode map ("imap" from now on). The
imap is just an array that tells you, for each inode number, its current
location on the disk. In the initial state shown above, we see this:
```sh
chunk(imap): 2 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
```
These chunks also have (by default) 16 entries, and again we can think of
them as such:
```sh
chunk(imap): 2 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
entry: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
```
Because each chunk of the imap has 16 entries, and because the checkpoint
region (CR) has 16 entries, we now know that the entire LFS has 16x16=256
inode numbers available for files. A small file system(!) but good enough
for our purposes.
We also now know that each chunk of the imap is responsible for a contiguous
group of inodes, and we know which ones depending on which entry in the CR
points to this chunk. Specifically, entry 0 of the CR points to a chunk of
the imap that has information about inode numbers 0...15; entry 1 of the CR
points to an imap chunk for inode numbers 16...31.
In this specific example, we know the 0th entry of the CR points to block=3,
and in there, the 0th entry has a '2' in it. In our simulator, the root inode
is inode number=0, and thus this is the inode of the root directory of the
file system. From the imap, we now know the location of inode number=0's
address is block=2. So let's look at block 2! We see:
```sh
type:dir size:1 refs:2 ptrs: 1 -- -- -- -- -- -- --
```
This file metadata is a simplified inode, with file type (a directory), size
(1 block), reference count (how many directories it refers to, if this is a
directory), and some number of pointers to data blocks (in this case, one,
which points to block address=1).
This finally leads us to the last bit of initial state, which is the contents
of the directory. This directory only has one block in it (at address 1),
which has contents:
```sh
[.,0] [..,0] -- -- -- -- -- --
```
Herein lies an empty directory, with [name,inode number] pairs for itself
(".") and its parent (".."). In this special case (the root), the parent is
just itself, and both are inode number=0. Whew! We have now (hopefully)
understood the entire contents of the initial state of the file system.
What happens next in the default mode of the simulation is that one or more
operations are run against the file system, thus changing its state. In this
case, we know what the command because we had the simulator tell us via the
"-o" flag (which shows each operation as it is run). That operation is:
```sh
create file /ku3
```
This means a file "ku3" was created in the root directory "/". To accomplish
this creation, a number of structures must be updated, which means that the
log was written to. You can see that four writes occur beyond the previous end
of the log (address=3), at blocks 4...7:
```sh
[.,0] [..,0] [ku3,1] -- -- -- -- --
type:dir size:1 refs:2 ptrs: 4 -- -- -- -- -- -- --
type:reg size:0 refs:1 ptrs: -- -- -- -- -- -- -- --
chunk(imap): 5 6 -- -- -- -- -- -- -- -- -- -- -- -- -- --
```
These updates reflect how this version of LFS writes to the disk to create a
file:
- A directory block update to include "ku3" and its inode number (1) in the root directory
- An updated root inode which now refers to block 4 where the latest contents of this directory are found
- A new inode for the newly created file (note the type)
- A new version of the first imap chunk which now tells us where both inode 0 and inode 1 are located
However, this does not (quite) reflect all that must change. Because the inode
map itself has changed, the checkpoint region must also reflect where the
latest chunk of the first piece of the inode map resides. Thus, the CR is also
updated:
```sh
checkpoint: 7 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
```
You might also have noticed one more thing in the output. In the initial file
system contents, there is a marker between the disk address and the contents
that says "live" for each entry, and in the final output there is a "?"
instead. This "?" is there so you can determine, for yourself, whether each
block is live or not. Start with the checkpoint region and see if you can
determine which group of blocks can be reached (and hence are live); all the
rest are thus dead and could be used again.
To see if you are right, run again with `-c`:
```sh
prompt> ./lfs.py -n 1 -o -c
...
```
As you can now see, every time a structure is updated, garbage is left behind,
one of the main issues a non-update-in-place file system like LFS must deal
with. Fortunately, for you, we won't worry too much about garbage collection
in this simplified version of LFS.
You now have all the information you need to understand this version of LFS.
The other options, which let you play with various aspects of the simulator, include:
```sh
prompt> ./lfs.py -h
Usage: lfs.py [options]
Options:
-h, --help show this help message and exit
-s SEED, --seed=SEED the random seed
-N, --no_force Do not force checkpoint writes after updates
-D, --use_disk_cr use disk (maybe old) version of checkpoint region
-c, --compute compute answers for me
-o, --show_operations
print out operations as they occur
-i, --show_intermediate
print out state changes as they occur
-e, --show_return_codes
show error/return codes
-n NUM_COMMANDS, --num_commands=NUM_COMMANDS
generate N random commands
-p PERCENTAGES, --percentages=PERCENTAGES
percent chance of:
createfile,writefile,createdir,rmfile,linkfile,sync
(example is c30,w30,d10,r20,l10,s0)
-a INODE_POLICY, --allocation_policy=INODE_POLICY
inode allocation policy: "r" for "random" or "s" for
"sequential"
-L COMMAND_LIST, --command_list=COMMAND_LIST
command list in format:
"cmd1,arg1,...,argN:cmd2,arg1,...,argN:... where cmds
are:c:createfile, d:createdir, r:delete, w:write,
l:link, s:syncformat: c,filepath d,dirpath r,filepath
w,filepath,offset,numblks l,srcpath,dstpath s
```

921
file-lfs/lfs.py Executable file
View File

@ -0,0 +1,921 @@
#! /usr/bin/env python
#
# lfs.py
#
# A simple simulator to emulate the behavior of LFS.
#
# Make lots of simplifying assumptions including things like:
# - all entities take up exactly one block
# - no segments or buffering of writes in memory
# - many other things
#
from __future__ import print_function
import math
import sys
from optparse import OptionParser
import random
import copy
# to make Python2 and Python3 act the same -- how dumb
def random_seed(seed):
try:
random.seed(seed, version=1)
except:
random.seed(seed)
return
# fixed addr
ADDR_CHECKPOINT_BLOCK = 0
# These are not (yet) things you can change
NUM_IMAP_PTRS_IN_CR = 16
NUM_INODES_PER_IMAP_CHUNK = 16
NUM_INODE_PTRS = 8
NUM_INODES = NUM_IMAP_PTRS_IN_CR * NUM_INODES_PER_IMAP_CHUNK
# block types
BLOCK_TYPE_CHECKPOINT = 'type_cp'
BLOCK_TYPE_DATA_DIRECTORY = 'type_data_dir'
BLOCK_TYPE_DATA_BLOCK = 'type_data'
BLOCK_TYPE_INODE = 'type_inode'
BLOCK_TYPE_IMAP = 'type_imap'
# inode types
INODE_DIRECTORY = 'dir'
INODE_REGULAR = 'reg'
# root inode is "well known" per Unix conventions
ROOT_INODE = 0
# policies
ALLOCATE_SEQUENTIAL = 1
ALLOCATE_RANDOM = 2
#
# Heart of simulation is found here
#
class LFS:
def __init__(self, use_disk_cr=False, no_force_checkpoints=False,
inode_policy=ALLOCATE_SEQUENTIAL, solve=False):
# whether to read checkpoint region and imap pieces from disk (if True)
# or instead just to use "in-memory" inode map instead
self.use_disk_cr = use_disk_cr
# to force an update of the checkpoint region after each write
self.no_force_checkpoints = no_force_checkpoints
# inode allocation policy
assert(inode_policy == ALLOCATE_SEQUENTIAL or inode_policy == ALLOCATE_RANDOM)
self.inode_policy = inode_policy
# whether to show "answers" to things or not
self.solve = solve
# dump assistance
self.dump_last = 1
# ALL blocks are in the "disk"
self.disk = []
# checkpoint region (first block)
self.cr = [3,-1,-1,-1,
-1,-1,-1,-1,
-1,-1,-1,-1,
-1,-1,-1,-1]
assert(len(self.cr) == NUM_IMAP_PTRS_IN_CR)
# create first checkpoint region
self.log({'block_type':BLOCK_TYPE_CHECKPOINT, 'entries': self.cr})
assert(len(self.disk) == 1)
# init root dir data
self.log(self.make_new_dirblock(ROOT_INODE, ROOT_INODE))
assert(len(self.disk) == 2)
# root inode
root_inode = self.make_inode(itype=INODE_DIRECTORY, size=1, refs=2)
root_inode['pointers'][0] = 1
root_inode_address = self.log(root_inode)
assert(len(self.disk) == 3)
# init in memory imap
self.inode_map = {}
for i in range(NUM_INODES):
self.inode_map[i] = -1
self.inode_map[ROOT_INODE] = root_inode_address
# imap piece
self.log(self.make_imap_chunk(ROOT_INODE))
assert(len(self.disk) == 4)
# error code: tracking
self.error_clear()
return
def make_data_block(self, data):
return {'block_type':BLOCK_TYPE_DATA_BLOCK, 'contents':data}
def make_inode(self, itype, size, refs):
return {'block_type':BLOCK_TYPE_INODE, 'type':itype, 'size':size, 'refs':refs,
'pointers':[-1,-1,-1,-1,-1,-1,-1,-1]}
def make_new_dirblock(self, parent_inum, current_inum):
dirblock = self.make_empty_dirblock()
dirblock['entries'][0] = ('.', current_inum)
dirblock['entries'][1] = ('..', parent_inum)
return dirblock
def make_empty_dirblock(self):
return {'block_type':BLOCK_TYPE_DATA_DIRECTORY,
'entries': [('-',-1), ('-',-1), ('-',-1), ('-',-1),
('-',-1), ('-',-1), ('-',-1), ('-',-1)]}
def make_imap_chunk(self, cnum):
imap_chunk = {}
imap_chunk['block_type'] = BLOCK_TYPE_IMAP
imap_chunk['entries'] = list()
start = cnum * NUM_INODES_PER_IMAP_CHUNK
for i in range(start, start + NUM_INODES_PER_IMAP_CHUNK):
imap_chunk['entries'].append(self.inode_map[i])
return imap_chunk
def make_random_blocks(self, num):
contents = []
for i in range(num):
L = chr(ord('a') + int(random.random() * 26))
contents.append(str(16 * ('%s%d' % (L, i))))
return contents
def inum_to_chunk(self, inum):
return int(inum / NUM_INODES_PER_IMAP_CHUNK)
def determine_liveness(self):
# first, assume all are dead
self.live = {}
for i in range(len(self.disk)):
self.live[i] = False
# checkpoint region
self.live[0] = True
# now mark latest pieces of imap as live
for ptr in self.cr:
if ptr == -1:
continue
self.live[ptr] = True
# go through imap, find live inodes and their addresses
# latest inodes are all live, by def
inodes = []
for i in range(len(self.inode_map)):
if self.inode_map[i] == -1:
continue
self.live[self.inode_map[i]] = True
inodes.append(i)
# go through live inodes and find blocks each points to
for i in inodes:
inode = self.disk[self.inode_map[i]]
for ptr in inode['pointers']:
self.live[ptr] = True
return
def error_log(self, s):
self.error_list.append(s)
return
def error_clear(self):
self.error_list = []
return
def error_dump(self):
for i in self.error_list:
print(' %s' % i)
return
def dump_partial(self, show_liveness, show_checkpoint):
if show_checkpoint or not self.no_force_checkpoints:
self.__dump(0, 1, show_liveness)
if not self.no_force_checkpoints:
print('...')
self.__dump(self.dump_last, len(self.disk), show_liveness)
self.dump_last = len(self.disk)
return
def dump(self, show_liveness):
self.__dump(0, len(self.disk), show_liveness)
return
def __dump(self, start, end, show_liveness):
self.determine_liveness()
for i in range(start, end):
# print ADDRESS on disk
b = self.disk[i]
block_type = b['block_type']
print('[ %3d ]' % i, end='')
# print LIVENESS
if show_liveness or self.solve:
if self.live[i]:
print(' live', end=' ')
else:
print(' ', end=' ')
else:
print(' ? ', end=' ')
if block_type == BLOCK_TYPE_CHECKPOINT:
print('checkpoint:', end=' ')
for e in b['entries']:
if e != -1:
print(e, end=' ')
else:
print('--', end=' ')
print('')
elif block_type == BLOCK_TYPE_DATA_DIRECTORY:
for e in b['entries']:
if e[1] != -1:
print('[%s,%s]' % (str(e[0]), str(e[1])), end=' ')
else:
print('--', end=' ')
print('')
elif block_type == BLOCK_TYPE_DATA_BLOCK:
print (b['contents'])
elif block_type == BLOCK_TYPE_INODE:
print('type:'+b['type'], 'size:'+str(b['size']), 'refs:'+str(b['refs']), 'ptrs:', end=' ')
for p in b['pointers']:
if p != -1:
print('%s' % p, end=' ')
else:
print('--', end=' ')
print('')
elif block_type == BLOCK_TYPE_IMAP:
print('chunk(imap):', end=' ')
for e in b['entries']:
if e != -1:
print(e, end=' ')
else:
print('--', end=' ')
print('')
else:
print('error: unknown block_type', block_type)
exit(1)
return
def log(self, block):
new_address = len(self.disk)
self.disk.append(copy.deepcopy(block))
return new_address
def allocate_inode(self):
if self.inode_policy == ALLOCATE_SEQUENTIAL:
for i in range(len(self.inode_map)):
if self.inode_map[i] == -1:
# ugh: temporary holder until real on-disk location filled in
self.inode_map[i] = 1
return i
elif self.inode_policy == ALLOCATE_RANDOM:
# inefficiently ensure that space exists
# better done with counter of alloc/free but this is ok for now
space_exists = False
imap_len = len(self.inode_map)
for i in range(imap_len):
if self.inode_map[i] == -1:
space_exists = True
break
if not space_exists:
return -1
while True:
index = int(random.random() * imap_len)
if self.inode_map[index] == -1:
self.inode_map[index] = 1
return index
# no free inode found
return -1
def free_inode(self, inum):
assert(self.inode_map[inum] != -1)
self.inode_map[inum] = -1
return
def remap(self, inode_number, inode_address):
self.inode_map[inode_number] = inode_address
return
def dump_inode_map(self):
for i in range(len(self.inode_map)):
if self.inode_map[i] != -1:
print(' ', i, '->', self.inode_map[i])
print('')
return
def cr_sync(self):
# only place in code where an OVERWRITE occurs
self.disk[ADDR_CHECKPOINT_BLOCK] = copy.deepcopy({'block_type':BLOCK_TYPE_CHECKPOINT, 'entries': self.cr})
return 0
def get_inode_from_inumber(self, inode_number):
imap_entry_index = int(inode_number / NUM_INODES_PER_IMAP_CHUNK)
imap_entry_offset = inode_number % NUM_INODES_PER_IMAP_CHUNK
if self.use_disk_cr:
# this is the disk path
checkpoint_block = self.disk[ADDR_CHECKPOINT_BLOCK]
assert(checkpoint_block['block_type'] == BLOCK_TYPE_CHECKPOINT)
imap_block_address = checkpoint_block['entries'][imap_entry_index]
imap_block = self.disk[imap_block_address]
assert(imap_block['block_type'] == BLOCK_TYPE_IMAP)
inode_address = imap_block['entries'][imap_entry_offset]
else:
# this is the just-use-the-mem-inode_map path
inode_address = self.inode_map[inode_number]
assert(inode_address != -1)
inode = self.disk[inode_address]
assert(inode['block_type'] == BLOCK_TYPE_INODE)
return inode
def __lookup(self, parent_inode_number, name):
parent_inode = self.get_inode_from_inumber(parent_inode_number)
assert(parent_inode['type'] == INODE_DIRECTORY)
for address in parent_inode['pointers']:
if address == -1:
continue
directory_block = self.disk[address]
assert(directory_block['block_type'] == BLOCK_TYPE_DATA_DIRECTORY)
for entry_name, entry_inode_number in directory_block['entries']:
if entry_name == name:
return (entry_inode_number, parent_inode)
return (-1, parent_inode)
def __walk_path(self, path):
split_path = path.split('/')
if split_path[0] != '':
self.error_log('path malformed: must start with /')
return -1, '', -1, ''
inode_number = -1
parent_inode_number = ROOT_INODE # root inode number is well known
for i in range(1, len(split_path) - 1):
inode_number, inode = self.__lookup(parent_inode_number, split_path[i])
if inode_number == -1:
self.error_log('directory %s not found' % split_path[i])
return -1, '', -1, ''
if inode['type'] != INODE_DIRECTORY:
self.error_log('invalid element of path [%s] (not a dir)' % split_path[i])
return -1, '', -1, ''
parent_inode_number = inode_number
file_name = split_path[len(split_path) - 1]
inode_number, parent_inode = self.__lookup(parent_inode_number, file_name)
return inode_number, file_name, parent_inode_number, parent_inode
def update_imap(self, inum_list):
chunk_list = list()
for inum in inum_list:
cnum = self.inum_to_chunk(inum)
if cnum not in chunk_list:
chunk_list.append(cnum)
self.log(self.make_imap_chunk(cnum))
self.cr[cnum] = len(self.disk) - 1
return
def __read_dirblock(self, inode, index):
return self.disk[inode['pointers'][index]]
# return (inode_index, dirblock_index)
def __find_matching_dir_slot(self, name, inode):
for inode_index in range(inode['size']):
directory_block = self.__read_dirblock(inode, inode_index)
assert(directory_block['block_type'] == BLOCK_TYPE_DATA_DIRECTORY)
for slot_index in range(len(directory_block['entries'])):
entry_name, entry_inode_number = directory_block['entries'][slot_index]
if entry_name == name:
return inode_index, slot_index
return -1, -1
def __add_dir_entry(self, parent_inode, file_name, inode_number):
# this will be the directory block to contain the new name->inum mapping
inode_index, dirblock_index = self.__find_matching_dir_slot('-', parent_inode)
if inode_index != -1:
# there is room in existing block: make copy, update it, and log it
index_to_update = inode_index
parent_size = parent_inode['size']
new_directory_block = copy.deepcopy(self.__read_dirblock(parent_inode, inode_index))
new_directory_block['entries'][dirblock_index] = (file_name, inode_number)
else:
# no room in existing directory block: allocate new one IF there is room in inode to point to it
if parent_inode['size'] != NUM_INODE_PTRS:
index_to_update = parent_inode['size']
parent_size = index_to_update + 1
new_directory_block = self.make_empty_dirblock()
new_directory_block['entries'][0] = (file_name, inode_number)
else:
return -1, -1, {}
return index_to_update, parent_size, new_directory_block
# create (file OR dir)
def __file_create(self, path, is_file):
inode_number, file_name, parent_inode_number, parent_inode = self.__walk_path(path)
if inode_number != -1:
# self.error_log('create failed: file %s already exists' % path)
self.error_log('create failed: file already exists')
return -1
if parent_inode_number == -1:
self.error_log('create failed: walkpath returned error [%s]' % path)
return -1
# finally, allocate inode number for new file/dir
new_inode_number = self.allocate_inode()
if new_inode_number == -1:
self.error_log('create failed: no more inodes available')
return -1
# this will be the directory block to contain the new name->inum mapping
index_to_update, parent_size, new_directory_block = self.__add_dir_entry(parent_inode, file_name, new_inode_number)
if index_to_update == -1:
self.error_log('error: directory is full (path %s)' % path)
self.free_inode(new_inode_number);
return -1
# log directory data block (either new version of old OR new one entirely)
new_directory_block_address = self.log(new_directory_block)
# now have to make new version of directory inode
# update size (if needed), inc refs if this is a dir, point to new dir block addr
new_parent_inode = copy.deepcopy(parent_inode)
new_parent_inode['size'] = parent_size
if not is_file:
new_parent_inode['refs'] += 1
new_parent_inode['pointers'][index_to_update] = new_directory_block_address
# if directory, must create empty dir block
if not is_file:
self.log(self.make_new_dirblock(parent_inode_number, new_inode_number))
new_dirblock_address = len(self.disk) - 1
# and the new inode itself
if is_file:
# create empty file by default
new_inode = self.make_inode(itype=INODE_REGULAR, size=0, refs=1)
else:
# create directory inode and point it to the one dirblock it owns
new_inode = self.make_inode(itype=INODE_DIRECTORY, size=1, refs=2)
new_inode['pointers'][0] = new_dirblock_address
#
# ADD updated parent inode, file/dir inode TO LOG
#
new_parent_inode_address = self.log(new_parent_inode)
new_inode_address = self.log(new_inode)
# and new imap entries for both parent and new inode
self.remap(parent_inode_number, new_parent_inode_address)
self.remap(new_inode_number, new_inode_address)
# finally, create new chunk of imap
self.update_imap([parent_inode_number, new_inode_number])
# SYNC checkpoint region
if not self.no_force_checkpoints:
self.cr_sync()
return 0
# file_create()
def file_create(self, path):
self.error_clear()
return self.__file_create(path, True)
# dir_create()
def dir_create(self, path):
self.error_clear()
return self.__file_create(path, False)
# link()
def file_link(self, srcpath, dstpath):
self.error_clear()
src_inode_number, src_file_name, src_parent_inode_number, src_parent_inode = self.__walk_path(srcpath)
if src_inode_number == -1:
self.error_log('link failed, src [%s] not found' % srcpath)
return -1
src_inode = self.get_inode_from_inumber(src_inode_number)
if src_inode['type'] != INODE_REGULAR:
self.error_log('link failed: cannot link to non-regular file [%s]' % srcpath)
return -1
dst_inode_number, dst_file_name, dst_parent_inode_number, dst_parent_inode = self.__walk_path(dstpath)
if dst_inode_number != -1:
self.error_log('link failed, dst [%s] exists' % dstpath)
return -1
# this will be the directory block to contain the new name->inum mapping
dst_index_to_update, dst_parent_size, new_directory_block = self.__add_dir_entry(dst_parent_inode, dst_file_name, src_inode_number)
if dst_index_to_update == -1:
self.error_log('error: directory is full [path %s]' % dstpath)
return -1
# log directory data block (either new version of old OR new one entirely)
new_directory_block_address = self.log(new_directory_block)
# now have to make new version of directory inode
# update size (if needed), inc refs if this is a dir, point to new dir block addr
new_dst_parent_inode = copy.deepcopy(dst_parent_inode)
new_dst_parent_inode['size'] = dst_parent_size
new_dst_parent_inode['pointers'][dst_index_to_update] = new_directory_block_address
# ADD updated parent inode TO LOG
new_dst_parent_inode_address = self.log(new_dst_parent_inode)
# inode must change too: to reflect NEW refs count
new_src_inode = copy.deepcopy(src_inode)
new_src_inode['refs'] += 1
new_src_inode_address = self.log(new_src_inode)
# and new imap entries for both parent and new inode
self.remap(dst_parent_inode_number, new_dst_parent_inode_address)
self.remap(src_inode_number, new_src_inode_address)
# finally, create new chunk of imap
self.update_imap([dst_parent_inode_number])
# SYNC checkpoint region
if not self.no_force_checkpoints:
self.cr_sync()
return 0
def file_write(self, path, offset, num_blks):
self.error_clear()
# just make up contents of data blocks - up to the max spec'd by write
# note: may not write all of these, because of running out of room in inode...
contents = self.make_random_blocks(num_blks)
inode_number, file_name, parent_inode_number, parent_inode = self.__walk_path(path)
if inode_number == -1:
self.error_log('write failed: file not found [path %s]' % path)
return -1
inode = self.get_inode_from_inumber(inode_number)
if inode['type'] != INODE_REGULAR:
self.error_log('write failed: cannot write to non-regular file %s' % path)
return -1
if offset < 0 or offset >= NUM_INODE_PTRS:
self.error_log('write failed: bad offset %d' % offset)
return -1
# create potential write list -- up to max file size
current_log_ptr = len(self.disk)
current_offset = offset
potential_writes = []
while current_offset < NUM_INODE_PTRS and current_offset < offset + len(contents):
potential_writes.append((current_offset, current_log_ptr))
current_offset += 1
current_log_ptr += 1
# write data block(s)
for i in range(len(potential_writes)):
self.log(self.make_data_block(contents[i]))
# write new version of inode, with updated size
new_inode = copy.deepcopy(inode)
new_inode['size'] = current_offset
for new_offset, new_addr in potential_writes:
new_inode['pointers'][new_offset] = new_addr
new_inode_address = self.log(new_inode)
# write new chunk of imap
self.remap(inode_number, new_inode_address)
self.log(self.make_imap_chunk(self.inum_to_chunk(inode_number)))
self.cr[self.inum_to_chunk(inode_number)] = len(self.disk) - 1
# write checkpoint region
if not self.no_force_checkpoints:
self.cr_sync()
# return size of write (total # written, not desired, may be less than asked for)
return current_offset - offset
def file_delete(self, path):
self.error_clear()
inode_number, file_name, parent_inode_number, parent_inode = self.__walk_path(path)
if inode_number == -1:
self.error_log('delete failed: file not found [%s]' % path)
return -1
inode = self.get_inode_from_inumber(inode_number)
if inode['type'] != INODE_REGULAR:
self.error_log('delete failed: cannot delete non-regular file [%s]' % path)
return -1
# have to check: is the file actually down to its last ref?
if inode['refs'] == 1:
self.free_inode(inode_number)
# now, find entry in DIRECTORY DATA BLOCK and zero it
inode_index, dirblock_index = self.__find_matching_dir_slot(file_name, parent_inode)
assert(inode_index != -1)
new_directory_block = copy.deepcopy(self.__read_dirblock(parent_inode, inode_index))
new_directory_block['entries'][dirblock_index] = ('-', -1)
# this leads to DIRECTORY DATA, DIR INODE, (and hence IMAP_CHUNK, CR_SYNC) writes
dir_addr = self.log(new_directory_block)
new_parent_inode = copy.deepcopy(parent_inode)
new_parent_inode['pointers'][inode_index] = dir_addr
new_parent_inode_addr = self.log(new_parent_inode)
self.remap(parent_inode_number, new_parent_inode_addr)
# if this ISNT the last link, decrease ref count and output new version
if inode['refs'] > 1:
new_inode = copy.deepcopy(inode)
new_inode['refs'] -= 1
new_inode_addr = self.log(new_inode)
self.remap(inode_number, new_inode_addr)
# create new chunk of imap
self.update_imap([inode_number, parent_inode_number])
# and sync if need be
if not self.no_force_checkpoints:
self.cr_sync()
return 0
def sync(self):
self.error_clear()
return self.cr_sync()
#
# HELPERs for main
#
def pick_random(a_list):
if len(a_list) == 0:
return ''
index = int(random.random() * len(a_list))
return a_list[index]
def make_random_file_name(parent_dir):
L1 = chr(ord('a') + int(random.random() * 26))
L2 = chr(ord('a') + int(random.random() * 26))
N1 = str(int(random.random() * 10))
if parent_dir == '/':
return '/' + L1 + L2 + N1
return parent_dir + '/' + L1 + L2 + N1
#
# must be in format: cXX,wXX,etc
# where first letter is command and XX is percent (from 0-100)
#
def process_percentages(percentages):
tmp = percentages.split(',')
csum = 0
for p in tmp:
cmd = p[0]
value = int(p[1:])
if value < 0:
print('percentages must be positive or zero')
exit(1)
csum += int(value)
if csum != 100:
print('percentages do not add to 100')
exit(1)
p_array = {}
cmd_list = ['c', 'w', 'd', 'r', 'l', 's']
for c in cmd_list:
p_array[c] = (0, 0)
csum = 0
for p in tmp:
cmd = p[0]
if cmd not in cmd_list:
print('bad command', cmd)
exit(1)
value = int(p[1:])
p_array[cmd] = (csum, csum + value)
csum += value
for i in p_array:
p_array[i] = (p_array[i][0] / 100.0, p_array[i][1] / 100.0)
return p_array
def make_command_list(num_commands, percent):
command_list = ''
existing_files = []
existing_dirs = ['/']
while num_commands > 0:
chances = random.random()
command = ''
if chances >= percents['c'][0] and chances < percents['c'][1]:
pdir = pick_random(existing_dirs)
if pdir == '':
continue
nfile = make_random_file_name(pdir)
command = 'c,%s' % nfile
existing_files.append(nfile)
elif chances >= percents['w'][0] and chances < percents['w'][1]:
pfile = pick_random(existing_files)
if pfile == '':
continue
woff = int(random.random() * 8)
wlen = int(random.random() * 8)
command = 'w,%s,%d,%d' % (pfile, woff, wlen)
elif chances >= percents['d'][0] and chances < percents['d'][1]:
pdir = pick_random(existing_dirs)
if pdir == '':
continue
ndir = make_random_file_name(pdir)
command = 'd,%s' % ndir
existing_dirs.append(ndir)
elif chances >= percents['r'][0] and chances < percents['r'][1]:
if len(existing_files) == 0:
continue
index = int(random.random() * len(existing_files))
command = 'r,%s' % existing_files[index]
del existing_files[index]
elif chances >= percents['l'][0] and chances < percents['l'][1]:
if len(existing_files) == 0:
continue
index = int(random.random() * len(existing_files))
pdir = pick_random(existing_dirs)
if pdir == '':
continue
nfile = make_random_file_name(pdir)
command = 'l,%s,%s' % (existing_files[index], nfile)
existing_files.append(nfile)
elif chances >= percents['s'][0] and chances < percents['s'][1]:
command = 's'
else:
print('abort: internal error with percent operations')
exit(1)
if command_list == '':
command_list = command
else:
command_list += ':' + command
num_commands -= 1
return command_list
#
# MAIN program
#
parser = OptionParser()
parser.add_option('-s', '--seed', default=0, help='the random seed', action='store', type='int', dest='seed')
parser.add_option('-N', '--no_force', help='Do not force checkpoint writes after updates', default=False, action='store_true', dest='no_force_checkpoints')
parser.add_option('-F', '--no_final', help='Do not show the final state of the file system', default=False, action='store_true', dest='no_final')
parser.add_option('-D', '--use_disk_cr', help='use disk (maybe old) version of checkpoint region', default=False, action='store_true', dest='use_disk_cr')
parser.add_option('-c', '--compute', help='compute answers for me', action='store_true', default=False, dest='solve')
parser.add_option('-o', '--show_operations', help='print out operations as they occur', action='store_true', default=False, dest='show_operations')
parser.add_option('-i', '--show_intermediate', help='print out state changes as they occur', action='store_true', default=False, dest='show_intermediate')
parser.add_option('-e', '--show_return_codes', help='show error/return codes', action='store_true', default=False, dest='show_return_codes')
parser.add_option('-v', '--show_live_paths', help='show live paths', action='store_true', default=False, dest='show_live_paths')
parser.add_option('-n', '--num_commands', help='generate N random commands', action='store', default=3, dest='num_commands')
parser.add_option('-p', '--percentages', help='percent chance of: createfile,writefile,createdir,rmfile,linkfile,sync (example is c30,w30,d10,r20,l10,s0)', action='store', default='c30,w30,d10,r20,l10,s0', dest='percentages')
parser.add_option('-a', '--allocation_policy', help='inode allocation policy: "r" for "random" or "s" for "sequential"', action='store', default='s', dest='inode_policy')
parser.add_option('-L', '--command_list', default = '', action='store', type='str', dest='command_list', help='command list in format: "cmd1,arg1,...,argN:cmd2,arg1,...,argN:... where cmds are: c:createfile, d:createdir, r:delete, w:write, l:link, s:sync format: c,filepath d,dirpath r,filepath w,filepath,offset,numblks l,srcpath,dstpath s')
(options, args) = parser.parse_args()
random.seed(options.seed)
command_list = options.command_list
num_commands = int(options.num_commands)
percents = process_percentages(options.percentages)
if options.inode_policy == 's':
inode_policy = ALLOCATE_SEQUENTIAL
elif options.inode_policy == 'r':
inode_policy = ALLOCATE_RANDOM
else:
print('bad policy', options.inode_policy)
exit(1)
# where most of the work is done
L = LFS(use_disk_cr=options.use_disk_cr,
no_force_checkpoints=options.no_force_checkpoints,
inode_policy=inode_policy,
solve=options.solve)
# what to show
print_operation = options.show_operations
print_intermediate = options.show_intermediate
# generate some random commands
if command_list == '':
if num_commands < 0:
print('num_commands must be greater than zero', num_commands)
exit(1)
command_list = make_command_list(num_commands, percents)
print('')
print('INITIAL file system contents:')
L.dump(True)
L.dump_last = 4 # ugly ... but needed to make intermediate dumps correct
print('')
#
# this variant allows control over each command
#
files_that_exist = []
dirs_that_exist = []
if command_list != '':
commands = command_list.split(':')
for i in range(len(commands)):
command_and_args = commands[i].split(',')
if command_and_args[0] == 'c':
assert(len(command_and_args) == 2)
if print_operation:
print('create file', command_and_args[1], end=' ')
rc = L.file_create(command_and_args[1])
if rc == 0:
files_that_exist.append(command_and_args[1])
elif command_and_args[0] == 'd':
assert(len(command_and_args) == 2)
if print_operation:
print('create dir ', command_and_args[1], end=' ')
rc = L.dir_create(command_and_args[1])
if rc == 0:
dirs_that_exist.append(command_and_args[1])
elif command_and_args[0] == 'r':
assert(len(command_and_args) == 2)
if print_operation:
print('delete file', command_and_args[1], end=' ')
rc = L.file_delete(command_and_args[1])
if rc == 0:
if command_and_args[1] in files_that_exist:
files_that_exist.remove(command_and_args[1])
else:
print('warning: cannot find file', command_and_args[1])
elif command_and_args[0] == 'l':
assert(len(command_and_args) == 3)
if print_operation:
print('link file ', command_and_args[1], command_and_args[2], end=' ')
rc = L.file_link(command_and_args[1], command_and_args[2])
if rc == 0:
files_that_exist.append(command_and_args[2])
elif command_and_args[0] == 'w':
assert(len(command_and_args) == 4)
if print_operation:
print('write file %s offset=%d size=%d' % (command_and_args[1], int(command_and_args[2]), int(command_and_args[3])), end=' ')
rc = L.file_write(command_and_args[1], int(command_and_args[2]), int(command_and_args[3]))
elif command_and_args[0] == 's':
if print_operation:
print('sync', end=' ')
rc = L.sync()
else:
print('command not understood so skipping [%s]' % command_and_args[0])
if not print_operation:
print('command?', end=' ')
if print_intermediate:
print('')
print('')
if command_and_args[0] == 's':
L.dump_partial(False, True)
else:
L.dump_partial(False, False)
print('')
if options.show_return_codes:
print('->', rc)
L.error_dump()
else:
print('')
#if not print_intermediate:
# print('\nChanges to log, checkpoint region?')
# print('')
if not options.no_final:
print('')
print('FINAL file system contents:')
L.dump(False)
print('')
if options.show_live_paths:
print('Live directories: ', dirs_that_exist)
print('Live files: ', files_that_exist)
print('')
else:
print('')