Sunday, March 9, 2014

Docs: JournalFS and TokenFS classes.

I released a chunk of Arduino code this week, and one of the classes is a "journalling" (or "log-structured") filesystem for extremely small Flash/EEPROM storage, using absolutely minimal resources, in about three hundred lines of code.

Compiled size is about 660 bytes for an instance of TokenFS. The virtual memory classes add about 930 bytes more, if you're starting from naught, but I think that includes EEPROM.h and some other system dependencies too.

When I say 'extremely small', I mean the 1024 bytes of EEPROM in the Atmega32U4 inside the Arduino. One kilobyte of storage, total. That's smaller than the minimum block size of many other filesystems.

The primary job of such a filesystem is to reduce degradation of the storage, by smoothing out the write access pattern so damage is spread evenly over the whole chip, and not concentrated in a few 'hot blocks'. Flash storage is generally guaranteed 10,000 'write cycles' before media errors start to occur. That's per byte location, so the perfect ideal is to have every byte on the storage written 9,999 times before you have to tick over the 10K odometer on any of them.

It's like having a blackboard, but where the eraser is made from sandpaper. It works - but not forever.

One nice thing: the Atmel AVR's EEPROM (and program flash on some variants) is 'hardened' to endure 100K write cycles, which partly makes up for the tiny size.

JournalFS

  • Levelled write distribution over entire storage.
  • Resistant to corruption, power fails, and zero space issues.
  • Guaranteed success if volume free space equals or exceeds block size.
  • Minimal RAM use - head and tail pointer state only. (a dozen bytes)
  • Performs automatic incremental and 'emergency' defragmenting.
  • Basic journal - only understands blocks
  • 256 bytes per block maximum (currently) 
  • 5 bytes per block overhead, 100% storage utilization. 
  • 64K storage size limit.
  • Storage is scanned/checked on mount, twice.
  • Tested on AVR Arduinos, but should work on SAM machines as well.
  • Virtual Memory classes mean filesystems can be stored in RAM, EEPROM, Program Flash (read only currently), or user-defined storage like SD cards.


Built on top of the base class is what most people think of as a 'filesystem', which is an indexed catalog of things in the storage. In this case, a single byte 'token' key is used instead of a filename string.

TokenFS

  • Adds 'byte token' index on top of JournalFS. 
  • Index size passed as instance parameter - maximum 256 'entries'.
  • Moderate RAM use: the index requires one word (two bytes) per entry. (max 512 bytes)
  • Writing data block for a key replaces existing data.
  • Writing empty blocks for a key is equivalent to 'file deletion'
  • Guaranteed success if replacement block is same size or smaller than previous version.
  • No practical difference between 'missing entries' and 'zero length files' - all keys return a result.
  • Test cases wrote 200,000,000 random 'file updates' to a 1K RAMDrive without unexpected data loss - exceeding Flash hardware reliability by 1,000 times.
What's the difference? JournalFS takes care of the utterly 'low' level details, but tries not to worry too much about what it's storing. By itself it would be perfect for diary-like 'logfiles' where you just append another record every once in a while, and throw away the oldest to make room.

TokenFS assumes the first byte stored in each block is a 'token key', and maintains an in-RAM index of all the keys it finds. The index is a catalog of pointers to the storage location where the 'latest' version of that key is kept. This is kind of like the logfile, but instead of throwing away the global oldest, it's smarter about throwing away obsolete blocks for individual keys.

TokenFS is the exemplar for how to extend JournalFS by overriding block_state(), which is how JournalFS notifies descendants about blocks found, and asks if they are still valid. 

I need to make it very clear that this "Filesystem" will not scale. It is not intended for large storage devices. The primary issue is the need to scan the filesystem on mount to 'catch up' with the journal - which means reading the headers of all the blocks in storage. When you have a ridiculously small amount of storage, that's fine. If you have megabytes, that time becomes significant.

But if you need to store a bunch of 'configuration options' in an Arduino that are independent of the program flash, this is the way.

Usage

Loading the library and defining some storage is pretty easy. This is how the RAMDrive is initialized in the filesystem test.

  #include <unorthodox.h>      // include the library, of course
  
  byte storage[1024];          // create a 1024 byte buffer
  MemoryPage store(storage);   // wrap it in a virtual page
  TokenFS fs(&store,1024,192); // 192-entry tokenfs 

In most cases you'll be using EEPROM, which is even simpler.

  #include <EEPROM.h>          // do this first
  #include <unorthodox.h>      // to switch on more stuff in here
  
  EEPROMPage eeprom(0);                   // wrap EEPROM in a virtual page
  TokenFS fs(&eeprom, 1024, entries);     // start the token index

If you wanted to use only a portion (say, leaving the first 128 bytes free) you can do so:

  EEPROMPage eeprom(128);                 // start at byte 128
  TokenFS fs(&eeprom, 1024-128, entries); // use until the end

Writing TokenFS entries is currently a little clumsy, for various reasons. To avoid re-packing the block later, we prepend the token byte ourselves. (It has been a challenge making it 'seem right' but also have good performance.)


  byte block[size+1];
  // manually prepend the token byte
  block[0] = token;
  // fill the rest of the block with data:
  // ....
  // wrap the array in virtual memory
  MemoryPage page(block);
  // write it to the filesystem
  if( fs.token_write(token, &page, size+1) ) {
     // it fit. yay!
  } else {
     // it didn't. boo!
     Serial.print("\n Write Failed ");
  }
  // page is automatically freed when we leave scope

Reading entries back from TokenFS is the most straightforward, token_page() will return a virtual memory page containing the payload content, (or null) and token_size() will tell you how much of it you should read. The token byte is not included.

  int size = fs.token_size(token);
  if(size) {
    Page * page = fs.token_page(token);
    for(int i=0; i<size; i++) {
      byte b = page->read_byte(i);
      // process each byte...
    }
    // free page object
    delete page;
  }

Generally, it's a good idea to call token_size() first, because if  zero, there's no point calling token_page().  Note the Page object is allocated by the call, so delete it when you are done to avoid memory leaks. (Treat it similarly to a 'file handle'.)

If you really need speed, you can avoid instancing new virtual memory Page objects by interrogating the token index directly to get the storage offsets.

There is generally no need to 'format' a volume before use - 0x00 or 0xFF filled blocks (the most common occurrence for EEPROM) are read as 'corrupt' blocks, which indicates an empty journal. If no existing journal can be found, a new one is started from scratch. The likelihood of random data being recognized as valid blocks is very small.

The exception would be if there had previously been a filesystem installed that was similar in structure, but actually different. In that case, clearing out the EEPROM may be necessary. (Or at least the first dozen bytes)

Applications

To understand why you want all this, imagine you have ten 'options' that you keep in the EEPROM for motor speeds and current limits and other important things. (If you keep it in program flash, it gets blown away with every software update. It's not good for a bugfix to reset all the controller defaults, sometimes.)

If you put them in ten well-known locations in the EEPROM (which is the simplest method - you just document the locations and manage them with newsgroups, not code) then your microcontroller will become corrupt when the first of those options is overwritten 10,000 times. We assume it's a hard wall, to be conservative.

But if we're using a diary-like journal, we need to fill the whole storage with updates before we loop and start to overwrite the old locations again. If the same few records are being changed over and over, this is 'journalled' across the whole storage, and you will get ten or a hundred times as many updates before the storage begins to corrupt.

10,000 updates sounds like a lot, but that's once an hour for 400 days. Lots of systems recalibrate once an hour. Or restart twenty times a day. Journalling can extend 'expected lifetime' of an embedded system from a year, to decades. That is not a trivial outcome.

The trick with these kinds of filesystems is to never store the index inside the volume. Because any central directory catalog becomes the 'hot-spot' that logically must be updated more often than the files it contains. That's why TokenFS must rebuild its RAM index each time the volume is mounted. This is the trade-off... without a "pre-compiled" index available, it takes much longer to start up. It has to read the whole diary to the end.

Limitations

  • JournalFS block size is currently encoded in one byte, so up to 256 bytes per block.
  • TokenFS uses the first byte as a key token, so maximum payload is 255 bytes.
  • 16-bit words are used for storage pointers, so 64K is the maximum volume size.
  • As mentioned, the need to do a full 'journal scan' may exceed acceptable time limits on large volumes.
  • Defragmenting 'thrash' increases as the free space approaches zero. In the worst case of a completely full volume, it must defragment the entire storage (burning up one entire pass) in order to update a single entry. Do not run the filesystem at 100% capacity. Try to keep 10% free, enough for a couple of blocks at least, so the defragmenter can find space without too much trouble.
  • Writing will invalidate any Page objects already obtained from token_page(), since defrag will move storage blocks around. Don't cache pages, look them up when you need them.  

Non-Limitations

  • Blocks can be larger than the amount of available RAM.

No comments:

Post a Comment