TS-75xx best practices SD cards

From Technologic Systems Manuals (sandbox)

Disk corruption is a common issue in embedded development and considerations must be made for a robust system. When used correctly, the Sandisk SD cards we include should provide a total write lifetime of around 8TB.

Counterfit cards, or bad media

It's been quoted from a Sandisk engineer that a third of the Sandisk branded flash cards on the planet are fake. We recommend Sandisk SD cards as that is what we use for our testing, but make sure you use a reputable source for acquiring any flash media.

We recommend avoiding ATP flash media as well as "Industrial SD cards". We have experienced corruption with Industrial Cards, though they seem to work if multiwrite is disabled but this makes the card extremely slow to write.

Interrupting a Write

Most issues are caused by interrupting a write to the storage media by disconnecting power. In a normal Linux environment for a server or desktop the start up and shutdown sequences should be very predictable, but on an embedded system a safe shutdown cannot always be guaranteed.

The most common issue is when powering off SD cards in the middle of a write. SD cards usually use MLC NAND flash for storage coupled with a manufacturer and model specific firmware. The NAND flash has a limitation where in order to perform a single byte write it must first read about 128KB to 256KB (or more) containing that byte into memory and erase that sector on the NAND chip. This is the erase block size and can vary based on the card. It takes the block in memory, changes the single byte in that copy, erases the intended location on the flash and then commits it back to the disk.

Most SD card firmwares also contain a wear leveling mechanism where they maintain a logical to physical mapping. This means that writing a contiguous file may actually end up in different areas in the NAND chip. If you interrupt a write cycle where it has erased a block, but not yet committed changes to the disk it will have lost data seemingly randomly across the card. There are several strategies you can adopt to avoid or limit your chance of corruption.

The most safe method is possible if you do not need to perform any writes that need to persist across reboots. If you are designing a data logger this is certainly not a good option, but if you're only responding to outside I/O this is your best choice. Once you get your application developed and ready for deployment you should try running from the initrd. This is already a read only filesystem which will never write to the disk. Powering off in the middle of a read is still safe. You can still write data to /tmp/ which will go to memory, but it will be lost on a poweroff. If you require the full Debian filesystem you could use the linuxrc-sdroot-readonly startup script. This will mount the Debian filesystem on the SD card as read only, but will commit any attempted writes to a ramdisk using unionFS. The downside to this configuration while booting to Debian is that you have to manage all writes to the system or risk filling up your memory and causing a crash.

The next best method is to use a battery backup. Most UPS backup solutions, or one you build yourself should contain a method to see the battery level. Once you reach lower levels you can simply run a "shutdown -h now". When power is available the board will boot back up, but you may want to check in the initrd while it is read only if the battery is back to an acceptable level before making the SD card read/write.

The last option is to limit your writes to reduce the chance of corruption. Make all of your writes to a ramdisk like /tmp/ and copy them to the physical media periodically or when you know power will be safe. This is the option many consumer electronics choose. You can make writes more predictable by mounting with the options "sync" which will stop linux from buffering writes in RAM, and "noatime" which will prevent linux from writing access times when reading a file.

If your application is losing power from users disconnecting it or powering it off you may want to consider using an LED to indicate when it is not safe to turn off. Most people have seen cameras or video games that say "Do not remove power while X is displaying" which is usually a graphic of a disk or an LED like on floppy drives to indicate that there is a write that should not be interrupted. You can implement this very simply with a system call:

     system("ts7500ctl --redledon");
     int ret = write(...);
     system("ts7500ctl --redledoff");

The usleep allows the user time to react to a new write. Most people should be able to react to an LED in about 215ms, but to be safe I would use a higher number.

You may also want to consider using the XNAND which from our testing has proven to be much more reliable with sudden power loss.

Other Resources