Compressed File Systems on Linux

2010 March 23 at 08:53 » Tagged as :fuse,

Perhaps the title should have been; 'the lack of a suitable compressed file system on linux'. A compressed file system in this case refers to a setup where the files are saved on the disk in a predefined compressed format (such as gzip or bzip2). When you read from those files they will be automatically decompressed by the file system. Similarly when you attempt to create a  new file or modify an old file, it should be automatically compressed before saving. Such a file system is sure to be very slow for random access but for sequential access it wouldn't matter so much. It might even be faster than an uncompressed file system because hard drives continues to be the real bottleneck in most computers today. Linux gives you two options for creating file systems; at the kernel level or in the user space.  e2compr is a kernel patch that supports compressing an ext2 file system while the user space tools are based on fuse. Using kernel space drivers could be messy if the file system chosen like e2compr is always playing catchup with the kernel version. The only compressed file systems supported in the mainstream kernel are jffs2 and squashfs.  The former is for flash drives and the latter is ready only. Though JFS is available for linux, it's compression mode is not.  While both KDE's Dolphin and Gnome's Nautilus support mounting a gzipped tar archive of a zip file as a node in the file system in read/write mode, they are not suitable for web apps. So fuse seems  the way to go. There are eight file systems listed on the Fuse compressed file system page. Four are read only, two are abandoned, two more are heading down that route. One is in it's early development stage. The remainder was FuseCompress. This one is real easy to install just do 'yum install fusecompress' setup is even easier.

fusecompress  /mnt/compressed   /mnt/uncompressed

Copying over a set of 30,000 files (a mix of text and binary data) with a total weight of 85MB took 36 seconds. The same operation when repeated, completed in 20 seconds because of file system caching.  When the files were copied without compression the operation completed in 1 second, so much for my assumption that a compressed system would be faster.