Compressed File Systems on Linux

Perhaps the title should have been; ‘the lack of a suitable compressed file system on linux’. A compressed file system in this case refers to a setup where the files are saved on the disk in a predefined compressed format (such as gzip or bzip2). When you read from those files they will be automatically decompressed by the file system. Similarly when you attempt to create a  new file or modify an old file, it should be automatically compressed before saving. Such a file system is sure to be very slow for random access but for sequential access it wouldn’t matter so much. It might even be faster than an uncompressed file system because hard drives continues to be the real bottleneck in most computers today.

Linux gives you two options for creating file systems; at the kernel level or in the user space.  e2compr is a kernel patch that supports compressing an ext2 file system while the user space tools are based on fuse. Using kernel space drivers could be messy if the file system chosen like e2compr is always playing catchup with the kernel version. The only compressed file systems supported in the mainstream kernel are jffs2 and squashfs.  The former is for flash drives and the latter is ready only. Though JFS is available for linux, it’s compression mode is not.  While both KDE’s Dolphin and Gnome’s Nautilus support mounting a gzipped tar archive of a zip file as a node in the file system in read/write mode, they are not suitable for web apps. So fuse seems  the way to go.

There are eight file systems listed on the Fuse compressed file system page. Four are read only, two are abandoned, two more are heading down that route. One is in it’s early development stage. The remainder was FuseCompress.

This one is real easy to install just do ‘yum install fusecompress’ setup is even easier.

fusecompress  /mnt/compressed   /mnt/uncompressed

Copying over a set of 30,000 files (a mix of text and binary data) with a total weight of 85MB took 36 seconds. The same operation when repeated, completed in 20 seconds because of file system caching.  When the files were copied without compression the operation completed in 1 second, so much for my assumption that a compressed system would be faster.

Mar 23rd, 2010 | Posted in Linux
  1. iggykoopa
    Apr 3rd, 2010 at 19:52 | #1

    Have you looked into using btrfs with compression turned on? It worked pretty well for me when I was testing it out.

  2. Thomas Martin Klein
    Jul 12th, 2011 at 01:32 | #2

    On the fly compression is not something one does for speed. I do not know of any compression that can outrun a good disk. And If we say that it is the same speed as disc access, and it compresses to a 50% ratio, Then you get the same speed. If you are looking for faster, then the compression algorithm has to be faster, or better. This is far from impossible in certain scenarios. But for an average when more and more compressed file types are available, i doubt it is possible.
    On a good machine you can get 5-8Mbytes/s of gzip speed with full CPU usage. It is not even close to disk speeds.

    • Feb 24th, 2012 at 05:46 | #3

      @Thomas: learn about lzop, pigz, and pbzip2. Then think about people like me who have 48 core machines. CPU is not an issue. Disk I/O is.

  3. Tim Omaha
    Mar 7th, 2012 at 23:22 | #4

    On the fly compression/decompression is great for backing up your system, and other data or archives on external USB2.0 hard drives. If I’m using a backup program like “Back in Time” for my laptop with linux that uses differential backups, and provides a time line to my external hard drive, file system compression is ideal. Also good for storing text and documents, and all those other wonderful things that need to be archived/backed up and can really benefit from the compression.

  4. Chris Cottrell
    Aug 29th, 2012 at 20:11 | #5

    I would be more interested in the test results of disk compression to mitigate network latency for remote mounts such that fewer data packets are crossing the wire in either direction in order to read/write the same data.

  5. Anthony
    Feb 5th, 2013 at 23:55 | #6

    I currently have my central syslog system using ZFS, which achieves roughly 20x compression. Sun was always difficult to deal with, but Oracle takes it to a whole new level, and their ongoing support of general-purpose servers is very much in question, so we find ourselves migrating to RHEL. The btrfs there is ancient and takes research to use.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>