Monday, October 14, 2013

Setting up Flashcache the hard way and some talk about initramfs Text

http://blog.viraptor.info/post/45310603661/setting-up-flashcache-the-hard-way-and-some-talk-about

If you follow the latest versions of… everything and tried to install flashcache you probably noticed that none of the current guides are correct regarding how to install it. Or they are mostly correct but with some bits missing. So here’s an attempt to do a refreshed guide. I’m using kernel version 3.7.10 and mkinitcpio version 0.13.0 (this actually matters, the interface for adding hooks and modules has changed).
Some of the guide is likely to be Arch-specific. I don’t know how much, so please watch out if you’re using another system. I’m going to explain why things are done the way they are, so you can replicate them under other circumstances.

Why flashcache?

First, what do I want to achieve? I’m setting up a system which has a large spinning disk (300GB) and a rather small SSD (16GB). Why such a weird combination? Lenovo allowed me to add a free 16GB SSD drive to the laptop configuration - couldn’t say no ;) The small disk is not useful for a filesystem on its own, but if all disk writes/reads were cached on it before writing them back to the platters, it should give my system a huge performance gain without a huge money loss. Flashcache can achieve exactly that. It was written by people working for Facebook to speed up their databases, but it works just as well for many other usage scenarios.
Why not other modules like bcache or something else dm-based? Because flashcache does not require kernel modifications. It’s just a module and a set of utilities. You get a new kernel and they “just work” again - no source patching required. I’m excited about the efforts for making bcache part of the kernel and for the new dm cache target coming in 3.9, but for now flashcache is what’s available in the easiest way.
I’m going to set up two SSD partitions because I want to cache two real partitions. There has to be a persistent 1:1 mapping between the cache and real storage for flashcache to work. One of the partitions is home (/home), the other is the root (/).

Preparation

Take backups, make sure you have a bootable installer of your system, make sure you really want to try this. Any mistake can cost you all the contents of your harddrive or break your grub configuration, so that you’ll need an alternative method of accessing your system. Also some of your “data has been written” guarantees are going to disappear. You’ve been warned.

Building the modules and tools

First we need the source. Make sure your git is installed and clone the flashcache repository: https://github.com/facebook/flashcache
Then build it, specifying the path where the kernel source is located - in case you’re in the middle of a version upgrade, this is the version you’re compiling for, not the one you’re using now:
make KERNEL_TREE=/usr/src/linux-3.7.10-1-ARCH KERNEL_SOURCE_VERSION=3.7.10-1-ARCH
sudo make KERNEL_TREE=/usr/src/linux-3.7.10-1-ARCH KERNEL_SOURCE_VERSION=3.7.10-1-ARCH install
There should be no surprises at all until now. The above should install a couple of things - the module and 4 utilities:
/usr/lib/modules/<version>/extra/flashcache/flashcache.ko
/sbin/flashcache_load
/sbin/flashcache_create
/sbin/flashcache_destroy
/sbin/flashcache_setioctl
The module is the most interesting bit at the moment, but to load the cache properly at boot time, we’ll need to put those binaries on the ramdisk.

Configuring ramdisk

Arch system creates the ramdisk using mkinitcpio (which is a successor to initramfs (which is a successor to initrd)) - you can read some more about it at Ubuntu wiki for example. The way this works is via hooks configured in /etc/mkinitcpio.conf. When the new kernel gets created, all hooks from that file are run in the defined order to build up the contents of what ends up in /boot/initramfs-linux.img (unless you changed the default).
The runtime scripts live in /usr/lib/initcpio/hooks while the ramdisk building elements live in /usr/lib/initcpio/install. Now the interesting part starts: first let’s place all needed bits into the ramdisk, by creating install hook /usr/lib/initcpio/install/flashcache :
# vim: set ft=sh:

build ()
{
    add_module "dm-mod"
    add_module "flashcache"

    add_dir "/dev/mapper"
    add_binary "/usr/sbin/dmsetup"
    add_binary "/sbin/flashcache_create"
    add_binary "/sbin/flashcache_load"
    add_binary "/sbin/flashcache_destroy"
    add_file "/lib/udev/rules.d/10-dm.rules"
    add_file "/lib/udev/rules.d/13-dm-disk.rules"
    add_file "/lib/udev/rules.d/95-dm-notify.rules"
    add_file "/lib/udev/rules.d/11-dm-lvm.rules"

    add_runscript
}

help ()
{
cat<<HELPEOF
  This hook loads the necessary modules for a flash drive as a cache device for your root device.
HELPEOF
}
This will add the required modules (dm-mod and flashcache), make sure mapper directory is ready, install the tools and add some useful udev disk discovery rules. Same rules are included in the lvm2 hook (I assume you’re using it anyway), so there is an overlap, but this will not cause any conflicts.
The last line of the build function makes sure that the script with runtime hooks will be included too. That’s the file which needs to ensure everything is loaded at boot time. It should contain function run_hook which runs after the modules are loaded, but before the filesystems are mounted, which is a perfect time for additional device setup. It looks like this and goes into /usr/lib/initcpio/hooks/flashcache:
#!/usr/bin/ash

run_hook ()
{
    if [ ! -e "/dev/mapper/control" ]; then
        /bin/mknod "/dev/mapper/control" c $(cat /sys/class/misc/device-mapper/dev | sed 's|:| |')
    fi

    [ "${quiet}" = "y" ] && LVMQUIET=">/dev/null"

    msg "Activating cache volumes..."
    oIFS="${IFS}"
    IFS=","
    for disk in ${flashcache_volumes} ; do
        eval /usr/sbin/flashcache_load "${disk}" $LVMQUIET
    done
    IFS="${oIFS}"
}

# vim:set ft=sh:
Why the crazy splitting and where does flashcache_volumes come from? It’s done so that the values are not hardcoded and adding a volume doesn’t require rebuilding initramfs. Each variable set as kernel boot parameter is visible in the hook script, so adding a flashcache_volumes=/dev/sdb1,/dev/sdb2 will activate both of those volumes. I just add that to the GRUB_CMDLINE_LINUX_DEFAULT variable in /etc/default/grub.
The commands for loading sdb1, sdb2 are in my case the partitions on the SSD drive - but you may need to change those to match your environment.
Additionally if you’re attempting to have your root filesystem handled by flashcache, you’ll need two more parameters. One is of course root=/dev/mapper/cached_system and the second is lvmwait=/dev/maper/cached_system to make sure the device is mounted before the system starts booting.
At this point regenerating the initramfs (sudo mkinitcpio -p linux) should work and print out something about included flashcache. For example:
==> Building image from preset: 'default'
  -> -k /boot/vmlinuz-linux -c /etc/mkinitcpio.conf -g /boot/initramfs-linux.img
==> Starting build: 3.7.10-1-ARCH
  -> Running build hook: [base]
  -> Running build hook: [udev]
  -> Running build hook: [autodetect]
  -> Running build hook: [modconf]
  -> Running build hook: [block]
  -> Running build hook: [lvm2]
  -> Running build hook: [flashcache]
  -> Running build hook: [filesystems]
  -> Running build hook: [keyboard]
  -> Running build hook: [fsck]
==> Generating module dependencies
==> Creating gzip initcpio image: /boot/initramfs-linux.img
==> Image generation successful

Finale - fs preparation and reboot

To actually create the initial caching filesystem you’ll have to prepare the SSD drive. Assuming it’s already split into partitions - each one for buffering data from a corresponding real partition, you have to run the flashcache_create app. The details of how to run it and available modes are described in the flashcache-sa-guide.txt file in the repository, but the simplest example is (in my case to create the root partition cache:
flashcache_create -p back cached_system /dev/sdb1 /dev/sda2
which creates a devmapper device called cached_system with fast cache on /dev/sdb1 and backing storage on /dev/sda2.
Now adjust your /etc/fstab to point at the caching devices where necessary, install grub to include the new parameters and reboot. If things went well you’ll be running from the cache instead of directly from the spinning disk.

Was it worth the work?

Learning about initramfs and configuring it by hand - of course - it was lots of fun and I got a ramdisk failing to boot the system only 3 times in the process…
Configuring flashcache - OH YES! It’s a night and day difference. You can check the stats of your cache device by running dmsetup status devicename. In my case after a couple of days of browsing, watching movies, hacking on python and haskell code, I get 92% cache hits on read and 58% on write on the root filesystem. On home it’s 97% and 91% respectively. Each partition is 50GB HDD with 8GB SDD cache. Since the cache persists across reboots, startup times have also dropped from ~5 minutes to around a minute in total.
I worked on SSD-only machines before and honestly can’t tell the difference between them and one with flashcache during standard usage. The only time when you’re likely to notice a delay is when loading a new, uncached program and the disk has to spin up for reading.
Good luck with your setup.

No comments:

Post a Comment