Tips & Tricks

SSH Backend

By combining S3QL’s local backend with sshfs, it is possible to store an S3QL file system on arbitrary SSH servers: first mount the remote target directory into the local filesystem,

sshfs user@my.server.com:/mnt/s3ql /mnt/sshfs

and then give the mountpoint to S3QL as a local destination:

mount.s3ql local:///mnt/sshfs/myfsdata /mnt/s3ql

Permanently mounted backup file system

If you use S3QL as a backup file system, it can be useful to mount the file system permanently (rather than just mounting it for a backup and unmounting it afterwards). Especially if your file system becomes large, this saves you long mount- and unmount times if you only want to restore a single file.

If you decide to do so, you should make sure to

  • Use s3qllock to ensure that backups are immutable after they have been made.

  • Call s3qlctrl backup-metadata right after a every backup to make sure that the newest metadata is stored safely.

Improving copy performance

Note

The following applies only when copying data from an S3QL file system, not when copying data to an S3QL file system.

If you want to copy a lot of smaller files from an S3QL file system (e.g. for a system restore) you will probably notice that the performance is rather bad.

The reason for this is intrinsic to the way S3QL works. Whenever you read a file, S3QL first has to retrieve this file over the network from the backend. This takes a minimum amount of time (the network latency), no matter how big or small the file is. So when you copy lots of small files, 99% of the time is actually spend waiting for network data.

Theoretically, this problem is easy to solve: you just have to copy several files at the same time. In practice, however, almost all unix utilities (cp, rsync, tar and friends) insist on copying data one file at a time. This makes a lot of sense when copying data on the local hard disk, but in case of S3QL this is really unfortunate.

The best workaround that has been found so far is to copy files by starting several rsync processes at once and use exclusion rules to make sure that they work on different sets of files.

For example, the following script will start 3 rsync instances. The first instance handles all filenames starting with a-f, the second the filenames from g-l and the third covers the rest. The + */ rule ensures that every instance looks into all directories.

#!/bin/bash

RSYNC_ARGS="-aHv /mnt/s3ql/ /home/restore/"

rsync -f "+ */" -f "-! [a-f]*" $RSYNC_ARGS &
rsync -f "+ */" -f "-! [g-l]*" $RSYNC_ARGS &
rsync -f "+ */" -f "- [a-l]*" $RSYNC_ARGS &

wait

The optimum number of parallel processes depends on your network connection and the size of the files that you want to transfer. However, starting about 10 processes seems to be a good compromise that increases performance dramatically in almost all situations.

S3QL comes with a script named pcp.py in the contrib directory that can be used to transfer files in parallel without having to write an explicit script first. See the description of pcp.py for details.