Duplicity is a backup tool that works off of rsync and rdiff libraries to copy only changes to a backup location. It can use compression and encryption tools on the data and also has the ability to save to Amazon’s S3 service. More details can be found here.
Installation on OpenBSD 4.4
The 4.4 version was the most difficult to get working since the majority of the issues came from the given OpenBSD libraries. Even installing the Duplicity port from the packages didn’t function right.
First we need to add a few packages. You can use the pkg_add function with whatever mirror to obtain the following, some depend on others so there will be others in the file install list:
- python-2.5.2p4
- py-boto-1.3
- gpgme-1.1.5
- librsync-0.9.7
- ncftp-3.2.1
When the main Python package is installed, it will ask you to create a few symbolic links, so create those.
ln -sf /usr/local/bin/python2.5 /usr/local/bin/python
ln -sf /usr/local/bin/pydoc2.5 /usr/local/bin/pydoc
Version 4.4 needs a separate Python XML package to work properly. If it’s not installed, you’ll get a series of errors when trying to send data to S3; I believe the XML error is when it tries to read the response. Something like this will error out:
Traceback (most recent call last):
File "/usr/local/bin/duplicity", line 482, in <module>
with_tempdir(main)
File "/usr/local/bin/duplicity", line 477, in with_tempdir
fn()
File "/usr/local/bin/duplicity", line 468, in main
full_backup(col_stats)
File "/usr/local/bin/duplicity", line 174, in full_backup
col_stats.set_values(sig_chain_warning = None).cleanup_signatures()
File "/usr/obj/ports/duplicity-0.4.12/fake-amd64/usr/local/lib/python2.5/site-packages/duplicity/collections.py", line 476, in set_values
File "/usr/obj/ports/duplicity-0.4.12/fake-amd64/usr/local/lib/python2.5/site-packages/duplicity/backends.py", line 802, in list
File "/usr/local/lib/python2.5/site-packages/boto/s3/bucketlistresultset.py", line 31, in bucket_lister
delimiter=delimiter)
File "/usr/local/lib/python2.5/site-packages/boto/s3/bucket.py", line 205, in get_all_keys
xml.sax.parseString(body, h)
File "/usr/local/lib/python2.5/xml/sax/__init__.py", line 43, in parseString
parser = make_parser()
File "/usr/local/lib/python2.5/xml/sax/__init__.py", line 93, in make_parser
raise SAXReaderNotAvailable("No parsers found", None)
xml.sax._exceptions.SAXReaderNotAvailable: No parsers found
To avoid that, a separate Python XML package needs to be downloaded and installed:
cd /usr/src
wget http://downloads.sourceforge.net/project/pyxml/pyxml/0.8.4/PyXML-0.8.4.tar.gz
tar zxvf PyXML-0.8.4.tar.gz
cd PyXML-0.8.4
python setup.py install
Now we can install Duplicity.
cd /usr/src
wget http://code.launchpad.net/duplicity/0.6-series/0.6.06/+download/duplicity-0.6.06.tar.gz
cd duplicity-0.6.06
python setup.py --librsync-dir=/usr/local build
python setup.py install --prefix=/usr/local
If you run the Duplicity jobs as root in a cron job, there is something about OpenBSD (I’m sure a security issue) that causes it to fail. I would get the output below in my log only when it ran as a cron job:
Traceback (most recent call last):
File "/usr/local/bin/duplicity", line 583, in <module>
with_tempdir(main)
File "/usr/local/bin/duplicity", line 577, in with_tempdir
fn()
File "/usr/local/bin/duplicity", line 558, in main
full_backup(col_stats)
File "/usr/local/bin/duplicity", line 234, in full_backup
bytes_written = write_multivol("full", tarblock_iter, globals.backend)
File "/usr/local/bin/duplicity", line 148, in write_multivol
globals.gpg_profile, globals.volsize)
File "/usr/local/lib/python2.5/site-packages/duplicity/gpg.py", line 240, in GPGWriteFile
bytes_to_go = data_size - get_current_size()
File "/usr/local/lib/python2.5/site-packages/duplicity/gpg.py", line 232, in get_current_size
return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory:'/tmp/duplicity-gM4CN9-tempdir/mktemp-iZknw0-2'
Odd that it can’t read the temporary folder that it created. Changing the folder location also did not work. The solution is to create a separate user for only backups. The can be an issue if you have files that cannot be read by all users and need backup, but I found in my case this worked for the specific files that needed to be saved.
useradd -m -d /home/dpbackup -c 'Duplicity' dpbackup
usermod -G nogroup dpbackup
mkdir /home/dpbackup/log
Make sure to add the new user to the deny list in SSH with DenyUsers dpbackup in the file /etc/ssh/sshd_config; there isn’t any reason for it to log in.
Now su as this new user. A GPG key needs to be created so that the compressed backups can be encrypted and signed. This way no one else that may have access to our S3 account (Amazon employees) can read the data.
su dpbackup
$ cd
$ gpg --list-keys
gpg: directory `/root/.gnupg' created
gpg: new configuration file `/root/.gnupg/gpg.conf' created
gpg: WARNING: options in `/root/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/root/.gnupg/pubring.gpg' created
gpg: /root/.gnupg/trustdb.gpg: trustdb created
$ gpg --gen-key
There will be a series of questions, most of the defaults are fine.
- Choose option 1 for DSA and Elgamal (the default)
- Choose the default key size of 2048
- Leave the default that the key will not expire, option 0
- Enter a User ID, Email address, and comment for the key.
- Type O for OK to accept.
- Enter a long passphrase for the key and allow it to be generated. I usually do at least 20 characters since the password will just sit in a script anyway.
Move the keys to some other safe place so that they can’t be lost. No key means the backups are worthless. Typically a second backup source is a good idea.
$ tar cf gpg_keys.tar .gnupg/
$ chmod 600 gpg_keys.tar
See sample scripts below for backup jobs.
Installation on OpenBSD 4.4
The 4.4 version was the most difficult to get working since the majority of the issues came from the given OpenBSD libraries. Even installing the Duplicity port from the packages didn’t function right.
First we need to add a few packages. You can use the pkg_add function with whatever mirror to obtain the following, some depend on others so there will be others in the file install list:
- python-2.5.4p1
- py-xml-0.8.4p8
- py-boto-1.7a
- gpgme-1.1.5p0
- librsync-0.9.7p0
- ncftp-3.2.2
When the main Python package is installed, it will ask you to create a few symbolic links, so create those.
ln -sf /usr/local/bin/python2.5 /usr/local/bin/python
ln -sf /usr/local/bin/python2.5-config /usr/local/bin/python-config
ln -sf /usr/local/bin/pydoc2.5 /usr/local/bin/pydoc
Now we can install Duplicity.
cd /usr/src
wget http://code.launchpad.net/duplicity/0.6-series/0.6.06/+download/duplicity-0.6.06.tar.gz
cd duplicity-0.6.06
python setup.py --librsync-dir=/usr/local build
python setup.py install --prefix=/usr/local
If you run the Duplicity jobs as root in a cron job, there is something about OpenBSD (I’m sure a security issue) that causes it to fail. I would get the output below in my log only when it ran as a cron job:
Traceback (most recent call last):
File "/usr/local/bin/duplicity", line 583, in <module>
with_tempdir(main)
File "/usr/local/bin/duplicity", line 577, in with_tempdir
fn()
File "/usr/local/bin/duplicity", line 558, in main
full_backup(col_stats)
File "/usr/local/bin/duplicity", line 234, in full_backup
bytes_written = write_multivol("full", tarblock_iter, globals.backend)
File "/usr/local/bin/duplicity", line 148, in write_multivol
globals.gpg_profile, globals.volsize)
File "/usr/local/lib/python2.5/site-packages/duplicity/gpg.py", line 240, in GPGWriteFile
bytes_to_go = data_size - get_current_size()
File "/usr/local/lib/python2.5/site-packages/duplicity/gpg.py", line 232, in get_current_size
return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory:'/tmp/duplicity-gM4CN9-tempdir/mktemp-iZknw0-2'
Odd that it can’t read the temporary folder that it created. Changing the folder location also did not work. The solution is to create a separate user for only backups. The can be an issue if you have files that cannot be read by all users and need backup, but I found in my case this worked for the specific files that needed to be saved.
useradd -m -d /home/dpbackup -c 'Duplicity' dpbackup
usermod -G nogroup dpbackup
mkdir /home/dpbackup/log
Make sure to add the new user to the deny list in SSH with DenyUsers dpbackup in the file /etc/ssh/sshd_config; there isn’t any reason for it to log in.
Now su as this new user. A GPG key needs to be created so that the compressed backups can be encrypted and signed. This way no one else that may have access to our S3 account (Amazon employees) can read the data.
su dpbackup
$ cd
$ gpg --list-keys
gpg: directory `/root/.gnupg' created
gpg: new configuration file `/root/.gnupg/gpg.conf' created
gpg: WARNING: options in `/root/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/root/.gnupg/pubring.gpg' created
gpg: /root/.gnupg/trustdb.gpg: trustdb created
$ gpg --gen-key
There will be a series of questions, most of the defaults are fine.
- Choose option 1 for DSA and Elgamal (the default)
- Choose the default key size of 2048
- Leave the default that the key will not expire, option 0
- Enter a User ID, Email address, and comment for the key.
- Type O for OK to accept.
- Enter a long passphrase for the key and allow it to be generated. I usually do at least 20 characters since the password will just sit in a script anyway.
Move the keys to some other safe place so that they can’t be lost. No key means the backups are worthless. Typically a second backup source is a good idea.
$ tar cf gpg_keys.tar .gnupg/
$ chmod 600 gpg_keys.tar
See sample scripts below for backup jobs.
Installation on Debian Lenny 5.0
The Debian install is a little bit simpler and can run the backup job as root inside cron. Get some install packages first:
apt-get install python python-dev librsync-dev python-boto
Install Duplicity:
cd /usr/src
wget http://code.launchpad.net/duplicity/0.6-series/0.6.06/+download/duplicity-0.6.06.tar.gz
tar zxvf duplicity-0.6.06.tar.gz
cd duplicity-0.6.06
python setup.py build
python setup.py install
Creating a user is optional, but good security practice for it not to be root.
useradd -m -d /home/dpbackup -c 'Duplicity' dpbackup
mkdir /home/dpbackup/log
Make sure to add the new user to the deny list in SSH with DenyUsers dpbackup in the file /etc/ssh/sshd_config; there isn’t any reason for it to log in.
Now su as this new user. A GPG key needs to be created so that the compressed backups can be encrypted and signed. This way no one else that may have access to our S3 account (Amazon employees) can read the data.
su dpbackup
$ cd
$ gpg --list-keys
gpg: directory `/root/.gnupg' created
gpg: new configuration file `/root/.gnupg/gpg.conf' created
gpg: WARNING: options in `/root/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/root/.gnupg/pubring.gpg' created
gpg: /root/.gnupg/trustdb.gpg: trustdb created
$ gpg --gen-key
There will be a series of questions, most of the defaults are fine.
- Choose option 1 for DSA and Elgamal (the default)
- Choose the default key size of 2048
- Leave the default that the key will not expire, option 0
- Enter a User ID, Email address, and comment for the key.
- Type O for OK to accept.
- Enter a long passphrase for the key and allow it to be generated. I usually do at least 20 characters since the password will just sit in a script anyway.
Move the keys to some other safe place so that they can’t be lost. No key means the backups are worthless. Typically a second backup source is a good idea.
$ tar cf gpg_keys.tar .gnupg/
$ chmod 600 gpg_keys.tar
See sample scripts below for backup jobs.
Sample Backup Scripts
The first portion of the script defines the variables we’ll need to use. The AWS keys are defined for you when you sign up for S3. Passphrase is the GPG passphrase set on the key generated from gpg –gen-key. The S3 bucket should be fairly unique, so I use the host name of the server. The others are pretty obvious but will be explained later.
#!/bin/sh
# Variables
export AWS_ACCESS_KEY_ID=ABABAB3333338888WWWW
export AWS_SECRET_ACCESS_KEY=BBBBBBBBBBTTTTTTTTTT8888888888VVVVVVVVVV
export PASSPHRASE=somelongpassphrase
DBHOST='dbserver1'
TIMESTAMP=`date +%m%d%Y%H%M`
FILE_PREFIX_DB='mydb_'
FILE_PREFIX_SVN_REPO='repo_'
GPG_PUB_KEY='AAEE66BB'
BACKUP_LOG_FILE='/home/dpbackup/log/s3_backup.log'
FULL_IF_OLDER_THAN='7D'
KEEP_MAX_SETS='2'
S3_BUCKET='serverhostname'
CURRENT_HOST='server-hostname'
TO_EMAIL='sysadmin@example.com'
Just some sample backup methods for MySQL or Subversion if needed.
/usr/local/bin/mysqldump -h $DBHOST -u mysql_admin -pmypass mydb > /home/dpbackup/mysql/$FILE_PREFIX_DB$TIMESTAMP.sql
/usr/local/bin/svnadmin dump /home/svn/repo > /home/dpbackup/svn/$FILE_PREFIX_SVN_REPO$TIMESTAMP.svnbk
This is only necessary on OpenBSD since it’s a security feature. We open it up now from 128 and close it back down later.
# Increase open file limit
ulimit -n 1024
Most of these options can be read in the man page of Duplicity, and there are many more to choose from. Basically this backup is going to do a full backup ever 7 days (from the $FULL_IF_OLDER_THAN variable), and use encryption with the highest bzip compression, before sending it to S3. It will write a fresh backup log to the defined file, which we’ll email out later.
# Backup to S3
/usr/local/bin/duplicity --s3-use-new-style --tempdir /home/dpbackup --full-if-older-than $FULL_IF_OLDER_THAN --encrypt-key "$GPG_PUB_KEY" --sign-key "$GPG_PUB_KEY" --gpg-options='--compress-algo=bzip2 --bzip2-compress-level=9' --include /etc/apache2 --include /home/dpbackup/svn --include /home/dpbackup/mysql --exclude '**' / s3+http://$S3_BUCKET > $BACKUP_LOG_FILE
This line just gives us some space in the log file; really it’s just for email formatting.
# Separate the log file a bit
echo -e '\n\n==== REMOVE OLD BACKUP SETS ====\n\n' >> $BACKUP_LOG_FILE
This command will check how many full backup sets are already on S3, and remove any more than what is defined in KEEP_MAX_SETS.
# Clean out backup sets older than variable sets
/usr/local/bin/duplicity remove-all-but-n-full $KEEP_MAX_SETS s3+http://$S3_BUCKET >> $BACKUP_LOG_FILE
Again, for formatting purposes.
# Separate the log file a bit
echo -e '\n\n==== CURRENT FILES IN BACKUP SET ====\n\n' >> $BACKUP_LOG_FILE
This command lists out the current files in our backup set so they can be reviewed in the email, making sure everything is working out it should.
# List all files in backup set for verification
/usr/local/bin/duplicity list-current-files s3+http://$S3_BUCKET >> $BACKUP_LOG_FILE
Now we can mail out the log file. The -s flag is for the subject line, and the TO_EMAIL is defined in our variables. We’re just writing the log file as the body of the email.
# Mail out log to sysadmins for verification
mail -s "$CURRENT_HOST Backup Log for $TIMESTAMP" $TO_EMAIL < $BACKUP_LOG_FILE
Since we exported the keys and passphrases, we want to make sure we don’t leave those around any longer than we have to; set them null.
# Clear secret variables
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=
Just a little clean up so we don’t waste space.
# Remove old and temporary files
rm /home/dpbackup/mysql/*
rm /home/dpbackup/svn/*
This is for OpenBSD only. Since we opened the open file limit up at the beginning of the script, close it back down.
# Put open file limit back to default
ulimit -n 128
End it.