Remote Backups with Borg
I want efficient backups, and I want control!
Once upon a time, I used Ubuntu as my main desktop OS, but despite how much I enjoyed tinkering then, these days I want a desktop that just works, so my personal computers are all largely Windows based.
There’s just one problem - in my opinion, there is no better OS for development than Linux.
The command line interface is much more flexible and easier to use, and the installation of dependencies and tooling is far less effort thanks to package managers and a sensible $PATH
.
Therefore, I do the majority of my development in a Linux (or Linux like) environment - whether that be on a remote server, a local virtual machine, or on the fantastic Windows Subsystem for Linux (WSL).
WSL is proving to work really well for me, so I set out looking for a backup tool that I could use to backup my personal files on my Windows machine from WSL (and elsewhere).
For my backup tool, I came up with a basic list of requirements:
- Open source
- Multi-platform
- Actively maintained
- Command line interface
- Good support for remote backups
In the simplest case, I could just upload those files to some cloud storage service, but storing things online means I have two more requirements:
-
Client side encryption:
No cloud provider is infallible, so if I’m going to be backing up private data, I don’t want a compromised host to result in all of my data falling into the hands of an attacker.
-
Deduplication (& compression):
Cloud storage is expensive, so I’d like to keep data usage low. My upload bandwidth is also very low, so anything that can reduce the amount of data I need to upload would be ideal.
-
No vendor lock-in
I want to be able to move my backups to a different hosting provider, without having to replace the backup tool. Ideally, migration between storage locations should be as low overhead as possible.
In the end, I found two tools that seemed to tick (most of) these boxes: BorgBackup, a Python based tool, and restic, written in Go.
- Borg doesn’t run natively on Windows, but works fine under WSL, which is where I’d prefer to interact with it anyway.
- Restic doesn’t support compression, although compression may not have a huge impact depending on the makeup of your data.
- Restic is a lot more flexible in terms of cloud storage, whereas Borg is simpler.
In the end, I decided on Borg, which seemed to have more of an emphasis on reducing network usage, and I didn’t really need the flexibility of restic’s cloud storage support - a remote filesystem over SSH was enough for me. Really, either would have suited my needs.
Cloud Storage for Borg
With the backup tool chosen, I then needed somewhere to host the data.
To work with borg, you’ll need an SSH server upon which you can run borg
in server mode.
This could be a general purpose VPS, or it could be a specialised storage provider.
In the latter case, you will either need the ability to upload and run your own executables, or the hosting provider will need to provision borg
for you.
I decided to look at specialised storage providers with specific borg support, from which I chose to use rsync.net’s special borg accounts1. The special borg accounts offer a cut price do-it-yourself approach to remote storage, in a very Linux friendly manner. There are a few restrictions in comparison to their normal account offering, with the biggest drawback being that you’re unable to create additional user accounts, which might be desirable if you were backing up multiple systems and wanted to segregate permissions (although this can be alleviated by the use of multiple SSH keys and restrictions in authorized-keys).
I’m a big fan of rsync.net’s no-nonsense offering. You pick your desired quota, you pick one of their 4 hosting locations (San Diego, Denver, Zurich, or Hong Kong), and you pay annually either via credit card or PayPal. You can easily change your quota in future if necessary, and you’re also given a buffer of an extra 10% space in case you haven’t given yourself quite enough room, so you’ve got a chance to finish your backups before reaching for the credit card. Pricing is incredibly simple, you just pay per GB in your quota - no base service cost, and no exorbitant transfer fees like you pay with AWS.
You’ll be assigned a unix username and password, which are separate to the credentials you’ll use to login to the account management interface on their website. With your unix username and password, you’re able to run a minimal set of commands on the filestore (although not with an interactive shell), including updating your password and setting allowed SSH public keys. From the web interface, you can then disable password login entirely, to increase the security of your filestore. Login to the account management interface can also be protected with TOTP based 2FA.
There’s a small list of commands (such as quota
, to check your usage) that are enabled for your user account, as well as borg
and borg1
, which provide legacy borg 0.x and the current stable borg 1.x release.
Unless you’re dealing with a pre-existing repository, you probably want to use borg1
.
Using Borg to Backup over SSH
In the following examples, I’ll be using Borg in Ubuntu for Windows to backup my documents, photos, and videos. The steps should be exactly the same regardless of your chosen distribution and whether you’re running from inside Windows or on an actual Linux box.
Make sure you’ve got key based login configured for SSH - this is highly recommended both for convenience and security.
Setup
First, install borg either using your package manager, or if you want to use the very latest version, then either install from source via pip
, or grab one of the standalone binaries from the GitHub releases page.
sudo apt install borgbackup
If you are not using a dedicated borg provider, you will need to repeat the borg installation steps on the remote machine.
Next, load your SSH key:
eval `ssh-agent`
ssh-add
You can test your SSH setup with a quick command, here I run quota
to get an idea of my remaining disk allowance and check I’ve loaded the correct SSH key:
$ ssh 12345@ab-cXXX.rsync.net "quota"
Disk Quotas for User 12345
Filesystem Usage SoftQuota HardQuota Files
data1 196.824G 267G 294G 4897
You should see output detailing the amount of space allocated to your account.
If you’re using rsync.net, you will need to set the BORG_REMOTE_PATH
environment variable to tell Borg that it must invoke borg1
on the remote end, rather than the legacy borg
.
export BORG_REMOTE_PATH=borg1
# Optionally, set it on login by adding to your .profile, .bashrc, etc
echo "export BORG_REMOTE_PATH=borg1" >> .profile
Creating a Remote Borg Repo
With borg installed and Now that you have passwordless SSH login working, you can now create an empty Borg repository. Choices made at this stage are important, as it determines what level of encryption your repo will use. This cannot be changed later without redoing your backups from scratch. This is also where you’ll choose the location for your backups to be stored in - the path is supplied in the familiar SSH format that you will recognise from other commands.
Firstly, run borg init
, to create an empty repo:
borg init --encryption=keyfile-blake2 --make-parent-dirs 12345@ab-cXXX.rsync.net:path/to/new/borgrepo
--encryption=keyfile-blake2
- this flag is mandatory, and determins which encryption and authentication settings to use.
In this example, we select the BLAKE2b hashing algorithm for authentication.
BLAKE2b was a finalist for the SHA-3 standard, and is designed with stricter security requirements than SHA-256, our other option.
It also has the benefit that, for architectures without hardware accelerated SHA-256 support, BLAKE2b is faster.
The keyfile
part of the flag tells Borg that we want our backups to be encrypted (there is no choice of algorithm: if encryption is enabled, it is AES based).
keyfile
also means that we want Borg to store the encryption key in a seperate file on your local machine, whereas the repokey
variant tells Borg to store the key on the remote server alongside the repo itself.
In both cases, the encryption key is password protected, but in the keyfile
case an attacker would need to steal a copy of the encrypted key file from your local machine as well as learn your password - so there is an extra layer of security.
Of course, the downside here is that if you don’t take care of the keyfile, and it gets lost or corrupted, you will never be able to restore your backups.
Use keyfile mode for the highest security, but be prepared to then need to deal with how to store backups of the keyfile (and no, you cannot backup the keyfile in your Borg repo!).
--make-parent-dirs
- similar to mkdir -p
, this will create all of the directory components in the path/to/new/borgrepo
if they don’t already exist.
12345@ab-cXXX.rsync.net:path/to/new/borgrepo
- this positional parameter defines where the repo should be stored, in the familiar form of username@hostname:path
.
You can easily move the repository in future, so you don’t need to think too hard about this - but I’d recommend going with a more generic name, in case you decide to expand the scope of your backups in future.
Run this command, and borg
will prompt you to enter a passphrase with which to protect your encryption key.
It is crucial that you pick something suitably long and complex, but it is equally crucial that you select something memorable and/or have some way of storing the passphrase such that you cannot forget it.
Without this passphrase and the keyfile, you do not have a backup: you have a useless random blob of binary that’s costing you 1.5 cents/GB/month to host.
Improving Borg Ease-of-Use
When you’re making backups, if you’re anything like me you want to get some insight into the status of the repo: how many backups you’ve made; how much storage you’ve used; how much data each backup has added to the repo.
Because of this, you’re going to want to run more than one borg
command on the remote repo at a time, and if you’ve picked a suitably long password, there can be a lot of typing and a lot of opportunity for typos.
Luckily, Borg supports a number of environment variables, two of which we want to add alongside our BORG_REMOTE_PATH
definition specified earlier.
The first sets the default value of the repository location, meaning you don’t need to keep supplying the address of your server:
export BORG_REPO="12345@ab-cXXX.rsync.net:path/to/new/borgrepo"
Again, add this line to your .profile
or similar if you don’t plan on using multiple Borg repositories.
When a command requires the repository, you can use ::
in it’s place, and it will expand to the value of BORG_REPO
.
The next environment variable we may want to set is BORG_PASSPHRASE
.
As you might expect, this stores the passphrase required to unlock your repository’s encryption key.
When setting this variable, we don’t want to expose the passphrase in our .bash_history
(or in your profile!).
You could potentially make use of Bash’s HISTCONTROL=ignorespace
, but an even better solution is to use read
's built-in secret mode, that will hide your input similar to a sudo
password prompt:
# This will allow you to secretly enter the passphrase, and will save it to a shell variable $BORG_PASSPHRASE
read -s -p "Borg passphrase: " BORG_PASSPHRASE
# In order for the variable to be visible to child processes (i.e. borg), we must export it (but don't leave the passphrase in your env once done!)
export BORG_PASSPHRASE="$BORG_PASSPHRASE"
With these variables set (in addition to BORG_REMOTE_PATH
as covered earlier) and passwordless SSH ready, we can now start interacting with the repo easily.
After setting these variables, try running the following command to see if your configuration is correct:
borg info ::
If you see some stats, and don’t get any error messages or prompts asking for passwords, you’re ready to start using your repo.
Creating a Backup
The following command shows an example of creating a backup:
borg -p create -s --list -x -c 900 -C zstd ::my_first_backup path/to/my/files/
-p
- print progress information
-s
- print statistics after creating the backup
-x
- do not cross filesystem boundaries
-C zstd
- this sets the compression algorithm to use - in this case, zstandard, a modern algorithm that is fast without compromising on compression ratio.
See the full list of options with
borg help compression
. -c 900
- this sets the checkpoint interval for the backup process to 900 seconds (15 minutes). In the event that the connection between the client and server is lost while a backup is in progress, only data uploaded since the last checkpoint will need to be repeated. Depending on the quality of your connection, you may wish to leave this as the default, or decrease it further - though note that writing checkpoints will slow down the backup process, so you will need to find a suitable balance that works for you.
::my_first_backup
- this provides the repo location and backup name to create.
As we have already set the
BORG_REPO
environment variable, we can `useskip providing the location and thus we only provide the name of the backup. Borg will store the creation time along with this name, so pick a descriptive name that describes what you’ve included (e.g. the device name and the type of content you’re backing up). path/to/my/files/
- the final positional arguments to
borg create
provide the paths of the directories you want to backup.
Run this command, and your backup will begin. Depending on the arguments provided, Borg should show you some output detailing either which files have been processed, and/or how much data has been processed, encrypted, compressed, and transferred.
Depending on your use case, future backups of the same data should be much faster. Borg will break the input files into chunks, and check for the presence of those chunks in the remote repo. If a chunk of data already exists on the remote, then Borg will simply write to the index without needing to upload the data again.
Interacting with the Repo
Once you’ve made a backup, you’ll want to get familiar with how to interact with backups.
First off, you can check the current overall stats of your repo using borg info
:
borg info ::
This will print a summary of your repo, including the size on disk on the remote server after both compression and deduplication.
You can see the entire list of backups currently stored in the repo using borg list
(I like to date my backups, but you can see this isn’t necessary):
# Note I've truncated the output here for brevity!
$ borg list ::
20181106_first Sat, 2018-11-17 09:36:53 [4463570c6e0263671ab94b...]
20181120_camera Tue, 2018-11-20 22:18:15 [b620f64f2308bb0813c0bb...]
[...]
This will print a list of all backups, along with the date and time that they were initiated.
You can then use borg info
and borg list
to view more in-depth information about individual backups:
# Show metadata of a specific backup
$ borg info ::20181106_first
Archive name: 20181106_first
Archive fingerprint: 4463570c6e0263671ab94b...
Comment:
Hostname: GEORGEPC
Username: george
Time (start): Sat, 2018-11-17 09:36:53
Time (end): Sat, 2018-11-17 15:18:16
Duration: 5 hours 41 minutes 23.04 seconds
Number of files: 25734
Command line: /usr/bin/borg -p create -s -x -c 900 -C zstd ::20181106_first files/
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 131.22 GB 128.85 GB 1.04 MB
All archives: 3.70 TB 3.58 TB 206.33 GB
Unique chunks Total chunks
Chunk index: 120082 2065567
# List (a snippet of) the files inside:
$ borg list ::20181106_first | head -n 5
drwxrwxrwx george george 0 Tue, 2018-11-06 20:20:32 files
drwxrwxrwx george george 0 Tue, 2018-10-23 22:06:34 files/cam
-rwxrwxrwx george george 294295 Tue, 2016-05-10 12:57:48 files/cam/bath/893.jpg
-rwxrwxrwx george george 298811 Tue, 2016-05-10 12:57:48 files/cam/bath/894.jpg
-rwxrwxrwx george george 307632 Tue, 2016-05-10 12:57:48 files/cam/bath/895.jpg
The former command will display the summary of disk usage and other metadata for the given backup, while the latter will list every file contained within the repo, along with it’s permissions, size, and modification time.
If you want to restore a file (or an entire backup), then simply use borg extract
to restore files into the current directory from a specified backup name:
borg extract --list ::my_first_backup
# Optionally, selecting only certain paths from the backup
borg extract --list ::my_first_backup path/to/restore
Dealing with Disconnections
If you’re stuck with a connection like mine, then running a backup with a large amount of new data is going to take a long time. If your connection drops during a backup, the data uploaded up to the point of the last checkpoint will be saved, but anything after that point will not have been backed up.
Incomplete backups will result in your repo having backups listed with a .checkpoint suffix on the name2:
$ borg list ::
[...]
20200309_filming.checkpoint Mon, 2020-03-09 00:02:12 [36de8c34ebdf279805d8d7...]
20200309_filming.checkpoint.1 Mon, 2020-03-09 23:26:21 [b215629278ca5c29702c88...]
20200309_filming.checkpoint.2 Tue, 2020-03-10 07:18:38 [bfb1b5a18e1f3ddf5bbc29...]
20200309_filming.checkpoint.3 Tue, 2020-03-10 19:19:02 [ee149f7c4ff8c48da70092...]
20200309_filming.checkpoint.4 Wed, 2020-03-11 00:04:29 [66b9da54bf5700b42e3acf...]
[...]
These checkpoints are normal backups. You can interact with them as if they were a complete backup, e.g. to restore files if you’re in a tight spot. The only difference being that the .checkpoint suffix signifies that the backup ended before the entire set of source directories was written to it.
One of the beautiful results of Borg’s design is that there is no special command required to resume a backup. Simply run the same command again, and Borg will restart the process - but by virtue of the deduplication support, any files that were saved in the checkpoint will not need to be reuploaded3.
Once your backup has successfully completed, you should see it in the borg list
output with the expected name alongside the checkpoints.
The checkpoints will use a negligible amount of disk space, thanks to the deduplication. If you want to clean up the repository, you can delete them explicitly with borg delete
:
$ borg delete -s ::20200309_filming.checkpoint.1
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
Deleted data: -131.45 GB -128.07 GB -210.57 kB
------------------------------------------------------------------------------
All archives: 3.43 TB 3.33 TB 206.33 GB
Unique chunks Total chunks
Chunk index: 119971 1929571
Note the negligible Deduplicated size
- this shows that the checkpoint we deleted was consuming only a couple hundred kilobytes of its own accord.
Alternatively, the borg prune
command can be used to automatically clean up both checkpoints and backups based on age.
Wrap-Up
Hopefully you found that intro to Borg useful. I’ve found it to be very easy to use, and very helpful for minimising storage and bandwidth usage.
If you want to make use of Borg yourself, I can highly recommend rsync.net for cheap and reliable hosting - but you could use any SSH server with a decent amount of storage space.
If you do decide you want to use Borg for backups, I’d also recommend looking into borgmatic. This is a wrapper for Borg that allows you to configure your Borg usage in a YAML configuration file, which is ideal if you want to automate your backups with a cron job (or systemd timer). Using borgmatic, you won’t need to remember every borg command invocation each time you want to refresh your backups!
-
One of the best feature’s of rsync.net’s borg offering is that the account is just one of their normal fileservers. There’s no obligation to use Borg, so you could also use it to upload one-off copies of files or archives with
scp
, sync directory structures withrsync
, backup your code withgit
, mount it as a filesystem withsshfs
, browse it with your favourite SFTP GUI…Alternatively, check out BorgBase for a much more guided approach to Borg hosting. ↩︎
-
This is only about half of the checkpoints I ended up with, please send fibre! ↩︎
-
There is a slight drawback to this approach - if you’re running on a particularly large source directory, it can take some time for Borg to walk the directory tree and verify that all of the files are already in the remote repository. If this becomes a problem, I’d suggest breaking down your backups into smaller chunks. ↩︎