July 16, 2013

backup daemon for OSX (and maybe other Unixes) – updated


Hi2all,

project of this month was a backup daemon. It's just a console application, no GUI, and meant to run whenever my Mac is on. And it is always on. ;-)

Whenever it comes to backups i have a bad feeling: what if i lose everything stored at home? Maybe fire, water is a little bit impossible here, or burglary? A thief won't stop at my Mac but also take the external hard drive and probably all memory sticks as well. And if i hide them very well then it's likely that accessing them is so awkward, that i only update them every now and then, and in case i need it it's probably several months old.

In the domestic most valued computer magazine c't was an article about how to solve this with BoxCrypter and BitTorrent Sync. BoxCrypter to encrypt you files and BT Sync to upload it to a friend, who does the same vice versa.

I tried it with a friend and found this:

If you want to encrypt more than one directory (with it's sub directories) or if you want to encrypt file names as well, you must buy BoxCrypter. Then it provides you with one or more virtual drives which you can use like normal drives, and stores the encrypted data in a directory bundle, which you can backup with BT Sync. But these virtual drives are no full replacements for standard drives, OSX complains here or there about not supported features. And i don't know what happens if you put your user directory into a BoxCrypter volume...

BitTorrent Sync seems to work reliably but it is slow. For unknown reasons it only transmits in bursts and thus uses only a fraction of the already not-so-high upload bandwidth.

After looking at this for two weeks i decided to actually write my own backup daemon.

Kio's backup daemon


The program can be found here:  kio's backup daemon.
A sample config file can be found at  ~/.backup_daemon/config.txt.
If you give it a try please report your experience.

It is now feature-complete and most bugs are eliminated so i think i can offer it for testing to others. If you feel awkward when you think of your backup and the 'worst case', then it may well be worth a try.

The current features are:
  • console application, no GUI. You can make it auto-start whenever you login and hide it in the dock.
  • Backup: synchronize a remote directory with your local directory in regular intervals.
  • File transfer: maintain 'push' and 'pull' folders for automatic transfer of files between you to your friend(s).
  • Encrypted connection, backups are stored encrypted, file and directory names are encrypted as well.
  • Upload speed limiter.
  • Daily snapshots of backups, except if hard links are not supported. (mostly NAS)
  • Include and exclude files or subdirectories from uploading.
  • Upload to a peer over the internet
  • Synchronize a backup folder locally, e.g. a DropBox folder, NAS or external drive.
You need (except for pure local usage):
  • a domain name. Use any dynamic name service you like.
  • an open port, forwarded by your router, if any.
  • a friend
Does not work with:
  • WebDAV shares.
There will be sporadic updates in the next months.

Behavior of sync type "backup"


In this mode the daemon tries to synchronize a local directory with a remote directory. Files which vanished remotely are deleted locally too, modified files or new files are downloaded. Once per day a snapshot is created which contains hard-linked files to other snapshots and the current backup, therefore using only very little additional space on the disk. Files which vanished are moved to the last snapshot if no previous copy exists.

This works well with local disks, but not so good with Network Attached Storage 'NAS', because within the last ten years Apple did not find the time to add a 'hard link' command to the Apple File Protocol 'AFP'. So basically, this does not work with NAS. In this case the backup daemon still creates snapshot directories, but they will only contain the deleted files. Just in case that you need to recreate them.

You can circumvent this problem in the same way as Time Machine does: With Apple's DiskUtility create a SparseImage on the NAS and mount this image locally. Determine the mount point and use this path for your backup destination.

Currently 'backup' folders are polled every 12 hours.

Behavior of sync type "push" / "pull"


In this mode the daemon downloads all new files i finds in the remote folder. Once download is complete, the file is added to a done list which is stored in "~/.backup_daemon/NAME.done".
So you can remove the file from the receiving folder without triggering a second download of this file.
Files which vanish remotely are not deleted locally, so the sender can remove his files at some point as well.

Currently 'pull' folders are polled ever 30 minutes.

Command line arguments


While the daemon can be started just 'as is', it may be started with one of two command arguments as well:

's' (single character 's'): Only the server is started.
'c' (single character 'c'): Only the clients are started.

This can be used to start 2 instances which can be debugged or stopped independently.

Settings in ~/.backup_daemon/config.txt


The backup daemon needs a config file, which must be stored at '~/.backup_daemon/config.txt'. (note: the location has changed.) The tilde '~' indicates your home directory.

All changes to the config file only take effect when the backup daemon starts. So if you update something here you must stop and restart the daemon. For an example also see ~/.backup_daemon/config.txt.

verbose : NUMBER
Defines how much log output will be produced.
Possible values are: 0 (nearly none) to 4 (each file transferred produces a log line)

upload_speed : NUMBER
Defines a speed limit for sent data. Setting this to 0 disables the limit.
The speed is set in bits per second and may be followed by a unit 'k' or 'M'.
For a typical ADSL connection with 1500 kbit/sec upload speed this might be set to 1000k.

num_clients : NUMBER
Defines how many client workers are created. 'clients' handle the receive side of a connection. 4 clients are recommended.

num_servers : NUMBER
Define how many servers are started at most. A server handles the send side of a connection. They are created whenever a client connect to this daemon. If more than N clients try to connect at the same time, only N connections are granted and the others are rejected and will retry later. There should be at least 10 servers allowed.

self : n=MY_NAME  : h=MY_ADDRESS  : p=MY_SERVER_PORT : secret="MY_LOGIN_SECRET"
Defines some settings concerning your own computer.
MY_NAME is your nickname. Any client connecting to you will check this name.
MY_ADDRESS is your static IP address or your static or dynamic server name. Remote clients must know this address to connect to your server.
MY_SERVER_PORT is the socket port which your server uses for incoming connections. Remote clients must know this port address and it must be forwarded in your router, if there is one between your computer and the internet line.
MY_LOGIN_SECRET this is the password which is exchanged and tested with a challenge - response test. Obviously any client must know this too.
The 'self' settings are used to create a 'peer' entry as well, so you can connect connect to yourself to make backups to a local disk.

peer : n=MY_NAME  : h=MY_ADDRESS  : p=MY_SERVER_PORT  : secret="MY_LOGIN_SECRET"
Define the same settings for any peer which is allowed to connect to your server. 

peer : *
Defines that any peer is allowed to connect to your server. But they will only have access to folders which are marked with 'p=*'.

push: n=MY_BACKUP : t=BACKUP : p=HIS_NAME : d="/MY/ROOT/DIR" : s="MY_FOLDER_PASSWORD" : x="EXCLUDED" : i="INCLUDED"
Define a folder, which is exported by your server.
MY_BACKUP is the nickname for this export. This must match a corresponding 'pull' entry in the remote clients' settings.
BACKUP is 'backup' for backup-style exports (directories which you want to backup to your friend's HD) or 'push' for a push folder which transmits anything put in here to your friend(s).
HIS_NAME nickname of your friend. This entry can appear multiple times: once for each friend which shall have access to this folder. If you add a p-setting with value '*' (one character) then this folder is exported to anybody who can connect to your server. For a distribution folder which shall be exported to anybody who knows about your server you must add a peer '*', see above.
MY/ROOT/DIR is the base path to your exported directory.
MY_FOLDER_PASSWORD is a password which is used for encryption. all files and file names are encrypted with this password. If you use this in a 'push' folder then all friends should use this password for decryption in their 'pull' jobs, or they'll only get the encrypted files which is probably useless. If you use this for a backup folder, then don't tell your friend your password. Then your files will be stored encrypted on your friend's HD. 
EXCLUDED contains a partial path of files which shall be excluded from the exported directory listing. Initially all files are included. Any file whose path starts with this string is excluded. Except if the string starts with '*' then this indicates a file type which will be excluded. Any amount of excluded file paths may be defined for an exported folder.
e.g.:
  • x="" excludes all files
  • x="." excludes all hidden files in the root level of this push folder.
  • x="aa" excludes all files starting with "aa"
  • x="a/bb" excludes all files starting with "bb" in folder "a"
  • x="*.txt" excludes all files in all folders which end on ".txt"
INCLUDED contains a partial path of files which shall be included in the exported directory listing. Initially all files are included. Any file whose path starts with this string is excluded. Any amount of included file paths may be defined for an exported folder.
If entries for both included and excluded files match then the longer and more specific entry wins.
e.g.
  • x="" i="Documents" excludes everything except (probably) folder "Documents"
  • i="photos" x="photos/private" exports all 'photos' except (probably) folder 'photos/private'
pull: n=MY_BACKUP : t=BACKUP : p=HIS_NAME : d="/MY/ROOT/DIR" : s="MY_FOLDER_PASSWORD" : x="EXCLUDED" : i="INCLUDED"
This is essentially the same as for 'push' just for the client. If the server exports an encrypted directory and the client has no encryption password set on the pull job, then the directory tree is stored encrypted on his disk. The client may also filter what to download with 'included' and 'excluded' partial paths though normally this won't be used. It is only useful if you want to retrieve only a few files from your backup.

Restore from a backup


What to do when the worst case happens and you need your data back from your friend?

  1. Stop your backup daemon and ask your friend to stop his backup daemon.
  2. Ask your friend to replace the backup pull job by a push job with identical settings, except the path must be extended with "/current_backup".
  3. Replace your backup push job by a pull job with identical settings, except you might want to change the destination path.
  4. Start your backup daemon and ask your friend to do the same.
Modified scenario: You want to restore from a local disk, e.g. a NAS. Then "you" and "your friend" are the same person. ;-)

Modified scenario: You want to restore only certain files. Then add an appropriate 'exclude' and 'include' entry to the pull job.

Modified scenario: Your friend brings your backup on an external drive: Add a push job for the external drive to your config.txt as well and remove the line for the upload limit.


Hints



  • Add the backup daemon as a start item to your login profile.
  • You can test that restoring your backup will work if your friend also adds a push job for the backup folder and you a pull job to a temp directory, eventually with an include/exclude filter.


No comments:

Post a Comment