Make a incremental backup

This is to introduce how to backup your data the rsync-time-backup script. It is a good idea to backup your valuable data. It better include snapshots so that you can find your file that you accidentally deleted, say, three weeks ago. It should also be incremental so you do not occupy extra space for the same file.

How-to

Setting up the SSH key agent

If you are backing up from/to a remote machine, you will need to login multiple times. It’s better to set up the SSH key agent before you start backing up.

Install the script

The backup program is a shell script. You should be able to install it in you user directory on any of the HPC servers. You can get it from the Github repo or just download the script file itself. 

To install it, put it into some directory like [shell]~/.local/bin/[/shell] and make sure the directory is in your [raw]PATH[/raw] environmental variable.

Do the backup

Once you install the script. You can make backups like this.
[raw] rsync_tmbackup.sh backup_target backup_location
[/raw] The backup target and location can be both local directories like [raw]/home/yunqi/work[/raw] or remote ones like [raw]yunqi@brosnan:~/work_backup[/raw]

The first time you run the backup, you will get something like
[raw] Safety check failed – the destination does not appear to be a backup folder or drive (marker file not found).
[/raw] It should be fine, just run the command it gives which creates the backup folder, and run the backup script again.

What you’ll get

if you look into your backup folder, you’ll probably see something like this 

[shell] total 8
drwxr-xr-x 16 yunqi teoroo 4096 Oct 20 10:16 2018-11-09-152638
drwxr-xr-x 16 yunqi teoroo 4096 Oct 20 10:16 2018-11-09-161817
-rw-r–r– 1 yunqi teoroo 0 Nov 9 15:26 backup.marker
lrwxrwxrwx 1 yunqi teoroo 17 Nov 9 16:18 latest -> 2018-11-09-161817
[/shell]

A sub-folder is created every time you backup, which is a snapshot of your target. You can delete any snapshot, other snapshots will not be affected.

Tip(s)

You may want to backup regularly,  so it’s probably a good idea to add some alias like this.

[shell] # Backup my work directory to Teoroo2
alias back_work=’rsync_tmbackup.sh ~/work/ yunqi@teoroo2.kemi.uu.se:/home/yunqi/backups/work_at_rackham’
[/shell]

The mechanism

Hard link

rsync-time-backup uses hard link for the same file at different snapshots. It is safe to delete one snapshot, and the others shall not be affected. A file is stored on the disk until all the snapshots including it were deleted. 

https://en.wikipedia.org/wiki/Hard_link

Drawbacks

Large files

The backup script is not currently smart enough to notice that you have moved your file. So it will create multiple copies of the same file when you move files or just rename your folders. Watch out your disk space occupancy if you are constantly moving your big files (MD trajectories, charge density files) around.

Remember to specify your username

The backup script is (again) not currently smart enough to know that your can have your user name in you ssh config file. In order for the script to understand a remote address, you can not omit your user name. 

Comments are closed.