The more I centralized the file storage on my home network, the more vulnerable I became to failures or data corruption on the servers. I wound up building a pretty nice backup solution over time to protect the data I really care about. My experience with the Locky ransomware motivated me to ensure my backups weren’t vulnerable to that kind of malware.
Most of the important data on my home network sits on a single big file server. The server holds all my pictures, music, video, documents, email archives, and databases. Windows machines on the network have their Documents library mapped to a folder on the network drive. Linux machines and Macs on the network can access the shared folders just as easily.
This setup is super convenient, especially when it inevitably comes time to wipe one of the Windows machines and reinstall from scratch to clear out all of the accumulated junk. I don’t have to worry about migrating the data. When it’s time to upgrade hardware, I just remap the Documents library on the new computer to the server, and everything’s right there.
The other big benefit is that it gives me just one place where I need to focus on a beefy storage setup, and just one place where I have all the data I care about protecting.
The primary storage on the server is a RAID-5 array of six 1TB drives, giving me about 5TB of storage. I don’t consider the RAID-5 to be a substitute for backups. It just allows me to handle a single drive failure without down time. But it’s vulnerable to simultaneous failures of two or more drives, failure of the RAID controller, software failures that corrupt the data, and stupid human tricks like rm -rf *.
My Backup System
My nightly backup script uses rdiff-backup and saves its results to a separate 2TB drive on the same server, but connected to a different drive controller. I don’t bother backing up most videos from my DVR. Those take up the majority of the space, which is why a 2TB backup drive is sufficient.
The rdiff-backup software is a fantastic solution as long as you can do everything on a Unix-like system. It appears there is some degree of Windows support, but it is much more recent and less well tested. The big advantage of rdiff-backup in a setup like mine is the ability to restore any older version of any file, including files that were later deleted. The latest version of each file is just a copy of the original, and older versions are stored as minimal diffs. I’ve found it to be very space efficient.
Because I’m putting my backups on a disk inside the same server, I’m still vulnerable to bigger failures that would damage all the drives. So after finishing the rdiff-backup pass, my nightly script updates an encrypted mirror and uploads any changed files from the encrypted set to Google’s cloud.
The script to copy to the Google cloud uses the gsutil tool. At the time I wrote the script, rsync functionality had just been added to gsutil. But when I tried to use it, I found that it was buggy. It would fail to copy files and it would incorrectly delete files in the cloud storage. This was a time-consuming problem when the files it deleted were very large and had taken significant time to upload. Unable to use their rsync functionality, I wound up coding my own minimum-copy logic.
Google also charges for every command executed. Want to know what files are in the cloud? That “gsutil ls” command is going to cost you. Each command may only be a few cents, but they turn into dollars very quickly when you have daily automated scripts. In my case, the cost for the commands just to find out what files were in the cloud started to rival my monthly cost of the storage itself.
I implemented a local list of files, sizes, and modification dates of files I’d pushed to the server, and I rely on that list to know what files have changed or are new. Once every couple of months, I rebuild the local list by querying Google, but there haven’t been any discrepancies. Along the way, my Google copy script grew more complicated with bandwidth limits, optimized gsutil command lines, and other details. The original upload of my encrypted backup took several weeks of nightly chunks. It’s now able to keep up every night.
The entire backup system is pretty much on autopilot and works pretty well. I have a fast, local copy that lets me retrieve old revisions of any file very easily, and a cloud copy should I ever need it. I’ve proven that it works the one time I accidentally wiped out my entire RAID array trying to add the sixth disk.
Ransomware Protection
In general, my design is reasonably protected against ransomware. The main reason is because the backup volume on the server is not accessible from the Windows or Mac machines on the network. A bad case scenario would involve having files on the server encrypted, corrupted, or deleted, and then having those damaged files swept up in the nightly backup, thus overwriting the good backups as well.
But the meta data used by rdiff-backup only exists on the backup volume (and the encrypted copy of the backup volume in the cloud). As long as the malware was prevented from reaching the backup volume, rdiff-backup would be able to restore versions prior to the encryption. What happens if somehow the malware is able to reach the backup volume, and manages to corrupt the rdiff-backup meta data?
In that case, the ability to restore pre-encrypted versions from the local backup volume would be lost. Since both the local encrypted copy and the cloud copy are overwritten with the versions from the backup volume, the damaged meta data would propagate to all locations. It’s unlikely that retrieving the original files would be easy from that state, and may not even be possible.
To protect against this, I added one more layer of protection to my backup script.
Scattered in about 30 locations around my network, I placed files that would allow me to detect if they’d been changed in any way. These sentinel files each have a corresponding signature that must match the contents of the file. If any of the sentinel files are missing or fail the signature check, the backup script aborts.
The sentinel files are checked both on my primary file systems as well as on the backup volume. Because rdiff-backup keeps an exact copy of the latest version of each file, the backups of the sentinel files also pass the signature checks. If, somehow, the backup volume were corrupted, it would not be copied to the Google cloud.
Is It Worth It?
My solution is not perfect or bulletproof. I have four copies of my data (original, backup, encrypted, and cloud), which is enough to give me comfort that I’m protected against any significant single failure. On the other hand, from the right location on my network, with the right credentials, everything is accessible. If my network were to be severely compromised, an attacker could damage things beyond recovery.
One possible next step would be offline backups. Writing to tape, Blu-ray, or external disk drives isn’t difficult, but it does involve manual servicing. I’d prefer something that is fully automated and doesn’t rely on human attention.
Another consideration is the fact that my solution involves custom code. There could be a bug in my code that I may not find out about until it’s too late when I attempt to restore something that wasn’t properly backed up. How comfortable will I be in five years relying on my own old code?
Finally, there’s cost. One of the things that pushed me in this direction was the expense of the various backup services. The fees tend to jump as soon as you start talking about whole networks and Linux servers. I chose the Google cloud because of their attractive storage rates, but the actual invoices are a bit higher than I originally calculated.
More to come
At this point, I think it’s time to revisit the options. Should I continue using my home-grown solution and paying for Google cloud space? Or is one of the services a better choice? What features would I gain or lose by changing? And what would be the cost difference?
In my next post on this topic, I’ll look at a recent Google Cloud invoice and the hard costs of my home-grown solution.