Howto: etherpad backups with git and cron

Many public etherpad instances delete pads after a certain time period. Instances can become flaky, delete pads at random or even vanish completely. Sometimes malicious users delete everything you typed and the timeline feature screws up so you can’t get it back. So I wrote a script to grab copies of an etherpad page (or similar) and keep track of changes using git. I run it from cron when I’m working on an important pad to make sure I don’t lose anything.

The first version looks like this:

#!/bin/bash
wget -N --no-check-certificate -nd -P ~/etherpad-saver/ https://pad.riseup.net/p/B7VT41OAfOe3/export/txt
(
    cd ~/etherpad-saver/
    git update-index -q --refresh
    if ! git diff-index --quiet HEAD --; then
        git add .
        git commit -a -m"Change detected - automatic commit"
    fi
)

Hideous, I know, but it scratches the itch for now. I’ve posted it up on github with some more comprehensive instructions. I may get round to removing the hard-coding and making it easier to use later. In the mean-time, if you fancy adding some improvements, do submit a pull request 🙂