Backups Overview
Date: April 2022
Date: April 2022
Date: April 2022
System Backups
Retention: 30 days
Schedule: Full back up monthly; incremental backup daily
System level backups are coordinated by the `backup-director.sh` script
on `lib-infra1` that runs daily at 02:00. This triggers individual backups
on each system listed in `/usr/local/etc/hosts.backup`.
System backups use `duplicity` to backup filesystems. Backups are pushed to
```
s3://osul-backup/servers/$hostname/$filesystem/
```
Systems with additional disks grab their disk list from
`/usr/local/etc/backups.conf` and are backed up following the same pattern
as other filesystems.
Backup information is stored in the `synetman.s3_backups` table on
`lib-admin1` in MySQL. Statistics for each backup are stored in the
`synetman.s3_backups_stats` table.
Full backups run once per month. Other days are incremental changes since the
previous backup. We keep one full backup and all incrementals since the last
full backup, for effectively one month of data retention.
```
# s3cmd ls s3://osul-backup/servers/lib-web2/
DIR s3://osul-backup/servers/lib-web2/data0/
DIR s3://osul-backup/servers/lib-web2/etc/
DIR s3://osul-backup/servers/lib-web2/log0/
DIR s3://osul-backup/servers/lib-web2/root/
DIR s3://osul-backup/servers/lib-web2/var/
Database Backups
Retention: 1 year
Schedule: Full backups nightly
SQL databases are backed up nightly, starting at 02:00. Backups are done
with the `s3mysqlbackup.sh` script on the database host. Each database is
dumped individually with native backup tools and pushed to AWS in the
`S3 Infrequent Access` Tier.
```
s3://osul-backup/database/$hostname/$timestamp/$timestamp-${db_name}.sql.gz
```
```
# s3cmd ls s3://osul-backup/database/lib-mdb1/20220207_020002/
2022-02-07 10:00 919 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-api_prod.sql.gz
2022-02-07 10:00 927 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-api_test.sql.gz
2022-02-07 10:00 557 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-archivesspace_test.sql.gz
2022-02-07 10:00 4734 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-cvm_prod.sql.gz
2022-02-07 10:00 4564 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-cvm_test.sql.gz
2022-02-07 10:00 12379280 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-drupal_biblio_test.sql.gz
2022-02-07 10:00 14602215 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-drupal_dev.sql.gz
2022-02-07 10:01 268816774 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-drupal_oetest.sql.gz
2022-02-07 10:02 111899457 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-drupal_press_prod.sql.gz
2022-02-07 10:02 111513174 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-drupal_press_test.sql.gz
2022-02-07 10:02 13298 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-ics_2020.sql.gz
2022-02-07 10:02 43153 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-kiosks_prod.sql.gz
2022-02-07 10:02 19494 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-kiosks_test.sql.gz
2022-02-07 10:02 548 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-nagios.sql.gz
2022-02-07 10:02 3863301 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-ojs3_prod.sql.gz
2022-02-07 10:02 3616081 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-ojs3_test.sql.gz
2022-02-07 10:03 88926778 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-ojs_test.sql.gz
2022-02-07 10:03 2573 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-pma.sql.gz
2022-02-07 10:12 2282665746 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-scholars_prod.sql.gz
2022-02-07 10:19 1293963284 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-scholars_prod_20201028.sql.gz
2022-02-07 10:29 2061659603 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-scholars_prod_clustertest.sql.gz
2022-02-07 10:30 1021225 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-scholars_stage.sql.gz
2022-02-07 10:30 1019906 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-scholars_test.sql.gz
2022-02-07 10:30 24970 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-sys.sql.gz
2022-02-07 10:30 10867 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-tour_prod.sql.gz
2022-02-07 10:30 10914 s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-tour_test.sql.gz
Application Backups
Retention: Indefinite
Schedule: Rolling snapshots nightly
Application data is backed up individually with `rsync` through our
AWS Storage Gateway `lib-sg1`. The AWS Storage Gateway provides an NFS
interface to our `s3://osul-backup` S3 bucket. This backup is a 1:1 copy of
the application data at the filesystem level. With the volume of data,
point-in-time recovery was too expensive.
Additional Data Resiliency Notes
- VM storage volumes are backed up separately by UIT
- UIT managed storage volumes (NetApp, Isilon) have auto snapshots
- Library NAS servers use ZFS. `gravy` and `mash` use mirrored pairs of
disks with hot spares to avoid data loss when drives fail - `cornucopia` uses RAIDZ2 to allow for multiple drive failures per pool
- Planning to replicate `gravy` and `mash` NAS servers to `yam` for
Disaster Recovery - Planning to extend integrated TrueNAS monitoring and scrape its metrics
into our own Prometheus infrastructure for tracking status of disks, ZFS
pools, filesystem usage per volume, etc.