Backups Overview

Date: April 2022

Date: April 2022

Date: April 2022

System Backups 

Retention: 30 days

Schedule: Full back up monthly; incremental backup daily

System level backups are coordinated by the `backup-director.sh` script
on `lib-infra1` that runs daily at 02:00. This triggers individual backups
on each system listed in `/usr/local/etc/hosts.backup`.

System backups use `duplicity` to backup filesystems. Backups are pushed to

```
s3://osul-backup/servers/$hostname/$filesystem/
```

Systems with additional disks grab their disk list from
`/usr/local/etc/backups.conf` and are backed up following the same pattern
as other filesystems.

Backup information is stored in the `synetman.s3_backups` table on
`lib-admin1` in MySQL. Statistics for each backup are stored in the
`synetman.s3_backups_stats` table.

Full backups run once per month. Other days are incremental changes since the
previous backup. We keep one full backup and all incrementals since the last
full backup, for effectively one month of data retention.

```
# s3cmd ls s3://osul-backup/servers/lib-web2/
                          DIR  s3://osul-backup/servers/lib-web2/data0/
                          DIR  s3://osul-backup/servers/lib-web2/etc/
                          DIR  s3://osul-backup/servers/lib-web2/log0/
                          DIR  s3://osul-backup/servers/lib-web2/root/
                          DIR  s3://osul-backup/servers/lib-web2/var/

Database Backups 

Retention: 1 year

Schedule: Full backups nightly 

SQL databases are backed up nightly, starting at 02:00. Backups are done
with the `s3mysqlbackup.sh` script on the database host. Each database is
dumped individually with native backup tools and pushed to AWS in the
`S3 Infrequent Access` Tier.

```
s3://osul-backup/database/$hostname/$timestamp/$timestamp-${db_name}.sql.gz
```

```
# s3cmd ls s3://osul-backup/database/lib-mdb1/20220207_020002/
2022-02-07 10:00          919  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-api_prod.sql.gz
2022-02-07 10:00          927  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-api_test.sql.gz
2022-02-07 10:00          557  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-archivesspace_test.sql.gz
2022-02-07 10:00         4734  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-cvm_prod.sql.gz
2022-02-07 10:00         4564  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-cvm_test.sql.gz
2022-02-07 10:00     12379280  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-drupal_biblio_test.sql.gz
2022-02-07 10:00     14602215  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-drupal_dev.sql.gz
2022-02-07 10:01    268816774  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-drupal_oetest.sql.gz
2022-02-07 10:02    111899457  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-drupal_press_prod.sql.gz
2022-02-07 10:02    111513174  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-drupal_press_test.sql.gz
2022-02-07 10:02        13298  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-ics_2020.sql.gz
2022-02-07 10:02        43153  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-kiosks_prod.sql.gz
2022-02-07 10:02        19494  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-kiosks_test.sql.gz
2022-02-07 10:02          548  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-nagios.sql.gz
2022-02-07 10:02      3863301  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-ojs3_prod.sql.gz
2022-02-07 10:02      3616081  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-ojs3_test.sql.gz
2022-02-07 10:03     88926778  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-ojs_test.sql.gz
2022-02-07 10:03         2573  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-pma.sql.gz
2022-02-07 10:12   2282665746  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-scholars_prod.sql.gz
2022-02-07 10:19   1293963284  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-scholars_prod_20201028.sql.gz
2022-02-07 10:29   2061659603  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-scholars_prod_clustertest.sql.gz
2022-02-07 10:30      1021225  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-scholars_stage.sql.gz
2022-02-07 10:30      1019906  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-scholars_test.sql.gz
2022-02-07 10:30        24970  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-sys.sql.gz
2022-02-07 10:30        10867  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-tour_prod.sql.gz
2022-02-07 10:30        10914  s3://osul-backup/database/lib-mdb1/20220207_020002/20220207_020002-tour_test.sql.gz

Application Backups 

Retention: Indefinite

Schedule: Rolling snapshots nightly

Application data is backed up individually with `rsync` through our
AWS Storage Gateway `lib-sg1`. The AWS Storage Gateway  provides an NFS 
interface to our `s3://osul-backup` S3 bucket. This backup is a 1:1 copy of
the application data at the filesystem level. With the volume of data,
point-in-time recovery was too expensive.


Additional Data Resiliency Notes

  • VM storage volumes are backed up separately by UIT
  • UIT managed storage volumes (NetApp, Isilon) have auto snapshots
  • Library NAS servers use ZFS. `gravy` and `mash` use mirrored pairs of
        disks with hot spares to avoid data loss when drives fail
  • `cornucopia` uses RAIDZ2 to allow for multiple drive failures per pool
  • Planning to replicate `gravy` and `mash` NAS servers to `yam` for
    Disaster Recovery
  • Planning to extend integrated TrueNAS monitoring and scrape its metrics
        into our own Prometheus infrastructure for tracking status of disks, ZFS
        pools, filesystem usage per volume, etc.