Back up databases using Kubernetes CronJobs
In this article, we will create a Kubernetes CronJob that executes every 12 hours and backs up a PgSQL database. The same concepts can be used for any other database.
To get jump-started checkout the repository for Postgres-Backup-Container, the readme file contains instructions on how to use it in the Azure Cloud. For details on how everything works continue on reading.
To achieve our goal we have 2 conditions that we need to fulfil:
- Scheduling a Kubernetes CronJob that makes a backup of the database.
- Store the backup externally.
Scheduling a Kubernetes CronJob
In every application, we have repeatable tasks, and Kubernetes provides us with the resource CronJob, to simplify and provide a Kubernetes native solution.
The CronJob is simply a Pod with a Schedule, i.e. a Pod with a specified time when it should run.
- The schedule is defined in Cron Format. For more details check out this cheat sheet.
- The Pod (in our use case) needs to make a backup of the database. A pod is a wrapper for a Container which means that In the container we specify the job. Lets set the container up.
Preparing the PgSQL Backup Container
The container is defined in the Dockerfile:
ENV PGHOST='localhost:5432' ENV PGDATABASE='postgres' ENV PGUSER='postgres@postgres' ENV PGPASSWORD='password'
RUN apk update RUN apk add postgresql
COPY dumpDatabase.sh .
ENTRYPOINT [ "/bin/sh" ] CMD [ "./dumpDatabase.sh" ]
The Dockerfile is understandable but to summarize we are creating an environment with PostgreSql installed so that we can use pg_dump in the script dumpDatabase.sh. Let's check out the script:
DUMP_FILE_NAME="backupOn`date +%Y-%m-%d-%H-%M`.dump" echo "Creating dump: $DUMP_FILE_NAME" cd pg_backup pg_dump -C -w --format=c --blobs > $DUMP_FILE_NAME if [ $? -ne 0 ]; then rm $DUMP_FILE_NAME echo "Back up not created, check db connection settings" exit 1 fi echo 'Successfully Backed Up' exit 0
In the script, we create a file name based on the current time, move in the directory pg_backup and create the dump from the database, if we get to exit 0 then we exit the script and with that the container successfully, if it would have been a failure the CronJob would restart the container, because of the restartPolicy being OnFailure (last difference from pods, promise).
Now, we know how the container works and that it exports the data to pg_backup, we are left with the Kubernetes Resources:
The CronJob resource is defined in the file ./aks/pg-backup-cronJob.yaml. Besides referencing the above container and defining the schedule to be every twelve hours (0 */12 * * *) there is one more important part, how we get the backup from the container:
volumeMounts: - mountPath: "/pg_backup" name: backup-volume
The volumeMounts property of a container enables us to map one external volume to the mountPath of the container. The external volume is defined below:
volumes: - name: backup-volume persistentVolumeClaim: claimName: pg-backup-pvc
The volumes property is part of the specification that defines the persistent volume claims that are needed for your containers, and because there can be many we need to provide a name so that we refer to it from the container, which explains the name backup-volume.
We reached the last part how or what is the persistent volume claim pg-backup-pvc. That gets us in the next section!
Storing the Backup Externally
Our PersistenceVolumeClaim is defined in the file ./aks/pg-persistent-volume-claim.yaml, the important points there are:
- We claim 5GB (for sure you want more)
- Require ReadWriteMany access.
- For the storage backup-storage.
This storage is defined in the file ./aks/pg-storage-class.yaml
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: backup-storage provisioner: kubernetes.io/azure-file parameters: location: westeurope skuName: Standard_GRS storageAccount: pgbackupstorage
It defines the provisioner to be azure-file, if you are using any other cloud use the specific provisioner. And we get to the three parameters, location, skuName and storageAccount. These three parameters specify which Azure storage account is mapped to your backup-storage StorageClass that is used throughout your Kubernetes Cluster. Though currently you need to manually create the Storage Account in the same Resource group as the Kubernetes Cluster.
Summing it up
To get it running:
- Create the Storage Account,
- Update pg-storage-class.yaml with the name of your storage account.
- Update pg-backup-cronJob.yaml to contain your database data (this is an example and purposefully kept simple for a real project use secrets).
Navigate to this directory and execute kubectl create -f . This creates all the resources in the current directory.
cronjob.batch "batch-every-twelve-hours" created persistentvolumeclaim "pg-backup-pvc" created storageclass.storage.k8s.io "backup-storage" created
Verify that the backups are uploaded in your Aazure file share, if they aren't, verify your database credentials and modify it.