This tutorial will guide you through the process of setting up Flyway migration tests in a CI pipeline using Spawn to provision a database on the fly.
This tutorial should take you about 20 minutes to complete.
Install Spawn by visiting the getting started documentation and following the installation steps.
This tutorial will assume you have a database with Flyway migrations already set up and commited to your own repository. Alternatively, follow along with the examples taken from this demo repo, which you can fork and clone.
Testing your database migrations before they reach production is critical. There are a few scenarios which result in a breaking change: you could write a migration which doesn’t work on a database with data in, you could have 2 developers both merge a versioned migration into production at the same time, and someone could write some SQL which leads to data loss. You can read more about the importance of testing your database migrations here.
We will work through a step-by-step guide on setting up migration tests for the PostgreSQL Pagila demo database stored in Amazon RDS, in a GitHub Actions pipeline. The logic can be applied to any other CI pipeline, and also SQL Server and MySQL databases.
Firstly, we want to get a backup of our production database. This will ultimately be used to run migrations against later. Testing against real data will catch problems which may not arise from testing against an empty database.
In our repository, lets create a folder which will hold our backup file:
> mkdir backups
Following the PostgreSQL docs, we’ll run the following to get a database backup file saved to our folder as pagila.sql
. Note that pg_dump
will have to be the same version (or lower) of your Spawn database, which is 12 in our example:
> export PGPASSWORD=<Password>
> pg_dump -h <Host> -p <Port> -U <Username> --create <DbName> --file backups/pagila.sql
Using Spawn, we can turn this into a data image from our backup file, following the instructions here. First, we will create a source file pagila-backup.yaml
in the directory which contains the backups
folder with the following content:
name: Pagila
engine: postgresql
version: 12.0
sourceType: backup
backup:
folder: ./backups/
file: pagila.sql
format: plain
tags:
- prod
- latest
The source file describes where the data and schema comes from. We can now create a data image from this:
> spawnctl create data-image --file ./pagila-backup.yaml
This set up will allow us to create databases from the backup file - we’ll come back to this later on in the tutorial.
Lets create and set up our GitHub Actions workflow to test database migrations.
To access spawnctl
in the pipeline, we will first need to create an access token so that we won’t be prompted to authenticate with Spawn in our CI environment:
> spawnctl create access-token --purpose "CI system Spawn access token"
Access token generated: <access-token-string>
Copy the <access-token-string>
into your clipboard; you won’t be able to retrieve it again. Now lets add it as a secret variable called SPAWNCTL_ACCESS_TOKEN
in GitHub Actions by following this tutorial.
In your repository, add a file underneath the folder .github/workflows
called migration-test.yaml
Copy and paste the yaml below, then commit and push the new file:
name: Database migration test
on: workflow_dispatch
jobs:
run_migration_test:
name: Run Flyway migration tests
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Run database migrations against prod backup
run: ./migration-test-prod.sh
env:
SPAWNCTL_ACCESS_TOKEN: ${{ secrets.SPAWNCTL_ACCESS_TOKEN }}
This workflow is referencing a bash script migration-test-prod.sh
which can be called from any CI pipeline as it is, and just work. Lets go ahead and create that script by copying the code below and saving the file at the root level of our directory, substituting the variable databaseName
for your own databases name:
#!/bin/bash
set -e
echo "Downloading and installing spawnctl..."
curl -sL https://run.spawn.cc/install | sh > /dev/null 2>&1
export PATH=$HOME/.spawnctl/bin:$PATH
echo "spawnctl successfully installed"
export SPAWN_PAGILA_IMAGE_NAME=Pagila:prod
echo
echo "Creating Pagila backup Spawn data container from image '$SPAWN_PAGILA_IMAGE_NAME'..."
pagilaContainerName=$(spawnctl create data-container --image $SPAWN_PAGILA_IMAGE_NAME --lifetime 10m --accessToken $SPAWNCTL_ACCESS_TOKEN -q)
databaseName="pagila"
pagilaJson=$(spawnctl get data-container $pagilaContainerName -o json)
pagilaHost=$(echo $pagilaJson | jq -r '.host')
pagilaPort=$(echo $pagilaJson | jq -r '.port')
pagilaUser=$(echo $pagilaJson | jq -r '.user')
pagilaPassword=$(echo $pagilaJson | jq -r '.password')
echo "Successfully created Spawn data container '$pagilaContainerName'"
echo
docker pull postgres:12-alpine > /dev/null 2>&1
docker pull flyway/flyway > /dev/null 2>&1
echo
echo "Starting migration of database with flyway"
docker run --net=host --rm -v $PWD/sql:/flyway/sql flyway/flyway migrate -url="jdbc:postgresql://$pagilaHost:$pagilaPort/$databaseName" -user=$pagilaUser -password=$pagilaPassword
echo "Successfully migrated 'Pagila' database"
echo
spawnctl delete data-container $pagilaContainerName --accessToken $SPAWNCTL_ACCESS_TOKEN -q
echo "Successfully cleaned up the Spawn data container '$pagilaContainerName'"
Before committing and pushing this file to the repository, lets give it executable permissions:
> chmod +x migration-test-prod.sh
This script accomplishes a few things. We are:
spawnctl
command from the pipeline agentPagila:prod
flyway migrate
on our data container, using migration scripts stored under the sql
folderThat’s it - that is our migration test. We have quickly provisioned a database instance from our back up using Spawn, and set the Flyway connection details to point to that database and run the migration scripts in our repository. Any errors will be apparent at this point before changes get merged into the main branch.
Note: There is an unofficial Flyway Migration action in the GitHub Marketplace which you can copy from. But using the code in migration-test-prod.sh
is using generic bash and will work across all CI pipelines.
Once you’ve pushed all your scripts to GitHub, you can now manually run the migration test workflow by navigating to the Actions tab in GitHub and clicking on ‘Database migration test’, then ‘Run workflow’. Or, a more likely scenario, we want this to automatically run on a pull request so the development team can test that any new migrations will run against main before merging.
Alter your migration-test.yaml
file to the following, so that we now trigger this workflow on other GitHub events:
name: Database migration test
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
workflow_dispatch:
jobs:
run_migration_test:
name: Run Flyway migration tests
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Run database migrations against prod backup
run: ./migration-test-prod.sh
env:
SPAWNCTL_ACCESS_TOKEN: ${{ secrets.SPAWNCTL_ACCESS_TOKEN }}
One last thing which is useful - with the way it’s set up, you will have to create a data image manually every time you have migration changes to get the latest backup of your production database. But we can automate this step using Spawn in our existing GitHub Actions pipeline.
Add some more GitHub secrets so that we can allow the agent access to our production database to back up, not write to. We will need the database’s <Host>
, <Username>
, and <Password>
. The next step will reference these secrets by using the names PAGILA_HOST
, PAGILA_ADMIN_USERNAME
and PAGILA_ADMIN_PASSWORD
but you will create secret names prefixed with your own database name.
Lets add another workflow called db-backup.yaml
underneath our .github/workflows
folder:
name: Take backup of production database daily
on:
schedule:
- cron: "0 9 * * MON-FRI"
workflow_dispatch:
jobs:
take_db_backup:
name: Take backup of production database
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Create backup
run: ./take-db-backup.sh
env:
SPAWNCTL_ACCESS_TOKEN: ${{ secrets.SPAWNCTL_ACCESS_TOKEN }}
PAGILA_HOST: ${{ secrets.PAGILA_HOST }}
PAGILA_USERNAME: ${{ secrets.PAGILA_ADMIN_USERNAME }}
PAGILA_PASSWORD: ${{ secrets.PAGILA_ADMIN_PASSWORD }}
We are going to create another bash script here called take-db-backup.sh
, which you can create at the root level of your repository with the following content substituting the variable databaseName
for your own databases name:
#!/bin/bash
set -e
echo "Downloading and installing spawnctl..."
curl -sL https://run.spawn.cc/install | sh > /dev/null 2>&1
export PATH=$HOME/.spawnctl/bin:$PATH
echo "spawnctl successfully installed"
databaseName="pagila"
mkdir backups
docker pull postgres:12-alpine > /dev/null 2>&1
echo
echo "Backing up Pagila database..."
docker run --net=host --rm -v $PWD/backups:/backups/ -e PGPASSWORD=$PAGILA_PASSWORD postgres:12-alpine pg_dump -h $PAGILA_HOST -p 5432 -U $PAGILA_USERNAME --create $databaseName --file /backups/pagila.sql
echo "Creating Spawn data image..."
echo
pagilaImageName=$(spawnctl create data-image --file ./pagila-backup.yaml --lifetime 336h --accessToken $SPAWNCTL_ACCESS_TOKEN -q)
echo "Successfully created Spawn data image '$pagilaImageName'"
echo
echo "Successfully backed up Pagila database"
Once again, this script needs to be made executable:
> chmod +x take-db-backup.sh
Finally you’ll need to add the source file for your data image pagila-backup.yaml
, which you created locally earlier, to your repository. Now you can push all your new files to the repository.
Every week day at 9am, this script will run which creates a new data image from a backup of the latest state of Pagila database. We don’t need to change our other workflow - that is already programmed to use an image under the name of Pagila:prod
, and much like docker tags we will have a newer image with the prod
tag after this script is run.
In this tutorial we saw how to
flyway migrate
to test latest migrations against a copy of production