How to create a CI archive for your project

Automating the archiving of CI runs and performance metrics enhances efficiency, promotes informed decision-making, and ensures long-term development process integrity. That is why we decided to create a CI archive for our project and we will take you through its creation process.

Jovan Blažek

Automating the archiving of CI runs and performance metrics from pull requests is crucial for several reasons.

Firstly, it ensures the preservation of valuable historical data. By automatically archiving these metrics, developers can maintain a comprehensive record of the performance and behavior of their codebase over time. This historical data can be precious for debugging and troubleshooting, allowing developers to identify patterns, trends, and regressions.

Secondly, having an accessible archive of CI runs and performance metrics facilitates collaboration and knowledge sharing within the development team. With this readily available information, team members can easily reference past results, learn from previous experiences, and make informed decisions about code changes.

Overall, automating the archiving of CI runs and performance metrics enhances efficiency, promotes informed decision-making, and ensures the long-term integrity of development processes.

That is why we decided to create a CI archive for our project and we would like to take you through our journey of creating it.

In search of a solution

When deciding on the solution, we had to take into account the following requirements:

the solution should be easy to implement
the solution should be free or cheap
the data should be accessible only by team members

Our first thought was to store data in a git repository, in a separate branch. This would be easy to implement and cheap to serve. Our project repository is already private, however, the pages it serves are public. To combat this problem, we decided to use PageCrypt, which would encrypt the html files, keeping our test results private. For serving these files we decided to use Render, which offers free static site hosting and has less strict size and bandwidth limits than GitHub Pages.

Implementation of Git-based CI archive

With initial research done, we split the implementation into the following steps:

Create a clean branch for archiving CI results and other metrics.
After each CI run, commit results to this branch with GitHub actions.
Configure automatic deployment on Render

The archive will have the following structure:

/
├─ pull-requests/
│  ├─ [PR number]/
│  │  ├─ playwright-report/
│  │  │  ├─ ...
│  │  │
│  │  ├─ jest-report/
│  │  │  ├─ ...
│  │  │
│  │  ├─ index.html - Generated list of directories
|  ├─ index.html - Generated list of directories
├─ index.html - Generated list of directories

GitHub actions workflow

We started with a simple goal in mind - archive the results of Playwright E2E tests. To do that, we wrote the following GitHub action job, which runs after the completion of playwright tests.

In short, this job:

creates or updates needed folder structure in the archive branch
downloads the test results and encrypts them using PageCrypt
regenerates navigation files using the bash script
pushes the changes to the archive branch

Let's take a closer look at this workflow step by step.

Job initialization

deploy-results:
  name: Deploy results to Render
  needs: [playwright-tests]
  permissions:
    contents: write
    pull-requests: write
  env:
    PW_REPORT_FOLDER: ${{github.workspace}}/archive/pull-requests/${{github.event.pull_request.number}}/playwright-report
    HTML_GENERATOR: ${{github.workspace}}/archive/generateHtmlListOfFolders.sh

  runs-on: ubuntu-latest
  steps:
    - name: Checkout ci-archive branch
      uses: actions/checkout@v3
      with:
        ref: ci-archive
        path: archive

    - uses: actions/setup-node@v3
      with:
        node-version: '16.12.0'

Firstly, we clone the archive branch into the archive directory and initialize node.

Creating a folder structure and removing old reports

With the environment ready, we download the test results and move them to the correct place - replacing the old report in the process. When done, we run the PageCrypt to encrypt the index.html file. Playwright report consists only of one html file, so we can encrypt it directly.

- name: Create a folder structure
      run: mkdir -p ${{env.PW_REPORT_FOLDER}}

    - name: Remove old report
      run: rm -rf ${{env.PW_REPORT_FOLDER}}/*

    - name: Download report
      uses: actions/download-artifact@v3
      with:
        name: playwright-report
        path: ${{env.PW_REPORT_FOLDER}}

    - name: Move report files
      run: |
        cd ${{env.PW_REPORT_FOLDER}}
        mv html-report/* .
        rm -rf html-report

    - name: Encrypt report
      run: npx --yes pagecrypt ${{env.PW_REPORT_FOLDER}}/index.html ${{env.PW_REPORT_FOLDER}}/index.html "${{secrets.PAGECRYPT_PASSWORD}}"

Updating archive navigation

With everything ready and in the right place, we can regenerate archive navigation by running bash script. This script is responsible for creating a list with links to every directory in the current one. This way, we can regenerate the index.html files after any change to the archive files. We run the script three times, once for each directory in the archive (root, pull-requests, PR number).

#!/bin/bash

TITLE=$1

FILENAME="index.html"

# Check if file exists and remove if it does
if [ -f $FILENAME ]; then
    rm $FILENAME
fi

touch $FILENAME
cat > $FILENAME << EOF
<!DOCTYPE html>
<html>
  <head>
    <meta name="robots" content="noindex" />
    <title>$TITLE</title>
    <link rel="stylesheet" href="/style.css" />
  </head>
  <body>
    <h1>$TITLE</h1>
    <ul>
EOF

# Loop over all subfolders in the current directory
for DIR in ./*; do
    if [ -d "$DIR" ]; then
        # If directory, add link to the unordered list in the HTML file
        DIR_NAME=${DIR:2}
        echo "      <li><a href=\\"$DIR/\\">$DIR_NAME</a></li>" >> $FILENAME
    fi
done

cat >> $FILENAME << EOF
    </ul>
  </body>
</html>
EOF

- name: Update archive folder lists
      run: |
        cd ${{github.workspace}}/archive
        ${{env.HTML_GENERATOR}} "CI Archive"
        cd ${{github.workspace}}/archive/pull-requests
        ${{env.HTML_GENERATOR}} "Pull Requests"
        cd ${{github.workspace}}/archive/pull-requests/${{github.event.pull_request.number}}
        ${{env.HTML_GENERATOR}} "Pull Request #${{github.event.pull_request.number}}"
      shell: bash

Committing and pushing changes

Finally, we commit and push the changes to the archive branch. We use git-auto-commit-action to do that. The commit message references the PR and a commit that tests ran on. This way, we can easily find the test results for a given commit. To make it easier to navigate to the archive, we add a sticky comment to the PR with a link to the archive using the sticky-pull-request-comment action.

- name: Commit and push changes to the ci-archive branch
      uses: stefanzweifel/git-auto-commit-action@v4
      with:
        commit_message: 'ci: Update test results #${{github.event.pull_request.number}} ${{github.event.pull_request.head.sha}}'
        branch: ci-archive
        repository: archive

    - name: Add sticky comment with link
      uses: marocchino/sticky-pull-request-[email protected]
      with:
        header: 'deploy-results'
        message: 'CI results for this PR are available at <https://example.com/pull-requests/${{github.event.pull_request.number}>}/'

How the turntables

After implementing the Git-based archive, everything was working great, results were being archived, data-driven decisions were being made, and everything was fine. Until it wasn't.

After a few weeks, we noticed poor performance of our CI pipeline. The time to complete was getting longer and longer and the archive was to blame. The way we were archiving the results was not efficient. The results of our E2E tests also include screenshots which were taking up a lot of space in our Git history. This caused the archive branch to grow to an enormous size, which got to the point when the pipeline would just time out or run out of space while downloading the archive branch.

We quickly realized that we needed to change the way we archive our results. The incremental history did not matter to us, we only cared about the final state of the PR. So we decided to move away from Git and use a different solution.

Archiving CI results using CDN

To combat the growing size of our repository, we turned our heads to a proper storage method for "larger" files - CDN. Luckily we were already using BunnyCDN on our project, so most of the work was already done.

We migrated the current archive to CDN and changed the workflow to upload the results to CDN instead of Git.

Updates to the github actions workflow

The logic for checking out the branch and pushing to it were removed. In their place, we added a step to upload the test results to CDN. The script uses API endpoints of the CDN to replace the old files with the new ones. It also regenerates the index.html files used for navigation. (So long bash...)

The final version of the workflow looks like this:

deploy-results:
name: Deploy results to CDN
needs: [playwright-tests]
env:
  PW_REPORT_FOLDER: ${{github.workspace}}/downloadedArtifacts/pull-requests/${{github.event.pull_request.number}}/playwright-report

runs-on: ubuntu-latest
steps:
  - uses: actions/checkout@v3

  - uses: actions/setup-node@v3
    with:
      node-version: '16.12.0'

  - name: Install dependencies
    run: npm ci

  - name: Create a folder structure
    run: mkdir -p ${{env.PW_REPORT_FOLDER}}

  - name: Download report
    uses: actions/download-artifact@v3
    with:
      name: playwright-report
      path: ${{env.PW_REPORT_FOLDER}}

  - name: Move report files
    run: |
      cd ${{env.PW_REPORT_FOLDER}}
      mv html-report/* .
      rm -rf html-report

  - name: Encrypt report
    run: npx --yes pagecrypt ${{env.PW_REPORT_FOLDER}}/index.html ${{env.PW_REPORT_FOLDER}}/index.html "${{secrets.PAGECRYPT_PASSWORD}}"

  - name: Upload report to CDN
    env:
      BUNNY_CDN_CI_ARCHIVE_API_KEY: ${{secrets.BUNNY_CDN_CI_ARCHIVE_API_KEY}}
      BUNNY_CDN_STORAGE_ZONE_NAME: archive
    run: |
      node ${{github.workspace}}/.github/workflows/uploadCiResultsToCdn ${{github.event.pull_request.number}}

  - name: Add sticky comment with link
    uses: marocchino/sticky-pull-request-[email protected]
    with:
      header: 'deploy-results'
      message: 'CI results for this PR are available at <https://archive.b-cdn.net/pull-requests/${{github.event.pull_request.number}>}/'

Fight the YAGNI (you aren’t gonna need it) principle

Archiving CI results and other project health metrics is a great way to preserve valuable historical data and move your team toward more data-driven decisions. We have learned the hard way that building something quickly and cheaply can come back to bite you later, so sometimes you have to fight the YAGNI (you aren't gonna need it) principle and build something that will scale better in the future. We hope that this article will help you in your journey of building your own archive.