ruzickap

ruzickap /action-my-broken-link-checker

A GitHub Action for checking broken links

47
4
GitHub
Public

Repository Statistics

Key metrics and engagement data

47
Stars
4
Forks
3
Open Issues
0
Releases
1.09
Engagement Rate
Default branch: main

Timeline

Repository has been active for 5 years, 6 months

Repository Created

Last Commit
Recently active

README.md

GitHub Actions: My Broken Link Checker ✔

GitHub Marketplace license release GitHub release date GitHub Actions status

This is a GitHub Action to check for broken links in your static files or web pages. It uses muffet for the URL checking task.

See the basic GitHub Action example to run periodic checks (weekly) against mkdocs.org:

yaml
1on:
2 schedule:
3 - cron: '0 0 * * 0'
4
5name: Check markdown links
6jobs:
7 my-broken-link-checker:
8 name: Check broken links
9 runs-on: ubuntu-latest
10 steps:
11 - name: Check for broken links
12 uses: ruzickap/action-my-broken-link-checker@v2
13 with:
14 url: https://www.mkdocs.org
15 cmd_params: "--one-page-only --max-connections=3 --color=always" # Check just one page

Check out the real demo:

My Broken Link Checker demo

This deploy action can be combined with Static Site Generators (Hugo, MkDocs, Gatsby, GitBook, mdBook, etc.). The following examples expect to have the web pages stored in the ./build directory. A caddy web server is started during the tests, using the hostname from the URL parameter and serving the web pages (see details in entrypoint.sh).

yaml
1- name: Check for broken links
2 uses: ruzickap/action-my-broken-link-checker@v2
3 with:
4 url: https://www.example.com/test123
5 pages_path: ./build/
6 cmd_params: '--buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --header="User-Agent:curl/7.54.0" --timeout=20' # muffet parameters

Do you want to skip the Docker build step? OK, script mode is also available:

yaml
1- name: Check for broken links
2 env:
3 INPUT_URL: https://www.example.com/test123
4 INPUT_PAGES_PATH: ./build/
5 INPUT_CMD_PARAMS: '--buffer-size=8192 --max-connections=10 --color=always --header="User-Agent:curl/7.54.0" --skip-tls-verification' # --skip-tls-verification is mandatory parameter when using https and "PAGES_PATH"
6 run: wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash

Parameters

Environment variables used by ./entrypoint.sh script.

VariableDefaultDescription
INPUT_CMD_PARAMS--buffer-size=8192 --max-connections=10 --color=always --verboseCommand-line parameters for the URL checker muffet
INPUT_DEBUGfalseEnable debug mode for the ./entrypoint.sh script (set -x)
INPUT_PAGES_PATHRelative path to the directory with local web pages
INPUT_URL(Mandatory / Required)URL that will be checked

Example of Periodic checks

Pipeline for periodic link checking:

yaml
1name: periodic-broken-link-checks
2
3on:
4 workflow_dispatch:
5 push:
6 paths:
7 - .github/workflows/periodic-broken-link-checks.yml
8 schedule:
9 - cron: '3 3 * * 3'
10
11jobs:
12 broken-link-checker:
13 runs-on: ubuntu-latest
14 steps:
15
16 - name: Setup Pages
17 id: pages
18 uses: actions/configure-pages@v3
19
20 - name: Check for broken links
21 uses: ruzickap/action-my-broken-link-checker@v2
22 with:
23 url: ${{ steps.pages.outputs.base_url }}
24 cmd_params: '--buffer-size=8192 --max-connections=10 --color=always --header="User-Agent:curl/7.54.0" --timeout=20'

Full example

GitHub Action example:

yaml
1name: Checks
2
3on:
4 push:
5 branches:
6 - main
7
8jobs:
9 build-deploy:
10 runs-on: ubuntu-latest
11 steps:
12 - name: Create web page
13 run: |
14 mkdir -v public
15 cat > public/index.html << EOF
16 <!DOCTYPE html>
17 <html>
18 <head>
19 My page, which will be stored on the my-testing-domain.com domain
20 </head>
21 <body>
22 Links:
23 <ul>
24 <li><a href="https://my-testing-domain.com">https://my-testing-domain.com</a></li>
25 <li><a href="https://my-testing-domain.com:443">https://my-testing-domain.com:443</a></li>
26 </ul>
27 </body>
28 </html>
29 EOF
30
31 - name: Check links using script
32 env:
33 INPUT_URL: https://my-testing-domain.com
34 INPUT_PAGES_PATH: ./public/
35 INPUT_CMD_PARAMS: '--skip-tls-verification --verbose --color=always'
36 INPUT_DEBUG: true
37 run: wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash
38
39 - name: Check links using container
40 uses: ruzickap/action-my-broken-link-checker@v2
41 with:
42 url: https://my-testing-domain.com
43 pages_path: ./public/
44 cmd_params: '--skip-tls-verification --verbose --color=always'
45 debug: true

Best practices

Let's try to automate the creation of web pages as much as possible.

The ideal situation requires a repository naming convention where the name of the GitHub repository matches the URL where it will be hosted.

GitHub Pages with custom domain

The mandatory part is the repository name awsug.cz, which is the same as the domain:

The web pages will be stored as GitHub Pages on their own domain.

The GitHub Action file may look like:

yaml
1name: hugo-build
2
3on:
4 pull_request:
5 types: [opened, synchronize]
6 push:
7
8jobs:
9 hugo-build:
10 runs-on: ubuntu-latest
11 steps:
12 - uses: actions/checkout@v2
13
14 - name: Checkout submodules
15 shell: bash
16 run: |
17 auth_header="$(git config --local --get http.https://github.com/.extraheader)"
18 git submodule sync --recursive
19 git -c "http.extraheader=$auth_header" -c protocol.version=2 submodule update --init --force --recursive --depth=1
20
21 - name: Setup Hugo
22 uses: peaceiris/actions-hugo@v2
23 with:
24 hugo-version: '0.62.0'
25
26 - name: Build
27 run: |
28 hugo --gc
29 cp LICENSE README.md public/
30 echo "${{ github.event.repository.name }}" > public/CNAME
31
32 - name: Check for broken links
33 env:
34 INPUT_URL: https://${{ github.event.repository.name }}
35 INPUT_PAGES_PATH: public
36 INPUT_CMD_PARAMS: '--verbose --buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --exclude="(mylabs.dev|linkedin.com)"'
37 run: |
38 wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash
39
40 - name: Check links using container
41 uses: ruzickap/action-my-broken-link-checker@v2
42 with:
43 url: https://my-testing-domain.com
44 pages_path: ./public/
45 cmd_params: '--verbose --buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --header="User-Agent:curl/7.54.0" --exclude="(mylabs.dev|linkedin.com)"'
46 debug: true
47
48 - name: Deploy
49 uses: peaceiris/actions-gh-pages@v3
50 if: ${{ github.event_name }} == 'push' && github.ref == 'refs/heads/main'
51 env:
52 ACTIONS_DEPLOY_KEY: ${{ secrets.ACTIONS_DEPLOY_KEY }}
53 PUBLISH_BRANCH: gh-pages
54 PUBLISH_DIR: public
55 with:
56 forceOrphan: true

The example is using Hugo.

GitHub Pages with github.io domain

The mandatory part is the repository name k8s-harbor, which is the directory part at the end of ruzickap.github.io:

In this example, the web pages will use GitHub's domain github.io.

yaml
1name: vuepress-build-check-deploy
2
3on:
4 pull_request:
5 types: [opened, synchronize]
6 paths:
7 - .github/workflows/vuepress-build-check-deploy.yml
8 - docs/**
9 - package.json
10 - package-lock.json
11 push:
12 paths:
13 - .github/workflows/vuepress-build-check-deploy.yml
14 - docs/**
15 - package.json
16 - package-lock.json
17
18jobs:
19 vuepress-build-check-deploy:
20 runs-on: ubuntu-latest
21 steps:
22 - uses: actions/checkout@v2
23
24 - name: Install Node.js 12
25 uses: actions/setup-node@v1
26 with:
27 node-version: 12.x
28
29 - name: Install VuePress and build the document
30 run: |
31 npm install
32 npm run build
33 cp LICENSE docs/.vuepress/dist
34 sed -e "s@(part-@(https://github.com/${GITHUB_REPOSITORY}/tree/main/docs/part-@" -e 's@.\/.vuepress\/public\/@./@' docs/README.md > docs/.vuepress/dist/README.md
35 ln -s docs/.vuepress/dist ${{ github.event.repository.name }}
36
37 - name: Check for broken links
38 uses: ruzickap/action-my-broken-link-checker@v2
39 with:
40 url: https://${{ github.repository_owner }}.github.io/${{ github.event.repository.name }}
41 pages_path: .
42 cmd_params: '--exclude=mylabs.dev --max-connections-per-host=5 --rate-limit=5 --timeout=20 --header="User-Agent:curl/7.54.0" --skip-tls-verification'
43
44 - name: Deploy
45 uses: peaceiris/actions-gh-pages@v3
46 if: ${{ github.event_name }} == 'push' && github.ref == 'refs/heads/main'
47 env:
48 ACTIONS_DEPLOY_KEY: ${{ secrets.ACTIONS_DEPLOY_KEY }}
49 PUBLISH_BRANCH: gh-pages
50 PUBLISH_DIR: ./docs/.vuepress/dist
51 with:
52 forceOrphan: true

In this case I'm using VuePress to create my page.

GitHub Action my-broken-link-checker


Both examples can be used as a generic template, and you do not need to change them for your projects.

Running locally

It's possible to use the checking script locally. It will install Caddy and Muffet binaries if they are not already installed on your system.

bash
1export INPUT_URL="https://debian.cz/info/"
2export INPUT_CMD_PARAMS="--buffer-size=8192 --ignore-fragments --one-page-only --max-connections=10 --color=always --verbose"
3./entrypoint.sh

Output:

text
1*** INFO: [2024-01-26 05:12:20] Start checking: "https://www.mkdocs.org"
2https://www.mkdocs.org/
3 200 https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/highlight.min.js
4 200 https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/languages/django.min.js
5 200 https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/languages/yaml.min.js
6 200 https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/styles/github.min.css
7 200 https://github.com/mkdocs/catalog#-theming
8 200 https://github.com/mkdocs/mkdocs/blob/master/docs/index.md
9 200 https://github.com/mkdocs/mkdocs/wiki/MkDocs-Themes
10 200 https://twitter.com/starletdreaming
11 200 https://www.googletagmanager.com/gtag/js?id=G-274394082
12 200 https://www.mkdocs.org/
13 200 https://www.mkdocs.org/#mkdocs
14 200 https://www.mkdocs.org/about/contributing/
15 200 https://www.mkdocs.org/about/license/
16 200 https://www.mkdocs.org/about/release-notes/
17 200 https://www.mkdocs.org/about/release-notes/#maintenance-team
18 200 https://www.mkdocs.org/assets/_mkdocstrings.css
19 200 https://www.mkdocs.org/css/base.css
20 200 https://www.mkdocs.org/css/bootstrap.min.css
21 200 https://www.mkdocs.org/css/extra.css
22 200 https://www.mkdocs.org/css/font-awesome.min.css
23 200 https://www.mkdocs.org/dev-guide/
24 200 https://www.mkdocs.org/dev-guide/api/
25 200 https://www.mkdocs.org/dev-guide/plugins/
26 200 https://www.mkdocs.org/dev-guide/themes/
27 200 https://www.mkdocs.org/dev-guide/translations/
28 200 https://www.mkdocs.org/getting-started/
29 200 https://www.mkdocs.org/img/favicon.ico
30 200 https://www.mkdocs.org/js/base.js
31 200 https://www.mkdocs.org/js/bootstrap.min.js
32 200 https://www.mkdocs.org/js/jquery-3.6.0.min.js
33 200 https://www.mkdocs.org/search/main.js
34 200 https://www.mkdocs.org/user-guide/
35 200 https://www.mkdocs.org/user-guide/choosing-your-theme
36 200 https://www.mkdocs.org/user-guide/choosing-your-theme/
37 200 https://www.mkdocs.org/user-guide/choosing-your-theme/#mkdocs
38 200 https://www.mkdocs.org/user-guide/choosing-your-theme/#readthedocs
39 200 https://www.mkdocs.org/user-guide/cli/
40 200 https://www.mkdocs.org/user-guide/configuration/
41 200 https://www.mkdocs.org/user-guide/configuration/#markdown_extensions
42 200 https://www.mkdocs.org/user-guide/configuration/#plugins
43 200 https://www.mkdocs.org/user-guide/customizing-your-theme/
44 200 https://www.mkdocs.org/user-guide/deploying-your-docs/
45 200 https://www.mkdocs.org/user-guide/installation/
46 200 https://www.mkdocs.org/user-guide/localizing-your-theme/
47 200 https://www.mkdocs.org/user-guide/writing-your-docs/
48*** INFO: [2024-01-26 05:12:21] Checks completed...

my-broken-link-checker-demo

Another example is checking a web page stored locally on your disk. In this case, I'm using the web page created in the ./tests/ directory from this Git repository:

bash
1export INPUT_URL="https://my-testing-domain.com"
2export INPUT_PAGES_PATH="${PWD}/tests/"
3export INPUT_CMD_PARAMS="--skip-tls-verification --verbose --color=always"
4./entrypoint.sh

Output:

text
1*** INFO: Using path "/home/pruzicka/git/action-my-broken-link-checker/tests/" as domain "my-testing-domain.com" with URI "https://my-testing-domain.com"
2*** INFO: [2019-12-30 14:54:22] Start checking: "https://my-testing-domain.com"
3https://my-testing-domain.com/
4 200 https://my-testing-domain.com
5 200 https://my-testing-domain.com/run_tests.sh
6 200 https://my-testing-domain.com:443
7 200 https://my-testing-domain.com:443/run_tests.sh
8https://my-testing-domain.com:443/
9 200 https://my-testing-domain.com
10 200 https://my-testing-domain.com/run_tests.sh
11 200 https://my-testing-domain.com:443
12 200 https://my-testing-domain.com:443/run_tests.sh
13*** INFO: [2019-12-30 14:54:22] Checks completed...

Examples

Some other examples of building and checking web pages using Static Site Generators and GitHub Actions can be found here: https://github.com/peaceiris/actions-gh-pages/.

The following links contain real examples of My Broken Link Checker: