Scalable self-hosted runner system for GitHub actions
An updated step-by-step guide on how to deploy a system capable of creating self-hosted runners on demand!
Hello everyone! This article is intended for organizations that develop on private repositories and the minutes available from GitHub are not enough for their CI/CD needs. It is just a compilation of the steps performed on the GitHub office hours workshop video. You can see the whole playlist here and the companion repository here. However, I wrote this article for those who do not want to watch the entire three videos, which are a bit lengthy. Just note that the platform being used is Microsoft Azure.
let’s begin!
Organize deployments
We are going to create a resource group called GitHubActionsRunners
. This resource group will contain all the required Azure services to deploy our autoscaling solution for GitHub runners.
You should have Azure CLI installed, otherwise check here for how to do that. First, log into your azure account:
az login
Now, on your shell run the following commands:
az account set --subscription <subscription>
Create the resource group:
az group create --name GithubActionsRunners --location <region>
Create Azure kubernetes Cluster
As my GitHub actions workloads are not very resource intensive, I will stick to the predefined Standard_DS2
node type for my worker pool. With this command we can create an Azure Kubernetes Services (AKS) cluster:
az aks create \
--resource-group GithubActionsRunners\
--name GithubActionsRunnersAKS \
--enable-addons monitoring \
--node-count 3 \
--node-vm-size Standard_DS2 \
--generate-ssh-keys
Connecting to AKS cluster
Enter the following command:
az aks get-credentials --resource-group GithubActionsRunners \
--name GithubActionsRunnersAKS
This will enable you to use kubectl
commands on your AKS cluster. Make sure you have it installed. If not, check this out.
Provisioning the networking infrastructure
We need to create an Application Gateway, assign a public Ip and connect it with our AKS cluster. This way, we can create a webhook server to allow the controller to spin-up our future runners on-demand.
First, create a public IP service:
az network public-ip create \
--resource-group GithubActionsRunners \
--name APGWPublicIp \
--allocation-method Static \
--sku Standard
Then, create a vnet for the application gateway. This vnet will be different from that of the AKS cluster to avoid any overlap on the IP assignment.
az network vnet create --name appgwVNet \
--resource-group GithubActionsRunners \
--address-prefix 11.0.0.0/8 \
--subnet-name appgwSubnet \
--subnet-prefix 11.1.0.0/16
Now, create the application gateway itself:
az network application-gateway create \
--resource-group GithubActionsRunners \
--name GithubActionsRunnersAPGW \
--location <region> \
--sku Standard_v2 \
--public-ip-address APGWPublicIp \
--vnet-name appgwVNet \
subnet appgwSubnet
fetch its ID by running:
APPGW_ID=$(az network application-gateway show — resource-group GithubActionsRunners — name GithubActionsRunnersAPGW — query “id” — output tsv)
and attach it to the AKS cluster to allow ingress:
az aks enable-addons --resource-group GithubActionsRunners \
--name GithubActionsRunnersAKS \
--addons ingress-appgw \
--appgw-id $APPGW_ID
Furthermore, type the following commands to get the required information to create the peering in both directions of the gateway and the AKS.
NODERESOURCEGROUP=$(az aks show — name GithubActionsRunnersAKS — resource-group GithubActionsRunners — query “nodeResourceGroup” — output tsv)AKSVNETNAME=$(az network vnet list — resource-group $NODERESOURCEGROUP — query “[0].name” — output tsv)AKSVNETID=$(az network vnet show — name $AKSVNETNAME — resource-group $NODERESOURCEGROUP — query “id” — output tsv)APPGWVNETID=$(az network vnet show — name appgwVNet — resource-group GithubActionsRunners — query “id” — output tsv)
- Application gateway to AKS peering:
az network vnet peering create --name AppGWtoAKSVnetPeering \
--resource-group GithubActionsRunners \
--vnet-name appgwVNet \
--remote-vnet $AKSVNETID \
--allow-vnet-access
- AKS to application gateway peering:
az network vnet peering create --name AKStoAppGWVnetPeering \
--resource-group $NODERESOURCEGROUP \
--vnet-name $AKSVNETNAME \
--remote-vnet $APPGWVNETID \
--allow-vnet-access
Finally, specify a DNS name for the public IP address. Go to your public IP service in your Azure portal and inside configuration, give a DNS name like so:
Installing cert-manager on AKS cluster
According to the controller’s documentation, the system uses cert-manager for the certificate management of admission webhook. This means that we have to install it before configuring the controller on our cluster. Run the following:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.yaml -n cert-manager
This will install cert-manager under the cert-manager namespace. We should verify that the installation was successful by checking the deployments:
kubectl get pods --namespace cert-manager
Wait for a few seconds for the deployments to finish. We should have an output similar to this:
NAME READY STATUS RESTARTS AGE
cert-manager-5c6866597-zw7kh 1/1 Running 0 2m
cert-manager-cainjector-577f6d9fd7-tr77l 1/1 Running 0 2m
cert-manager-webhook-787858fcdb-nlzsq 1/1 Running 0 2m
Furthermore, create an Issuer to test that the webhook works. Create a YAML called test-resources.yaml
and paste this:
apiVersion: v1
kind: Namespace
metadata:
name: cert-manager-test
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: test-selfsigned
namespace: cert-manager-test
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: selfsigned-cert
namespace: cert-manager-test
spec:
dnsNames:
- example.com
secretName: selfsigned-cert-tls
issuerRef:
name: test-selfsigned
Create the test resources:
kubectl apply -f test-resources.yaml
Now check the status of the newly created certificate:
kubectl describe certificate -n cert-manager-test
We should have an output similar to this:
...
Spec:
Common Name: example.com
Issuer Ref:
Name: test-selfsigned
Secret Name: selfsigned-cert-tls
Status:
Conditions:
Last Transition Time: 2019-01-29T17:34:30Z
Message: Certificate is up to date and has not expired
Reason: Ready
Status: True
Type: Ready
Not After: 2019-04-29T17:34:29Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CertIssued 4s cert-manager Certificate issued successfully
Moreover, clean up the test resources:
kubectl delete -f test-resources.yaml
We have verified that cert-manager works!
Create ingress controller and production issuer
Now that the certificate manager works, we need to create a real Issuer and ingress controller. Create a YAML manifest called ingress-tls.yaml
with the following content (Replace hosts with your hostname):
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-main
annotations:
kubernetes.io/ingress.class: azure/application-gateway
cert-manager.io/cluster-issuer: letsencrypt-prod
appgw.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- <your-hostname-defined-in-APGWPublicIp>
secretName: tlsingress
rules:
- host: <your-hostname-defined-in-APGWPublicIp>
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: test-app
port:
number: 80
Now, let’s define a real Issuer by creating the following YAML, call it cluster-issuer-prod.yaml
:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: <your-email>
privateKeySecretRef:
name: tlssprod
solvers:
- http01:
ingress:
name: ingress-main
Do not apply these resources just yet! Can you see that there is a test-app
assigned to the route "\"
? We will deploy a test app to ensure that the ingress is working correctly before moving on to installing the controller. Create a YAML called test-app.yaml
and paste this into the file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-app
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: test-app
template:
metadata:
labels:
app: test-app
spec:
containers:
- name: test-app
image: mcr.microsoft.com/azuredocs/aks-helloworld:v1
ports:
- containerPort: 80
protocol: TCP
env:
- name: TITLE
value: "AKS Ingress test is successful!"
---
apiVersion: v1
kind: Service
metadata:
name: test-app
namespace: default
spec:
ports:
- protocol: TCP
port: 80
targetPort: 80
selector:
app: test-app
deploy the test app on your Kubernetes:
kubectl apply -f test-app.yaml -n default
Then, apply the ingress controller and the issuer to AKS using:
kubectl apply -f ingress-tls.yaml
kubectl apply -f cluster-issuer-prod.yaml
Now, check the ingress:
kubectl get ingress
You should be able to see your hostname and public IP address. If you go to your browser and navigate to your hostname URL, you should get this screen:
This means that we are on the right track and we can move on to configure our webhook and controller!
Configuring GitHub App
This step needs to be done prior to installing the actions runner controller on your kubernetes, because there are some secrets needed prior to the installation. First, we need to install GitHub CLI using conda:
conda install gh --channel conda-forge
Now, let’s install a GitHub CLI extension for token management called GH Token. We need a dependency jq to be installed first:
wget https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64
sudo chmod +x ./jq-linux64
mv ./jq-linux64 jq
sudo mv ./jq /usr/bin
With that set up, install the GH Token extension:
gh auth login
gh extensions install Link-/gh-token
We also need to install JWT-CLI, which is a command line tool that will help us work with JSON Web Tokens:
wget https://github.com/mike-engel/jwt-cli/releases/download/4.0.0/jwt-linux.tar.gz
tar xvf jwt-linux.tar.gz
sudo mv jwt /usr/bin/
rm -rf ./jwt-linux.tar.gz
Next, on GitHub, go to your Organization’s settings, select developer settings, then go to GitHub Apps, and click create a new GitHub App. Give it the following information:
- Name for the App.
- Homepage URL of your organization.
- Uncheck Webhook just for now. We are going to enable that later.
- Permissions
- Actions: Read-only
- Contents: Read-only
- Metadata: Read-only
- Self-hosted runners: Read and Write
Then select “Create GitHub App”. You will then be redirected to the App Settings and you should see a “Install App” tab on the left side. Select it, and press the button “Install”.
After the app is installed, in the “About” tab, copy the App Id:
Scroll down and generate a new private key:
Save it on a secure location in your computer, then use both the contents of the private key and the installation ID with the gh token command to get the installation ID:
gh token installations -i <App-ID> -k <path-to-private-key>
If you encounter a message like this:
{
"message": "'Expiration time' claim ('exp') is too far in the future",
"documentation_url": "https://docs.github.com/rest"
}
It is probably because of the difference between the GitHub server time and your local time. In that case, you might want to try this command:
sudo date +%T -s $(curl -sI https://api.github.com | grep -Fi "date" | awk '{ print $6 }') -u
and re-run gh token
again. Get the value from the "id"
key in the JSON output. We are finally ready to install the actions-runner-controller!
Installing actions-runner-controller using HELM
Refer to this documentation on how to install HELM package manager. Use the following commands to get the actions-runner-controller
repository.
helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controllerhelm repo update
We will need to createvalues.yaml
manifest in which we are going to insert the three secrets obtained (App ID, Installation ID, and private key) and an additional random token as a webhook secret. It is a pretty long file so instead of pasting it in this article, you can copy the example file from the original repository here. I suggest you do CTRL+F after the file is copied and search for the following keys:
- github_app_id
- github_app_installation_id
- github_app_private_key
- github_webhook_secret_token
Then, insert the corresponding values next to those keys. For the additional github_webhook_secret_token
key, I used a python library called secrets
to generate a random hash value for it:
import secrets
# webhook_token variable contains the random token
webhook_token = secrets.token_urlsafe(16)
print(webhook_token) # copy the output value
Save the manifest. With all the parameters set-up, type the following to deploy the controller:
helm upgrade --install -f "path/to/values.yaml"\
--namespace default\
--create-namespace --wait \
actions-runner-controller \
"actions-runner-controller/actions-runner-controller"
Check that the deployments were successful:
kubectl get pods -n default
We should have the following output:
NAME READY STATUS RESTARTS AGE
actions-runner-controller-546dfcc8b8-wc7mt 2/2 Running 0 20s
actions-runner-controller-github-webhook-server-6bb9654dc8ckjgk 2/2 Running 0 20s
test-app-65d7765fdd-6rff6 1/1 Running 0 16h
Updating the Ingress for our actions runner controller
Recall our ingress manifest called ingress-tls.yaml
. Search for the http
key. We want to insert the following path /runners-scaler
right bellow the path /
where our test app is:
- path: /runners-scaler
pathType: Prefix
backend:
service:
name: actions-runner-controller-github-webhook-server
port:
number: 80
Then, just update the ingress:
kubectl apply -f ingress-tls.yaml
Verify that everything is working correctly:
kubectl describe ingress ingress-main
On the output, search for Events and you should see something similar to this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreateCertificate 14s cert-manager Successfully created Certificate "tlsingress"
Furthermore, use your browser or curl -G
and insert your URL with the additional route that was just created, i.e: <your-url>/runners-scaler
. It should return a message saying Webhook server is running
. That means everything is ok! Eureka!
Adding the Webhook URL to the GitHub APP
Go back to the App Settings of your GitHub App. Under the “Webhook URL” textbox, insert the URL with the runner-scaler route. Right after that, there is a textbox called “Webhook secret”. Paste the hash that you have on your values.yaml
file that was generated using the previous python snippet.
Save the changes. Finally, on the “Permissions & Events” tab, scroll down to “Subscribe to events” and check:
- Workflow job
- Workflow dispatch
- Workflow Run
Save your changes again. On the “Advanced” tab, you should see something like:
Which means that everything is good!
Spinning the runners
There are two ways the runners can be created. One is using deployments of kind Runner
and the other is using RunnerDeployment
. On this tutorial, we are using RunnerDeployment
as it will allow us to manage sets of runners. The Runner
class only manages them individually.
RunnerDeployment
We need to create a YAML manifest for your deployment. Let’s call it basic-runners.yaml
. Start by including the RunnerDeployment
kind:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: azure-runners
namespace: default
spec:
replicas: 0
template:
spec:
organization: <your-organization>
labels:
- label1
- label2
env:
- name: RUNNER_FEATURE_FLAG_EPHEMERAL
value: "true"
Without autoscaling, we can manually define the number of runners to create by including the replicas
key, inside the spec
key.
HorizontalRunnerAutoscaler
Autoscaling is implemented by backing a RunnerDeployment
resource kind with a HorizontalRunnerAutoscaler
kind. With this, every time there is a webhook event, the runners can be created automatically and then destroyed when finished. We need to append the specifications for the autoscaler to the basic-runners.yaml
manifest:
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: azure-runners-deployment-autoscaler
namespace: default
spec:
scaleTargetRef:
name: azure-runners
minReplicas: 0
maxReplicas: 6
scaleUpTriggers:
- githubEvent: {}
duration: "1m"
and apply the resources:
kubectl apply -f basic-runners.yaml
With the above example, the webhook server scales example-runners by 1 replica, and destroys that runner 1 minute after it has finished its job.
By now you show be able to automatically create runners whenever there is a GitHub workflow queued on your actions tab. Only the jobs that match the labels that we set on basic-runners.yaml
will be executed by our runners (you can see that in the YAML file, there are two labels called label1
and label2
).
Note: You can also create customized runners if you need certain tools or dependencies installed before doing any job. But, that is a topic for another article.
If you made it through here, thank you for reading my post. I know there are lots of steps, but it is really worth considering if you need a great a scalable self-hosted runners solution for GitHub actions. Check out my other articles if you like.
See you on the next one!