Running self-hosted GitHub Actions runners on Google Kubernetes Engine (GKE) while needing to access AWS resources presents a unique authentication challenge. This guide walks through setting up OIDC federation between GKE and AWS, troubleshooting common issues, and configuring tools like SOPS/helm-secrets to work seamlessly.
The Challenge
You have:
- Self-hosted GitHub Actions runners on GKE (using actions-runner-controller)
- Secrets encrypted with SOPS using AWS KMS
- A need to access AWS resources without storing long-lived credentials
The solution: Workload Identity Federation - allowing GKE pods to authenticate directly with AWS using OIDC tokens.
Architecture Overview
┌─────────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────────┐
│ GKE Cluster │ │ AWS IAM │ │ AWS Resources │
│ │ │ │ │ │
│ ┌───────────────────┐ │ │ ┌───────────────────┐ │ │ ┌───────────────────┐ │
│ │ GitHub Runner Pod │ │ │ │ OIDC Provider │ │ │ │ KMS Keys │ │
│ │ │ │ │ │ (GKE Issuer) │ │ │ │ S3 Buckets │ │
│ │ ┌───────────────┐ │ │ │ └─────────┬─────────┘ │ │ │ Other Services │ │
│ │ │ Projected SA │─┼──┼────▶│ │ │ │ └───────────────────┘ │
│ │ │ Token │ │ │ │ ┌─────────▼─────────┐ │ │ ▲ │
│ │ │ (aud: sts. │ │ │ │ │ IAM Role │──┼─────┼────────────┘ │
│ │ │ amazonaws. │ │ │ │ │ (Trust Policy) │ │ │ │
│ │ │ com) │ │ │ │ └───────────────────┘ │ │ │
│ │ └───────────────┘ │ │ │ │ │ │
│ └───────────────────┘ │ │ │ │ │
└─────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘
Flow:
- GKE issues a service account token with
audience: sts.amazonaws.com - Pod presents token to AWS STS via
AssumeRoleWithWebIdentity - AWS validates token against registered OIDC provider
- AWS issues temporary credentials for the specified IAM role
- Pod uses credentials to access AWS resources (KMS, S3, etc.)
Step 1: Configure GKE Runner Pods with Projected Tokens
The default Kubernetes service account token has the wrong audience for AWS. You need to mount a projected service account token with audience: sts.amazonaws.com.
Helm Chart Configuration
Add this to your actions-runner-controller Helm chart:
values.yaml:
runnerDefaults:
awsOidc:
enabled: false
audience: "sts.amazonaws.com"
expirationSeconds: 86400
roleArn: "" # Optional: set default or let workflows specify
templates/runnerdeployment.yaml:
spec:
template:
spec:
# Mount projected token for AWS OIDC
{{- if .Values.runnerDefaults.awsOidc.enabled }}
volumeMounts:
- name: aws-iam-token
mountPath: /var/run/secrets/aws
readOnly: true
volumes:
- name: aws-iam-token
projected:
sources:
- serviceAccountToken:
audience: {{ .Values.runnerDefaults.awsOidc.audience }}
expirationSeconds: {{ .Values.runnerDefaults.awsOidc.expirationSeconds }}
path: token
env:
- name: AWS_WEB_IDENTITY_TOKEN_FILE
value: /var/run/secrets/aws/token
{{- if .Values.runnerDefaults.awsOidc.roleArn }}
- name: AWS_ROLE_ARN
value: {{ .Values.runnerDefaults.awsOidc.roleArn | quote }}
{{- end }}
{{- end }}
Verify Token Claims
After deploying, verify the token has correct claims:
kubectl exec -n arc-system <runner-pod> -c runner -- \
cat /var/run/secrets/aws/token | \
cut -d. -f2 | base64 -d 2>/dev/null | jq '{sub, aud, iss}'
Expected output:
{
"sub": "system:serviceaccount:arc-system:default",
"aud": ["sts.amazonaws.com"],
"iss": "https://container.googleapis.com/v1/projects/<PROJECT>/locations/<LOCATION>/clusters/<CLUSTER>"
}
Step 2: Register GKE as OIDC Provider in AWS
Get Cluster OIDC Issuer
gcloud container clusters describe <CLUSTER_NAME> \
--location=<LOCATION> \
--format="value(selfLink)"
The issuer URL format is:
https://container.googleapis.com/v1/projects/<PROJECT>/locations/<LOCATION>/clusters/<CLUSTER>
Get Current Certificate Thumbprint
AWS needs the thumbprint of the OIDC provider’s SSL certificate:
echo | openssl s_client -servername container.googleapis.com \
-connect container.googleapis.com:443 2>/dev/null | \
openssl x509 -fingerprint -sha1 -noout | \
sed 's/://g' | cut -d= -f2 | tr '[:upper:]' '[:lower:]'
Important: Google rotates certificates periodically. You may need to update the thumbprint when this happens.
Create OIDC Provider in AWS
Using AWS CLI:
aws iam create-open-id-connect-provider \
--url "https://container.googleapis.com/v1/projects/<PROJECT>/locations/<LOCATION>/clusters/<CLUSTER>" \
--client-id-list "sts.amazonaws.com" \
--thumbprint-list "<THUMBPRINT>"
Or using Terraform/Pulumi for infrastructure-as-code management.
Step 3: Configure IAM Role Trust Policy
Create an IAM role with a trust policy that allows your GKE service account to assume it:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GKEWorkloadIdentity",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<AWS_ACCOUNT>:oidc-provider/container.googleapis.com/v1/projects/<GCP_PROJECT>/locations/<LOCATION>/clusters/<CLUSTER>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"container.googleapis.com/v1/projects/<GCP_PROJECT>/locations/<LOCATION>/clusters/<CLUSTER>:sub": "system:serviceaccount:<NAMESPACE>:<SERVICE_ACCOUNT>",
"container.googleapis.com/v1/projects/<GCP_PROJECT>/locations/<LOCATION>/clusters/<CLUSTER>:aud": "sts.amazonaws.com"
}
}
}
]
}
Key fields:
:sub- The Kubernetes service account (system:serviceaccount:arc-system:defaultfor runners using default SA):aud- Must match the token audience (sts.amazonaws.com)
Step 4: Use in GitHub Actions Workflows
Basic Usage
jobs:
deploy:
runs-on: [self-hosted, gke-runner]
steps:
- name: Access AWS resources
env:
AWS_ROLE_ARN: arn:aws:iam::123456789012:role/MyRole
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/aws/token
AWS_REGION: us-east-1
run: |
aws sts get-caller-identity
aws s3 ls s3://my-bucket/
With aws-actions/configure-aws-credentials
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/MyRole
aws-region: us-east-1
web-identity-token-file: /var/run/secrets/aws/token
- name: Access AWS resources
run: aws s3 ls
Troubleshooting Common Issues
Issue 1: “Not authorized to perform sts:AssumeRoleWithWebIdentity”
Symptoms:
AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
Possible causes and fixes:
A. Thumbprint Mismatch
Google rotates certificates. Check and update the thumbprint:
# Get current thumbprint
echo | openssl s_client -servername container.googleapis.com \
-connect container.googleapis.com:443 2>/dev/null | \
openssl x509 -fingerprint -sha1 -noout | \
sed 's/://g' | cut -d= -f2 | tr '[:upper:]' '[:lower:]'
# Update in AWS
aws iam update-open-id-connect-provider-thumbprint \
--open-id-connect-provider-arn "arn:aws:iam::<ACCOUNT>:oidc-provider/container.googleapis.com/v1/projects/<PROJECT>/locations/<LOCATION>/clusters/<CLUSTER>" \
--thumbprint-list "<NEW_THUMBPRINT>"
B. Wrong Audience
Verify your token has aud: sts.amazonaws.com:
kubectl exec -n arc-system <pod> -c runner -- \
cat /var/run/secrets/aws/token | cut -d. -f2 | base64 -d | jq '.aud'
If it shows a different audience, check your projected volume configuration.
C. Subject Mismatch
The trust policy :sub condition must exactly match the token’s subject:
# Get actual subject from token
kubectl exec -n arc-system <pod> -c runner -- \
cat /var/run/secrets/aws/token | cut -d. -f2 | base64 -d | jq '.sub'
Common format: system:serviceaccount:<namespace>:<service-account-name>
D. Wrong Token File Being Used
Check which token file the AWS SDK is using:
echo $AWS_WEB_IDENTITY_TOKEN_FILE
Something else (like GKE Workload Identity) might be setting this to a different path. Override it explicitly in your workflow.
Issue 2: SOPS “role ARN is not set”
Symptoms:
could not load AWS config: role ARN is not set
Cause: The AWS SDK’s web identity provider requires both AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN.
Fix: Set AWS_ROLE_ARN in your workflow:
env:
AWS_ROLE_ARN: arn:aws:iam::123456789012:role/KMSRole
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/aws/token
Issue 3: SOPS Role Self-Assumption Error
Symptoms:
User: arn:aws:sts::123456789012:assumed-role/KMSRole/...
is not authorized to perform: sts:AssumeRole on resource:
arn:aws:iam::123456789012:role/KMSRole
Cause: SOPS encrypted files often specify a role with the kms_key+role_arn format. If you’ve already assumed the role via AWS_ROLE_ARN, SOPS tries to assume it again.
Fix: Add self-assumption permission to the role’s trust policy:
{
"Sid": "AllowSelfAssume",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:role/KMSRole"
},
"Action": "sts:AssumeRole"
}
Issue 4: Token Expiration
Symptoms: Authentication works initially but fails after some time.
Cause: Projected tokens expire (default 1 hour in Kubernetes, configurable via expirationSeconds).
Fix: Set appropriate expiration in your volume configuration:
- serviceAccountToken:
audience: sts.amazonaws.com
expirationSeconds: 86400 # 24 hours
path: token
Complete Working Example
Helm Values (staging.yaml)
runnerDefaults:
awsOidc:
enabled: true
audience: "sts.amazonaws.com"
expirationSeconds: 86400
AWS IAM Trust Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GKEWorkloadIdentity",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::631169009083:oidc-provider/container.googleapis.com/v1/projects/staging-477819/locations/us-east4/clusters/gha-staging-cluster"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"container.googleapis.com/v1/projects/staging-477819/locations/us-east4/clusters/gha-staging-cluster:sub": "system:serviceaccount:arc-system:default",
"container.googleapis.com/v1/projects/staging-477819/locations/us-east4/clusters/gha-staging-cluster:aud": "sts.amazonaws.com"
}
}
},
{
"Sid": "AllowSelfAssume",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::631169009083:role/KMSHandling"
},
"Action": "sts:AssumeRole"
}
]
}
GitHub Actions Workflow
name: Deploy to Staging
on:
push:
branches: [main]
jobs:
deploy:
runs-on: [self-hosted, gcp-staging-us-east4]
steps:
- uses: actions/checkout@v4
- name: Deploy with helm-secrets
env:
AWS_ROLE_ARN: arn:aws:iam::631169009083:role/KMSHandling
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/aws/token
AWS_REGION: us-east-1
run: |
# Verify credentials work
aws sts get-caller-identity
# Deploy with SOPS-encrypted secrets
helmfile --environment staging apply
Security Best Practices
-
Use specific service accounts - Don’t use the
defaultservice account. Create dedicated SAs for runners. -
Scope trust policies narrowly - Use specific
:subconditions, not wildcards. -
Set appropriate token expiration - Balance between convenience (longer) and security (shorter).
-
Use separate roles per environment - Don’t share roles between staging/production.
-
Audit role usage - Enable CloudTrail logging for STS operations.
-
Rotate OIDC thumbprints proactively - Monitor for certificate rotations.
Conclusion
Cross-cloud OIDC federation eliminates the need for long-lived AWS credentials in your GKE clusters. While the initial setup requires careful attention to token claims, trust policies, and certificate thumbprints, the result is a secure, maintainable authentication flow.
Key takeaways:
- Mount projected tokens with
audience: sts.amazonaws.com - Keep OIDC provider thumbprints updated
- Match trust policy conditions exactly to token claims
- For SOPS, allow role self-assumption when using
kms+roleformat - Always test with
aws sts get-caller-identitybefore running workloads