All My CI Broke with BuildFailed, and the Cause Was a Billing Lock

 ・ 6 min

photo by Cristina Gottardi(https://unsplash.com/@cristina_gottardi?utm_source=templater_proxy&utm_medium=referral) on Unsplash

When all your CI suddenly turns red, you usually suspect the code or the config first. But this time it wasn't the code, the YAML, or the runner. Today's rabbit hole was less of a dev problem and more of an ops problem.

It started simply enough. Across several private repos in our organization, GitHub Actions CI was failing everywhere. At first I assumed it was a self-hosted runner problem. I'd recently attached a local Mac mini as a GitHub Actions runner and switched several repos' CI to run on a self-hosted runner.

But the symptoms were strange.

Normally, when a runner has a problem, the job gets created and then hangs in queued, or it fails with something like "no matching runner." This time, there was no job at all.

Here's what a representative run looked like.

conclusion: startup_failure
workflowName: ""
path: BuildFailed
jobs: []
logs: 404

There were no logs either. More precisely, it died before any logs could even be created.

First suspect: self-hosted runner configuration#

The first thing I suspected was the local runner.

The past setup did actually have a problem. The runner was using the default _work folder, and to put the work directory on an external SSD, you have to pass the --work option.

By now it had been fixed like this.

workFolder: /mnt/ssd/actions-work/my-org

The runner was online, too.

runner: local-mac-mini
status: online
busy: false
labels:
- self-hosted
- macOS
- ARM64
- local
- mac-mini

And yet the problem persisted.

So I put together a very simple smoke workflow that doesn't use the self-hosted runner at all.

name: Actions Smoke Test
 
on:
  push:
    branches:
      - test/actions-smoke
  workflow_dispatch:
 
jobs:
  smoke:
    runs-on: ubuntu-latest
    steps:
      - name: Print smoke marker
        run: echo "actions-smoke-ok"

But this failed in exactly the same way.

conclusion: startup_failure
workflowName: ""
path: BuildFailed
jobs: []

At this point I had to accept that it wasn't just a runner problem. Even ubuntu-latest was dying before it ever reached a runner.

Second suspect: a tangled Enterprise / Organization runner group#

Next, I suspected the GitHub Enterprise settings.

Our Enterprise usage had recently ended, and the Organization's Actions runner group screen was showing this warning.

Runner group Default already exists at the Enterprise level.
Please consider renaming this group.

This was suspicious enough.

The organization had a runner group called Default, and it looked like a Default also remained at the Enterprise level. On top of that, through the API it looked like there were two Default runner groups.

id 1: Default, visibility=all
id 3: Default, visibility=selected

So for a while I thought, "Enterprise ended, but the runner group policy stuck around and tangled up the org's Actions, right?"

This hypothesis wasn't a bad one. It really is a good idea to clean up leftover settings after ending Enterprise, and it's better to rename the runner group from Default to something distinct like local-mac-mini.

But this wasn't the decisive cause either. Because it wasn't just the private repos—even the smoke workflow using a GitHub-hosted runner died with BuildFailed. If a runner group were the cause, the ubuntu-latest workflow should at least have failed differently.

GitHub Support's answer: a billing lock#

In the end I contacted GitHub Support. The answer was surprisingly simple.

The BuildFailed behavior occurs when an account's billing is locked,
blocking access to billable features, like Actions.

So the cause wasn't the code, the YAML, or the runner. Actions itself was blocked because of a billing lock.

With this explanation, every symptom lines up.

  • Simultaneous failures across several private repos
  • The self-hosted runner failing too
  • ubuntu-latest failing too
  • No job being created
  • The logs endpoint returning 404
  • An empty workflow name
  • A synthetic workflow called BuildFailed appearing

GitHub Actions is treated as a billable feature on private repos. When the billing account is locked, it can get blocked at the workflow startup stage regardless of the runner type.

The problem was the organization account#

When I checked further with Support, the affected billing account turned out to be the organization account, not a personal one.

That pinned down the scope.

At first it could have been a personal account problem, a leftover Enterprise problem, or an organization problem. But in the end it was the organization's billing lock.

Support advised me to open a new ticket with the billing team.

What I learned#

The lessons from this rabbit hole are pretty clear.

1. BuildFailed + startup_failure + jobs=[] may not be a runner problem#

If it were a runner problem, a job usually gets created. For example, you'd see something like this.

queued
waiting for runner
no matching runner

But when it comes out like this instead:

workflowName: ""
path: BuildFailed
jobs: []
logs: 404

then the workflow died before it ever reached a runner. In that case, if you only fixate on the YAML, the runner labels, and whether the runner is online, you can waste a lot of time.

2. Private repo Actions is affected by billing state#

Even if you use a self-hosted runner, Actions itself falls into GitHub's billable feature territory. So you shouldn't think, "It's running on my own runner, so why would billing be the problem?"

That's because Actions orchestration—the process of queuing work and assigning it to a runner—is itself a GitHub feature.

3. After ending Enterprise, be sure to check the billing/account scope#

Ending Enterprise doesn't seem to make every leftover state disappear cleanly. At least this time, all of these were tangled together.

  • Ending Enterprise
  • Organization billing
  • Actions runner group
  • Private repo Actions
  • Support ticket routing

If your Enterprise has ended, check the following.

Personal billing
Organization billing
Enterprise billing
Actions access
Runner groups
Outstanding balance
Seats
Billing cycle

In particular, before adding payment info, it's important to confirm "how much gets charged immediately."

Wrapping up#

Here's how you can lay out the root cause of this problem.

organization billing lock
→ GitHub Actions billable feature blocked
→ all private repo workflows fail at startup
→ synthetic BuildFailed workflow
→ jobs=[]
→ logs=404

The next action isn't to reinstall the runner—it's to check with the billing team. The key things to confirm are roughly these.

  1. Whether there's an outstanding balance, and if so, the exact amount
  2. How much gets charged immediately if you add payment info (including whether you can switch to monthly billing)
  3. Whether you're still tied to the ended Enterprise's billing scope

Today's takeaway is this: broken CI isn't always a code or runner problem. Sometimes billing state shows up like an infrastructure outage at the lowest level. If the symptoms look like infrastructure no matter how you look at them but nothing turns up, take a look at your billing state too.


When you have eliminated the impossible, whatever remains, however improbable, must be the truth.

— Arthur Conan Doyle


Other posts
Things to Verify Before Registering Your Business 커버 이미지
 ・ 4 min

Things to Verify Before Registering Your Business

What Developers Should Prepare Before Joining a New Company 커버 이미지
 ・ 6 min

What Developers Should Prepare Before Joining a New Company

An Intro to Patents, Entrepreneurship, and IP-Based Startups 커버 이미지
 ・ 10 min

An Intro to Patents, Entrepreneurship, and IP-Based Startups