Workspace CI/CD optimization — concurrency, dead code removal, docs-only skip filters

decisionaccepted

Baseline (pre-optimization)

  • 30 repos with 184 workflow files
  • 167 / 184 missing concurrency: groups (91%)
  • 171 / 184 missing timeout-minutes: (93%)
  • 50 / 184 flagged push-no-paths (fire on every commit)
  • 16 / 184 run npm run build without uploading artifact (likely duplicating Vercel's build)
  • 271 Vercel production deployments in last 7 days (~39/day)
  • Top offender: kohyr at 58 builds/week (21% of workspace total)
  • 3-way parallel production deploys on repz: Vercel native git + deploy.yml CLI + deploy-production.yml amondnet-action

Phase A — Concurrency Safety Net

Added concurrency: { group: ${{ github.workflow }}-${{ github.ref }}, cancel-in-progress: true } to all workflows missing it. When a new push arrives on the same ref, any in-progress run of the same workflow is cancelled.

  • 29 repos, 125 workflow files modified
  • Commits: chore(ci): add concurrency groups to cancel stale workflow runs in each affected repo
  • Risk: Zero — concurrency only cancels stale runs, never touches in-flight healthy work
  • Expected savings: 15-25% reduction on burst-push scenarios (amend-push-again patterns)
  • Rollback: git revert <commit> per repo. Each commit is self-contained.

Phase B — Surgical Deletes

B.1 Safe deletions

  • scribd/build.yml (commit 561df08) — Pure subset of scribd/ci.yml's build job. ci.yml does install + lint + build + security + preview-comment; build.yml only did install + type-check + build. Every step covered by ci.yml.
  • atelier-rounaq/deploy.yml (commit afabc15) — Dead Lovable.dev stub. The deploy job's entire implementation was run: echo "Deploying to Lovable.dev..." — the project migrated to Vercel months ago. test job was redundant with Vercel's native build.

B.2 repz three-way deploy refactor (commit cb1d843)

Discovery: repz had three workflows deploying to production on every push to main:

  1. Vercel native git integration (canonical, kept)
  2. deploy.yml — vercel CLI with --prebuilt --prod, plus dead staging path referencing non-existent dev branch
  3. deploy-production.yml — amondnet/vercel-action@v25 frontend deploy, plus Supabase edge function deployment (the only workflow doing this)

Refactor:

  • Deleted repz/deploy.yml (-222 lines) — dead staging + redundant Vercel deploy
  • Deleted repz/deploy-production.yml (-186 lines) — redundant Vercel + extracted Supabase functions
  • Created repz/deploy-supabase-functions.yml (+44 lines) — isolated Supabase edge function deployment triggered only on supabase/functions/** changes or manual dispatch. Preserves all 13 edge functions (ai-coach, stripe-webhook, analyze-form, calculate-nutrition, client-dashboard, client-onboarding, create-checkout, generate-workout, health, plan-auth, send-email, sync-wearable, _shared).

Net: -364 lines of workflow YAML. Canonical production deploy is now Vercel native git integration only; Supabase functions deploy only when actually changed.

Retained repz workflows (16 total): ci, db-migrations, release, docs, docs-doctrine, docs-validation, pr-checks, e2e-tests, lighthouse, visual-audit, auto-merge, branch-cleanup, structure-enforce, notion-sync, ops-sync-report, deploy-supabase-functions.

Phase C — Docs-Only Skip Filter

Added the following paths-ignore block to push + pull_request triggers on CI-type workflows:

paths-ignore:
  - '**/*.md'
  - 'docs/**'
  - '.github/ISSUE_TEMPLATE/**'
  - 'LICENSE'

Semantics: paths-ignore in GitHub Actions means the workflow is skipped only when ALL changed files match the ignore patterns. Any commit with mixed code + docs changes still triggers normally. This is the safest possible filter — zero risk of missing a real code change.

Scope applied to 10 workflow filenames in the "safe list": ci.yml, test.yml, ci-smoke.yml, accessibility.yml, lighthouse.yml, visual-regression.yml, benchmark.yml, performance.yml, code-quality.yml, structure-enforce.yml

Scope intentionally NOT applied (judgment call):

  • security.yml, codeql.yml, codeql-analysis.yml — security scans should still run on docs changes to catch secret leaks in README/markdown
  • release.yml, autonomous.yml — specialized workflows needing per-repo review
  • deploy-production.yml — already handled in Phase B

Result: 23 repos, 39 workflow files modified. adil and meshal-web committed via git commit <pathspec> to avoid mixing with in-progress WIP on unrelated files.

Rollback: git revert <commit> per repo.

Phase C misread — correction

Initial Phase C plan (drafted before this session's execution) incorrectly targeted docs-doctrine.yml, notion-sync.yml, and ops-sync-report.yml across all repos under the assumption they fired on every push. Direct inspection during execution revealed:

  • docs-doctrine.yml already has on: push: paths: ["**/*.md"] — already filtered
  • notion-sync.yml is on: workflow_dispatch: only — doesn't fire on push
  • ops-sync-report.yml is on: workflow_dispatch: only — doesn't fire on push

Original audit had flagged these for no-concurrency and no-timeout only — not for push-no-paths. Phase A's concurrency addition already covered their needs. Real Phase C targets were the actual 50 push-no-paths workflows dominated by ci.yml (21 instances).

Expected Aggregate Impact

| Lever | Mechanism | Est. savings | |---|---|---| | Phase A concurrency | Cancels stale runs on rapid-push | 15-25% on burst pushes | | Phase B repz refactor | 3× → 1× production deploys | 60-70% on repz deploy workflows | | Phase B dead code | Eliminates ~360 lines of dead runs | Small but clean | | Phase C docs-only skip | Skips CI on pure docs commits | 25-35% on CI/test/perf workflows | | Total (conservative) | | 30-45% reduction in GH Actions minutes |

Re-measurement Plan (2026-04-15)

Run these diagnostics one week after execution and compare to the 2026-04-08 baseline:

# 1. Vercel deployment count per project, last 7 days
python3 ~/scripts/vercel_deployment_count.py --days 7 --team team_cGFXe2xrRySciNomITsbHNPE

# 2. GitHub Actions minutes used (requires admin:org or billing scope)
gh api orgs/morphism-systems/settings/billing/actions
gh api users/alawein/settings/billing/actions

# 3. Workflow run counts by repo + workflow, last 7 days
for repo in $(cat ~/tmp/all-repos.txt); do
  gh run list --repo alawein/$repo --limit 100 --created ">=$(date -d '-7 days' -u +%Y-%m-%dT%H:%M:%SZ)" --json name | \
    jq -r '.[].name' | sort | uniq -c
done

Success criteria:

  • Vercel production deployments trending down (target: <200/week)
  • GitHub Actions minutes consumption down 30%+
  • No broken CI: every code push still triggers appropriate workflows
  • No silent deployment failures: Supabase edge functions still updating when changed

If savings < 20%: investigate residual burn. Likely candidates: cron-scheduled audits firing frequently, matrix builds multiplying cost, security scans running on every push.

Outstanding Follow-ups

  1. Phase D was this document. Complete.
  2. timeout-minutes rollout — deferred from Phase A. Needs per-job judgment. Could apply blanket timeout-minutes: 60 as a conservative catch-all for hung jobs.
  3. Security/codeql paths-ignore — revisit in a focused session with the security-oriented mindset to decide whether to filter them.
  4. repz/release.yml references prj_YUp9XfhmLNNea44s8AYzCtoZjbi0 and prj_00lQaotYs5qA9kCQtg92xZSEUVLI — different project IDs than the canonical prj_tPY4Oxc1ofEQFNJjlAlsjNTvjccV. May be dead references. Zero automation cost (workflow_dispatch only) so not urgent.
  5. workspace-root CLAUDE.md is not git-tracked — the 7 new gotchas persist via Dropbox only. Could be moved to the tracked alawein/alawein/CLAUDE.md sub-repo version, though the two have diverged (11.5KB workspace root vs 6KB sub-repo).
  6. Monorepo-specific Vercel path filters — could add commandForIgnoringBuildStep to kohyr's Vercel project (the #1 offender at 58 builds/week) using git diff HEAD^ HEAD --quiet -- apps/morphism/ packages/ to skip rebuilds when only docs change. Higher ROI than any GitHub Actions work on kohyr alone.

Commit Reference

All commits tagged Part of workspace CI/CD optimization — Phase {A,B,C}. Search by commit message:

# Find all Phase A commits across workspace
for d in /c/Users/mesha/Desktop/Dropbox/GitHub/alawein/*/; do
  cd "$d" 2>/dev/null && git log --all --grep="Phase A" --oneline 2>/dev/null
done

# Find all Phase B/C commits
for d in /c/Users/mesha/Desktop/Dropbox/GitHub/alawein/*/; do
  cd "$d" 2>/dev/null && git log --all --grep="Phase [BC]" --oneline 2>/dev/null
done