Engineering Manager OS: Master Team Output (–29% PR p50 in 4 weeks)

Categories Engineering Manager

TL;DR: Outcome: cut pull request (PR) cycle-time median (p50) by 29% in 4 weeks. Action: run the “Manager OS” cadence (daily/weekly/monthly) and measure PR p50 weekly.

Context: New engineering managers often default to “more coding.” I ran a 4-week experiment to test whether small managerial habits—priorities, delegation, feedback, rhythm—could improve whole-team throughput more than an extra manager coding sprint.

Engineering Manager with his team

What I changed

  • Daily discipline (Manager OS)
    • Wrote Top 3 priorities each morning (only tasks I must do).
    • Used delegation prompt: “What do you think we should do?” before giving answers.
    • Delivered 1 micro-feedback per day (positive or constructive).
    • Energy check: blocked 90 minutes for strategy during my peak hours.
  • Weekly rhythm
    • 1:1s using 3Q: How are you? What’s blocking you? Where do you want to grow?
    • Team sync (≤30 min): Wins → Priorities → Risks (with named owners).
    • Calendar audit (Fri): killed or delegated one recurring meeting weekly.
    • Weekly reflection (15 min): one tweak committed for next week.
  • Monthly system
    • Team health pulse (anonymous, 3 questions).
    • Role clarity: refreshed each person’s top 3 responsibilities.
    • Decision log review: merged, archived, or escalated stale items.
  • Definitions used (explicit)
    • PR cycle time: open → first approval or merge, whichever comes first.
    • p50 / p90: median and 90th percentile.
    • WIP (work in progress): open issues/PRs assigned to a person.
    • Planned work %: story points or items tagged “Planned” ÷ total completed.
    • DORA: DevOps Research and Assessment; we reference Deployment Frequency and Lead Time for Changes but tracked PR cycle time as proxy.

Baseline (4 weeks prior, n values shown)

Data window: 4 weeks, n=96 PRs, n=27 deploys.

MetricDefinition (short)BaselineNotes
PR cycle time p50PR open → first approval/merge42 hSkewed; p90=120 h; 6 outliers >7 days
Deployment frequency (median/wk)Prod deploys per week5Total n=27 in 4 weeks
WIP per engineer (p50)Open items per person3Range 1–7
Planned work sharePlanned ÷ total done60%Unplanned interrupts high
1:1 completion rateHeld ÷ scheduled67%6 cancellations
On-call interrupts (p50/day)>5-min incidents per eng3p90=6

Observation: Skewed PR times demand medians; outliers were reviewer vacations and unclear ownership.

How to replicate Engineering Manager OS (one-week start)

  1. Instrument: Capture 4 weeks of PR cycle times, deploy counts, 1:1 completion, WIP, planned %. Use medians; list outliers with causes.
  2. Install daily loop: Top 3, delegation prompt, one micro-feedback, 90-min strategy block at your personal peak.
  3. Run 3Q 1:1s this week with every direct; record “blockers → owner → date due”.
  4. Team sync ≤30 min: Wins → Priorities → Risks (each risk has an owner and a next date).
  5. Calendar audit (Fri): remove or delegate exactly one recurring meeting ≥30 min.
  6. Role clarity refresh: write top 3 responsibilities per person; share in team doc.
  7. Decision log: list open decisions, assign a DRI (directly responsible individual), due date, and escalation path.

Paste-ready artifact (copy/print)

Engineering Manager OS – Weekly Checklist
Data window to monitor (rolling 4 weeks):
[ ] PR cycle time p50, p90; list outliers + causes
[ ] Deploys/week (median); Lead Time proxy if available
[ ] WIP per engineer (p50)
[ ] Planned work % vs unplanned
[ ] 1:1 completion rate

Daily:
[ ] Write Top 3 (only I can do)
[ ] Ask: “What do you think we should do?” before answering
[ ] Give 1 micro-feedback
[ ] 90-min strategy block during peak energy

Weekly:
[ ] 1:1s (3Q): How are you? Blocking? Growth?
[ ] Team sync ≤30 min: Wins → Priorities → Risks(owner)
[ ] Calendar audit: cut/delegate 1 recurring
[ ] Reflection 15 min: 1 tweak for next week

Monthly:
[ ] Team health pulse (3 Qs)
[ ] Role clarity: top 3 responsibilities/person
[ ] Decision log: close, escalate, or commit dates

Risks & gotchas

  • Risk: Engineering Manager drifts back to coding to “help.”
    Mitigation: Protect the 90-min strategy block; track % time on multiplication tasks vs coding.
  • Risk: Delegation stalls due to unclear ownership.
    Mitigation: In syncs, every risk gets a named owner and next date; publish in the team doc.
  • Risk: PR outliers hide systemic issues (vacations, neglected reviews).
    Mitigation: Track outliers explicitly with causes; add backup reviewers; rotate review ownership.
  • Risk: Meetings creep back.
    Mitigation: Enforce weekly calendar audit with a visible count of meetings cut/delegated.
  • Risk: 1:1 cancellations reduce signal.
    Mitigation: Set a 90% completion SLO (service level objective); reschedule within the same week.

Results after 4 weeks (n=104 PRs, n=31 deploys)

  • PR cycle time p50: 42 h → 30 h (–29%); p90: 120 h → 88 h; outliers reduced from 6 to 2.
  • Deployment frequency (median/wk): 5 → 6.
  • WIP per engineer (p50): 3 → 2.
  • Planned work %: 60% → 72%.
  • 1:1 completion: 67% → 93%.

Primary drivers: clearer ownership, faster reviews via named backups, fewer interrupts, and less managerial context thrash.

Next (2025-11-04 target)

  • Goal: PR cycle time p50 ≤ 24 h, p90 ≤ 72 h; 1:1 completion ≥ 90%; WIP p50 = 2; Planned work ≥ 75%.
  • Actions: introduce “review SLAs” (next-business-day on PRs), expand backup reviewers, and pilot a 2-hour weekly “no-meeting build window” for the team.

Bottom line: A manager improving personal output by 10% loses to a team improving by 30%. The Engineering Manager OS shifts time from “doing” to “multiplying,” and the numbers move accordingly.

Speedy DORA roll-up: what to standardize org-wide?

Categories DORA
Dora metrics

TL;DR: If you’re rolling up DORA across teams, prioritize MTTR median + change size over raw frequency; run a 30-min MTTR review after every incident.

Try this week: add a pre-merge risk checklist and hold an MTTR review within 48h.


Context: I pulled Q2→Q3 DORA (DevOps Research & Assessment) metrics for our team (13 weeks each) and shared a simple dashboard with peer leaders. The method works with any CI + incident system and scales to an org roll-up when definitions are consistent.

What I did to add DORA analytics

  • Set a fixed 13-week window (Q2 vs Q3 2025); UTC timestamps only.
  • DORA Definitions used:
    • Deployment Frequency = prod releases/week from CI/CD (retries deduped).
    • CFR (Change Failure Rate) = % of deployments that caused a customer-impacting incident or rollback within 7 days.
    • MTTR (Mean Time To Restore) = incident start→resolved; we report median and mean, and also an adjusted MTTR (exclude incidents >72h) to highlight long-tail risk—not to replace the median.
  • Linked incidents to the prior deployment (or explicit deployment_id) and tagged by workstream/owner.
  • Shared results with a VP of Engineering and Heads of departments

Baseline & results

Window: Q2 vs Q3 2025 (13 weeks each). n: Deployments Q2=79, Q3=83; incidents with customer impact in Q3=3 (one 5-day outlier).

MetricWindow (n)Baseline → ResultMethod/DefinitionNote
Deployment Frequency13 wks (n=79→83)6.58 → 6.92 /wkCount prod releases/weekTop-quartile frequency; verify against seasonality
Deployments (count)13 wks79 → 83CI/CD release count+4 releases
CFR (Change Failure Rate)13 wks1.3% → 3.6%Incidents/Deploys (≤7 days)Below org average
MTTR — Median13 wks (n=3)1.7h → 25.8hIncident start→resolvedSkewed; use median
MTTR — Mean13 wks1.7h → 49.8hAs aboveInflated by 5-day outlier
Lead Time for Changes13 wksNot tracked → tracking in Q4PR merge→prodBaseline pending
DORA results table

Org readers: copy this structure, keep your linking rule and MTTR start/stop identical quarter-to-quarter, and publish MTTR P50/P90 next time.

Interpretation

  • For org roll-ups, lead with MTTR median and CFR; qualify frequency. Frequency without change size/rollback paths risks vanity metrics.
  • Sample size note: n=3 incidents → treat MTTR trends as directional; set a rule of thumb: don’t draw strong MTTR conclusions until n≥10/quarter.
  • Removing the 5-day outlier, adjusted MTTR ≈10.7h—still not top-quartile for web apps (practical range many teams target: ~5–12h).

What I want to do now

  1. Adopt a 48h MTTR review standard (org-wide). 30 minutes, blameless, produce 1–3 operational fixes and a runbook update. Expected effect: MTTR median <12h.
  2. Adopt a PR risk checklist (org-wide). Require feature flags/kill-switch, rollback plan, observability, and max change size. Expected effect: keep CFR ≤2% while maintaining ≥7/wk frequency.
  3. Concentrated incidents: one workstream produced 2 incidents → blameless review with senior dev and QA. Expected effect: stream CFR ≤2%.
  4. Lead time gap: instrument PR-merge→prod and publish P50/P90. Expected effect: median lead time <1 day; exposes review/release bottlenecks.
  5. Release hygiene: avoid Friday cutovers unless flagged + on-call coverage. Expected effect: CFR controlled without reducing frequency.

How you can replicate (one-week plan)

  1. Choose the last 13 weeks; export deployment IDs (prod only) and incidents with start/resolved times.
  2. Normalize to UTC; define linking: incident ↔ nearest prior deployment within 7 days (or explicit deployment_id).
  3. Compute: deployments/week, CFR, MTTR median/mean; also adjusted MTTR (exclude >72h) for long-tail analysis.
  4. Tag incidents by service/workstream/on-call/PR author to find concentration.
  5. Produce a single page (table + two small charts).
  6. Host definitions in the same doc; don’t change them mid-quarter.
  7. Review with peers; agree on 1–3 changes tied to metrics; schedule an automated weekly refresh to Slack/Notion.

Paste-ready artifact — “Pre-merge risk checklist” (drop into your PR template)

[ ] Change is behind a feature flag or can be disabled at runtime
[ ] Rollback path documented (how + who + max time to rollback)
[ ] Observability in place (logs + metrics + alerts) for new paths
[ ] Change size within team standard (e.g., <300 LOC or split by flag)
[ ] Post-deploy smoke check defined (owner + steps)
[ ] Risk label set (Low/Med/High) and, if High, release window approved
[ ] Deployment ID will be attached to incident tickets automatically

Risks & gotchas with DORA analysis

  • Small n distorts MTTR. Mitigation: headline the median, publish mean and adjusted MTTR; treat MTTR trends as directional until n≥10/quarter.
  • Incidents not linked to deploys. Mitigation: require deployment ID in every incident; auto-attach from CI metadata.
  • Friday freezes reduce throughput. Mitigation: exception process with flags + on-call coverage.
  • Owner tagging creates blame. Mitigation: blameless reviews; use tags to target systemic fixes.

Next measurement

Re-measure on 03 Jan 2026 (or next quarter). Targets:

  • Deployment Frequency ≥ 7.0/wk (13-week median).
  • CFR ≤ 2.0%.
  • MTTR median <12h (stretch: <6h); adjusted MTTR <8h.
  • Lead Time (PR merge→prod) median <1 day.
    Org note: we’ll publish org-level MTTR percentiles next quarter for benchmarking.

Appendix A — CFR & MTTR (Postgres)

Works on Postgres; adjust PERCENTILE_CONT syntax for other warehouses.

-- Window: Q3 2025
WITH window AS (
  SELECT TIMESTAMP '2025-07-01' AS start_ts, TIMESTAMP '2025-09-30' AS end_ts
),
deploys AS (
  SELECT id, service, deployed_at
  FROM deployments, window
  WHERE environment='prod'
    AND deployed_at >= start_ts AND deployed_at < end_ts
),
incs AS (
  SELECT id, service, started_at, resolved_at, deployment_id
  FROM incidents, window
  WHERE started_at >= start_ts AND started_at < end_ts
),
link AS (
  -- Link incidents to a deployment by explicit fk or nearest prior deploy within 7 days
  SELECT i.id AS incident_id, d.id AS deploy_id, i.started_at, i.resolved_at
  FROM incs i
  LEFT JOIN deploys d
    ON (i.deployment_id = d.id)
    OR (i.service = d.service AND d.deployed_at <= i.started_at
        AND i.started_at < d.deployed_at + INTERVAL '7 days')
),
mttr AS (
  SELECT
    PERCENTILE_CONT(0.5) WITHIN GROUP (
      ORDER BY EXTRACT(EPOCH FROM (resolved_at - started_at))/3600
    ) AS mttr_median_h,
    AVG(EXTRACT(EPOCH FROM (resolved_at - started_at))/3600) AS mttr_mean_h,
    COUNT(*) AS incidents_n
  FROM link
  WHERE resolved_at IS NOT NULL
),
cfr AS (
  SELECT
    COUNT(DISTINCT deploy_id)::float / NULLIF((SELECT COUNT(*) FROM deploys),0) * 100 AS cfr_pct
  FROM link
  WHERE deploy_id IS NOT NULL
)
SELECT
  (SELECT COUNT(*) FROM deploys) AS deployments_n,
  ROUND(((SELECT COUNT(*) FROM deploys)::float / 13.0), 2) AS deploys_per_week,
  ROUND((SELECT cfr_pct FROM cfr), 2) AS cfr_pct,
  ROUND((SELECT mttr_median_h FROM mttr), 1) AS mttr_median_h,
  ROUND((SELECT mttr_mean_h FROM mttr), 1) AS mttr_mean_h,
  (SELECT incidents_n FROM mttr) AS incidents_n;