macreflects.com -

Engineering Manager OS: Master Team Output (–29% PR p50 in 4 weeks)

Engineering ManagerOctober 7, 2025

TL;DR: Outcome: cut pull request (PR) cycle-time median (p50) by 29% in 4 weeks. Action: run the “Manager OS” cadence (daily/weekly/monthly) and measure PR p50 weekly.

Context: New engineering managers often default to “more coding.” I ran a 4-week experiment to test whether small managerial habits—priorities, delegation, feedback, rhythm—could improve whole-team throughput more than an extra manager coding sprint.

What I changed

Daily discipline (Manager OS)
- Wrote Top 3 priorities each morning (only tasks I must do).
- Used delegation prompt: “What do you think we should do?” before giving answers.
- Delivered 1 micro-feedback per day (positive or constructive).
- Energy check: blocked 90 minutes for strategy during my peak hours.
Weekly rhythm
- 1:1s using 3Q: How are you? What’s blocking you? Where do you want to grow?
- Team sync (≤30 min): Wins → Priorities → Risks (with named owners).
- Calendar audit (Fri): killed or delegated one recurring meeting weekly.
- Weekly reflection (15 min): one tweak committed for next week.
Monthly system
- Team health pulse (anonymous, 3 questions).
- Role clarity: refreshed each person’s top 3 responsibilities.
- Decision log review: merged, archived, or escalated stale items.
Definitions used (explicit)
- PR cycle time: open → first approval or merge, whichever comes first.
- p50 / p90: median and 90th percentile.
- WIP (work in progress): open issues/PRs assigned to a person.
- Planned work %: story points or items tagged “Planned” ÷ total completed.
- DORA: DevOps Research and Assessment; we reference Deployment Frequency and Lead Time for Changes but tracked PR cycle time as proxy.

Baseline (4 weeks prior, n values shown)

Data window: 4 weeks, n=96 PRs, n=27 deploys.

Metric	Definition (short)	Baseline	Notes
PR cycle time p50	PR open → first approval/merge	42 h	Skewed; p90=120 h; 6 outliers >7 days
Deployment frequency (median/wk)	Prod deploys per week	5	Total n=27 in 4 weeks
WIP per engineer (p50)	Open items per person	3	Range 1–7
Planned work share	Planned ÷ total done	60%	Unplanned interrupts high
1:1 completion rate	Held ÷ scheduled	67%	6 cancellations
On-call interrupts (p50/day)	>5-min incidents per eng	3	p90=6

Observation: Skewed PR times demand medians; outliers were reviewer vacations and unclear ownership.

How to replicate Engineering Manager OS (one-week start)

Instrument: Capture 4 weeks of PR cycle times, deploy counts, 1:1 completion, WIP, planned %. Use medians; list outliers with causes.
Install daily loop: Top 3, delegation prompt, one micro-feedback, 90-min strategy block at your personal peak.
Run 3Q 1:1s this week with every direct; record “blockers → owner → date due”.
Team sync ≤30 min: Wins → Priorities → Risks (each risk has an owner and a next date).
Calendar audit (Fri): remove or delegate exactly one recurring meeting ≥30 min.
Role clarity refresh: write top 3 responsibilities per person; share in team doc.
Decision log: list open decisions, assign a DRI (directly responsible individual), due date, and escalation path.

Paste-ready artifact (copy/print)

Engineering Manager OS – Weekly Checklist
Data window to monitor (rolling 4 weeks):
[ ] PR cycle time p50, p90; list outliers + causes
[ ] Deploys/week (median); Lead Time proxy if available
[ ] WIP per engineer (p50)
[ ] Planned work % vs unplanned
[ ] 1:1 completion rate

Daily:
[ ] Write Top 3 (only I can do)
[ ] Ask: “What do you think we should do?” before answering
[ ] Give 1 micro-feedback
[ ] 90-min strategy block during peak energy

Weekly:
[ ] 1:1s (3Q): How are you? Blocking? Growth?
[ ] Team sync ≤30 min: Wins → Priorities → Risks(owner)
[ ] Calendar audit: cut/delegate 1 recurring
[ ] Reflection 15 min: 1 tweak for next week

Monthly:
[ ] Team health pulse (3 Qs)
[ ] Role clarity: top 3 responsibilities/person
[ ] Decision log: close, escalate, or commit dates

Risks & gotchas

Risk: Engineering Manager drifts back to coding to “help.”
Mitigation: Protect the 90-min strategy block; track % time on multiplication tasks vs coding.
Risk: Delegation stalls due to unclear ownership.
Mitigation: In syncs, every risk gets a named owner and next date; publish in the team doc.
Risk: PR outliers hide systemic issues (vacations, neglected reviews).
Mitigation: Track outliers explicitly with causes; add backup reviewers; rotate review ownership.
Risk: Meetings creep back.
Mitigation: Enforce weekly calendar audit with a visible count of meetings cut/delegated.
Risk: 1:1 cancellations reduce signal.
Mitigation: Set a 90% completion SLO (service level objective); reschedule within the same week.

Results after 4 weeks (n=104 PRs, n=31 deploys)

PR cycle time p50: 42 h → 30 h (–29%); p90: 120 h → 88 h; outliers reduced from 6 to 2.
Deployment frequency (median/wk): 5 → 6.
WIP per engineer (p50): 3 → 2.
Planned work %: 60% → 72%.
1:1 completion: 67% → 93%.

Primary drivers: clearer ownership, faster reviews via named backups, fewer interrupts, and less managerial context thrash.

Next (2025-11-04 target)

Goal: PR cycle time p50 ≤ 24 h, p90 ≤ 72 h; 1:1 completion ≥ 90%; WIP p50 = 2; Planned work ≥ 75%.
Actions: introduce “review SLAs” (next-business-day on PRs), expand backup reviewers, and pilot a 2-hour weekly “no-meeting build window” for the team.

Bottom line: A manager improving personal output by 10% loses to a team improving by 30%. The Engineering Manager OS shifts time from “doing” to “multiplying,” and the numbers move accordingly.

Speedy DORA roll-up: what to standardize org-wide?

DORAOctober 3, 2025

TL;DR: If you’re rolling up DORA across teams, prioritize MTTR median + change size over raw frequency; run a 30-min MTTR review after every incident.
Try this week: add a pre-merge risk checklist and hold an MTTR review within 48h.

Context: I pulled Q2→Q3 DORA (DevOps Research & Assessment) metrics for our team (13 weeks each) and shared a simple dashboard with peer leaders. The method works with any CI + incident system and scales to an org roll-up when definitions are consistent.

What I did to add DORA analytics

Set a fixed 13-week window (Q2 vs Q3 2025); UTC timestamps only.
DORA Definitions used:
- Deployment Frequency = prod releases/week from CI/CD (retries deduped).
- CFR (Change Failure Rate) = % of deployments that caused a customer-impacting incident or rollback within 7 days.
- MTTR (Mean Time To Restore) = incident start→resolved; we report median and mean, and also an adjusted MTTR (exclude incidents >72h) to highlight long-tail risk—not to replace the median.
Linked incidents to the prior deployment (or explicit deployment_id) and tagged by workstream/owner.
Shared results with a VP of Engineering and Heads of departments

Baseline & results

Window: Q2 vs Q3 2025 (13 weeks each). n: Deployments Q2=79, Q3=83; incidents with customer impact in Q3=3 (one 5-day outlier).

Metric	Window (n)	Baseline → Result	Method/Definition	Note
Deployment Frequency	13 wks (n=79→83)	6.58 → 6.92 /wk	Count prod releases/week	Top-quartile frequency; verify against seasonality
Deployments (count)	13 wks	79 → 83	CI/CD release count	+4 releases
CFR (Change Failure Rate)	13 wks	1.3% → 3.6%	Incidents/Deploys (≤7 days)	Below org average
MTTR — Median	13 wks (n=3)	1.7h → 25.8h	Incident start→resolved	Skewed; use median
MTTR — Mean	13 wks	1.7h → 49.8h	As above	Inflated by 5-day outlier
Lead Time for Changes	13 wks	Not tracked → tracking in Q4	PR merge→prod	Baseline pending

DORA results table

Org readers: copy this structure, keep your linking rule and MTTR start/stop identical quarter-to-quarter, and publish MTTR P50/P90 next time.

Interpretation

For org roll-ups, lead with MTTR median and CFR; qualify frequency. Frequency without change size/rollback paths risks vanity metrics.
Sample size note: n=3 incidents → treat MTTR trends as directional; set a rule of thumb: don’t draw strong MTTR conclusions until n≥10/quarter.
Removing the 5-day outlier, adjusted MTTR ≈10.7h—still not top-quartile for web apps (practical range many teams target: ~5–12h).

What I want to do now

Adopt a 48h MTTR review standard (org-wide). 30 minutes, blameless, produce 1–3 operational fixes and a runbook update. Expected effect: MTTR median <12h.
Adopt a PR risk checklist (org-wide). Require feature flags/kill-switch, rollback plan, observability, and max change size. Expected effect: keep CFR ≤2% while maintaining ≥7/wk frequency.
Concentrated incidents: one workstream produced 2 incidents → blameless review with senior dev and QA. Expected effect: stream CFR ≤2%.
Lead time gap: instrument PR-merge→prod and publish P50/P90. Expected effect: median lead time <1 day; exposes review/release bottlenecks.
Release hygiene: avoid Friday cutovers unless flagged + on-call coverage. Expected effect: CFR controlled without reducing frequency.

How you can replicate (one-week plan)

Choose the last 13 weeks; export deployment IDs (prod only) and incidents with start/resolved times.
Normalize to UTC; define linking: incident ↔ nearest prior deployment within 7 days (or explicit deployment_id).
Compute: deployments/week, CFR, MTTR median/mean; also adjusted MTTR (exclude >72h) for long-tail analysis.
Tag incidents by service/workstream/on-call/PR author to find concentration.
Produce a single page (table + two small charts).
Host definitions in the same doc; don’t change them mid-quarter.
Review with peers; agree on 1–3 changes tied to metrics; schedule an automated weekly refresh to Slack/Notion.

Paste-ready artifact — “Pre-merge risk checklist” (drop into your PR template)

[ ] Change is behind a feature flag or can be disabled at runtime
[ ] Rollback path documented (how + who + max time to rollback)
[ ] Observability in place (logs + metrics + alerts) for new paths
[ ] Change size within team standard (e.g., <300 LOC or split by flag)
[ ] Post-deploy smoke check defined (owner + steps)
[ ] Risk label set (Low/Med/High) and, if High, release window approved
[ ] Deployment ID will be attached to incident tickets automatically

Risks & gotchas with DORA analysis

Small n distorts MTTR. Mitigation: headline the median, publish mean and adjusted MTTR; treat MTTR trends as directional until n≥10/quarter.
Incidents not linked to deploys. Mitigation: require deployment ID in every incident; auto-attach from CI metadata.
Friday freezes reduce throughput. Mitigation: exception process with flags + on-call coverage.
Owner tagging creates blame. Mitigation: blameless reviews; use tags to target systemic fixes.

Next measurement

Re-measure on 03 Jan 2026 (or next quarter). Targets:

Deployment Frequency ≥ 7.0/wk (13-week median).
CFR ≤ 2.0%.
MTTR median <12h (stretch: <6h); adjusted MTTR <8h.
Lead Time (PR merge→prod) median <1 day.
Org note: we’ll publish org-level MTTR percentiles next quarter for benchmarking.

Appendix A — CFR & MTTR (Postgres)

Works on Postgres; adjust PERCENTILE_CONT syntax for other warehouses.

-- Window: Q3 2025
WITH window AS (
  SELECT TIMESTAMP '2025-07-01' AS start_ts, TIMESTAMP '2025-09-30' AS end_ts
),
deploys AS (
  SELECT id, service, deployed_at
  FROM deployments, window
  WHERE environment='prod'
    AND deployed_at >= start_ts AND deployed_at < end_ts
),
incs AS (
  SELECT id, service, started_at, resolved_at, deployment_id
  FROM incidents, window
  WHERE started_at >= start_ts AND started_at < end_ts
),
link AS (
  -- Link incidents to a deployment by explicit fk or nearest prior deploy within 7 days
  SELECT i.id AS incident_id, d.id AS deploy_id, i.started_at, i.resolved_at
  FROM incs i
  LEFT JOIN deploys d
    ON (i.deployment_id = d.id)
    OR (i.service = d.service AND d.deployed_at <= i.started_at
        AND i.started_at < d.deployed_at + INTERVAL '7 days')
),
mttr AS (
  SELECT
    PERCENTILE_CONT(0.5) WITHIN GROUP (
      ORDER BY EXTRACT(EPOCH FROM (resolved_at - started_at))/3600
    ) AS mttr_median_h,
    AVG(EXTRACT(EPOCH FROM (resolved_at - started_at))/3600) AS mttr_mean_h,
    COUNT(*) AS incidents_n
  FROM link
  WHERE resolved_at IS NOT NULL
),
cfr AS (
  SELECT
    COUNT(DISTINCT deploy_id)::float / NULLIF((SELECT COUNT(*) FROM deploys),0) * 100 AS cfr_pct
  FROM link
  WHERE deploy_id IS NOT NULL
)
SELECT
  (SELECT COUNT(*) FROM deploys) AS deployments_n,
  ROUND(((SELECT COUNT(*) FROM deploys)::float / 13.0), 2) AS deploys_per_week,
  ROUND((SELECT cfr_pct FROM cfr), 2) AS cfr_pct,
  ROUND((SELECT mttr_median_h FROM mttr), 1) AS mttr_median_h,
  ROUND((SELECT mttr_mean_h FROM mttr), 1) AS mttr_mean_h,
  (SELECT incidents_n FROM mttr) AS incidents_n;

macreflects.com