Eight years of shipping software: what I'd do differently

December 1, 2025·9 min read·2 comments

More than eight years ago, midway through medical school, I shipped my first production code at Asuqu. Since then I've worked at multiple companies, written code in eight languages, deployed to three cloud providers, and broken production exactly three times. The breaks taught me more than the successes.

This is not a list of principles. Principles are clean and abstract. Real experience is messy and specific. These are specific things that happened and what I took from them.

The feature shipped too early

At a consumer fintech app in 2021, we had a feature for recurring transfers. The product manager wanted it for a marketing deadline. The deadline was two weeks away. The feature needed four weeks of work if you included edge cases: failed transfers, insufficient funds mid-recurrence, timezone handling for users in different regions, and the retry logic for bank API failures.

We shipped in two weeks. The happy path worked. The edge cases didn't.

Within the first month:

12 users had duplicate transfers because the retry logic didn't track what had already been sent
3 users in different timezones had transfers execute on the wrong day
1 user had a transfer fail silently because the bank API returned an error code we didn't handle

The support team spent the next three weeks manually correcting transactions. The engineering team spent two weeks building the edge case handling we should have built originally. The marketing moment passed without anyone noticing.

What I took from this: the cost of shipping something incomplete isn't zero. It's the support cost plus the remediation cost plus the trust cost. These costs are real and quantifiable. If I had presented them alongside the deadline pressure, the decision might have been different. I didn't because I assumed the deadline was non-negotiable. It wasn't. The product manager later said she would have moved it if we had explained the risk clearly.

The missing tests

In 2022, I built a notification service. It started as a simple function: receive an event, compose a message, send it via push notification. It grew to handle email, SMS, in-app notifications, digest batching, quiet hours, user preferences, and retry logic.

I didn't write tests. The function was simple when it started, and by the time it was complex, the absence of tests felt like too large a gap to fill.

Six months later, the notification service was the most critical service in the system. Every product feature used it. A bug in the digest batching logic caused users to receive 50+ notifications at midnight. The fix took 20 minutes. Finding the bug took 8 hours because there were no tests to narrow down the behaviour.

After the incident, I spent a week writing tests for the notification service. Every edge case I tested revealed behaviour I had forgotten I implemented. The tests were documentation of the system's behaviour, written six months too late.

The lesson is obvious in retrospect: write the tests when the code is small enough to test easily. Not when it becomes critical. Not after the first incident. Now. "It's simple enough to not need tests" is always true today and never true in six months.

The schema migration

In 2023, I ran a database migration on a production PostgreSQL database. The migration added a column with a default value to a table with 2 million rows. On my local database with 1,000 rows, it ran in 200 milliseconds. On production, it locked the table for four hours.

The issue: in PostgreSQL (before version 11), adding a column with a non-null default value rewrites the entire table. The rewrite holds an exclusive lock. No reads or writes can happen on the table during the rewrite. Our API returned errors for every request that touched that table. For four hours.

What I should have done:

Add the column as nullable without a default
Backfill the values in batches
Set the default for new rows
Add the NOT NULL constraint after all rows have values

This is the standard pattern for zero-downtime migrations. I knew it existed. I didn't use it because the migration was "simple" and I didn't test it against production-scale data.

I won't make that mistake again. Always test migrations against a dataset that matches production in size. Not in content (use synthetic data for privacy), but in volume. A migration that takes 200ms on 1,000 rows can take hours on 2 million rows, and the time scaling isn't always linear.

We now have a rule: no migration runs in production without first running it against a staging database with production-scale data. The staging database is refreshed weekly from a production backup with PII scrubbed.

Learning TypeScript late

I started with JavaScript and resisted TypeScript for about eighteen months. My reasons were the usual ones: it slows you down, it adds boilerplate, the type system is complex, the compiler errors are unreadable.

All of those were true and none of them mattered compared to the bugs TypeScript would have prevented. The navigation param bug, the Redux action payload bug, the API response shape bug, these were recurring issues that TypeScript eliminates entirely.

When I finally adopted TypeScript, the productivity loss was about two weeks. After that, I was faster because the IDE could tell me things that I previously had to look up or remember. Autocomplete for API response shapes. Type errors when I renamed a field but missed a reference. Compile-time guarantees that an object had the properties I thought it had.

Eighteen months of JavaScript bugs, multiplied by the debugging time for each one, was far more expensive than two weeks of learning TypeScript.

Reading the codebase before proposing changes

Early in my career, I would join a project and immediately start proposing improvements. "This should use TypeScript." "This architecture is wrong." "We should switch to this library."

I was usually technically correct. But the existing choices had context I didn't have. The architecture was "wrong" because it was built for constraints that no longer applied, and the team knew this but had higher priorities. The library was chosen because an alternative had a critical bug at the time, and they didn't switch back when the bug was fixed.

At Tarjimly, I spent my first two weeks reading code before proposing any changes. When I did propose changes, they were grounded in understanding of the existing system. The team was receptive because the proposals were informed, not reflexive.

It took a few rounds of this before I learned: understanding why things are the way they're is more valuable than knowing how they should be. Every codebase has history. The developer who reads the history before suggesting changes builds trust. The developer who suggests changes on day one doesn't.

The medical background

People outside of engineering are often surprised that a medical degree is useful in software. People inside engineering are less surprised, because they recognise that software development isn't primarily about writing code. It's about understanding problems, diagnosing failures, communicating clearly, and making decisions with incomplete information.

I didn't bring these skills from medicine into engineering as a career switcher. I was developing them in parallel. The clinical training shaped my debugging approach while I was actively shipping production code. The documentation discipline from medical records bled into my pull request descriptions in real-time, not in retrospect. I was doing ward rounds in the morning and code reviews in the evening for years.

But the most valuable thing from medicine isn't a transferable skill. It's a perspective. Knowing what it feels like to have someone's wellbeing depend on your competence changes how you think about reliability. I don't ship code that isn't ready because I've been in situations where "not ready" had consequences that no post-mortem could fix.

What stays the same

Eight years in, the fundamentals haven't changed. Read the code before you change it. Write tests before the code becomes critical. Test migrations at scale. Ship when the edge cases are handled, not when the happy path works. Understand the system before proposing improvements.

These aren't novel insights. They're the same things experienced engineers have been saying for decades. The difference between knowing them and believing them is the cost of learning them the hard way. I have paid that cost on all of them.

If I could go back, I wouldn't change the path. The mistakes were expensive but instructive. The career has been built not on avoiding errors but on extracting everything useful from each one and making different mistakes next time.

RESPONSES

David OkonkwoDec 14, 2025

The schema migration story is something that should be required reading for every junior engineer who says "we don't need a staging environment." Thank you for writing it this honestly.

Chloe DurandDec 22, 2025

The note about reading the codebase before proposing changes — I've worked with engineers who skip this and it's always obvious. Worth saying plainly.