Journal/Case Study

Rebuilt a 70-screen advisor platform without pausing the roadmap

How we redesigned and rebuilt Northwind's core advisor platform: 70 screens, active users, live roadmap, zero production incidents.

Muhammad AsfandyarMay 10, 20257 min read

Rebuilding a live system is like replacing the engine of a plane mid-flight. Nobody wants to do it. But sometimes the engine is genuinely on fire and you don't really have a choice.

That's where Northwind was when they came to us. They needed to tear down their advisor platform and rebuild it from scratch, while their advisors were actively using it with real clients every single day.

The platform

Seventy screens. An advisor-facing web app handling client portfolios, compliance workflows, and reporting. Built over five years by multiple teams, each leaving their own unique brand of chaos behind. The kind of codebase where you read three files just to understand one function, and the answer is usually "it depends."

Everyone had a wish list. The business wanted coherence. The engineering team wanted a component architecture they could actually maintain. The advisors just wanted it to stop being slow in client meetings, which, fair enough.

How we approached it

The obvious move is a parallel rebuild: build the new thing in isolation, cut over when it's ready. We rejected that immediately. You end up maintaining two systems, two bug queues, and two sets of confused users, followed by a big-bang migration where all the risk lands at once. Also it takes forever. A full parallel rebuild here would have been twelve months minimum. Northwind didn't have twelve months.

We used the strangler pattern instead. Replace screens incrementally. Route traffic section by section. Retire old code as each part stabilises. Ship continuously throughout.

The test: if a production incident hits, can you revert in under five minutes? We built the routing layer so the answer was always yes.

The results

Eight months. Forty features shipped to production in parallel. The roadmap never paused. Activation went up 38%. Zero production incidents across the entire migration window. Not zero-ish. Zero.

More from the Journal