Reflections on over-engineering: Cozy
The Good
We were processing over $100m/mo in rent at Cozy, nearly all of that on the 1st of the month. We were able to do that in a Ruby monolith by maintaining a disciplined programming team, with thorough automated tests and monitoring. Every time we had a performance issue, we fixed it, whether it was adding indices, redesigning queries, or building auxiliary systems like for better parallelization.
I consider Cozy’s payment system an example of appropriately-complex engineering. It wasn’t under-engineered then glued together with hacks, it wasn’t over-engineered and then had a bunch of stuff that didn’t work. We were diligent and rigorous. We carried this discipline to all parts of Cozy’s code base and it paid off.
The payments system was necessarily complex from a logic perspective, but not a technical one. I believe it could have scaled another order of magnitude in volume without any sort of technical redesign. The work done on the payments system was not spent wrestling with complexity that did not reflect the innate complexity of the problem being solved.
The Bad
On the other hand, Cozy was running its monolith directly on AWS. This required one full-time Operations (DevOps) Engineer at a minimum, and eventually a second for redundancy. We easily could have run our entire setup on Heroku, which would have given us better tooling and more features. I would have done this, too, but it turns out the value of our first Ops Engineer was immense outside of Ops (he is the first to point out this “host it in Heroku” situation btw).
The problem Cozy was solving had no essential technical complexity, and thankfully we avoided much of it. We ended up over-spending in Ops but it did not infect the rest of the organization with workaround complexity.