Here are some rough notes on how to do a big, staged refactor in a codebase.
Understand what is possible now
People want to do certain stuff in the codebase. They are currently doing it in a bad way, thus the desire to refactor.
The first step is to understand the things they are doing.
Decide what should be possible
It’s possible that the set of things currently possible is not the set of things that should be possible.
If the refactor is going to take away people’s ability to do things, get buy-in from them on that.
If this will make it possible to do more things, that will probably go over well!
Think about how to make it possible
Before making any changes to existing call sites, think about the new way you want people to do the things.
Roll out the new thing under a new name
Let people opt in to your new thing either with a new version or new function name.
This doesn’t break existing callers of the old thing.
List where the old things are happening
Write a little tool (using stuff like grep) that checks where the current call sites of the old thing are.
Run this tool, then check the result into version control.
Add some code-owner bits to enforce this list can only ever shrink, not grow.
Add a test that re-runs the tool on every CI run, and makes sure the curren state matches what is expected.
Make that list shrink
Go through the allow-listed call sites and migrate them over to the new way. If possible, it might be nice to mechanically do this refactor.
Maybe grow the list temporarily
As you go, you may discover interesting use cases of the old thing that your new thing actually doesn’t entirely cover. You might need to then temporarily grow the allowlist.
Go back and work on enabling the use-case in the new way, so the allowlist stops growing.
Get the list to zero
Then your refactor is done!