Reformatting Your Codebase with git filter-branch

An all too common scenario…

At work we use the PSR-2 Standard for code style when writing PHP. This works great for new code but creates a new problem… If your editing an old file that does not use this style you have 3 options:

  1. Format the whole file; this incremental approach makes sense but can create an enormous diff that both makes it bad for code review and add your name to lots of lines you didn’t actually modify (just changing white spacing).
  2. Format a portion of the file; this may be just the functions you have worked on. This may be the worst solution, it minimises the noise in the diff but now created multiple styles in the file for the next person to be confused and frustrated with.

Enter git filter-branch!

Git has tonnes of great tools that can be very powerful, filter-branchis another example of this. It can be used for lots of things we won’t explore here — rather we will use it for a very specific use — filtering through the tree.

git filter-branch — tree-filter ‘phpcbf src’ -- --all

It’s Time to Get Tricky

We can do a lot to speed this up:

  1. Run all the file formatting in parallel. For a given commit all the files could be formatted independently and at the same time. I originally tried this in bash but it’s more trouble than it’s worth, especially if it hits a commit with a lot of changes that spawn thousands of processes.
  2. Only focus on files that have been changed. This is easily the largest performance boost, but used in conjunction with the two above should turn hours to formatting in to minutes.
git filter-branch --tree-filter 'phpcbf $(\
git show $GIT_COMMIT --name-status | egrep ^[AM] |\
grep .php | cut -f2)' -- --all
  • egrep ^[AM] filters down the statues to Added and Modified only. No need to try and format files that are being Deleted.
  • grep .php to only format PHP files.
  • cut -f2 removes the status prefix from the list so we just get the raw file paths.

Verifying the Result

Use git blame to look at a file that was not formatted previously. You should see that this file is now formatted nicely and has all the original authors and dates on the left of the output.

I’m a data nerd and TDD enthusiast originally from Sydney. Currently working for Uber in New York. My thoughts here are my own. 🤓 elliotchance@gmail.com

I’m a data nerd and TDD enthusiast originally from Sydney. Currently working for Uber in New York. My thoughts here are my own. 🤓 elliotchance@gmail.com