Sunday, 27 March 2016

Your Code As A Crime Scene

Just finished reading this by Adam Tornhill and wanted to share a quick review of it

Really interesting book. Part theory part hands on experience and as such it's a different style to a lot of programming books which I think keeps you engaged after the enticing introduction which tells you that the same techniques used to track serial killers can help you find problems in your software. It's a premise like that which get's you interested.

Sometimes it feels like a bit of a gimmick. I don't feel like often the forensic techniques are anything more than a touch point that helps you understand the premise of some pretty nifty statistical analysis through your code and particularly your source code history.  We see the map that indicates where jack the ripper may have  lived but then our own code map seems a bit more straightforward. It still acts as a nice narrative and vehicle for getting the concepts across but I don't feel I'm any closer to starting my private detective business unfortunately.

Section 1 starts with the theory to help you understand where the bottlenecks may be in your code and in particular how to tell from how a code evolves and grows how it may have suffered from growing pains. Essentially V1.0 if your code is normally written by a smaller group with a singular purpose but as time goes on more people come in to the project and that can very easily lead to some sprawl of the code base and differing standards and implementation styles so it's great if you can try and identify those areas because they may need a proper ground up redesign.

This is done through a couple of different mechanisms one of which is mining the richness of information in your git history. The tooling provided will allow you to find code which is complex or could be home to many errors another way of doing that which has been suggested by colleagues was just to keep an eye out for anywhere I had made commits....dickheads.

So while you may not be pounding the beat whilst looking for clues as to where the issues are, what you get instead is some great heuristics and insights. A lot of them that I have no experience of like indentation-based complexity for example which indicates you can judge complexity of code visually from how it's indented. All of these techniques are based on academic finding and Adam acts almost as a conduit bringing these to industry to say hey this could be useful to you today don't want for it to become a fancy start-up with a .io address.

We also get to have worked examples both on the code base that's used to analyse code bases (how meta ;) ) to NHibernate and others. It's brilliant to get this level of hands on work in a book which could easily disappear into theory and intellectual grandstanding. It makes the examples feel like something you can actually aspire to and all in all this made me interested in checking out Microsoft R (pretty sure there is a free EdX course on it) and checking out the world of data analysis.

Section 2 talks at length about automated testing and also some of the shortcomings you can have with the implementation when you have an 'extensive' set of tests that are too close to the code and counter-intuitively slow down progress you can make.

I found the chapter on this to be probably the most interesting and thought provoking of the book. It's not often a developer will talk about writing tests and even less want to draw attention to the tests they've written but this is a frank discussion about how the best intentions around testing can end up weighing you down because the same discipline and thought processes we bring to the application code isn't present often in the test code.

Section 3 is on social psychology and so its very different to what has come before. It's less actionable (to start but does come back around) and I guess more of intellectual value. That's not to say there isn't touch points of real world but initially it's removed from concious thoughts etc. Interesting points include discussing why Brooks law on increasing team size doesn't fit for open source projects.

This feels like a manual for sitting down and working out how to mine the information rich source that is in your source code history. As an industry which has such a steely focus on big data etc we often seem wary to turn the microscope around to look at ourselves and see what can be learned about how we work and what makes for successful software delivery. Though the techniques may not stay the same I truly believe this book could make a big difference to how software gets delivered. Actionable metrics for software development is such a bloody minefield that it's hard not to be really floored when someone can not only show you actionable and scientifically backed ones but also make it an interesting read.

I'm going to make a concious effort to use the tools listed in the book and see what information we can bring out. We are in a pretty distributed project set-up (which I guess is becoming the norm via micro-services) so sometimes I think some of our temporal dependencies will be lost because it's across multiple git repos but I still think a lot of what is discussed and reasoned about here will be useful. When you have a book like this which isn't about a technology explicitly or about a class of problem I think there's no higher praise than to say it's thought provoking and has driven me to want to implement what's talked about as soon as I can.

Well worth a read!

No comments:

Post a Comment