Using code metrics effectively
As the lead of a team and then a director for a project, code metrics were very important to me. I talked about using Sonar for code metrics in my previous post. Using metrics gets to the very core of what I believe about software development:
- Great teams create great software.
- Most people want to, and can be, great.
- Systematic problems conspire to cheat people into doing less than their best work.
It’s this last point that is key. Code metrics are a conversation starter. Metrics are a great way to start the conversation that says, “Hey, I notice there may be a problem here, what’s up?” In this post, I’ll go through a few cases where I’ve used metrics effectively in concrete ways. This is personal; each case is different and your conversations will vary.
Helping to recognize bad code
A number of times, I’ve worked with one of those programmers who can do amazing things but write code that is unintelligible to mortal minds. This is usually a difficult situation, because they’ve been told what an amazing job they’ve been doing, but people who have to work with that code know otherwise. Metrics have helped me have conversations with these programmers about how to tell good code apart from bad.
While we may not agree what good code is, we all know bad code when we see it, don’t we? Well, it turns out we don’t. Or maybe good code becomes bad code when we aren’t looking. I often use cyclomatic complexity (CC) as a way to tell good code from bad. There is almost never good code with a high CC. I help educate programmers about what CC is and how it causes problems, giving ample references for further learning. I find that because metrics have a basis in numbers and science, they can counteract the bad behaviors some programmers have that are continually reinforced because they get their work done. These programmers cannot argue against CC, and without exception have no desire to. They’re happy to have learned how they can keep themselves honest and write better code.
It’s important to help these programmers change their style. I demonstrate basic strategies for reducing CC. Usually this just means helping them split up monolithic functions or methods. Eventually I segue into more advanced techniques. I’ve seen lightbulbs go off, and people go from writing monolithic procedures to well-designed functions and classes, just because of a conversation based in code metrics and followup mentoring.
I use CC to keep an eye on progress. If the programmer keeps writing code with high CC, I have to work harder. Maybe we exclusively pair until they can stand on their own feet again. Bad code is a cancer, so I pay attention to the CC alarm.
Writing too much code
A curious thing happens in untested codebases: code grows fast. I think this happens because the code cannot be safely reused, so people copy and paste with abandon (also, the broken windows theory is alive and well). I’ve used lines of code (LoC) growth to see where it seems too much code is being written. Maybe a new feature should grow a thousand lines a week (based on your gut feeling), but if it grows 3000 lines for the last few weeks, I must investigate. Maybe I learn about some deficiency in the codebase that caused a bunch of code to be written, maybe I find a team that overlooked an already available solution, maybe I find someone who copy and pasted a bunch of stuff because they didn’t know better.
Likewise, bug fixing and improvements are good, so I expect some growth in core libraries. But why are a hundred lines a week consistently added to some core library? Is someone starting to customize it for a single use case? Is code going into the right place, do people know what the right place is, and how do they find out?
LoC change is my second favorite metric after CC, especially in a mature codebase. It tells me a lot about what sort of development is going on. While I usually can’t pinpoint problems from LoC like I can with CC, it does help start a conversation about the larger codebase: what trends are going on, and why.
Tests aren’t being written
A good metrics collection and display will give you a very clear overview on what projects or modules have tests and which do not. Test count and coverage numbers and changes can tell you loads about not just the quality of your code, but how your programmers are feeling.
If coverage is steadily decreasing, there is some global negative pressure you aren’t seeing. Find out what it is and fix it.
- Has the team put themselves into a corner at the end of the release, and are now cutting out quality?
- Is the team being required to constantly redo work, instead of releasing and getting feedback on what’s been done? Are they frustrated and disillusioned and don’t want to bother writing tests for code that is going to be rewritten?
- Are people writing new code without tests? Find out why, whether it’s due to a lack of rigor or a lack of training. Work with them to fix either problem.
- Is someone adding tests to untested modules? Give them a pat on the back (after you check their tests are decent).
Driving across-the-board change
I’ll close with a more direct anecdote.
Last year, we ‘deprecated’ our original codebase and moved new development into less coupled Python packages. I used all of the above techniques along with a number of (private) metrics to drive this effort, and most of them went up into some visible information radiators:
- Job #1 was to reduce the LoC in the old codebase. We had dead code to clean up, so watching that LoC graph drop each day or week was a pleasure. Then it became a matter of ensuring the graph stayed mostly flat.
- Job #2 was to work primarily in the new codebase. I used LoC to ensure the new code grew steadily; not too fast (would indicate poor reuse), and not too slow relative to the old codebase (would indicate the old codebase is being used for too much new code).
- Job #3 was to make sure new code was tested. I used test count and coverage, both absolute numbers and of course growth.
- Job #4 was to make sure new code was good. I used violations (primarily cyclomatic complexity) to know when bad code was submitted.
- Job #5 was to fix the lowest-hanging debt, whether in the new or old codebase. Sometimes this was breaking up functions that were too long, more often it was merely breaking up gigantic (10k+ lines) files into smaller files. I was able to look at the worst violations to see what to fix, and work with the programmers on fixing them.
Aside from the deleting of dead code, I did only a small portion of the coding work directly. The real work was done by the project’s programmers. Code metrics allowed me to focus my time where it was needed in pairing, training, and mentoring. Metrics allowed the other programmers to see their own progress and the overall progress of the deprecation. Having metrics behind us seemed to give everyone a new view on things; people were not defensive about their code at all, and there was nowhere to hide. It gave the entire effort an air of believably and achievability, and made it seem much less arbitrary that it could have been.
I’ve used metrics a lot, but this was certainly the largest and most visible application. I highly suggest investing in learning about code metrics, and getting something like Sonar up on your own projects.
Good read. I also like to use code metrics, however for some reason i’ve never seen them get integrated into the dev process anywhere i worked.
Mostly i just run some metrics locally in my IDE to see how new code being developed looks like.
Do you have other recommendations for metric tools? you mentioned SONARQUBE. Is there anything else worth looking at?
We are doing Java/C#, so i know FxCop, StyleCop, and for Java i’m using findbugs, checkstyle and PMD.