This week I was looking over a pull request at work and I noticed that there was a long-ish method in the code. This particular method was around 100 lines long. I know for some people that’s huge, for others, it’s tiny. As I read the method, I put a comment on the PR that something like “This is a pretty long method. One possible candidate for extraction is like 140-160.”
The submitter reviewed all of our comments and essentially replied “I don’t see the value of splitting this up just because it’s long. If there’s other reasons, I’m open to it.”
This caused me to stop and think for a minute. His comment was actually quite good. He’s new to our team so he could have just said “Ok, I’ll split it up.” But he didn’t. Looking back at my original comment, I didn’t give him any reason to, either. I was just stating a fact “This method is around 100 lines long.”
So this made me evaluate my own reasonings, why do I
hate dislike long methods? At the core, it’s readability. Long methods are more difficult to read and really figure out what’s going on. I don’t remember the exact details of the PR, but I know it involved putting some data on a timeline based on some geographical coordinates. Something akin to “Let me know when something in this Lat/Long window happens between these two times.” If we were to write out, in pseudo-code, what the method was going to do, it might look something like:
- Check all the input data
- Convert the times specified to UTC
- Get all the geographical data for that window
- Filter geo data for specified time
- Create a new notification if there is an intersection
Those are probably the thoughts we’d have when writing this. And really, that’s all this method would do, just those 5 things. It’s actually, only doing 1 thing, but it takes 5 actions to get there. Now, imagine each of those actions is a single line, the method isn’t too long. But perhaps you want to add logging or notification if the input data doesn’t pass the check. Perhaps you have to create an array of objects for the notification, one for each event that happens in the window. It doesn’t take much work for us to wind up at 50 or 100 lines of code, and that’s essentially what happened.
The problem is, as I’m reading those 100 lines later, when I have to make a change, I again have to process all the information in that method. My thought process will be something like “Ok, he’s checking the inputs…looks like he’s converting the time to UTC. Where did that variable come from? Oh it’s back there as the output of one of the validity checks. Ok, where was I?…Oh yeah, time converted….ok we’re getting some time from a service, looks like it’s returning an array of coordinates, is Latitude in the 0 position, or Longitude? Looks like Latitude…all right, so we’re checking that latitude against the window….<a few more minutes>…Ok, so what is it this method is supposed to be doing?” This goes on until I’ve parsed the entire method.
However, if I can see a few methods instead, essentially mirroring those bullet points above, then I’ll know the goal of the createNotificationForLatLongWindow method.
The other main aspect for keeping methods short is code reuse. As I re-reviewed the PR I noticed there was a method that was very similar right above this one. One of the main differences had been that this new method allowed you to pass in the time you wanted to check against, the first method always just assumed the time of right now.
Once I saw that, I thought about how we could extract a handful of methods and reuse them between the two methods. And that’s typically when developers start looking at extracting methods.
However, by that point, it’s almost too late. This PR is a great example of why. The first method had been written some time before by some other developer. For this new developer to see the commonality between this old method and his new method he’d have to do all the parsing I was talking about above. Once he does all that parsing of the first method, he has to understand how his is different, and make a decision of if he can just jam his in to the first method or write a second method.
It might not sound like it, but that’s actually a lot of work. Sure, one method like that isn’t too bad, but if you’re doing this all over the code base, you’ll be doing that same evaluation over and over again.
If the original developer had used small, well named methods, then the developer working on the second method would have been able to glance at the first method, see the 5 things it’s doing, know that he’s doing 3 of those the same and 2 different. He then can call the 3 methods that are identical, and write 2 smaller methods. His total work goes down, he gets to take advantage of the mysterious “code reuse” that managers everywhere are raving about, and he gets to move on to the next problem.
Some people might want to state the “You Aren’t Going to Need It” rule here about premature refactoring. But I don’t think that applies in this case. We’re not talking about creating a bunch of extra objects or features. We’re not saying that the original author should have created a Factory Pattern implementation so that the following authors wouldn’t have to. We’re talking about readable code. When we write code with the mindset of refactoring later, it’s not YAGNI, it’s IWGD — It Won’t Get Done.