Code Optimization

29. November 2014 Uncategorized 0

For starters, a friend reached out to me on Facebook and asked for my thoughts on optimizing code, especially with the cost of hardware being so cheap. I don’t know that this will answer his question, but it’s my thoughts on the subject.

Early Optimization

Premature optimization is the root of all evil (or at least most of it) in programming. – Donald Knuth

That is a rather famous quote that developers use quite often. A lot of times it’s used as an excuse for writing bad code, sometimes, it’s used to remind people to not spend time optimizing their code too early.

Too often developers will spend time and energy trying to lay out the perfect design and perfect implementation because they fear an area of the application will be too slow. Keep in mind, they haven’t tested the performance. They don’t know that it’s a bottleneck, just a hunch that it will cause a problem.  The reason this is bad is that in isolation it’s difficult to tell what will be slow code.  For example, if you’re reading through code and you see something that raises a red flag, you could take some amount of time to resolve that issue. Only after resolving the issue and getting it merged into mainline development do you realize the code you just optimized gets executed one time a week, or a month, or a year. Meanwhile, there is code in the application that gets executed multiple times a minute, or even multiple times a second.  Until you know what the actual bottleneck in the application is, it is irresponsible to try to optimize it.

HARDWARE OPTIMIZATION

That’s the first problem with code optimization. At the other end of the spectrum is a different problem with code optimization. A lot of times this problem starts with a discussion like “Space is cheap” or “RAM is cheap” etc. The thought is, that developers get paid a lot of money. Most of the time a senior developer in the mid-west is making around $50 per hour. However, the total cost of development is usually closer to $75 per hour. And if you’re a consultant, you’re probably getting billed out at $100-150 per hour.  So it doesn’t take a lot of time before the cost of implementation gets pretty high.  For example, instead of paying a developer 10 hours, you could buy a pretty decent server, and the developer could be doing other things.

However, this makes a couple assumptions. First, it is assuming that RAM and CPU solve the problems developers are trying to solve. Sure, RAM and CPU help a lot, and they cover over a multitude of [programming] sins. But they’re not the only systems that could be bottlenecks. For example, network throughput could be a bottleneck. This could happen if an application is sending too much unnecessary data. Instead of sending only the data that is required (e.g. 100 bytes) a developer may send an entire record, 1KB, or 10 times more data than is required. This isn’t solved by building a bigger or faster server. It’s only “solved” by either expanding the network pipe, or by sending less data.

The second assumption this mentality has, is that if a developer delegates performance to a server, that the rest of the code will not have problems. In fact, it’s probably just the opposite. A developer that takes such a lackadaisical attitude to performance, probably isn’t writing good code anyway.

MAKE THE MOST OUT OF WHAT YOU HAVE

While typing this up, I kept thinking about one of the first articles I ever read about Map-Reduce years ago. It discussed how Google used commodity hardware to implement their blazing fast speeds. They did that so that if one of their servers dies, they can easily, quickly (and cheaply) pull the server out of the rack, and throw a new one in and keep running. They did this, not by stock piling a lot of really expensive servers. Instead, they did it by writing code that is optimized for the hardware they had. They wrote frameworks and libraries that realized the bottleneck was serially processing a request when delegating it would be much quicker. It wasn’t that the hardware they had was bad, and couldn’t process a million records, it was that it couldn’t easily process billions of records. However, they realized that if they took those billion records and split them up across 1000 computers, then each of those 1000 computers could easily process a million records.

That is, they matched up the code being written to optimize the hardware they had available. They didn’t buy bigger servers, or more CPU. Instead, they wrote good code to take advantage of what is there.

I will most likely never work on a Map-Reduce algorithm, much less one that will be used by such a large company & most people on the face of the earth. However, that doesn’t excuse me from the responsibility of writing smart software. The application I’m working on right now will be run on a corporate network that I have absolutely no control over. I can’t dictate what the specs of the users’ computers are, nor can I dictate when they should be updated, or even what OS version they’re running.

If I were to divest myself of the responsibility and say “When they upgrade, it will be faster” I’m opening myself up to bad coding practices, and building something that might take an inordinate amount of hardware in order to perform.