Years ago, my attention was drawn to this car forum discussion about someone who had hooked up a garden hose to flush out the crankcase of his Ford Mustang. If you don’t want to read it, supposedly this guy bought a Mustang and was doing an oil change. He thought it was too dark and sludgy so he drained it, put a garden hose into his crankcase and turned it on. He then ran the car to get the water shooting through all the engine parts. It lasted about 3 minutes before the engine locked up. The next 25 pages of forum posts alternate between calling the guy an idiot and claiming the post has to be a troll.
I’m not personally convinced the guy wasn’t just a good troll. However, even if he’s a troll, he’s hit on something that is believable. That is, a lot of people (myself included) just don’t know that much about cars. Now, I DO know enough to not flood my engine with water, but if it’s not a tune-up, oil change, or brake change, I’m calling a mechanic.
This example is a bit on the absurd side. However, it shows a good point, if you don’t understand the systems involved, you run the risk of causing yourself more problems. Best case, you just waste a lot of time, energy, and money with your misunderstanding. Worst case, you end up causing injury or death.
So what are the systems involved in software engineering? It largely depends on the area you’re working in. I’m currently working on a web application, so for me the systems involved are CPU & RAM of the web servers, the network infrastructure between my web and SQL servers. There’s also the memory, CPU and disk space of the SQL servers. Finally, there are the individual pieces of software that are running, for example, understanding how SQL handles specific queries.
If you’re doing embedded system development your systems change. Instead of worrying about the CPU and RAM of a server, you are now concerned with any devices you might be controlling (such as a power supply, or servo motor.) You’ll also be worried the input signals, and voltage level.
Let’s move a bit from the abstract to concrete. I want to look at a specific issue that I’ve run into in the past 12 months. We have an application that manages sales leads. This application displays all sorts of information on the lead, as well as a summary page that allows for users to sort, as well as page through the results. Finally, there is no limit to the number of leads a particular user may have.
After migrating users from an old system to the new system, we got a few complaints of slow page loads. The customers involved all had over 1,000,000 leads they were dealing with, even showing 100 leads at a time lead to 10,000 pages.
So where should we look? We have a lot of options. We have an ASP.NET MVC page with some Javacript on the front end. This website sits on several load balanced virtual machines. Additionally, we have the data spread across several SQL Server databases, however, each customer’s data is located in only one database.
If we’re unclear on the particulars of each system, this issue might take a while to diagnose. We might start by putting a breakpoint on the controller method that gets called when the “Next” button is pressed. Then step through the code to see if anything is out of the ordinary.
But perhaps we’re a bit smarter than that, so instead of just putting a breakpoint somewhere we decide to profile our application. After profiling a customer that has a million records, we profile one that has only a couple thousand leads and then start to see if anything jumps out at us.
However, as we look at our system, we should be able to get a jump on what the problem is. Either the database is having issues serving up the data, or our application is designed poorly and is querying the database once for each record (100 times) instead of a single time returning a list. A quick check tells us that, in fact, the webserver makes a single call to GetLeads() which returns a list of records.
So now we turn our attention to the database. SQL Server is an enterprise solution, so it should be able to handle a million records in less than the 30-40 seconds it was taking on the website. A bit of digging in the stored procedure highlights the issue. I was creating a temporary table for sorting (using the the OVER function of Sql Server.) This temporary table really just needed to hold the ID and the Rank of each field, however, instead it was being loaded with the entire record (30 or so fields) and it was doing this for all 1,000,000+ records for a customer.
In the end, we were able to diagnose the issue and implement a solution in a matter of hours. I credit that to out team having a good grasp of the systems involved. None of us had exhaustive knowledge of all the systems, but each of us had a firm grasp of at least one system. So we were able to see that the way the temporary table was implemented was not optimal based on how SQL server handled it.
There are times that bugs can disguise themselves and send you down the wrong path. However, the majority of time, having a solid understanding of how the systems you’re using operate will go a long way towards diagnosing an issue.