Thursday, May 13, 2010

A Thought About Minimizing System Latency

When programmers think about optimizing things, they often think about optimizing code.  After all, code is where they live.  It's what they do.  If it's not the code that's to be optimized, what is?  I confess that this was my way of thinking for many years.  Many years. Want to know how to make things faster?  Fire up the profiler!

I no longer think that way.  Now I think about optimizing the system.  (This recognition, in fact, has convinced me that I'm going to have to give Fastware! a new subtitle. It's currently "Straight Talk about Fast Code," which I think has a nice ring to it, but that's too codecentric.  I'll have to find a way to say something catchy that corresponds to "Straight Talk about Fast Systems." But I digress.)

In rare cases, the system is a single executable over which you have complete control, so, sure, fire up the profiler. More commonly, the system consists of multiple cooperating components, often broken across multiple executables.  Web-based systems, for example, may have different components handling network traffic, business logic, database queries, etc.  In that case, if the system is slow, it's the system that's slow, so before you fire up the profiler, you need to know where to point it.

At first glance, the general way you approach this problem is to start a timer when an event brings the system into action (e.g., a browser sends a request to your web site), stop the timer when the system produces the appropriate response (e.g., packets are sent back to the browser), then look at how much time was taken in the various parts of the system.  The problem is that in many cases, this is the wrong way to think about things.

Steve Souders has done a great job of demonstrating one aspect of this idea as it applies to web sites, describing in various forms (book, article, video) how he derived what he calls the Performance Golden Rule:
Only 10-20% of the end user response time is spent downloading the HTML document.  The other 80-90% is spent downloading all the components in the page.
His approach takes the view that you start the timer when the initial request for an HTML page comes in, but you don't stop it until everything on the page has been rendered in the browser.  From a user's point of view, this is much more reasonable.  After all, until the page has been displayed, he or she is still waiting, even if the web site itself thinks the job was finished long ago.

One of the most interesting implications of this observation is that a critical part of the user's perception of the system's performance is determined by software (in this case, the browser) over which "the system" has no control.  For example, a big part of how fast a site like Yahoo (where Souders worked when he wrote High Performance Web Sites;  he's at Google now) seems to be is determined by the user's browser, and of course Yahoo has no control over which browser the user has chosen to use.  The idea that the performance of a system is influenced by software over which it has no control generalizes.  Even native applications, for example, are typically at some degree of mercy with respect to the libraries they link with and the operating system they run on.

But that's an issue for another day.  What I want to pose now is the idea that when determining the latency of an interactive system, the timer should not be started when something triggers the system, it should be triggered when the person using that system decides that they want to do something.  Once you've decided you want to do something, everything between then and when you actually get it done is wait time, even if during that time you're typing in commands or pulling down menus or wading through search results, etc.  One of the nice things about thinking about things this way is that it offers a framework for thinking about such disparate performance issues as UI design (minimize the time needed to get from deciding what you want to do and expressing it) and prefetching and speculative execution (both of which entail satisfying requests that have not yet been expressed).


No comments: