System Tuning With Queues

| | Comments (1) | TrackBacks (1)

This blog entry has inspired me to make a few notes to work out some ideas through this rather meandering essay.

In very simple terms (read convenient simplification, see below), a queue is a serialized list of pending requests. And it is typically used to broker/filter/throttle requests to a naturally concurrent system behind it. What's interesting is that systems are typically compositions of other naturally concurrent (sub)systems. Turtles all the way down, if you will...

What I mean by a 'naturally concurrent system' is one that will continue to accept requests at nearly any arrival rate, but as the arrival rate increases, the service times increase once the arrival rate passes the natural level of concurrency of the system. Service time is how long it takes for a system to handle a request.

This increase is non-linear because as the more requests are in the system, it takes more 'effort' to manage the requests than to process them. Think I/O blocking against shared resources or thread context switching as common types of overhead. This is usually manifest as the throughput decreasing.

The point of a queue, as described above, is to keep the service time constant with respect to the arrival rate and to keep the overhead of managing concurrent requests down by simply preventing them from entering the system. The boundary is between the queue and the system.

Now, the best way tune a complex system would be to find the natural level of concurrency for every concurrent sub-system. But isolating every system and testing for it's concurrency level it would probably be pointless since it would be different once you put everything back together again. And remember, it's turtles all the way down...

But considering network effects, some sub-systems could be considered hubs, and would be easily identifiable as bottlenecks. If a hub is a bottleneck because it has a low concurrency (as compared to the systems that feed it), you have the choice of putting a queue in front of it (like a connection pool), or by putting a queue further up the chain.

Regardless, the only thing that matters really is the system wide level of concurrency. You must define what the system is and if it includes the tcp stack or just your appserver.

Once that is decided, you must test the system to determine its concurrency. Then you must identify bottlenecks and hubs and place queues in front of them if it makes sense (if they are getting too many requests), and then retest the system.

The point of all this is to improve the efficiency of your system, and to fail gracefully when the incoming requests get out of hand. Your system will fail in some fashion, so you are best off knowing under what circumstances that will happen (at what arrival rate) and how your system will behave when it get's there.

Of course, the best way to accomplish this is by stressing your system early and often and creating a load profile with each test.

In short, the point of optimizing with queues is to reduce the inefficiency of handling too many concurrent requests at any given sub-system, so the throughput does not decrease as the load increases. And you don't need a large complex model to find improvements, you only need to tackle the bottlenecks, and measure if they actually helped.

Consider this image. The blue line is arrival rate vs. throughput. It breaks at about 6 req/sec. The pink line is arrival rate vs. requests in the system. Overlaying these lines, you see with an arrival rate of 6 req/sec, the system begins queuing. The number of requests in the sysetm is about 2. Thus, with a queue size of zero and two requests being handled, there must be two resource handlers. As you improve the efficiency of your system, this number will increase, typically towards the number of cpu's you have if profiling a single application. This value will represent the throttling of your system by the critical path bottleneck.

Do note some bottlenecks are good if they are tuned to the maximum throughput possible. Letting more requests through could (as mentioned above) reduce the efficiency of the system (lower the throughput).

[addendum]

What I'm leaving out is that a system can be modeled as a queue and X request processors (CPU's, etc). That is, response time is the sum of wait time (in the queue) and the service time (time to get processed).

The assumptions above are that you have no control over the 'queue' in the system in question, or that there really is no queue. An example of this is threading. You efffectively can spawn as many threads as you want in java till the vm crashes or things get really slow. To manage things better you use a thread pool that throttles the number of threads and queues requests for new (finished) ones.

So for simplicity, I'm saying '(sub-)system'. And not discussing the effective queue in it. And am treating service time as response time.

[udpate 12-2-04: the point]

After all that, it would be fair to say:

The point of a request queue is to locally optimize the throughput of a concurrent system.

Note this is different than an async message queue and other similiar things whose responsibilities are to improve modularity and reliability...

1 TrackBacks

Listed below are links to blogs that reference this entry: System Tuning With Queues.

TrackBack URL for this entry: http://www.manamplified.org/cgi-bin/mt-tb.cgi/254

A friend this afternoon pointed me to someone that read my earlier post about tuning and queueing and went into quite a bit more detail. It's a pretty good intro to the subject, and a lot more detailed than the... Read More

1 Comments

It was exactly these considerations that convinced us to place a configurable throttle on NetKernel (http://www.1060research.com)

A couple of addition points of interest though are:

1) when processing XML we have found that heap usage is the biggest constraint on the number of parallel requests that can be processed.

2) when accepting requests from heterogeneous sources (say message queues and http) it is not enough to rely on factors such as tcp queuing. You need a common queue across everything.

We did some scalability testing with a number of different scenarios. See (http://www.1060research-server-1.co.uk/docs/2.0.0/book/userguide/doc_whitepaper_scalability.html">http://www.1060research-server-1.co.uk/docs/2.0.0/book/userguide/doc_whitepaper_scalability.html)">http://www.1060research-server-1.co.uk/docs/2.0.0/book/userguide/doc_whitepaper_scalability.html)">http://www.1060research-server-1.co.uk/docs/2.0.0/book/userguide/doc_whitepaper_scalability.html">http://www.1060research-server-1.co.uk/docs/2.0.0/book/userguide/doc_whitepaper_scalability.html)

The graphs show that you really can keep the efficiency high and degrade gracefully if you apply queueing/throttling.

Leave a comment