When developers are asked to load test a system, most will start up a million threads to create a load on the server showing that it will eventually become un-responsive, but they have no numbers that allow them to create a load profile so they can properly plan for peak times in a production environment.
Little's Law states:
The average number of customers in a queueing system N is equal to the average arrival rate of customers to that system λ, times the average time spent in that system T
When load, or stress, testing a system, the only variable you can control is the arrival rate. The values you can measure are the response time and the throughput, or the completed requests in a period of time.
As long as your arrival rate is equal to your throughput, you are not queueing requests. At the point where the throughput is less than the arrival rate, you have reached 100% utilization in the system.
In the real world, load reaches a system in bursts. But to simplify matters, we only want the steady state values. Note that queueing during bursts is good, assuming the load is just a burst and will settle down allowing the queue to empty.
To dermine the load on a system in steady state, you need to keep the arrival rate into the system constant for a relevant period of time. To do this, you do not have concurrent threads/clients blindly fire one request after another into the system.
For an arrival rate of 1 req/sec you must have a client start a request every second, on the second. Not block for the reply and the fire a new request. The requests should fire like the tick of a metronome.
By increasing the number of clients or threads making a request every second, you increase the arrival rate. If we have 5 clients making one request each second, we have an arrival rate of 5 req/sec.
What's of interest here is if the average response time remains constant up to 100% utilization, or if it is non-linear and increases as the arrival rate increases.
That is, with an arrival rate of 1 req/sec, the average response time is .2 sec. If you plan for 80% utilization at average peak loads by extrapolating with these values you get, .8 utilization / .2 sec = 4 req /sec per resource. But this may not be true. At 4 requests a second, your average response time in the real world might be .25, which is 100% utilization.
Another interesting thing to keep an eye on is to see what the actual level of concurrency is. If you get the arrival rate up to the point where the throughput cannot keep up (100% utilization), the value of N should equal the number of effective resources. In a very simple system, this should approach the number of CPU's on the server. But if the server is passing requests to a database, you may see far fewer effective resources (a lower number of concurrent requests) when at 100% utilization.
[update: 1/5/05]
Also see System Tuning With Queues.
Leave a comment