I was part of a project where we had to tweak one of our WCF (.Net framework 4.5) services that were hosted on IIS7. The target was to achieve a throughput of 1500+ messages per second with sub-second response. The service was synchronous blocking and had an XML payload as request-response. There are a lot of good articles scattered over the internet. I have tried to summarize all those so you get a good starting point. The reference section below contains links to all the articles that I have referred to. Going through them had helped me understand what happens under the hoods and more importantly, what options are available to tweak to achieve ones performance goals. Hope you find this write-up useful.
How A Request Is Processed By IIS7?
The request processing models vary significantly between versions of IIS. This section provides a quick gist of how a HTTP request is processed in IIS7. The following diagram represents how a typical WCF request is processed.
HTTP.SYS is the Kernel driver that receives all HTTP / HTTPS requests and sends them to IIS for processing. For each application hosted in IIS, an ASP.NET Worker Process receives the request for processing. Internally this has a pipeline of handlers, through which the request is passed. One of those handlers is the WCF “Service Model” handler. This is responsible for instantiating the WCF Service instance to process the request.
The HTTP.SYS delegates the call to the ASP.NET Worker Process over an I/O Thread from the CLR Thread Pool. Once the ASP.Net gets the request, it en-queues the request on a Worker Thread from the CLR Thread Pool. If ASP.NET got a Thread, it returns an HSE_STATUS_PENDING to the HTTP.SYS I/O Thread freeing it to take the next request.
The Worker Thread that picked up the request, delegates the call to the WCF on an I/O Thread. This again is non-blocking (In IIS6 this, the WCF handler was invoked synchronously), so the ASP.NET Worker Thread no longer waits on WCF to process the request.. WCF internally uses I/O threads from CLR Thread Pool to process the request.
While the request is being processed, it goes through multiple stages. Each stage is completely asynchronous to ensure that the thread from the previous stage that hands to over request is not blocked till the life time of the request at that stage. Being asynchronous, each stage maintains its own internal queue to stage and process the requests. Un-optimized, badly tuned setup will make requests trapped in these queued leading to low throughput and high latency. This section would help you to understand what those queues are and the reason why requests get queued and what performance counter that you can look for to identify a problem.
All Requests are queued in the HTTP.SYS Queue which is maintained for each Worker Process. The default queue limit is 1000, which can be modified using the “ApplicationPool’s Queue Length” property on the IIS Manager. Requests get queued when IIS Worker Process is not able to de-queue at the required rate. If the Queue is full a 503 response is sent back to the client. Watch the performance Counter, “Http Service Request Queues\CurrentQueueSize” to see the count of items queued when there is latency. Mostly this value is good enough, unless you have a huge traffic with large payloads being exchanged for each request.
ASP.NET queues all incoming requests from the HTTP.SYS to the CLR Thread Pool. If there are no CLR Worker Threads available to pick the request, this queue starts getting filled up. The Queue limit is controlled by “Process Model’s RequestQueueLimit”. If the queue is full a 503 response is sent. The default limit of this queue is 5000. Watch the performance Counter, “ASP.NET v4.0.30319\Requests Queued” to see the count of items queued when there is latency. Again this value is good enough and need not be tweaked unless necessary.
After the ASP.NET Worker Thread picks up a request for processing, there’s one more parameter that is being checked for before the request is actually handed over to WCF. That is the “Application Pool’s MaxConcurrentRequestsPerCPU”. This is a safety net to prevent un-controlled number of request flooding at the ASP.NET layer. With this setting the number of concurrent requests executed per CPU is limited; thereby the Thread usage is optimal. The default value is 5000. Watch the performance Counter, “ASP.NET v4.0.30319\Requests Queued” to see the count of items queued when there is latency.
The ASP.NET finally hands over the request to WCF over an I/O Thread. The WCF Service Model is responsible for processing a WCF Service request. The number of requests processed is controlled by the “ServiceModel/serviceThrottling Settings – maxConcurrentCalls, maxConcurrentSessions, maxConcurrentInstances”. These 3 parameters limit the maximum concurrent behavior. For e.g., if the maxConcurrentCalls is set to 500, at any given point, there can only be a maximum of 500 calls being executed concurrently. If there are 600 calls received by the service, the remained 100 would be queued waiting for its turn. WCF 4.0 had introduced a good set of counters to indicate if the throttle settings are fully utilized or underutilized. The value indicates how much percentage of the Throttle settings is actually used. E.g., A value of 50 on a maxConcurrentInstances = 500, means only 50% of maxConcurentIntances of used, i.e., only 250 concurrent instances are created to service the load. The default values listed in the table below.
|Property||.NET 4.0 Default||Previous Default|
|MaxConcurrentCalls||16 * ProcessorCount||16|
|MaxConcurrentInstances||116 * ProcessorCount||26|
|MaxConcurrentSessions||100 * ProcessorCount||10|
WCF 4.0 Default Throttling Values
Look for the following Performance Counters to monitor the utilization of the Throttle settings.
- Percent of Max Concurrent Calls
- Percent of Max Concurrent Instances
- Percent of Max Concurrent Sessions
Arriving at the optimum Throttle value for your WCF Services may not be always straight forward. Setting arbitrary high values to the throttle parameters would not provide better results. For e.g., let’s say you except a constant concurrent load of 1000 Requests per second. Now you would be tempted to set the maxConcurrentCalls and maxConcurrentInstances to 1000 each, assuming that no request shall be made to wait, since you have provisioned resources to WCF proportional to the load you expect. But this works on the contrary. When you allow higher number of concurrent calls to execute that many Threads are created. Obviously not all Threads can be executed at the same time, which depends on the Processor you have. This leads to excessive context switching wasting your CPU cycles rather than processing the actual request.
The best values to start would be with the values listed in table “WCF 4.0 Default Throttling Values” and tweak them as you go till the point you get the best response time and throughput. The CallDuration and CallsOutstanding performance counters are good indicators of how much a request takes for processing and how many calls are being executed.
Keep watching the CPU Utilization as well. Keeping the utilization around 70%-80% would give you better performance. Accordingly tweak the maxConcurrentCalls and maxConurrentInstances.
Garbage collection happens when there is no space available to create new objects. GC runs on a high priority thread, which completely suspends the processing of the WCF Request. Excessive GC would adversely impact the Throughput and response time. Make sure that objects are created only when they are needed and destroyed as early as possible. Keep looking for the “% Time Spent in GC, Gen0, Gen1, Gen2 Collections”. These should give a fair idea of how the memory allocation happens and its impact. This directly impacts the CallDuration and reduces the CallsOutstanding.
CPU usage will be high during a garbage collection. If a significant amount of process time is spent in a garbage collection, the number of collections is too frequent or the collection is lasting too long. An increased allocation rate of objects on the managed heap causes garbage collection to occur more frequently. Decreasing the allocation rate reduces the frequency of garbage collections. You can monitor allocation rates by using the “Allocated Bytes/second” performance counter.
At every stage in processing a WCF request Threads are involved. These threads are requested from the .NET CLR Thread Pool. The number of minimum and maximum Threads plays a role in getting you the optimum performance. For a WCF Service, when self-hosted or IIS-hosted, the number of Threads available in the ThreadPool determines how far the Service can scale. As mentioned earlier, more threads do not mean higher scalability. Set the “maxWorkerThreads, maxIOThreads of the processModel” section of the ASP.NET application to an optimum value. To start with follow the general recommendation.
- Set the values of the maxWorkerThreadsparameter and the maxIoThreads parameter to 100.
- Set the value of the maxconnectionparameter to 12*N (where N is the # of CPUs).
- Set the values of the minFreeThreadsparameter to 88*N
- Set the value of minWorkerThreadsto 50.
By setting the Auto config to true, you can get know what it takes for your service to handle the load. Based on feedback the .Net ThreadPool recalibrates itself by creating the required number of threads to handle the load that your Service receives. But sometimes a much smaller value could give you much better performance. And that’s what happened in our case.
This section can serve as a quick reference for the monitoring counters, throttling settings to look for at each stage.
|Measuring messages received at each point would help you identify areas where requests are made to wait. Watch these counters.||HTTP Service Request Queues / Arrival Rate||ASP.NET Applications / Requests/Sec||Calls Per Second|
|To Identify where requests are Queued, use these counters||ASP.NET Request QueuedASP.NET Request WaitTIme|
|To check if the Throttle Settings are better utilized||Percent of Max Concurrent CallsPercent of Max Concurrent Instances
Percent of Max Concurrent Sessions
CallsOutstanding also gives an indicator of how many calls are effectively handled by your service. My experience is that lesser the number better the performance.
|To Throttle tweak these parameters||<applicationPool maxConcurrentRequestsPerCPU=”5000″maxConcurrentThreadsPerCPU=”0″
|<serviceThrottlingmaxConcurrentCalls=”10″ maxConcurrentSessions=”10″ maxConcurrentInstances=”10″/>|
|To Throttle Threads||<processModel autoConfig=”true|false” maxWorkerThreads=”num” maxIoThreads=”num” minWorkerThreads=”num” minIoThreads=”num”|
To wrap up, my simple rules would be,
- Know the internals. By passing them takes you no-where.
- Start with recommended settings and tweak only when you know you have to.
- Profile with counters.
- Find out what each counter value mean to your application.
- Go one at a time with changes. Don’t jump. You really would not know what made things good or worse.
- Repeat steps 3, 4, 5.
Should there be any information that is incorrectly stated, please drop me a note. You feedback is appreciated. Thank you.