Throttling WCF Services On IIS7

I was part of a project where we had to tweak one of our WCF (.Net framework 4.5) services that were hosted on IIS7. The target was to achieve a throughput of 1500+ messages per second with sub-second response. The service was synchronous blocking and had an XML payload as request-response. There are a lot of good articles scattered over the internet. I have tried to summarize all those so you get a good starting point. The reference section below contains links to all the articles that I have referred to. Going through them had helped me understand what happens under the hoods and more importantly, what options are available to tweak to achieve ones performance goals. Hope you find this write-up useful.

How A Request Is Processed By IIS7?

The request processing models vary significantly between versions of IIS. This section provides a quick gist of how a HTTP request is processed in IIS7. The following diagram represents how a typical WCF request is processed.

IIS WCF Processing

HTTP.SYS is the Kernel driver that receives all HTTP / HTTPS requests and sends them to IIS for processing. For each application hosted in IIS, an ASP.NET Worker Process receives the request for processing. Internally this has a pipeline of handlers, through which the request is passed. One of those handlers is the WCF “Service Model” handler. This is responsible for instantiating the WCF Service instance to process the request.

The HTTP.SYS delegates the call to the ASP.NET Worker Process over an I/O Thread from the CLR Thread Pool. Once the ASP.Net gets the request, it en-queues the request on a Worker Thread from the CLR Thread Pool. If ASP.NET got a Thread, it returns an HSE_STATUS_PENDING to the HTTP.SYS I/O Thread freeing it to take the next request.

The Worker Thread that picked up the request, delegates the call to the WCF on an I/O Thread. This again is non-blocking (In IIS6 this, the WCF handler was invoked synchronously), so the ASP.NET Worker Thread no longer waits on WCF to process the request.. WCF internally uses I/O threads from CLR Thread Pool to process the request.

Request Processing & Intermediary Queues

While the request is being processed, it goes through multiple stages.  Each stage is completely asynchronous to ensure that the thread from the previous stage that hands to over request is not blocked till the life time of the request at that stage.  Being asynchronous, each stage maintains its own internal queue to stage and process the requests. Un-optimized, badly tuned setup will make requests trapped in these queued leading to low throughput and high latency. This section would help you to understand what those queues are and the reason why requests get queued and what performance counter that you can look for to identify a problem.

IIS WCF Processing Queues


All Requests are queued in the HTTP.SYS Queue which is maintained for each Worker Process. The default queue limit is 1000, which can be modified using the “ApplicationPool’s Queue Length” property on the IIS Manager. Requests get queued when IIS Worker Process is not able to de-queue at the required rate. If the Queue is full a 503 response is sent back to the client. Watch the performance Counter, “Http Service Request Queues\CurrentQueueSize” to see the count of items queued when there is latency. Mostly this value is good enough, unless you have a huge traffic with large payloads being exchanged for each request.

ASP.NET: CLR Thread Pool queue

ASP.NET queues all incoming requests from the HTTP.SYS to the CLR Thread Pool.  If there are no CLR Worker Threads available to pick the request, this queue starts getting filled up. The Queue limit is controlled by “Process Model’s RequestQueueLimit”. If the queue is full a 503 response is sent. The default limit of this queue is 5000. Watch the performance Counter, “ASP.NET v4.0.30319\Requests Queued” to see the count of items queued when there is latency. Again this value is good enough and need not be tweaked unless necessary.

ASP.NET: Worker Process Native Queue

After the ASP.NET Worker Thread picks up a request for processing, there’s one more parameter that is being checked for before the request is actually handed over to WCF. That is the “Application Pool’s MaxConcurrentRequestsPerCPU”. This is a safety net to prevent un-controlled number of request flooding at the ASP.NET layer. With this setting the number of concurrent requests executed per CPU is limited; thereby the Thread usage is optimal. The default value is 5000. Watch the performance Counter, “ASP.NET v4.0.30319\Requests Queued” to see the count of items queued when there is latency.

WCF: Throttling & Utilization

The ASP.NET finally hands over the request to WCF over an I/O Thread. The WCF Service Model is responsible for processing a WCF Service request. The number of requests processed is controlled by the “ServiceModel/serviceThrottling Settings – maxConcurrentCalls,             maxConcurrentSessions, maxConcurrentInstances”. These 3 parameters limit the maximum concurrent behavior. For e.g., if the maxConcurrentCalls is set to 500, at any given point, there can only be a maximum of 500 calls being executed concurrently. If there are 600 calls received by the service, the remained 100 would be queued waiting for its turn. WCF 4.0 had introduced a good set of counters to indicate if the throttle settings are fully utilized or underutilized. The value indicates how much percentage of the Throttle settings is actually used. E.g., A value of 50 on a maxConcurrentInstances = 500, means only 50% of maxConcurentIntances of used, i.e., only 250 concurrent instances are created to service the load. The default values listed in the table below.

Property .NET 4.0 Default Previous Default
MaxConcurrentCalls 16 * ProcessorCount 16
MaxConcurrentInstances 116 * ProcessorCount 26
MaxConcurrentSessions 100 * ProcessorCount 10

WCF 4.0 Default Throttling Values

Look for the following Performance Counters to monitor the utilization of the Throttle settings.

  • Percent of Max Concurrent Calls
  • Percent of Max Concurrent Instances
  • Percent of Max Concurrent Sessions

Why High Throttling Values Won’t Help?

Arriving at the optimum Throttle value for your WCF Services may not be always straight forward. Setting arbitrary high values to the throttle parameters would not provide better results. For e.g., let’s say you except a constant concurrent load of 1000 Requests per second. Now you would be tempted to set the maxConcurrentCalls and maxConcurrentInstances to 1000 each, assuming that no request shall be made to wait, since you have provisioned resources to WCF proportional to the load you expect. But this works on the contrary. When you allow higher number of concurrent calls to execute that many Threads are created. Obviously not all Threads can be executed at the same time, which depends on the Processor you have. This leads to excessive context switching wasting your CPU cycles rather than processing the actual request.

Where Do I Start?

The best values to start would be with the values listed in table “WCF 4.0 Default Throttling Values” and tweak them as you go till the point you get the best response time and throughput. The CallDuration and CallsOutstanding performance counters are good indicators of how much a request takes for processing and how many calls are being executed.

Keep watching the CPU Utilization as well. Keeping the utilization around 70%-80% would give you better performance. Accordingly tweak the maxConcurrentCalls and maxConurrentInstances.

Watch your Garbage Collection cycles

Garbage collection happens when there is no space available to create new objects. GC runs on a high priority thread, which completely suspends the processing of the WCF Request. Excessive GC would adversely impact the Throughput and response time. Make sure that objects are created only when they are needed and destroyed as early as possible. Keep looking for the “% Time Spent in GC, Gen0, Gen1, Gen2 Collections”. These should give a fair idea of how the memory allocation happens and its impact. This directly impacts the CallDuration and reduces the CallsOutstanding.

CPU usage will be high during a garbage collection. If a significant amount of process time is spent in a garbage collection, the number of collections is too frequent or the collection is lasting too long. An increased allocation rate of objects on the managed heap causes garbage collection to occur more frequently. Decreasing the allocation rate reduces the frequency of garbage collections. You can monitor allocation rates by using the Allocated Bytes/second” performance counter.

.NET ThreadPool Tweaks

At every stage in processing a WCF request Threads are involved. These threads are requested from the .NET CLR Thread Pool. The number of minimum and maximum Threads plays a role in getting you the optimum performance. For a WCF Service, when self-hosted or IIS-hosted, the number of Threads available in the ThreadPool determines how far the Service can scale. As mentioned earlier, more threads do not mean higher scalability. Set the “maxWorkerThreads, maxIOThreads of the processModel” section of the ASP.NET application to an optimum value. To start with follow the general recommendation.

  • Set the values of the maxWorkerThreadsparameter and the maxIoThreads parameter to 100.
  • Set the value of the maxconnectionparameter to 12*N (where N is the # of CPUs).
  • Set the values of the minFreeThreadsparameter to 88*N
  • Set the value of minWorkerThreadsto 50.

By setting the Auto config to true, you can get know what it takes for your service to handle the load. Based on feedback the .Net ThreadPool recalibrates itself by creating the required number of threads to handle the load that your Service receives. But sometimes a much smaller value could give you much better performance. And that’s what happened in our case.

Performance Counters & Throttle Settings Quick Reference

This section can serve as a quick reference for the monitoring counters, throttling settings to look for at each stage.

Measuring messages received at each point would help you identify areas where requests are made to wait. Watch these counters. HTTP Service Request Queues / Arrival Rate ASP.NET Applications / Requests/Sec Calls Per Second
To Identify where requests are Queued, use these counters ASP.NET Request QueuedASP.NET Request WaitTIme
To check if the Throttle Settings are better utilized Percent of Max Concurrent CallsPercent of Max Concurrent Instances

Percent of Max Concurrent Sessions

CallsOutstanding also gives an indicator of how many calls are effectively handled by your service. My experience is that lesser the number better the performance.

To Throttle tweak these parameters <applicationPool    maxConcurrentRequestsPerCPU=”5000″maxConcurrentThreadsPerCPU=”0″

requestQueueLimit=”5000″ />

<serviceThrottlingmaxConcurrentCalls=”10″        maxConcurrentSessions=”10″          maxConcurrentInstances=”10″/>
To Throttle Threads <processModel    autoConfig=”true|false”   maxWorkerThreads=”num”   maxIoThreads=”num”   minWorkerThreads=”num”   minIoThreads=”num”

To wrap up, my simple rules would be,

  1. Know the internals. By passing them takes you no-where.
  2. Start with recommended settings and tweak only when you know you have to.
  3. Profile with counters.
  4. Find out what each counter value mean to your application.
  5. Go one at a time with changes. Don’t jump. You really would not know what made things good or worse.
  6. Repeat steps 3, 4, 5.


Should there be any information that is incorrectly stated, please drop me a note. You feedback is appreciated. Thank you.

Get Started!

In the world of Internet, there’s so much and its all over. So getting started with a something new could be sometimes daunting. Following are some of the good technical reference materials that I found to be useful when trying to explore new / existing topics. Hope these cut down your cycles!

Reactive Extensions ( Rx )

  • A very good introductory video on Reactive Extensions(video).


  • REST explained in very simple, understandable terms. Recommended for anyone who wants to get started with REST(video).
  • Now read the original dissertation of Roy Thomas Fielding.
  • A good coverage of different articles on InfoQ related to REST. Do read the REST implementation of “how to GET a coffee use case”. It gives a perspective on how REST can be used to model a complete state machine(article).

Twitter Bootstrap

  • Watch this video if you want a quick hands on intro.
  • A simple step-by-step introduction to the Bootstrap framework. Its a multi-part series, with inline links to subsequent articles.
  • A crisp introduction to LESS & Sass


There has been a lot of talk and hype 🙂 on this. So just wanted to give it read and see if its really worth all the hype.

  • Read this and this to get know what Node.js is all about.
  • A very crisp unbiased view of Node.js from Chris Richardson. This really helped to understand how to fit Node.js from an Architect prespective. I have decided to try out a few samples myself.

Microservices Pattern

The above video on Node.js touches upon using Node.js as an API gateway in Architectures based on Microservices. I was curious to know what those were. The below ones came to my rescue :).

  • Read this good writeup again by Chris Richarsan to understand Microservics pattern.
  • Another deep dive introduction by Martin Fowler &  James Lewis.

Angular JS

A lot of buzz around this SPA framework. Check out this video for a very through overview of the nuts and bolts of Angular JS. Another good hands on video for Anjular JS.

Using Disruptor-net With WCF – Part 2

This is in continuation to my earlier post of using Disruptor-net with WCF. I have attempted to try out the Disruptor-net v1.1.0.

I have renamed the classes from my previous example to suit the naming convention of the Disruptor-net v1.1.0 (Port of the Java Disruptor 2.7.1). Events are published by Publishers and consumed by a Consumer.

The Events (of type Event) are Binary serilialized using ProtoBufNet and publised by multiple WCF Clients (EventPublisher) to an WCF Service (EventService) over NetNamedPipeBinding (This can be replaced with other types of Bindings based on the need).

The Consumer consists of the Disruptor, setup to invoke a chain of EventHandlers in a Diamond Path configuration. The DiamondPathEventHandlingSequencer is a simple DSL based class that encapsulates the setting of the Disruptor for a Diamond Path event handling sequence. The RingBuffer is filled up with instances of type ISequencerEntry. This interface provides methods for the IEventHandlers to read and write state of the Event as it traverses through the Diamond Path event handling Sequence.

Three event handlers are hooked up using the Disruptor in a Diamond Path Configuration. They resemble the architecture of LMAX explained here. They are the Journaler, UnMarshaller and Business Logic Processor. To keep the example simple I have not included the Replicator.

The EventJournaler journals all incoming events to a persistant store provided by Redis.

The EventAssembler deserialises the incoming Binary stream and writes the assembled Event instance back to the RingBuffer.

The EventTracker plays the role of the Business Logic Processor. In the context of this example, it just reference counts the number of events coming from each EventSource.

The above 3 handlers are setup inside the EventSequencer, a singleton wrapper over the DiamondPathEventHandlingSequencer.

The EventService publishes the incoming events to the EventSequencer, which internally is publised to the RingBuffer through the DiamondPathEventHandlingSequencer. Subsequently the RingBuffer takes care of invoking the Event Handlers in the appropriate sequence.

I also ran a few simple tests on my laptop, a Core i7 2.20GHz machine with 4GB memory. Below are the results.

A very trivial inference is that the Throughput of the Consumer almost equals to that of the Publiser, when there is minimal latency in the Event Handling path. So Distruptor does its job of sequencing the events across threads in an most effective way without incurring a lock over head.

The source is available here. The Code is self explanatory. It has two parts, a Publisher and Consumer. Ensure that the Consumer is started before running the Publiser.

This had been my second adventure with Disruptor, so any feedback on incorrect or better usage of the Disruptor would be much appreciated.

Using Disruptor-net With WCF – Part 1

Many of you should have been quite familiar with the Disruptor from the LMAX team. I happened to read through and did try out a sample project to see how the RingBuffer can be used with WCF.

To get started I picked up 1.0 .Net version of the project from here. Although the 2.7 port is available, I thought let me get familiar with 1.0, so I can better appreciate the 2.7 improvements.

The sample code is very simple and straight forward. It illustrates how the RingBuffer can be used to sequence calls from Multiple Producers to a Single Consumer.

The code is self explanatory if you are familiar with Disruptor internals. Here is a very brief summary of the code.

A Consumer WCF Service receives Byte Stream messages from Producers and dumps them into the MP1CSequencingBuffer. The MP1CSequencingBuffer is a wrapper class around the RingBuffer. This class abstracts away the details of setting up the RingBuffer, the logic of adding an entry to the RingBuffer and finally getting notified when an new item is available through the IBatchHandler. The ByteArrayItemProcessor simulates the the Business Logic processing. The incoming byte stream is de-serialized and a simple message counter is maintained to track messages from each Producer.

The Solution has 4 projects. The Disruptor, a Producer (WCF Client Console App) and a Consumner (WCF Service Console App) and a shared project with utility and shared classes.

Make sure to start the Consumer before starting the Producer.

This is my first attempt on understanding and using the Disruptor. So any feedback or improvements would be gladly accepted and greatly appreciated.

The code can be downloaded from here.

.Net Generics – 2

Generic Singleton Type

We have all writen Singleton implementations in our code sometime or the other. The basic code structure is the almost the same.  Have a static member of a type, check for null before initialization and return the instance. Additionally, lock the object if you need to have synchronization logic.  The same boiler plate code can be re-written using Generics so that it can be re-used.

public class TSingleton<TType>
        where TType:new()
        private static TType _instance;

        public static TType Instance
                if (_instance == null)
                    _instance = new TType();

                return _instance;

For example, let say we have classes called DBService and Logger which you would like to be a singleton’s. Using the above type, you can your singleton’s as,

DBService dbService = TSingleton<DBService>.Instance;
Logger logger = TSingleton<Logger>.Instance;

Thats TSingleton!