Erlang 'C' & Scheduling for Call Centres - II

Escalation Matrix -

OK, great, we've got an SLA, we have the appropriate staff in place to take the call when they make it ... now, what happens if they are not able to fix the problem?  Easy!  You get the problem to the right people that can fix it in a timely manner.  This is where the escalation matrix comes into play.

Assuming that its 2am and you've got a Tier 1 customer (remember, your definition of Customers was made before this) that has no telephone service (hard down etc...).  This is impacting them and potentially costing them $$$/hr.  Your engineer has taken the call and started working on the issue.

Now as this company is paying you lots of money for the service (that IS why they are Tier 1 after all), you need to ensure that you've got ALL the right people available and working on their problem as quickly as possible.  A sample 2 Stage Internal Escalation Matrix that I've used with great success in the past is presented below.    You will need to have a separate matrix that is provided to Customers which I shall provide you with in a later post.

Provided below is a table detailing the different groups & times that they need to be notified at based on the problem & its impact.

Please note – its fairly easy to remove the additional Escalation Group step mentioned below, if your escalation is to only one group! This type of structure only applies to larger companies where the problem and responsible party could be in a variety of different locations. The groups mentioned below also vary based on the type of organization - for example, Ops/NOC is applicable to a telco environment but not necessarily a manufacturing one.


SLA & Tiered Service Levels

SLA - this is a difficult one.  You obviously want to offer all of your customers the premier, best in the world, platinum level of service, but unfortunately, that does not always make financial sense.  Customers need to be tiered dependent on the amount of money they pay you (see my post on the 80/20/30 rule) and incidents/problems need to be tiered dependent on the impact to their business.  

It makes for an interesting measurement or matrix but a basic one that will need to be customized for your business is provided below.

Tier 1
Tier 2
Tier 3
Priority 170% + Service Impact or

Total Loss of Service

5min Response

4hr Resolution
70% + Service Impact or

Total Loss of Service

30min Response

8hr Resolution
70% + Service Impact or

Total Loss of Service

1hr Response

24hr Resolution
Priority 250% - 70% Service Impact

15min Response

8hr Resolution
50% - 70% Service Impact

1hr Response

24hr Resolution
50% - 70% Service Impact

4hr Response

72hr Resolution
Priority 3Up to 50% Service Impact

30min Response

12hr Resolution
Up to 50% Service Impact

4hr Response

72hr Resolution
Up to 50% Service Impact

24hr Response

96hr Resolution

Please note the difference here between 'Response' time and 'Resolve' time!!  Make sure that you use this to your effect as problems cannot generally be resolved immediately on 1st contact ... work to analyze the problem takes time.  Don't kid yourself otherwise!

As you can see from the table/matrix mentioned above (hope it's not too confusing?) reading from Top Left -> Bottom Right your SLA follows a specified path.  Dependent on the Tier of your customer and the impact to their business a specified service level is offered to them.

A key point to make is that the SLA needs to be something that is achievable - having a customer facing SLA that is more stringent than your own internal OLA (the service level offered by your own internal departments) is doomed to failure and unfortunately some fairly large financial repercussions!

Erlang 'C' & Scheduling for Call Centres

Erlang 'C' is a Nobel winning formula used in the Call Centre and Operations industries to determine the correct and appropriate level of staffing based on key call metrics.  The scary looking formula for this is below and the even scarier explanation from Wikipedia is here.

From a Call Centre and Staffing Point of view, the primary elements considered are as follows:
  • Average Talk Time
  • Calls/per specified period (15min is a good benchmark)
  • Specified Service Metrics or SLA (ie. 80/20 <- 80% of calls answered in 20s or less etc...) ... correspondingly, you want to consider your abandon %'age here also.  Are you willing to accept that some of your customers will hang up?  If so, how many & consider what impact that will have on your business in the long run!
With this information in hand and using the formula, you are able to determine how many resources you need in a given period to meet your customer demand.  Using some free online tools (links provided below), you are also able to determine your required resources based on a specified timetable and rotation.  For example, if the formula states you need 8 resources between 8am-9am and you are running a 24/7 call center the actual number of staff you need to employ is 'X'.

Some Good Free Erlang 'C' Calculators -


With this in mind, you still need to plan for excess capacity relevant to staff absenteeism either planned or unplanned.  So although the formula only called for 8 staff & your overage based on a 24/7 call centre is 'X' ... you should actually plan to have 'Y' resources available to cover these gaps!!

KPI's and the Importance of Measurements (part 2)

Continuing from my previous post here, we're going to get more in depth into KPI's and their measurement now.

How do I measure KPI’s?

Get the data (whatever is important to you ... if you use the examples previously mentioned, then track service outages by minutes for example vs. a specified date) into your spreadsheet or other tracking tools, then keep on adding more and more information every time you have another service interruption or outage.  

The key here is consistency and ensuring that you reflect as realistic a picture as possible so the more information you can capture the better.  If you are measuring outages, then make sure you reflect the customers impacted, the total amount of time, the volume of calls or interactions it created and the reason for the outage (even a simple 3rd party vs. internal tag is important as it tells you where you need to focus your attention).

Once the data has been captured - make sure you have and are using the right tool for this ... (a spreadsheet as mentioned is great in the early stages but if you can tie this back into a good Incident & Problem Management system and/or a database (I'll get into ITIL and Six Sigma in later posts) you're going to do really well!) - then you need to come up with an appropriate means of analysis.

We are all familiar with the disparaging quotes about statistics (including "There are three kinds of lies: lies, damned lies, and statistics", attributed to either Mark Twain or Disraeli, depending on whom you ask), and it's no secret that many people harbor a vague distrust of statistics as commonly used.

Averages don’t tell you very much. One data point that is extremely far outside the curve will skew everything towards it so care must be taken to ensure that you are measuring information correctly.

Good analysis is an ongoing process, so set targets and assesses whether any changes you make are improving your KPI’s or not.

KPI's and the Importance of Measurements

There is a great quote that goes something like -

"If you cannot measure it, you cannot manage it!"

 ... this is so true and especially so in the Technical Support, Customer Service, and Operations areas.

There are great KPI's (Key Performance Indicators) and not so great ones.  The key is choosing the right one for your business and you need to choose it from a CUSTOMER point of view.

There is no use choosing your KPI from any other area as if you lose your customers, you lose your revenue and obviously you lose your business!!

When defining a set of KPIs to control and measure performance, the most likely debate is probably around measuring KPIs.  Another way to think about KPIs is that they are measurements designed to assess performance.

The Traditional Mantra is -
“Measure. Analyse. Act” 

KPI’s are the middle stage, but they’re defined by the first and they should drive the third.

What KPI’s should I use?

Your choice of KPI’s depends on your intention and target audience.  Which problem or issue are you trying to solve, whom is it impacting, what is the impact and what outcome would you like to see afterward are all good questions to ask when building a KPI plan.

Two common KPIs are 1st Call Resolution and Downtime (please note I have not said these are good ones - that is something you will need to determine for yourself depending on your interpretation of whats important to your customer ... this is something I shall discuss in greater detail in later posts).

Similarly, KPI’s should be measured over time and you should not expect your initial snapshot to give you the full picture as you will frequently have to 'massage' and/or revise your measurement criteria and focus until you are measuring the correct information.

1st Call Resolution - 

Measurement of the %'age of customer issues resolved at the first call.

% Uptime/Downtime -

Measurement of the %'age of time the service is available (or not).

These are just 2 of the hundreds of different KPIs out there ... a great place to find more is here and it is well worth your time to visit!

Another problem you might have though is that you don't have any way to measure this ... that is something I will discuss further in later posts.