How do we know the way to increase an MSP’s profitability? Because we have sat in your seat.
How much can we increase the bottom-line with Service Delivery Improvements? 10% year over year is our track record (or $31,200 per tech per year, to be more specific).
Recently, I was consulting with one of our nimble Customers and we were bogged down in a discussion around Request Segmentation. Now, if you have been following me for any length of time, you know I have strong opinions in this area. You also know that when it comes to Service Delivery improvement and moving the 4 KPI’s that impact profit, nothing is at the core more than Request Segmentation.
It was so frustrating to know something and not be able to explain it. To make matters worse, I have been living and breathing the subject for more than ten years, and at times thinking of nothing else (yes, really).
Then it happened… the definition of what Request Segmentation is burst out of me…at last!
Defining Request Segmentation (finally!)
Many years ago, at a very mature MSP, every non-Help Desk** ticket that came in was scheduled. Even if it took less than an hour to complete, 2 hours was the scheduled time in the Autotask Dispatcher Workshop calendar.
**Help Desk Tickets were End-User issues for Customers with a Help Desk agreement in place. Phone and Email requests were handled by the remote Help Desk Team. All others were handled and scheduled by the Customer Service Team.
Can you imagine the waste? Yes, there was a waste of time in overscheduling, but also in the noise caused by Customers, Account Managers, Management, & Techs trying to circumvent the scheduling system, especially the Owner.
In 2015, Incident Tickets for Managed Service Customers and Non-Contract Customers were moved to being remediated from a queue rather than hard scheduling a Service Call for every ticket.
This was much more efficient as it used the Next SLA Event Due Date to sort the tickets in the queue. A Non-Contract SLA was then created to mix Contract with Non-Contract Customers. The Non-Contract SLA was set at 2x Contract SLA, and it worked well. At 4x, upsell opportunities flew off the shelf as Non-Contract Customers started to understand the value of having a Managed Service Agreement – it is about response time, not $$$.
Things vastly improved by implementing a NOC system and sending all Incidents** to the NOC queue. It worked so well that it was not long before most of the noise was coming from Quick Hit Service Requests. You know, those things working as designed, but as a Managed Service Provider, we cannot straight face the Customer saying they need to wait three days for something that will take less than an hour to complete. Think Password Reset, Digi Cert license renewal, adding a new user to Off 365 (oh wait, that was not around in 2015).
So, we came up with the Priority: Quick Hit. Yes, it is a Service Request for the ITIL police, but because we are a Managed Service Provider and a d@*! good one at that, we are not going to make the Customer wait for these. We are going to put them in the NOC queue with the rest of the Incidents and give them a High SLA so that they are engaged on today and completed within a business day.
** Incidents: Not working as designed (ITIL), Break/Fix (Industry Jargon), Working yesterday, not working today (Stephen D Buyze).
We closed out 2015 and 2016 with a very high-profit thanks in part to the NOC model, even though the MSP industry had a downturn in 2015. Up until 2017, most of the Service Request** work was still being done on-site.
** Service Request: everything that is not an Incident.
The reason for the on-site scheduling is as a competitive advantage over the out of market NOC-only providers. But by 2017, the majority of on-site work was putting the MSP at a cost disadvantage and more and more work was being done remotely. We were still using a Service Call in the Dispatcher Workshop Calendar for every ticket, and still scheduling more time than needed…and still very inefficient.
Years before Incidents were moved into a NOC system, we had mapped every Recurring Ticket and aligned them, so that no Service Call was disrupted by any other scheduling, including the other recurring schedules. Once the mapping was done, the schedule was aligned to maximize Project Availability while leaving enough room for everything else.
From then on, the Recurring Tickets were never a problem within the MSP’s Service Delivery operations. The proactive maintenance program (for which Recurring Tickets was used) accounted for 30% of the labor utilization. The results of the mapping reduced the percentage of disruptions from over 40% to below 8%, and made a Service Manager very happy, not to mention the all-in Managed Service Customers.
Four Things to do Before Project Scheduling
In 2013, we defined a Project as any Service Request with estimated labor to be over 16 hours. The need to define a project was due to Project Scheduling disrupting everything we did in Service Delivery.
We also realized that before scheduling a Project, we needed to know four things:
- SoW – which would tell the dispatcher what skillset was needed.
- Estimated Hours – or how many hours would it take to complete the project. We always scheduled a 10% buffer to cover the time lost due to disruptions that always remain no matter how efficient the operation is running. Scheduling a buffer prevents the multiple project train wreck caused by the first project overrunning the end date.
- BoM with Estimated Delivery Date of Parts – no point in scheduling the engagement to start before the parts would arrive.
- Scheduling Pattern – there is a big difference between a network upgrade with new Firewalls going in first, followed by rack and stack of physical servers, VMware installs, and an Azure or NetApp backup schema, then an Off 365 migration (which is different than a PC refresh).
As you can see, based on the need for this additional information, following an incident workflow just isn’t going to work. From my experience, I can tell you that following any other Service Request workflow is not going to work either.
Projects Cause the Wheels to Come Off at Most MSP’s
I would like to digress from the storyline for a moment to say that it is Projects that cause the wheels to come off most MSP operations. Before there is a significant amount of Project work (10% of total available hours), life goes along, just dumping all tickets into a single queue and hoping they all get done within SLA.
It is when Projects don’t fit – which means they never get done & no one knows the status – that the Techs are forced to work after-hours. The noise, confusion, disruption, & lost productivity is deafening.
The noise gets so loud that the MSP actually notices it. The same noise occurs when Incidents are mixed with Service Requests; it is just at a much lower volume. Since the SLA is misconfigured, no one really listens – but it is there, just the same. Along with confusion, disruption, and lost productivity…
Finally, in 2018, it dawned on me that we could use the SLA engine to order all the Service Request Tickets (not Recurring or Projects, but everything else) with the Incidents in the queue. Thus, for the first-time Service Requests, engagements were just as efficient as Incident remediation. Voila, life is good.
If only it were that easy. While we successfully filtered out Incidents, Quick Hits, Projects, and Recurring tickets, what was left was a pile that still needed further sorting, filtering, rinsing, and fabric softener (ok, maybe not the fabric softener).
What to do with the Leftovers
Here are the challenges with the remaining Service Requests.
- Not all can fit in the queue. It is one thing to grab a ticket out of the queue and work on it for a few hours, but for a Tech to be out of the queue for more than four hours means either their eyes are off the queue for too long, or it will take them more than 4 hours to complete the engagement – neither is acceptable.
- The solution: any Service Request over 4 hours needs to be scheduled.
- All Service Requests take planning to be effective and efficient (this is true for Incidents also). Most of the time the planning can be done as part of the engagement, but sometimes it needs to be scheduled separately from the engagement (this is always true for Projects – planning time should be at about a week before the start date of the project).
- We must always keep Customers’ expectations in mind:
- Something that takes less than an hour to complete, they expect it today or tomorrow morning.
- Something that takes up to half a day (4 hours) to complete, they expect within a few days and always within a week.
- Something that takes a full day (8 hours) to complete, they expect within a week (they have to arrange schedules also) and always within two weeks.
- Something that is more than a day to complete (8-16 hours), but still not a project (16 hours), they expect within two weeks and always within three weeks.
We’re finished now, right? Whoa, hold your horses.
There you have it. Now that all Requests are segmented into their own workflow, everyone plays nicely: no confusion, noise, disruption – you have reached Zen or Nirvana.
Well, not so fast. We have used the term “Queue” throughout the article. By using Priorities and the SLA engine to order all tickets in the queue, we have introduced new challenges – go figure. It kinda feels like a marketing research project ending with the recommendation to do more marketing research.
When queues no longer work (see Out with the Queue in with the … article), Widgets are here to save the day…
For more details on Request Segmentation, check out this video: