Category Archives: Incident Management

Understanding How Your Help Desk and NOC Services Connect

This post about the Help Desk and NOC was initially published on CX Expert. It has been updated to reflect current market trends and information.

Perhaps, there is no other obstacle that causes more concern to managed services providers than how they are going to manage, monitor and provide services to their clients. For companies that are getting into managed services or cloud, one obstacle must be tacked. This obstacle is what you will do about your help desk and NOC. You must determine the best practices to follow, the cost, and many other essential things.

Origin of the NOC

The help desk and NOC is the heart of managed services providers. It is capable of providing redundancy, physical security and an area that is secure for collaboration for technicians to be able to manage and monitor customer environments.

Every managed service provider (MSP) service delivery model needs the NOC. However, many people don’t know where the idea of the NOC came from and how the early MSPs came to rely on them. With the advent of cloud computing and other business process and technological advances, it is essential to know how different factors influenced the NOC for the cloud provider and MSP today.

The idea of the NOC is not new – it has been around for a very long time. Initially, it was a creation of telecommunications, and it was used to monitor and manage telecommunications networks. Technicians sit and receive information in real-time inside the NOC. The NOC’s physical configuration allows for an intimate, secure and safe place for technicians to collaborate and discuss with other technicians on issues and problems that would otherwise be unsafe or unsuitable around other non-approved personnel.

MSPs came into existence in the mid-1990s. During this time, nearly all companies had a business plan. The plan included a 27/7 operational NOC that was physically secure from which to deliver their managed services. These configurations, procedures, and tools used in each of the MSPs were different, but the presence of the physical NOC was an essential and consistent characteristic.

The Help Desk and its Importance

A lot of people confuse the terms help desk and NOC when they serve two different and vital functions. The main reason behind the confusion is easy to understand when you look back at how they both came into being and how they have been split apart gradually in modern time managed services conditions.

We have already discussed the role of NOC, but it is important to look at how the help desk fits in the equation. Older NOCs have the help desk integrated into them to maximize the benefits of security, redundancy and collaborative work environments. The simplest and easiest way of distinguishing and defining the two models is system-based work, and the NOC performs network around the management and monitoring of objects that are under management with the MSP. On the other hand, the help desk is more responsible for interfacing with end-users, and it is also a customer-facing department. It helps to respond to problems and get solutions.

It is easy to see why the two areas seem to be similar. Each company should make these choices on their own, but it is important to understand that there is a difference between the two and they service unique and important functions within managed services practice.

The Interaction Between the Help Desk and the NOC

Now that we know the difference between help desk and the NOC, we must look at how they should interact with each other in managed services practices. It is important to acknowledge that there are different ways in which help desk and the NOC interact and co-exist.

Helpdesk Existing Within the NOC

Having a help desk and a NOC should be evident. All MSP’s logical and physical security controls can be addressed in one physical space. The physical access to workstations, how technicians log in to systems belonging to clients and change management are important to control and can be monitored effectively and be enforced if the NOC has the help desk residing into it.

There are more configurations where the MSP can create an entire floor of their work premises, where the help desk teams and the NOC teams work, complete with secure access, albeit in sections within the secure area. The main purpose of mingling the help desk and the NOC is to take advantage of the process efficiencies and security. If you take the security of your operation into account in the same way, there is no need to build out different facilities. Instead, you can just build it once.

There are many other benefits to have the configuration. Any interaction between the help desk and the NOC is naturally easier when they are both located within the same secure area. This can help with redundancy, and the continuity of the business plans should anything happen to the facility.

The Help Desk Existing Outside the NOC

In larger MSP environments, you will find the help desk outside the NOC. This is common, but it will depend on your unique situation. In most cases, you will find large companies with the help desk located outside the NOC because the help desk team won’t fit within the NOC. Typically, MSPs that need to have multi-time zone help desks or multilingual helpdesk do not manage the MSP centrally, but they have several help desk facilities that are located in different locations.

It is not always needed to build full NOC around all the help desk areas. Therefore, help desks exist outside the NOC. Regardless of how you operate and configure your NOC and help desk, there should be enough controls that deal with how the two elements in your business interact with one another. There should be documentation on the handling of trouble tickets, the handling of the connectivity within the MSP and how the redundancy of power is handled within the MSP organization to maintain operational effectiveness.

How to Build a NOC

This post was initially published on CX Expert. It has been updated and refreshed before being republished here.

A network operations center (NOC) is the central point of monitoring for your network. It helps to ensure uptime in your business. NOCs are not limited to networks. They can provide visibility into systems management, virtualized infrastructure, IT security and many more. In most cases, only large companies have resources necessary to create a NOC infrastructure that is effective. However, even small and medium businesses can gain visibility into the performance and availability of their networks by creating their own NOCs or using tools that NOCs use.

 For you to create your NOC, you don’t need to have a modern room that is full of expensive and high technology gear for network surveillance. It is possible to create your NOC just about anywhere and have the ability to know where and how network issues occur in time for troubleshooting. We will look at some essential capabilities to build into your network management system and get a NOC field of vision.

Centralize Alert M

When dealing with a growing network with a lot of devices from different manufacturers, alerts can be useful. You should be able to receive them in a single and central location for actionable and easy access and insight.

Alerts may include but are not limited to:

  • performance metrics,
  • availability statistics,
  • errors message,
  • hardware thresholds, and a host of other relevant factors.

The primary challenge with alert management lies in receiving alerts on time and managing them at a centralized level to help in comparing alerts, tracking alert history, eliminating false positives and deducing alert functions.

Group your Network Elements

It is possible to have network hardware in different models, types, versions, from different and unique makers in different locations. They are compatible with different platforms. It is therefore difficult to get an understanding of the network issues given the medley of network devices. The best solution is to create logical groups of your devices. This will allow you to monitor your devices as a group and not disparate entities. You can also create static groups to help you add network nodes manually. You can also add network nodes automatically, but this is based on a pre-defined condition.

You can get a logical understanding of the situation by using grouping devices for monitoring networks. You can also use grouping to set parent-child dependencies between your network elements, and this can allow you to eliminate any redundant alerts and be able to understand the impact of any faulty device on its dependents.

Customize Your Network Diagnostics

You may find it hard and frustrating that even though you have the ability to get network performance and health, you are not capable of dissecting the information as fast as needed. This ties directly to the KPIs you are measuring but in many cases, this problem arises due to the dashboard view that is used to see the network performance data.

You need to use a web-based dashboard because it is the best and can be viewed from anywhere. The ability to customize the dashboard will also allow you to understand your data easier and faster because you will see the things that need attention fast such as:

  • The top interfaces by traffic
  • The top errors and discards
  • The top interfaces by traffic
  • Top interfaces that face maximum percent utilization

Map device topology

You are sometimes faced with the work of searching for the reason that caused your network to go down without any clue if you are a network administrator. However, it is straightforward to pin the problem on a map and trace the source of the problem. You just have to use the mapping topology. This will help you to monitor the availability by looking at a map. You can use the following steps to do it.

  • Discover the nodes of your network such as interfaces, network devices, and servers
  • Place the nodes of your network on a custom map
  • Connect the elements of your network on the ARP table data to get a graphical depiction of both virtual links and physical links.

Unify Management Platforms

The budget is not the only thing involved but also the operational expertise, and management overhead are also needed to run different platforms for network management for different requirements. You can cut back on time and money if you unify the management platform.

Unifying your platforms can also give you a comprehensive view of the NOC functions. Therefore, it is important to look for a solution that can stand alone and still be compatible with other management modules for systems management, network configuration management, and virtualization management. You can simplify your operations if you have the same management platform. It also allows you to customize your interface and doesn’t require a lot of work to manage your NOC.

Make sure you can access the network performance monitoring data of your organization from the comfort of your workstation because this can be the most effective NOC that any network admin could ever have. Also, it is crucial to have the right NOC dashboard to get a comprehensive view that is always available to you. This clearly shows how the network devices are doing and the things that are causing your network downtime. You don’t need a chief network engineer when designing your NOC.

Conclusion

If your company manages multiple networks, then you should be aware of the challenges involved when it comes to monitoring them at once. The data from your clients and your data is private, and your networks should be running without any delays.

Unless you have a time in your business that can handle this high level of network management, you may want to outsource professional support. You can use NOC engineers and technicians to monitor the health of your infrastructure, the capacity of your infrastructure and the capacity of your infrastructure.

With all this important information, they can make informed decisions for your business and adjust the systems to optimize the productivity and performance of your organization. They will send out alerts in case of any issues based on the type, severity, and level of expertise needed to solve the issues and any other things your NOC team specifies. After resolving the issue, you can alter things in your system and monitor your system to prevent the problem from recurring.

WHAT IS A HELPDESK?

OK, to start with it’s not a desk that helps people! A help desk is a team of individuals (generally support staff) that provide solutions and resolutions to customers experiencing problems. Generally working at the 1st tier of the support model they are responsible for Incident reporting and resolution vs. Problem Management (I shall discuss those terms in greater depth below).


What is an Incident?
Simply put, an Incident is anything related to customer contact (Incidents are also reported by automatic means via monitoring tools and I will discuss those types of incidents in greater depth in later posts). Incidents related to customers can be anything really – Information requests, Account Updates, Issue reporting are all examples of Incidents. Incidents can also be reported through a variety of different methods – this could include the phone (probably the most common), email (a close 2nd) and even chat. As mentioned previously, automated monitoring tools can also generate incidents.


All of these different Incidents coming from/through different sources would get routed to your Incident Management tool. For smaller teams, this could be something as simple as a spreadsheet but in larger organizations either in-house customer-built applications or enterprise level tools prevail.



Incident Management (in a nutshell)
Your helpdesk is responsible for reviewing the information in each of these incidents and checking if there is an appropriate solution already available to the customer. For those instances for example where the customer wishes to update their Account Information, the helpdesk would look at the Incident, obtain the correct new information (& assuming that all appropriate security questions had been reviewed) log into the customers account and update the information. Once the information had been updated, they would inform the customer and then close the Incident. This is probably one of the simpler examples of an Incident from start to finish.


If the customer is reporting a problem or an issue, the Helpdesk staff are responsible for updating the Incident with all the relevant details as supplied by the customer. If the customer’s issue matches a known fix they are able to inform or supply that fix to the customer, however, if that is not the case they would need to escalate the issue to the Problem Management team. The simplest way to think of the Incident Management (Helpdesk/Tier1) team and the issues they resolve is that if a “band-aid” exists they can apply it. If more drastic attention is required they will need to call the Doctor!



Problem Management
Problem Management is where the interesting work really happens. Incident Management due to its repetitive nature can get tedious and is definitely a drain on the more skilled staff in your organization … if you have people like that, think about moving them into Problem Management if you have such a team or create one if you don’t! Problem Management is more in-depth. It’s where more often than not a single Problem is the cause of multiple Incident’s from multiple customers … as such you want your best people at this level. Generally, you would consider this Tier 2 or Tier 3 from an escalation and staffing perspective and dependent on your product or service you would have some very technically oriented people there. Their goal is not to just provide a band-aid, but rather to find out why the problem happened in the first place and fix it. Ideally, they should be looking at ways to fix it in such a way as to ensure that it doesn’t happen again!!



KPI’s
Now each of these teams would have different metrics in place. Obviously, your Tier1 team (Incident Management/Customer Service/Helpdesk) needs to get back to the customer in a timely manner. Their goal as already mentioned is to fix it, fix it fast and move on. A band-aid will not always reattach the finger though, so it’s up to the Tier2 team to ensure that the surgery goes smoothly which obviously takes a lot more time as you don’t want the surgeon doing a shoddy job!




Response Time – So with that analogy in mind … you want to have an aggressive goal set for your Helpdesk – try to work with the 80/20 rule … 80% of incidents responded to in 20 seconds (If you have the resources, otherwise maybe 20 minutes? Or 20 hours (that’s less than 1 day so might still be good – especially if you’re doing email support)? Or 20 days ß well that’s probably not really worthwhile) but hopefully you get the point? You want to set a specific goal for measuring how quickly your customers are getting a response.



Resolve Time – notice that I have separated these out. As much as you’d like to be able to resolve 100% of issues at that first contact, it’s not always going to be possible. However, you can have another measurement in place that tracks this which is the Resolve Time (sometimes called MTTR (Mean Time to Repair)). The Goal here is also to get that band-aid on as quickly as possible so you need to ensure that your Incident Management system has some sort of a knowledge base which helps your staff find the solution to commonly placed issues/questions. If they have the answer every time, then a 100% resolution at 1st contact is achievable! If not, however … it gets a bit more complicated because all of a sudden your Incident Management team becomes the customer and the team they go to is the Problem Management team. Guess what? They have a different measurement for Response Time and Resolve Time too!


Problem Management Response Time – now as previously mentioned these are generally your more senior staff and as much as you’d like them to be available 24/7 unless you have an extremely large organization this is probably fairly unlikely. So you are going to have built or determined some relevant response times based on their availability. In addition, as these escalated issues are generally issues that cannot easily be resolved, your resolution time is going to be extended also. Pick some appropriate intervals that meet your customers SLAs. Your main goal for this team (in addition to resolving the problem of course) is communication, communication, communication!!! They must inform your customer-facing agents what the issue is, what they are doing to resolve it and when they expect to have it resolved. If they cannot provide an estimated resolution time, they MUST provide your Tier1 team with an estimated update time.

The Difference Between Incident Managment and Problem Management

Incident Management and Problem Management are both key components of the ITIL service model and have been defined and created in an effort to provide a better and more streamlined service to consumers.

ITIL itself stands for the – Information Technology Infrastructure Library – and comprises of the following books:

  • ITIL Service Strategy 
  • ITIL Service Design 
  • ITIL Service Transition 
  • ITIL Service Operation 
  • ITIL Continual Service Improvement
Incident Management and Problem Management are both elements of the fourth volume – ITIL Service Operation, which tries to define the best practice for dealing with interruptions to a customers service.

What is an Incident?

An incident is a single – unique – issue impacting one specific customer and their service. While there can be many similar incidents impacting multiple customers, each of them are in their fashion unique and need to be logged and treated as such.

An example of an incident is you losing your home Internet connection. While the underlying root cause could be related to a fiber cut impacting hundreds of houses, your individual issue is one specific incident as it is unique to you. 

What is the objective of the Incident Management team?

The Incident Management team is the group responsible for dealing with your issue. Now they could be called by a variety of different names – Helpdesk, Service Desk, Technical Support Team etc… – their primary role is to get your service restored in as timely a manner as possible. They are basically there to put a “band-aid” on your problem and not necessarily resolve the root cause.

How are Incidents Tracked?

Incidents are tracked and responded to through a variety of different automated and manual tools. The ideal function of the Incident Management team is to resolve the issue before it has an impact on your business/life and they track these issues through a variety of different alarms and monitoring tools.

The worst type of reporting is one in which a manual report is needed. If a customer has been impacted, then in some fashion they have already failed in one of their primary roles!

What is a problem?

In the context of Incident Management, a Problem is one that comprises multiple incidents. If you take into account my previous example of an Internet failure at your home, the problem, in this case, would be the actual fiber cut which is the root cause of the issue.

As such, this “problem” would have multiple incidents attached to it.

What is Problem Management?

In contrast to Incident Management, Problem Management is a lot more than just slapping a band-aid on an Incident. With Problem Management the underlying root cause of an issue must be discovered and steps are taken to ensure that similar issues do not occur in the future. Problem Management is a significantly more involved process and takes quite a bit more time and resources to achieve correctly.