In my previous post in this series, I posed a question: “Can perfect monitoring still end in failure?” I hope that you have come to the same two conclusions that I’ve come to: nothing made by humans is perfect, and even if monitoring were perfect, we can most certainly fail at the desired outcome. How?
Remember that the goal of monitoring is to detect the incident. Detection is merely one of many steps in a great incident management process. Of course, even if we are perfect in one process step, we can fail at other points along the incident management continuum. End-to-end incident management is outside the scope of this service monitoring series, but it is worth our time to discuss several key points.
Communication is just as important as monitoring
There are at least three key types of communication that must be successful to have a good outcome.
- Communicate to the service desk so they can communicate with end users who call them.
- This topic takes us back to the very first post in this series: Service Monitoring as a Strategic Opportunity.
- Have you every called your internet service provider (ISP) to let them know that you are impacted and they had no idea? Has that happened even in the midst of a service outage? I hope that experience has not happened to you. If it has, you probably were not very confident in the provider’s ability to run their service.
- Often, a lack of communication is worse than the outage itself. People understand service impact. As humans, we have all had things break. We have run out of fuel in our cars. We have had the soup that we were heating boil over. We understand mistakes and we understand failure. However, we all tend to loathe incompetence. The perception of ‘not being aware’ of a service incident smells of incompetence. What a shame it is for us to detect the incident with monitoring, but fail at communication.
- Communicate to the stakeholders so they do not get surprised.
- Has a service that you run ever had an outage during which your boss was surprised? If so, I am sure that he or she was not very happy.
- Communicate to the end user so they do not even have to call the service desk.
- If we go back to the ISP example, would you be delighted if they proactively notified you?
- You probably would not want them to wake you up to tell you they were experiencing a service incident. However, if you were trying to use the internet and you could not get to your desired website due to a service incident on the ISP’s side, it would be a great experience if they told you about the service incident right in-line with your experience (perhaps in the browser).
- I recognize that there are all sorts of technology barriers to that sort of inline experience (e.g. how can they tell me that I’m impacted if they are completely down? How would they integrate into my browser securely?) All of those are great points, but they are beside the point - the point is that the goal should be to deliver that experience to the end user.
- Start with the belief that there is at least one solution to every problem and find a way to succeed (e.g. the ISP’s router in your home recognizes the service failure and intercepts your traffic to present you with a user-friendly in-line notification)
Communications only work when information is consumed
For each person we communicate with, we must consider the best ways to communicate to achieve the highest communication consumption rates.
Like everything else, drive improvements via metrics
Let’s look at how we might measure this monitoring and communication topic from an outcome perspective so that we can drive the right improvements. Let’s take the end user scenario as our example. What are the possible outcomes?
- We caught the incident with monitoring + end user did not call.
- This is the desired outcome
- Success with monitoring (confirmed), success with communication (assumed), success with communication consumption (assumed), and no help desk call (confirmed)
- We caught the incident with monitoring + end user called the help desk.
- This is failure. What are the options?
- We were slow to communicate for some reason
- We communicated on time, but the communication was not consumed
- We communicated on time and perhaps the user would have consumed the information, but we scoped the communication wrongly (e.g. we emailed 999 impacted users, but we left off the 1000th person’s email address)
- We did not catch the incident with monitoring.
If we measure a, b, and c for every service incident, and if we break b down into the sub-categories, we will have great data with which to measure our success. And we will have data to use to target areas of focus to improve.
Wrapping things up
We have covered a lot of content, and we have covered many months - the first post in this series was in September of 2014. Time surely passes quickly.
The changes that we are seeing in the industry (and the changes in the world that our industry supports) are unprecedented. As cloud services and new modalities like virtual and mixed reality continue to change our lives, I encourage you to notice all of the services that underpin that transformation. As you continue to provide these services within your business, remember that service monitoring is one of the most important, foundational things that must be great for services to be great. Also remember, that service monitoring is a foundational service in its own right. And as we ‘ops folks’ continue to figure out where we provide value in the modern, devops world, we should see service monitoring as a tremendous opportunity through which we can invent the future ourselves. So please go invent the service monitoring future for your business. Our businesses need for each of us to lead that transformation.
Since we have come to the end, some thanks are in order. I would like to thank the AXELOS ITIL team for giving me the opportunity to write this series on service monitoring as a service. In my view, service monitoring is the first and most important step to succeeding operationally in the modern service management world, so I appreciate the opportunity to share my opinions in this blog series and in the Getting Started with CloudOps and DevOps: Service Monitoring as a Service webinar in October 2015.
Finally, I would like to thank YOU for reading the series. I hope that you found the posts helpful in some way. If the series was helpful and/or if you would like to see further content like this, please leave a comment in the box below.
More blog posts in the Building Service Monitoring as a Service with an Eye on the Cloud series
Read the first blog post from Carroll Moon, Service Monitoring as a Strategic Opportunity.
Read the second post, The Future of Service Management in the Era of the Cloud.
Read the third post, One Team - One Set of Service Management Objectives.
Read the fourth post, Service Monitoring Service Outputs.
Read the fifth post, Service Management Outputs.
Read the sixth post, Building Trust in the Service Monitoring Service.
Read the seventh post, Making the Service Viral.
Read the eighth post, Service Monitoring Application Development.
Read the ninth post, Monitoring Service Health.
Read the tenth post, Delivering the Service Monitoring Service.