Am checking up on a friend who is using Office 365 within a team of 20 people, mainly for SharePoint 2013, and who relies on SharePoint support provided externally. My friend stated they want to ramp up the usage, but were concerned about service availability, and wanted to know whether it was possible to get a record of service uptime for Office 365. They were particularly interested in SharePoint Online service uptime.
That got me thinking. Not all customers using Office 365 will understand how to read the service status provided in Office 365. Even if they saw that page, without really understanding the meaning will probably gloss over some of the features within the service status offerings on that page. Also, considering the methods by which Office365 tenants are provisioned (off-the-shelf buy, through a re-seller, directly) it could be likely that literally any computer literate person could be drafted in to support the customer who takes on Office 365 who potentially has never worked in customer support!
So, yes, taking some time to understand the service statuses provided within Office 365 is useful. Particularly if you are responsible for managing SharePoint online through Office 365 (plus its other offerings), or if you are considering a move / hybrid approach and need to inform the client of the service expectation and what is used to measure service status in Office 365.
What is the Office 365 Service dashboard? The service dashboard is located in the Office 365 Admin centre, and then by clicking the View Details and History link at the foot of the current health list in the centre of the screen as shown in Figure 1.
Figure 1: The Office 365 Service Dashboard
When View Details and History is clicked, a screen showing the service health of Office 365 is displayed. These Service are Exchange Online, Identity Service, Lync Online, Office 365 Portal, Office Subscription, Rights Management Service and SharePoint Online and all are shown in expanded format showing the sub-services and their statuses. The service status is indicated by an icon which is displayed for each. Figure 2 shows each service icon, its meaning, the definition associated with that icon.
Figure 2: Office 365 Service Status icons, description and Definitions
Most of the above are self-explanatory, and for each service listed if there is any status reported other than ‘Normal service availability’ there is a link which is provided which when clicked allows the individual to get more information about the service issue.
An interesting one above is ‘False Positive’. For those working in email land you may have heard of this term before. A False Positive is essentially a message which is legitimate but marked as spam, which is then rejected or returned to the sender. So, what that means is that a report is provided that indicates a problem but does not provide clear proof. It is very important that these are noted, because if that’s not done, it will result in False Negative.
Without going into False Negatives (which I will write about in a companion article), let’s take a look at a False Positive example starring on-Premise SharePoint. Assume that a SharePoint 2013 on premise platform has a third party app deployed on it and a monitoring service carrying out remote scans of all apps and services on that platform. Note that it does not matter what the third party app does. The app identifies itself as version “1.6.11”. A remote scan provided on the platform identifies the app to be a vulnerable version (could be due to security, compatibility or other issues). The remote scan does not have further knowledge of the app, however, the scanner has reason to believe that a vulnerability exists and includes this in its reports. However, the SharePoint administrator (a human!) of the target system may know that this app has already been security-fixed to “1.6.11-1”, however, the app still identifies itself as version 1.6.11. Hence, this is a False Positive because the platform is deemed healthy.
I would suggest that False Positives are useful when identified, and should be recorded – so it has good reason to be there listed as a service status. One thing I should point out, however, is that the problem with tools to remote scan is that if they are configured strongly enough to be effective, there’s a significant chance of receiving false positives. If too many false positives are received, the monitoring becomes less proactive to the point were real issues are ignored, because of the volume of false positives and the assumption therefore that a human already ‘understands’ that there is no issue (when there is!).
Going back to getting more information concerning the service status. Figure 3 shows a list of the current health against each service in Office 365, and for any there there are issues links are provided for more information. SharePoint shows as being in extended recovery in the screenshot. You can click the View details link to get more information concerning the issue. In the figure, I have deliberately scored over the date…
Figure 3. Example of the Current health of services section in the Service Overview page
When clicking on the view details against a particular service status (not normal service status), another page is displayed giving further information concerning the issue covering issue, resolution action and date of next update. Figure 4 shows the page displayed when the view details link is clicked (again I have deliberately scored over the date).
Figure 4. The details page regarding the relevant incident when clicking the view details link at the foot of any service which has an issue on the service overview page
So, the service overview page is a good resource for identifying how the products provided in a Office 365 tenant are performing. However, without understanding the lifecycle of a service incident, it will be problematic in identifying whether a service incident is being fixed, has been fixed, is still under investigation and so on. What is the lifecycle concerning a service incident and how is that reflected on the service health dashboard? Here is an outline of that process:
- The incident occurs
- The service health dashboard is updated to ‘investigating’
- The incident status is posted to the service health dashboard
- The incident is resolved
- The closure summary is posted to the service health dashboard
- The post incident review is posted to the service health dashboard
I think this lifecycle is important to describe to your customer, as well as your service desk team (if for example you have an internal support team). When explaining the lifecycle of a service incident to a new customer, do this as part of providing Office 365 in the first place. If this is not done, there will be an assumption from the customer that they have a direct line to Microsoft Support. They will assume that they can quite literally pick up the phone, report a user challenge which they think is resolved using the product, and expect it be resolved immediately and that the resolution meets their exact requirements. Besides which, even understanding the sheer wealth of information provided on the service dashboard will overwhelm the customer, particularly if the customer has not been taken to identify, using the Service Level Agreement, the products listed of the service dashboard which will be of applicable to the them. That’s not to say the support you provide does not monitor the other services, it is just that in terms of priority that there is information going back to the customer that clearly identifies what is supported.
Conclusion.
Understanding the service dashboard is only a part of the picture in providing a successful support service for an Office 365 customer. In order for it to be effective and measurable, the results being displayed on the dashboard needs to be made meaningful and each service status to be relayed to the relevant customer in a way that makes it useful to them. So, to effectively support Office 365 and to manage customer expectation, you should define a Service Level Agreement which maps onto exactly what will be supported, since that is the key that will help you and the customer able to map the service status provided by the Office 365 service, give that meaning and provides a useful resource which can be measured.
I repeat, the principle here is a matter of what is supported. Your job as SharePoint support is to support the information worker, and in that sense, you support the usage of SharePoint. Your job is intertwined with how SharePoint is exploited to the benefit of the business. Your job is user support. However, Microsoft will see things from an entirely different perspective. Their job is product support. Microsoft cannot be expect to support your users, and certainly cannot be expected to precisely understand how your business applies to their products. So, when your query goes to Microsoft Office365 support, that query will be of several to do with the products provided within Office 365 of which SharePoint is one. That support will answer the question from a product perspective point of view. Your job, is to translate that inherently generic information into the specific information your end user needs. That means that the service status messages that are provided within the dashboard are not specific to your users concerning their use of the product, and instead the service status of the entire product provided to all customers. You must therefore take the information provided by the service status messages and relay that to the customer in a way they understand using the Service Level Agreement.
The Service Level Agreement is vital, for in the end, the only support service truly qualified to support your users is your own. The service dashboard is a resource, for that is the best it can be. If you depend 100% on Microsoft Office 365 support, the best you can hope for is an accidental or actual coincidence of purpose. I believe that is a foolish prospect, whereby one would hope or even attempt to engineer that the two widely differing set of goals (those of Microsoft and your end users) coincide somewhere, so that your company can extract some real benefit from ‘supplier’ support. I would not put all my eggs in relying on complete Microsoft support. You will still need to provide real user support and that means building a proper support model which exposes the service status. The alternative (getting the supplier to provide end user support) will simply not happen in the way the customer will expect or fully understand.
References:
SharePoint Service Level Agreement Guide
Book – SharePoint 2013 Adoption and Governance Guide