One of the interesting cloud computing topics is the SLA. In my many discussions with most cloud IaaS providers, I have heard them describe the merits of their SLAs. But, in reality, cloud SLA is an oxymoron. When you move to the cloud, you need to adopt a different way of thinking about SLA. Let me explain.
The main issue with SLA is any degradation in Quality of Services, such as failure, brownouts and latency, which harms the availability and performance of the online service itself. This is true whether my application is deployed on the cloud or on-premises. And the consequences of downtime, and the resulting liability, are basically the same.
So, why doesn’t it really exist? At least not in the traditional sense
I believe that the following three define it clearly – endless capacity, on-demand, and self-service. SLA is more a reflection of the Mean Time Between Failures (MTBF) as it is based on the core infrastructures. There is no such thing as a system that absolutely never experience degradation in Quality of Service for any reason from a simple bug all the way to human error. That said, some of the current IaaS vendors’ SLAs are presented as more of a market differentiator and not as a true performance indicator in the context of their cloud performance.
It is simple to relate to a small limited IT environment of resources and expect to get a great SLA. However, a change in expectations is needed when it comes to the endless resources of the public IaaS, its granularity, complexity, and dynamic behavior. Not to mention the IaaS economies of scale enabled by the use of shared infrastructure centralized storage and centralized networks. All of these generate the need to understand the probability of the service quality degradation but there will never be an option for a real certainty. It is impossible for the giant public cloud vendors to guarantee each and every point on the system.
SLA per a Single Resource
Let’s take a look at the specific cloud resource SLA. Can AWS provide a specific SLA for a single resource? The simple straight answer is No. The Amazon giant cloud platform is based on the probability of a failure. Make no mistake, the AWS on-demand or reserved instances are not SLA-based but price-based, and both might fail at any time.
“Region Unavailable” and “Region Unavailability” mean that more than one Availability Zone in which you are running an instance, within the same Region, is “Unavailable” to you.
Service Credits are calculated… or either Amazon EC2 or Amazon EBS (whichever was Unavailable, or both if both were Unavailable) in the Region affected for the monthly billing cycle in which the Region Unavailability occurred in accordance with the schedule below. AWS EC2 SLA white paper
If you still haven’t read your cloud provider SLA, I encourage you to do so. As quoted above, AWS considers SLA exceptions, which result in “Service Credits” and not compensation, based on the single Availability Zone status. If you are an e-commerce site, your infrastructure vendor won’t be able to compensate you and you might need to consider insurance for your online business. On these matters the Amazon SLA actually makes some sense and can serve as good example for others.
The only way to guarantee the SLA is by wrapping the actual application with managed services that include complete high availability tooling and services. This brings us back to the traditional managed service vendor where SLA can be negotiated, hence more tangible – but this is not the cloud.
The Cloud Shared Responsibility model
I find this model to be very important and relevant for this discussion. Amazon, the most mature cloud in the market, coined it and it actually defines the relations between the IaaS vendor and its platform consumers. The public cloud vendor’s is responsible for providing all the necessarily building blocks that enable the user to create a robust cloud operation.
Recognizing that the nature of IaaS contradicts the basics of a real SLA does not mean that the cloud is not a good viable computing solution for the enterprise. Instead of relying on the IaaS vendor, realize and understand the platform weaknesses and strengths and design your systems to cope in case of failure. Don’t forget that your SLA will be determined by your customers in keeping with the level of responsibility and liability you present in case of an event.
About Ravello Systems
Ravello is the industry’s leading nested virtualization and software-defined networking SaaS. It enables enterprises to create cloud-based development, test, UAT, integration and staging environments by automatically cloning their VMware-based applications in AWS. Ravello is built by the same team that developed the KVM hypervisor in Linux.