End user perspectives
The service abstracts the complexity of everything behind the interface to the end user. From the end user perspective, think of more mature services like electricity or telephony. Electricity can be steam, hydro or nuclear powered. The only thing relevant to the end user is first and foremost, whether their electricity service is available and then secondarily, the size and shape of the socket and the voltage. With the telephone, we care about getting a dial tone and fast, clear connection to the person on the other end – no matter what time of day we pick it up. Key to remember about providing any service is that the end users expect it to work as advertised, all the time. In fact, the best services just simply work and are only noticed if there is an obvious interruption or obstacle to getting that service.
Continuous availability of the application service is the new frontier in how IT operates. This reflects a change in the way that consumers are conditioned. Back in 1990, you would call your travel agent and leave a message to them saying you wanted to travel between London and Amsterdam. They would call you back in a few hours and tell you your different options. You would pick one and they would mail you your tickets. In 2000, you would go to an Internet site to see your options on your own, buy your ticket and use electronic tickets. You would even try a few websites to see if the prices were different and pick the one with the most favourable itinerary and price.
Today, for the most part, you will get the same itineraries and prices across the sites. If the first site you tried is unavailable, you won’t wait a half hour to revisit the site, you’ll move on to the next one and make your purchase there. The former requirement was presence; presence and accessibility of information. The new requirement is “persistent presence”, defined by Silicon Valley luminary Regis McKenna as “... the combination of applications enabling sustained and repeatable consumer access experience...”
SOA services
The service oriented architecture paradigm creates application services with well-defined interfaces that abstract underlying complexity from the consumers (subscribers) of the service. Just like the electric socket doesn’t care if you plug in a computer or a television, an SOA service doesn’t care if its subscriber is a CRM application or an ERP application. If different subscribers of the service require different service levels, the provider of the service has to account for the most stringent needs. SOA fuels the need for continuous availability. This in turn introduces a new level of service quality challenge.
An interruption to an SOA service impacts both critical and non-critical applications. When turned into a service, applications that have traditionally tolerated planned and unplanned outages can no longer afford disruptions.
Traditionally, availability has been an operations concern. The new generation of applications development has to consider continuous availability as a part of their design as well as their implementation. Building continuously available applications requires close collaboration between development teams and their counterparts in systems, networks, and database administration.
Enterprises that achieve the highest levels of application availability successfully bring down the applications and operations silos and deploy “always available” applications.
Developers need to take operations issues into account when designing and building “always available” applications. The first step is to understand service level requirements. The next is getting a better understanding of the enterprise infrastructure and the deployment environment. Building adequate instrumentation and graceful recovery into the application design significantly improves quality of the service and reduces cost of ownership.
Unexpected application failures are a fact of life in IT. Root causes include hardware faults, software bugs, human errors and disasters. The two important metrics in availability are the mean time to failure (MTTF) – how often does my application service crash - and mean time to recovery (MTTR) – how long does it take to bring it back up. By definition, availability is the ratio of mean time to failure to the sum of mean time to failure and mean time to recovery. The goal in a continuously available application is not only to maximise the mean time to failure but also to minimise the mean time to recovery.
MTTF is typically hardware-driven. Operations’ focus on reducing mean time to failure has contributed to the lack of communication between applications and operations teams. Disk crash or a CPU failure can be handled in isolation by the operations staff. With the advancements in hardware, application disruptions due to component failures can be isolated, managed or even eliminated should the operations team chooses to spend the money.