ITIC 2014 Reliability Survey: IBM Servers Most Reliable for Sixth Straight Year, Cisco UCS Comes on Strong, HP Reliability Rebounds

For the sixth year in a row, corporate enterprise users said IBM server hardware delivered the highest levels of reliability/uptime among 14 server hardware and 11 different server hardware virtualization platforms. A 58% majority of IBM servers achieved “five nines” or 99.999% availability – the equivalent of 5.25 minutes of unplanned per server downtime compared to 46% of Hewlett-Packard servers and 40% of Oracle server hardware.

Those are the results of the latest independent ITIC 2014 Global Server Hardware and Server OS Reliability Survey which polled C-level executives and IT managers at over 600 organizations worldwide during March and April 2014.

The survey results showed that the overall reliability HP’s servers increased significantly in 2014 compared to the 2012 and 2013 polls and surpassing the uptime of rival Oracle servers which remained the same or declined slightly compared to prior polls. Cisco Systems, Inc.’s Cisco Unified Computing System (UCS) servers, which appeared for the first time in this year’s ITIC Reliability poll, made a very strong showing, posting uptime equal to or better than HP (depending on the category) and bested only by IBM server reliability. Half – 50% – of Cisco UCS server hardware users said they achieved 99.999% of per server/per annum availability.

IBM System z Enterprise high end enterprise mainframe class server was the only hardware platform that had no  – 0% – unplanned server outages of over four (4) hours duration. On the opposite end of the spectrum, 13% of Dell PowerEdge Servers racked up the highest percentage of downtime exceeding four (4) hours duration followed by 10% of HP ProLiant systems which recorded over four hours of unplanned downtime.

However, an in-depth analysis of the results indicate that the prolonged unplanned downtime of over four hours has less to do with the inherent reliability or instability of the Dell PowerEdge and HP ProLiant servers and is more indicative of end user behavioral patterns. Some 60% of Dell users and 53% of HP ProLiant users said they kept their servers for four, five or even six years or longer without upgrading/retrofitting or right-sizing the servers to accomodate more compute intensive workloads. By contrast, only 21% of IBM System x or IBM Power Systems users retained their servers for four or more years without upgrading or retrofitting them.

IBM Rock Solid Reliability

The fact that corporate enterprise users gave IBM hardware the highest reliability ratings for the six years that ITIC has conducted its Global Server Hardware and Server OS Reliability poll, speaks to the technical excellence and robustness of the servers. The rock solid reliability of its servers also reflects and underscores the consistency of IBM’s technical service, support, security and customer responsiveness over the last six years, while many of their rivals were beset with management woes (HP) or in transition due to acquisition (Oracle’s purchase of Sun Microsystems).

Additionally, IBM servers recorded the lowest incidences of the more significant Tier 2 and Tier 3 server outages lasting from one-to-four hours or more. IBM System x, System z and Power Systems i and p servers averaged the lowest percentage (4%) of one to four hours of per server/per annum server outages compared to 6% of HP ProLiant and Integrity servers and 8% of Oracle x86 and SPARC servers that recorded Tier 2 and Tier 3 outages.

Survey Highlights

Among the other top survey highlights:

  • A 79% majority of corporations now require a minimum of 99.99% uptime or better for mission critical hardware, operating systems & main line of business (LOB) applications, up from 67% in 2012-2013 survey.
  • Cisco, Dell, IBM and Apple (in that order) achieved the highest customer satisfaction ratings for products, service and support.
  • Nearly one-third or 31% of businesses don’t provide for hardware failover and redundancy and 12% of companies don’t bother to track hardware failure rates.
  • Human Error is the issue as identified by 44% of survey respondents that negatively impacts network reliability, surpassing inherent flaws in either server hardware or the server operating system. Bugs or flaws in the server OS came in second with 33% of those polled stating it undercuts overall reliability, while 30% of survey participants said the fact that their IT departments were overworked and understaffed undercut reliability.
  • Some 45% of respondents rely on the built-in redundant hardware capabilities of their servers to provide high availability and failover protection.
  • High end system reliability: Among the high-end mainframe class enterprise systems, the IBM System z delivered the highest reliability: it did not record any instances of the most severe Tier 3 outages lasting four hours or more of duration.
  • A 51% majority of respondents said their main line of business (LOB) servers were two-to-four years old. Of that number 30% revealed their LOB servers were two-to-three years old; 21% said they were three-to-four years old. Another 21% indicated their servers were four, five or greater than five years old.
  • Mainstream server reliability: Among the mainstream “work horse” servers, IBM’s Power Systems recorded the least amount of unplanned downtime, approximately 13 minutes per server/per year. By contrast, some eight percent of organizations using Oracle (formerly Sun Microsystems) x86-based servers and five percent of HP ProLiant servers experienced over four (4) hours of per server/per annum downtime and five percent of.
  • Server operating system reliability: On the server operating system side, IBM AIX v 7.1, Ubuntu v 12.04, Red Hat Enterprise Linux 6.5, HP UX v 11.3 and SUSE Linux Enterprise 11 and (in that order) registered the least amount of unplanned downtime due to inherent flaws in the operating system. IBM’s AIX v 7.1 running on Power Systems, averaged approximately eight minutes of per server/per annum downtime and recorded the least amount overall downtime due to Tier 1, Tier 2 and Tier 3 outages for the best reliability among survey respondents. Overall, Red Hat Enterprise Linux 6.5 and Canonical Ltd.’s Ubuntu v 12.04 were tied for second although custom implementations of Canonical Ltd’s Ubuntu v 12.04, which has been steadily gaining mainstream adoption, came in a close second to IBM, averaging about 12 minutes of annual server downtime. Hot on their heels was  SuSE Linux Enterprise 11 which perennially scores very high reliability.

ITIC’s 2014 Global Server Hardware, Server OS Reliability survey results indicate that the inherent reliability and uptime of nearly all of the 12 major server hardware and 18 server operating system distributions generally continues to improve year-over- year.

These reliability improvements are based on technical advances in the underlying processor technology such as Intel Corp.’s latest Xeon Processor E7 v2 family are optimized for high reliability and availability and when embedded in a server system that is properly configured, managed and right-sized to fit the application workload, these processors can boost performance by 35% to 60%. Enhancements in memory and disk technology as well as improvements to the core server hardware and server operating systems, have also served to fortify server reliability. All of these technology advances combine to improve performance, scalability, security and enable the servers to support larger applications, data intensive transactions and heavier workloads.

Vendors’ technical service and support; the ready availability of documentation and fixes/patches for known issues and security vulnerabilities also play a crucial role in helping to mitigate or exacerbate and extend outages.

At the same time, ITIC’s 2014 Reliability poll reveals that a variety of external factors are having more of a direct impact on system downtime and overall availability. These include such factors as human error; overworked and understaffed IT departments; the rapid mainstream adoption of complex new technologies such as virtualization and increasing cloud computing deployments and an explosion of Bring Your Own Device (BYOD) technology. And to reiterate, on the server hardware side it is apparent that when corporations opt to retain their servers for four, five and even six years or more without upgrading, they will experience significant increases of unplanned downtime.