For the fifth year in a row, IBM servers delivered the highest levels of reliability and uptime among 14 server platforms.
Those are the results of the latest independent ITIC 2013 Global Server Hardware and Server OS Reliability Survey which polled C-level executives and IT managers at over 550 organizations worldwide from August 2012 through January 2013.
Among the high-end mainframe class systems, both the IBM System z and the Stratus Technologies’ ftServer 6310 delivered the highest inherent reliability: both had no instances – 0% – of the most severe Tier 3 outages lasting four hours or more of duration. Among the mainstream “work horse” servers, IBM’s Power Systems recorded the least amount of unplanned downtime, approximately 13 minutes per server/per year. By contrast, some 6 percent of organizations using Oracle (formerly Sun Microsystems) x86-based servers experienced of over four (4) hours of per server/per annum downtime. This was the highest percentage of lengthy Tier 3 server outages among the 14 platforms surveyed.
On the server operating system side, IBM AIX v 7.1, Ubuntu v 11.10, Windows Server 2008 R2, Red Hat Enterprise Linux 6 and SUSE Linux Enterprise 11(in that order) registered the least amount of unplanned downtime due to inherent flaws in the OS.
IBM’s AIX v 7.1 running on Power Systems, averaged approximately 10 minutes of per server/per annum downtime and recorded the least amount overall downtime due to Tier 1, Tier 2 and Tier 3 outages for the best reliability among survey respondents. Canonical Ltd.’s Ubuntu v 11.10, which has been steadily gaining mainstream adoption, came in a close second to IBM, averaging about 12 minutes of annual server downtime. Hot on its heels was Ubuntu’s chief Linux server OS rival Red Hat Enterprise Linux 6 (which also runs on IBM Power Systems) and Microsoft’s Windows Server 2008 R2 both of which registered about 12.5 minutes of unplanned yearly downtime. SUSE Linux Enterprise 11 (which also runs on IBM Power Systems) rounded out the top five logging just under 13 minutes of unplanned server OS downtime. On the other end of the spectrum, Oracle x86 and HP ProLiant servers had the highest percentage of systems that experienced more than four hours of per server annual downtime, with six percent of Oracle x86 systems and five percent of HP ProLiant systems experiencing a high degree of downtime.
Overall, 52% of IBM server hardware users reported experiencing 1 to 5 minutes of per server, per annum downtime, which is the equivalent of 5.25 minutes of downtime equaling 99.999% uptime. By comparison, just 41% of Oracle server users and 39% of HP servers recorded 99.999% or five nines of uptime.
Dell received the highest marks for customer satisfaction with technical service and support and product performance. Three-quarters or 75% of Dell survey respondents rated its service and support as “excellent” or “very good” followed by 70% of IBM survey participants and 69% of Stratus users who gave those servers “excellent” or “very good” ratings. Oracle’s customer satisfaction ratings were the lowest in the survey: only 45% ranked its technical service and support as “excellent” or “very good. However, Oracle had the highest percentage of dissatisfied customers with seven percent giving the company’s technical service and support a “poor” grade and 9% calling it “unsatisfactory. This is the third year in a row that Oracle service and support has had the dubious distinction of having the highest percentage of businesses that are unhappy with its technical service and support.
None of the survey participants gave IBM or Stratus service and support “unsatisfactory” ratings for technical service and support. Likewise, only one percent of respondents gave Dell, HP and Fujitsu an “unsatisfactory” and just two percent of Toshiba customers rated its technical service and support unsatisfactory.
Among server operating system vendors, Microsoft scored the highest marks for customer satisfaction with 71% of respondents rating the Redmond, Washington software firm’s technical service and support as “excellent” or “very good.”
In fact, Windows Server 2008 R2’s reliability renaissance continues to impress. Microsoft’s Windows Server OS noticeably lagged behind the majority of the UNIX, Linux and Open Source distributions in ITIC’s 2008 and 2009 Server Reliability surveys. In 2008, Windows Server 2008 survey respondents experienced 3.77 hours of downtime; that figure dropped to 2.42 hours of annual downtime in the 2009 survey. In the latest 2013 survey, Windows Server 2008 R2 has reached parity with the top tier server OSes, recording 13.2 minutes of unplanned per server annual downtime.
New Questions, Revealing Answers
ITIC also updated the 2012-2013 Reliability survey with several new questions to reflect the continuing changes in the industry, gaining further insights into user and system behavior:
- IBM server administrators spend the least time rebooting their server OS, including planned reboots to add or reconfigure system resources. Three-quarters – 75% of IBM administrators – “rarely or never” reboot the server compared with 66% of HP managers and 51% of Oracle administrators. Microsoft administrators spend the most amount of time rebooting the Windows Server 2008 and 2008 R2 servers; approximately one-quarter or 24% of respondents said they rebooted Windows weekly or daily to add or reconfigure system resources. The need for IT managers to take Windows servers offline to apply patches and reconfigure system resources remains a weak point for Microsoft.
- IBM Power VM, Stratus ftServer, Dell servers w/Windows Hyper-V and HP ProLiant iVirtualization (in that order) were the four most reliable hardware virtualization platforms. Virtualization market leader VMware was in the middle of the pack with respectable reliability rankings. Surprisingly, 60% of Dell Virtualization systems running Microsoft Hyper-V experienced one to 10 minutes of unplanned downtime compared to the 54% of Dell Virtualization systems running VMware.
- Some 31% of businesses don’t provide for hardware failover and redundancy which puts them at greater risk for system downtime. It also potentially lengthens the duration of an outage when something goes awry.