IBM z Systems Enterprise; IBM Power Systems Servers Most Reliable for Ninth Straight Year; Lenovo x86 Servers Deliver Highest Uptime/Availability among all Intel x86-based Systems
For the ninth year in a row, corporate enterprise users said IBM’s z Systems Enterprise mainframe class server achieved near flawless reliability, recording less than 10 seconds of unplanned per server downtime each month. Among mainstream servers, IBM Power Systems devices and the Lenovo x86 platform delivered the highest levels of reliability/uptime among 14 server hardware and 11 different server hardware virtualization platforms.
Those are the results of the ITIC 2017 Global Server Hardware and Server OS Reliability survey which polled 750 organizations worldwide during April/May 2017.
Among the top survey findings:
- IBM z Systems Enterprise mainframe class systems, had the lowest incident – 0% — of > 4 hours of per server/per annum downtime of any hardware platform. Specifically, IBM z Systems mainframe class servers exhibit true mainframe fault tolerance experiencing just 0.96 minutes of of unplanned per server annual downtime. That equates to 8 seconds per month or “blink and you miss it,” 2 seconds of unplanned weekly downtime. This is an improvement over the 1.12 minutes of per server/per annum downtime the z Systems servers recorded in ITIC’s 2016 – 2017 Reliability poll nine months ago.
- Among mainstream hardware platforms, IBM Power Systems and Lenovo System x running Linux have least amount of unplanned downtime 2.5 and 2.8 minutes per server/per year of any mainstream Linux server platforms.
- 88% of IBM Power Systems and 87% of Lenovo System x users running RHEL, SuSE or Ubuntu Linux experience fewer than one unplanned outage per server, per year.
- Tenly two percent of IBM and Lenovo servers recorded >4 hours of unplanned per server/per annum downtime; followed by six percent of HPE servers; eight percent of Dell servers and 10% of Oracle servers.
- IBM and Lenovo hardware and the Linux operating system distributions were either first or second in every reliability category, including virtualization and security.
- Lenovo x86 servers achieved the highest reliability ratings among all competing x86 platforms
- Lenovo Takes Top Marks for Technical Service and Support: Lenovo tech support the best followed by Cisco and IBM
- Some 66% of survey respondents said aged hardware (3 ½+ years old) had a negative impact on server uptime and reliability vs. 21% that said it has not impacted reliability/uptime. This is 22% increase from the 44% who said outmoded hardware negatively impacted uptime in 2014
- Reliability continues to decline for the fifth year in a row on the HP ProLiant and Oracle’s SPARC & x86 hardware and Solaris OS. Reliability on the Oracle platforms declined slightly mainly due to aging. Many Oracle hardware customers are eschewing upgrades, opting instead to migrate to rival platforms.
- Some 16% of Oracle customers rated service & support as Poor or Unsatisfactory. Dissatisfaction with Oracle licensing and pricing policies remains consistently high for the last three years.
- Only 1% of Cisco, 1% of Dell, 1% of IBM and Lenovo, 3% of HP, 3% of Fujitsu and 4% of Toshiba users gave those vendors “Poor” or “Unsatisfactory” customer support ratings.
And continuing a trend that has manifested over the past three years, Human Error and Security, respectively are the chief issues that negatively impact server hardware/server operating system reliability and cause downtime.
Unsurprisingly, in the 21st Century Digital Age, the functionality and reliability of the core, foundation server hardware and server operating systems is more crucial than ever. The server hardware and the server OSes are the bedrock upon which the organization’s mainstream line of business (LOB) applications rest. High reliability and near continual system and application availability is imperative for organizations’ core on-premises, cloud based and Network Edge/Perimeter environments. Infrastructure — irrespective of location – is essential to the overall health of business operations.
The inherent reliability and robustness of server hardware and the server operating systems are the singularly most critical factors that influence, impact and ultimately determine the uptime and availability of mission critical line of business applications, virtual machines (VMs) that run on top of them and the connectivity devices that access them.
On a positive note, the inherent reliability of server hardware and server operating system software as well as advancements in the underlying processor technology, all continue to improve year over year. But the survey results also reveal that external issues, most notably human error and security breaches, have also assumed greater significance in undermining system and network accessibility and performance.
The overall health of network operations, applications, management and security functions all depend on the core foundation elements: server hardware, server operating systems and virtualization to deliver high availability, robust management and solid security. The reliability of the server, server OS and virtualization platforms form the foundation of the entire network infrastructure. The individual and collective reliability of these platforms have a direct, immediate and long lasting impact on daily operations and business results.
The ITIC survey also polled customers on the minimum acceptable reliability requirements for their organizations’ main line of business servers and applications.
Reliability Trends
- Majority of corporations Need “Four Nines” of Uptime. Some 79% of corporations now require a minimum of 99.99% uptime for mission critical hardware, operating systems & main line of business (LOB) applications. This is a 30% increase of from the 49% of respondents who said their firms required a minimum “four nines” of uptime in the 2014 survey.
- Cost of Hourly Downtime Increases: 98% of firms say hourly downtime costs exceed $150K; 31% of respondents estimate hourly downtime costs their companies up to $400K; this is a seven percent increase from 2014 survey & 33% indicate that one hour of downtime now costs $1M to >$5M
- Security, BYOD and mobility pose the biggest technology threats to reliability
- Technical service & support and fast, efficient vendor responsiveness are crucial
- Overall Top Issues Negatively impacting network reliability are:
- Human Error (e.g., misconfiguration, right-sizing server workloads etc.) – 80% vs. 49% in 2015 poll
- Complexity – involving provisioning, deployment & usage of new technologies e.g. Data Analytics, IoT, Network Edge/Perimeter and mobile apps
- Increased workloads on aging hardware
Human Error Overtakes Security as Chief Cause of Downtime
The survey also showed that the three technology issues of most concern according to this year’s survey are: Security, Disaster Recovery and Backup and Business Continuity. At the same time, the survey results find that 80% of respondents cited human error as the chief culprit of unplanned downtime, surpassing Security issues which were pinpointed by 59% of those polled.
Additionally, ITIC’s latest 2017 Reliability research reveals that a variety of external factors are having more of a direct impact on system downtime and overall availability. These include overworked and understaffed IT departments; the rapid mainstream adoption of complex new technologies such as the aforementioned IoT, Big Data Analytics, virtualization and increasing cloud computing deployments and the continuing proliferation of BYOD and mobility technologies.
In the context of its Reliability Surveys, ITIC broadly defines human error to encompass both the technology and business mistakes organizations make with respect to their network equipment and strategies.
Human error as it relates to technology includes but is not limited to:
- Configuration, deployment and management mistakes
- Failure to upgrade or right size servers to accommodate more data and compute intensive workloads.
- Failure to migrate and upgrade outmoded applications that are no longer supported by the vendor.
- Failing to keep up to date on patches and security.
Human error with respect to business issues includes:
- Failure to allocate the appropriate Capital Expenditure and Operational Expenditure funds for equipment purchases and ongoing management and maintenance functions
- Failure to devise, implement and upgrade the necessary computer and network to address issues like Cloud computing, Mobility, Remote Access, and Bring Your Own Device (BYOD).
- Failure to construct and enforce strong computer and network security policies.
- Ignorance of Total Cost of Ownership (TCO), Return on Investment (ROI).
- Failure to track hourly downtime costs.
- Failure to track and assess the impact of Service Level Agreements and regulatory compliance issues like Sarbanes-Oxley (SOX), Health Insurance Portability and Accountability Act (HIPAA).
Conclusions
Reliability is and will continue to be among the most crucial metrics in the organization. Improvements or declines in reliability can either mitigate or increase technical and business risks to the organization’s end users and its external customers. The ability to meet service-level agreements (SLAs) hinges on server reliability, uptime and manageability. These are key indicators that enable organizations to determine which server operating system platform or combination thereof is most suitable.
To ensure business continuity and increase end user productivity, it is imperative that businesses maximize the reliability and uptime of their server hardware and server operating systems. A 79% majority of corporations now require “four nines” or 99.99% minimum uptime. Organizations are advised to “right size” their server hardware to accommodate increased workloads and larger applications. Businesses should also regularly replace, retrofit and refresh their server hardware and server operating systems with the necessary patches, updates and security fixes as needed to maintain system health. At the same time, server hardware and server operating system vendors should be up front and provide their customers with realistic recommendations for system configurations to achieve optimal performance. Vendors also bear the responsibility to deliver patches, fixes and updates in a timely manner and to inform customers to the best of their ability regarding any known incompatibility issues that may potentially impact performance. Vendors should also be honest with customers in the event there is a problem or delay with delivering replacement parts.