reliability

ITIC Poll: Human Error and Security are Top Issues Negatively Impacting Reliability

Multiple issues contribute to the high reliability ratings among the various server hardware distributions.  ITIC’s 2018 Global Server Hardware, Server OS Reliability Mid-Year Update reveals that three issues in particular stand out as positively or negatively impacting reliability. They are: Human Error, Security and increased workloads.

ITIC’s 2018 Global Server Hardware, Server OS Reliability Mid Year Update polled over 800 customers worldwide from April through mid-July 2018. In order to obtain the most objective and unbiased results, ITIC accepted no vendor sponsorship for the Web-based survey.

Human Error and Security Are Biggest Reliability Threats

ITIC’s latest 2018 Reliability Mid Year update poll also chronicled the strain that external issues placed on organizations and their IT departments to ensure that the servers and operating systems deliver a high degree of reliability and availability.  As Exhibit 1 illustrates, human error and security (both from internal and external hacks) continue to rank as the chief culprits that cause unplanned downtime among servers, operating systems and applications for the fourth straight year.  After that, there is a drop off of 22 to 30 percentage points for the remaining issues ranked in the top five downtime causes. Both human error and reliability have had the dubious distinction of maintaining the top two factors precipitating unplanned downtime in the past five ITIC reliability polls.

Analysis

Reliability is a two-way street in which server hardware, OS and application vendors as well as corporate users both bear responsibility for the reliability of their systems and networks.

On the vendor side, there are obvious reasons why hardware makers like HPE, IBM and Lenovo mission critical servers consistently gain top reliability ratings. As ITIC noted in Part 1 of its reliability survey findings, the reliability gap between high end systems and inexpensive, commodity servers with basic features continue to grow. They include:

  • Research and Development (R&D) Vendors like Cisco, HPE, Huawei, IBM and Lenovo have made an ongoing commitment to research and development (R&D) and continually refresh/update their solutions.
  • RAS 2.0.The higher end servers incorporate the latest Reliability, Accessibility and Serviceability (RAS) 2.0 features/functions and are fine-tuned for manageability and security.
  • Price is not the top consideration. Businesses that purchase higher end mission critical and x86 systems like Fujitsu’s Primergy, HPE’s Integrity, Huawei’s KunLun, IBM Z and Power Systems and Lenovo System x want a best in class product offering, first and foremost. These corporations in verticals like banking/finance, government, healthcare, manufacturing, retail and utilities are more motivated with the historical ability of the vendor to act as a true responsive “partner” delivering a highly robust, leading edge hardware. They also want top-notch after market technical service and support, quick response to problems and fast, efficient access to patches and fixes.
  • More experienced IT Managers. In general, IT Managers, application developers, systems engineers and security professionals at corporations which purchase higher end servers from IBM, HPE, Lenovo, and Huawei tend to have more experience. The survey found that organizations that buy mission critical servers have IT and technical staff that boast approximately 12 to 13 years experience. By contrast, the average experience among IT managers and systems engineers at companies that purchase less expensive commodity based servers is about six years.

Highly experienced IT managers are more likely to spot problems before they become a major issue and lead to downtime and in the event of an outage. They are also more likely to perform faster remediation, accelerating the time it takes to identify the problem and get the servers and applications up and running faster than less experienced peers.

In an era of increasingly connected servers, systems, applications, networks and people, there are myriad factors that can potentially undercut reliability; they are:

  • Human Error and Security. To reiterate, these two factors constitute the top threats to reliability. ITIC does not anticipate this changing in the foreseeable future. Some 59% of respondents cited Human Error as their number one issue, followed by 51% that said Security problems caused downtime. And nearly two-thirds — 62% — of businesses indicated that their Security and IT administrators grapple with a near constant deluge of more pervasive and pernicious security threats. If the availability, reliability and access to servers, operating systems and mission critical main LOB applications is compromised or denied, end user productivity and business operations suffer immediate consequences.
  • Heavier, more data intensive workloads. The latest ITIC survey data finds that workloads have increased by 14% to 39% over the past 18 months.
  • A 60% majority of respondents say increased workloads negatively impact reliability; up 15% percentage points since 2017. Of that 60%, approximately 80% of firms experiencing reliability declines have commodity servers: e.g., White box; older Dell, HPE ProLiant and Oracle hardware >3 ½ years old that haven’t been retrofitted/upgraded.
  • Provisioning complex new applications that must integrate and interoperate with legacy systems and applications. Some 40% of survey respondents rate application deployment and provisioning as among their biggest challenges and one that can negatively impact reliability.
  • IT Departments Spending More Time Applying Patches. Some 54% of those polled indicated they are spending upwards of one hour to over four hours applying patches – especially security patches. Users said the security patches are large, time consuming and often complex, necessitating that they test and apply them manually. The percentage of firms automatically applying patches commensurately decreased from 30% in 2016 to just 9% in the latest 2018 poll. Overall, the latest ITIC survey shows that as of July 2018 companies are applying 27% more patches now than any time since 2015.
  • Deploying new technologies like Artificial Intelligence (AI), Big Data Analytics which require special expertise by IT managers and application developers as well as a high degree of compatibility and interoperability.
  • A rise in Internet of Things (IoT) and edge computing deployments which in turn, increase the number of connections that organizations and their IT departments must oversee and manage.
  • Seven-in-10 or 71%of survey respondents said aged hardware (3 ½+ years old) had a negative impact on server uptime and reliability compared with just 16% that said the older servers had not experienced any declines in reliability or availability. This is an increase of five percentage points from the 66% of those polled who responded positively to that survey question in the ITIC 2017 Reliability Survey and it’s a 27% increase from the 44% who said outmoded hardware negatively impacted uptime in the ITIC 2014 Reliability poll.

Corporations Minimum Reliability Requirements Rise

At the same time, corporations now require higher levels of reliability than they did even two o three years ago. The reliability and continuous operation of the core infrastructure and its component parts: server hardware, server operating system software, applications and other devices (e.g. firewalls, unified communications devices and uninterruptible power supply) are more crucial than ever to the organization’s bottom line.

It is clear that corporations – from the smallest companies with fewer than 25 people, to the largest multinational concerns with over one hundred thousand employees, are more risk averse and concerned about the potential risk for lawsuits and the damage to their reputation in the wake of an outage. ITIC’s survey data now indicates that an 84% majority of organizations now require a minimum of “four nines” – 99.99% reliability and uptime.

This is the equivalent of 52 minutes of unplanned outages related to downtime for mission critical systems and applications or just 4.33 minutes of unplanned monthly outage for servers, applications and networks.

Conclusions

The vendors are one-half of the equation. Corporate users also bear responsibility for the reliability of their servers and applications based on configuration, utilization, provisioning, management and security.

To minimize downtime and increase system and network availability it is imperative that corporations work with vendor partners to ensure that reliability and uptime are inherent features of all their servers, network connectivity devices, applications and mobile devices. This requires careful tactical and strategic planning to construct a solid strategy.

Human error and security are and will continue to pose the greatest threats to the underlying reliability and stability of server hardware, operating systems and applications. A key element of every firm’s reliability strategy and initiative is to obtain the necessary training and certification for IT managers, engineers and security professionals. Companies should also have their security professionals take security awareness training. Engaging the services of third party vendors to conduct security vulnerability testing to identify and eliminate potential vulnerabilities is also highly recommended.  Corporations must also deploy the appropriate Auditing, BI and network monitoring tools. Every 21st Century network environment needs continuous, comprehensive end-to-end monitoring for their complex, distributed applications in physical, virtual and cloud environments.

Ask yourself: “How much reliability does the infrastructure require and how much risk can the company safely tolerate?”

ITIC Poll: Human Error and Security are Top Issues Negatively Impacting Reliability Read More »

ITIC 2018 Server Reliability Mid-Year Update: IBM Z, IBM Power, Lenovo System x, HPE Integrity Superdome & Huawei KunLun Deliver Highest Uptime

August 8, 2018

For the tenth straight year, IBM and Lenovo servers again achieved top rankings in ITIC’s 2017 – 2018 Global Server Hardware and Server OS Reliability survey.

IBM’s Z Systems Enterprise server is in a class of its own. The IBM mainframe continues to exhibit peerless reliability besting all competitors. The Z recorded less than 10 seconds of unplanned per server downtime each month. Additionally less than one-half of one percent of all IBM Z customers reported unplanned outages that totaled greater than four (4) hours of system downtime in a single year.

Among mainstream servers, IBM Power Systems 7 and 8 and the Lenovo x86 X6 mission critical hardware consistently deliver the highest levels of reliability/uptime among 14 server hardware and 11 different mainstream server hardware virtualization platforms. Each platform averaged just 2.1 minutes of unplanned per annum/per server downtime (See Exhibit 1).

That makes the IBM Power Systems and Lenovo x 86 servers approximately 17 to 18 times more stable and available, than the least reliable distributions – the rival Oracle and HPE ProLiant servers.

Additionally, the latest ITIC survey results indicate just one percent of IBM Power Systems and Lenovo System x servers experienced over four (4) hours of unplanned annual downtime. This is the best showing among the 14 different server platforms surveyed.

ITIC’s 10th annual independent ITIC 2017 – 2018 Global Server Hardware and Server OS Reliability survey polled 800 organizations worldwide from August through December 2017.  In order to obtain the most accurate and unbiased results, ITIC accepted no vendor sponsorship. …

ITIC 2018 Server Reliability Mid-Year Update: IBM Z, IBM Power, Lenovo System x, HPE Integrity Superdome & Huawei KunLun Deliver Highest Uptime Read More »

RizePoint Emerges as Market Leader in Audit, Compliance and BI Market

Protecting and maintaining brand reputation is essential for any company. As a result, it is essential for enterprises to proactively monitor and manage all activities- operational and experiential – that influence a consumer’s overall brand experience. Ignorance involving any aspect of business operations will result in ongoing, significant consequences. It will damage a corporation’s reputation; adversely impact customers; result in operational inefficiencies, business losses and potential litigation; and even criminal penalties. It also raises the corporation’s risk of non-compliance with crucial local, state, federal and international industry regulations.

This is especially true for firms in fast-paced, competitive and highly regulated industries, including but not limited to the food, hospitality, hotel, restaurant, retail and transportation vertical markets. Typically, these organizations have dozens, hundreds or even thousands of stores, restaurants and hotels located in multiple, geographically remote locations. They must collect, aggregate and analyze a veritable data deluge in real-time. And they must respond proactively and take preventative measures to correct issues as they arise. Organizations that do business across multiple states and internationally, face other challenges. They must synchronize and integrate processes and data across the entire enterprise. Businesses must also ensure that every restaurant, hotel or retail store in the chain, achieves and maintains compliance with a long list of complex standards, health and safety laws.

ITIC’s research indicates that companies across a wide range of industries are deploying a new class of Quality Experience Management software. These solutions let businesses access the latest information on daily operations, policies, procedures and safety mechanisms in an automated fashion. They also let companies take preventative and remedial action irrespective of time, distance or physical location.

Quality Experience Management software with built-in Business Intelligence tools can deliver immediate and long-term benefits and protect the corporate brand. ITIC’s customer-based research shows that RizePoint, based in Salt Lake City, UT – with 20 years’ experience in audit compliance monitoring, reporting and correction – is the clear market leader. Its software delivers brand protection and risk mitigation with mobile and cloud capabilities, increasing efficiency and productivity. …

RizePoint Emerges as Market Leader in Audit, Compliance and BI Market Read More »

Hourly Downtime Tops $300K for 81% of Firms; 33% of Enterprises Say Downtime Costs >$1M

The cost of downtime continues to increase as do the business risks. An 81% majority of organizations now require a minimum of 99.99% availability. This is the equivalent of 52 minutes of unplanned outages related to downtime for mission critical systems and applications or ,just 4.33 minutes of unplanned monthly outage for servers, applications and networks.                                         

Over 98% of large enterprises with more than 1,000 employees say that on average, a single hour of downtime per year costs their company over $100,000, while an 81% of organizations report that the cost exceeds $300,000. Even more significantly: three in 10 enterprises – 33% – indicate that hourly downtime costs their firms $1 million or more (See Exhibit 1). It’s important to note that these statistics represent the “average” hourly cost of downtime.  In a worst case scenario – if any device or application becomes unavailable for any reason the monetary losses to the organization can reach millions per minute. Devices, applications and networks can become unavailable for myriad reasons. These include: natural and man-made catastrophes; faulty hardware; bugs in the application; security flaws or hacks and human error. Business-related issues, such as a Regulatory Compliance related inspection or litigation, can also force the organization to shutter its operations. For whatever the reason, when the network and its systems are unavailable, productivity grinds to a halt and business ceases.

Highly regulated vertical industries like Banking and Finance, Food, Government, Healthcare, Hospitality, Hotels, Manufacturing, Media and Communications, Retail, Transportation and Utilities must also factor in the potential losses related to litigation as well as civil penalties stemming from organizations’ failure to meet Service Level Agreements (SLAs) or Compliance Regulations. Moreover, for a select three percent of organizations, whose businesses are based on high level data transactions, like banks and stock exchanges, online retail sales or even utility firms, losses may be calculated in millions of dollars per minute. …

Hourly Downtime Tops $300K for 81% of Firms; 33% of Enterprises Say Downtime Costs >$1M Read More »

Q & A: Mike Flannagan, VP & GM, Cisco’s Data & Analytics Group

ITIC’s coverage areas continue to expand and evolve based on your feedback. We will now feature Q&As with industry luminaries and experts discussing hot industry trends and technologies.

Cisco is one of the preeminent high technology companies and a market leader in networking for the last three decades. Cisco’s technologies and market strategies continue to evolve along with those of the overarching high tech industry and its expanding customer base. Cisco is expanding its presence beyond networking and becoming a driving force in The Internet of Things (IoT) and Data Analytics. Michael Flannagan is Vice President and General Manager of Cisco’s Data & Analytics Group. He is responsible for the company’s data and analytics strategy, and leads multiple software business units. This includes: Cisco’s Data Virtualization Business Unit; Cisco’s Analytics Business Unit and Cisco’s ServiceGrid Business Unit and Cisco’s Energy Management Business Unit. ITIC Principal Analyst spoke to Flannagan in-depth about Cisco’s recent analytics acquisitions and the increasingly prominent role analytics will play in Cisco’s products and strategy.

Laura DiDio, Cisco is upping its game with IoT Edge Analytics/Data Analytics, the acquisition of ParStream and its recent partnership with IBM to incorporate Watson’s cognitive computing and AI capabilities onto Cisco edge routers. Can you provide us with insight into the tangible positive impact that IoT Analytics is having both in the data center and at the Edge in terms of business and technical advantages – e.g. performance gains, positive impact on manpower and device resources, cost savings, driving top line revenue, lowering TCO, accelerating ROI and also helping to increase reliability and mitigate risk? …

Q & A: Mike Flannagan, VP & GM, Cisco’s Data & Analytics Group Read More »

Cost of Hourly Downtime Soars: 81% of Enterprises Say it Exceeds $300K On Average

The only good downtime is no downtime.

ITIC’s latest survey data finds that 98% of organizations say a single hour of downtime costs over $100,000; 81% of respondents indicated that 60 minutes of downtime costs their business over $300,000. And a record one-third or 33% of enterprises report that one hour of downtime costs their firms $1 million to over $5 million.

For the fourth straight year, ITIC’s independent survey data indicates that the cost of hourly downtime has increased. The average cost of a single hour of unplanned downtime has risen by 25% to 30% rising since 2008 when ITIC first began tracking these figures.

In ITIC’s 2013 – 2014 survey, just three years ago, 95% of respondents indicated that a single hour of downtime cost their company $100,000.  However, just over 50% said the cost exceeded $300,000 and only one in 10 enterprises reported hourly downtime costs their firms $1million or more. In ITIC’s latest poll three-in-10 businesses or 33% of survey respondents said that hourly downtime costs top $1 million or even $5 million.

Keep in mind that these are “average” hourly downtime costs. In certain use case scenarios — such as the financial services industry or stock transactions the downtime costs can conceivably exceed millions per minute. Additionally, an outage that occur in peak usage hours may also cost the business more than the average figures cited here. …

Cost of Hourly Downtime Soars: 81% of Enterprises Say it Exceeds $300K On Average Read More »

IBM z13s Delivers Power, Performance, Fault Tolerant Reliability and Security for Hybrid Clouds

Security. Reliability. Performance. Analytics. Services.

These are the most crucial considerations for corporate enterprises in choosing a hardware platform. The underlying server hardware functions as the foundational element for the business’ entire infrastructure and interconnected environment. Today’s 21st century Digital Age networks are characterized by increasingly demand-intensive workloads; the need to use Big Data analytics to analyze and interpret the massive volumes and variety of data to make proactive decisions and keep the business competitive. Security is a top priority. It’s absolutely essential to safeguard sensitive data and Intellectual Property (IP) from sophisticated, organized external hackers and defend against threats posed by internal employees.

The latest IBM z13s enterprise server delivers embedded security, state-of-the-art analytics and unparalleled reliability, performance and throughput. It is fine tuned for hybrid cloud environments. And it’s especially useful as a secure foundational element in Internet of Things (IoT) deployments. The newly announced, z13s is highly robust: it supports the most compute-intensive workloads in hybrid cloud and on-premises environments. The newest member of the z Systems family, the z13s, incorporates advanced, embedded cryptography features in the hardware that allow it to encrypt and decrypt data twice as fast as previous generations, with no reduction in transactional throughput owing to the updated cryptographic coprocessor for every chip core and tamper-resistant hardware-accelerated cryptographic coprocessor cards. …

IBM z13s Delivers Power, Performance, Fault Tolerant Reliability and Security for Hybrid Clouds Read More »

IBM, Lenovo Top ITIC 2016 Reliability Poll; Cisco Comes on Strong

IBM Power Systems Servers Most Reliable for Seventh Straight Year; Lenovo x86 Servers Deliver Highest Uptime/Availability among all Intel x86-based Systems; Cisco UCS Stays Strong; Dell Reliability Ratchets Up; Intel Xeon Processor E7 v3 chips incorporate advanced analytics; significantly boost reliability of x86-based servers

In 2016 and beyond, infrastructure reliability is more essential than ever.

The overall health of network operations, applications, management and security functions all depend on the core foundational elements: server hardware, server operating systems and virtualization to deliver high availability, robust management and solid security. The reliability of the server, server OS and virtualization platforms are the cornerstones of the entire network infrastructure. The individual and collective reliability of these platforms have a direct, immediate and long lasting impact on daily operations and business results. For the seventh year in a row, corporate enterprise users said IBM server hardware delivered the highest levels of reliability/uptime among 14 server hardware and 11 different server hardware virtualization platforms. A 61% majority of IBM Power Systems servers and Lenovo System x servers achieved “five nines” or 99.999% availability – the equivalent of 5.25 minutes of unplanned per server /per annum downtime compared to 46% of Hewlett-Packard servers and 40% of Oracle server hardware. …

IBM, Lenovo Top ITIC 2016 Reliability Poll; Cisco Comes on Strong Read More »

IBM z/OS, IBM AIX, Debian and Ubuntu Score Highest Security Ratings

Eight out of 10 — 82% — of the over 600 respondents to ITIC’s 2014-2015 Global Server Hardware and Server OS Reliability survey say security issues negatively impact overall server, operating system and network reliability. Of that figure a 53% majority of those polled say that security vulnerabilities and hacks have a “moderate,” “significant” or “crucial impact on network availability and uptime (See Exhibit 1).

Overall, the latest ITIC survey results showed that organizations are still more reactive than proactive regarding security threats. Some 15% of the over 600 global corporate respondents are extremely lax: some seven percent said that security issues have no impact on their environment while another eight percent indicated that they don’t keep track of whether or not security issues negatively affect the uptime and availability of their networks. In contrast, 24% of survey participants or one-in-four said security has a “significant” or “crucial” negative impact on network reliability and performance.

Still, despite the well documented and high profile hacks into companies like Target, eBay, Google and other big name vendors this year, the survey found that seven-out-of-10 firms – 70% – are generally confident in the security of their hardware, software and applications – until they get hacked. …

IBM z/OS, IBM AIX, Debian and Ubuntu Score Highest Security Ratings Read More »

IBM Platform Resource Scheduler Automates, Accelerates Cloud Deployments

One of the most daunting and off-putting challenges for any enterprise IT department is how to efficiently plan and effectively manage cloud deployments or upgrades while still maintaining the reliability and availability of the existing infrastructure during the rollout.

IBM solves this issue with its newly released Platform Resource Scheduler which is part of the company’s Platform Computing portfolio and an offering within the IBM Software Defined Environment (SDE) vision for next generation cloud automation. The Platform Resource Scheduler is a prescriptive set of services designed to ensure that enterprise IT departments get a trouble-free transition to a private, public or private cloud environment by automating the most common placement and policy procedures of their virtual machines (VMs). It also helps guarantee quality of service while greatly reducing the most typical human errors that occur when IT administrators manually perform tasks like load balancing and memory balancing. The Platform Resource Scheduler is sold with IBM’s SmartCloud Orchestrator and PowerVC and is available as an add-on with IBM SmartCloud Open Stack Entry products. It also features full compatibility with Nova APIs and fits into all IBM OpenStack environments. It is built on open APIs, tools and technologies to maximize client value, skills availability and easy reuse across hybrid cloud environments. It supports heterogeneous (both IBM and non-IBM) infrastructures and runs on Linux, UNIX and Windows as well as IBM’s zOS operating systems. …

IBM Platform Resource Scheduler Automates, Accelerates Cloud Deployments Read More »

Scroll to Top