Featured

ITIC 2024 Hourly Cost of Downtime Part 2

There’s No Right Time for Downtime

                                                      

ITIC’s 11th annual Hourly Cost of Downtime Survey data indicates that 97% of large enterprises with more than 1000 employees say that on average, a single hour of downtime per year costs their company over $100,000. Even more significantly: four in 10 enterprises – 41% – indicate that hourly downtime costs their firms $1 million to over $5 million (See Exhibit 1). It’s important to note that these statistics represent the “average” hourly cost of downtime.  In a worst-case scenario – such as a catastrophic outage that occurs during peak usage times or an event that disrupts a crucial business transaction – the monetary losses to the organization can reach and even exceed millions per minute.

Additionally, in highly regulated vertical industries like Banking and Finance, Food, Energy, Government, Healthcare, Hospitality, Hotels, Manufacturing, Media and Communications, Retail, Transportation and Utilities, must also factor in the potential losses related to litigation. Businesses may also be liable for civil penalties stemming from their failure to meet Service Level Agreements (SLAs) or Compliance Regulations. Moreover, for select organizations, whose businesses are based on compute-intensive data transactions, like stock exchanges or utilities, losses may be calculated in millions of dollars per minute.

ITIC’s most recent poll – conducted in conjunction with the ITIC 2023 Global Server Hardware Server OS Reliability Survey – found that a 90% majority of organizations now require a minimum of 99.99% availability. This is up from 88% in the last 2 ½ years. The so-called 99.99% or “four nines” of reliability equals 52 minutes of unplanned per server/per annum downtime for mission critical systems and applications or, 4.33 minutes of unplanned monthly outages for servers, applications, and networks.

All categories of businesses were represented in the survey respondent pool: 27% were small/midsized (SMB) firms with up to 200 users; 28% came from the small/midsized (SME) enterprise sector with 201 to 1,000 users and 45% were large enterprises with over 1,000 users.

These above statistics are not absolute. They are the respondents’ estimates of the cost of one hour of hourly downtime due to interrupted transactions, lost/damaged data, and end user productivity losses that negatively impacted corporations’ bottom line. These figures exclude the cost of litigation, fines or civil or criminal penalties associated with regulatory non-compliance violations. These statistics are also exclusive of any voluntary “good will” gestures a company elects to make of its own accord to its customers and business partners that were negatively affected by a system or network failure. Protracted legal battles and out-of-court settlements, fines and voluntary good-will gestures do take a toll on the company’s revenue and cause costs to skyrocket further – even if they do help the firm retain the customer account. There are also “soft costs” that are more elusive and difficult to measure, but nonetheless negative. These include the damage to the company’s reputation which may result in untold lost business and persist for months and years after a highly publicized incident.

To reiterate: in today’s Digital Age of “always on” networks and connectivity, organizations have no tolerance for downtime. It is expensive and risky. And it is just plain bad for business.

Only four (4%) percent of enterprise respondents said that downtime costs their companies less than $100,000 in a single 60-minute period and of that number an overwhelming 93% majority were micro SMBs with fewer than 10 employees. Downtime costs are similarly high for small and midsized businesses (SMBs) with 11 to 200 employees. To reiterate, these figures are exclusive of penalties, remedial action by IT administrators and any ensuing monetary awards that are the result of litigation, civil or criminal non-compliance penalties.

 

Do the Math: Downtime Costs Quickly Add Up

 

Downtime is expensive for all businesses – from global multinational corporations to small businesses with fewer than 20, 50 or 100 employees. Hourly losses of hundreds of thousands or millions per hour or even minutes in transaction-heavy environments are unfortunately commonplace. Exhibit 3 depicts the monetary costs of per server/per minute downtime involving a single server to as many as 1,000 servers in which businesses calculate hourly downtime costs from $100,000 to $10,000,000 million (USD).

As Exhibit 3 illustrates a one minute of downtime for a single server in a company that calculates its hourly cost of downtime for a mission critical server or application at $100,000 is $1,667 and $16,670 per minute when downtime affects 10 servers and main line of business applications/data assets. The above chart graphically emphasizes how quickly downtime costs add up for corporate enterprises.

Small businesses are equally at risk, even if their potential downtime statistics are a fraction of large enterprises.  For example, an SMB company that estimates that one hour of downtime “only” costs the firm $10,000 could still incur a cost of $167 for a single minute of per server downtime on their business-critical server. Similarly, an SMB company that assumes that one hour of downtime costs the business $25,000 could still potentially lose an estimated $417 per server/per minute. With few exceptions micro SMBs –with 1 to 20 employees – typically would not rack up hourly downtime costs of hundreds of thousands or millions in hourly losses. Small companies, however, typically lack the deep pockets, larger budgets, and reserve funds of their enterprise counterparts to absorb financial losses or potential litigation associated with downtime. Therefore, the resulting impact could be as devastating for them as it is for enterprise firms.

Hourly downtime costs of $25,000; $50,000 or $75,000 (exclusive of litigation or civil and even criminal penalties) may be serious enough to put the SMB out of business – or severely damage its reputation and cause it to lose business.

 

Hourly Downtime Costs Exceed $5 Million for Top Verticals

 

Exhibit 4 shows that ITIC’s Hourly Cost of Downtime survey revealed that for large enterprises, the costs associated with a single hour of downtime are much higher. Average hourly outage costs topped the $5 Million (USD) mark for the top verticals. These include Banking/Finance; Government; Healthcare; Manufacturing; Media & Communications; Retail; Transportation and Utilities.

Once again, except in specific and rare instances, in 2024, corporations have a near total reliance on their personal and employer-owned interconnected networks and applications to conduct business. Corporate revenue and productivity are inextricably linked to the reliability and availability of the corporate network and its data assets. When servers, applications and networks are unavailable for any reason business and productivity slow down or cease completely.

The minimum reliability/uptime requirements for the top vertical market segments are even more stringent and demanding than the corporate averages in over 40 other verticals as Exhibit 5 below illustrates.

The above industries are highly regulated and incorporate strict compliance laws. But even without regulatory oversight the top vertical market segments are highly visible. Their business operations demand near flawless levels of uninterrupted, continuous operation. In the event of an unplanned outage of even a few minutes, when users cannot access data and applications, business stops and productivity ceases.

These statistics reinforce what everyone knows: infrastructure, security, data access and data privacy and adherence to regulatory compliance are all imperative.

Server hardware, server OS and application reliability all have direct and far-reaching consequences on the corporate bottom line and ongoing business operations.  Unreliable and unavailable server hardware, server operating systems and applications will irreparably damage companies’ reputation.

In certain extreme cases, business, and monetary losses because of unreliable servers can cause an enterprise to miss its quarterly or annual revenue forecasts or even go out of business as a direct consequence of sustained losses and litigation brought on by the outage.

 

Minimum Reliability Requirements Increase Year over Year

Time is money. Time also equates to productivity and the efficiency and continuity of ongoing, uninterrupted daily operations. If any of these activities are compromised by outages for any reason – technical or operational failure that renders the systems and the data unavailable – business grinds to a halt. This negatively impacts the corporate enterprise. The longer the outage lasts, the higher the likelihood of having a domino effect on the corporation’s customers, business partners and suppliers. This in turn will almost certainly raise Total Cost of Ownership (TCO) and undermine the return on investment (ROI).

High reliability and high availability are necessary to manage the corporation’s level of risk and exposure to liability and potential litigation resulting from unplanned downtime and potential non-compliance with regulatory issues. This is evidenced by corporations’ reliability requirements which have increased every year for the past 11 years that ITIC has polled organizations on these metrics.

Consider the following: in 2008, the first year that ITIC surveyed enterprises on their Reliability requirements, 27% of businesses said they needed just 99% uptime; four-in-10 corporations – 40% – required 99.9% availability. In that same 2008 survey, only 23% of firms indicated they required a minimum of “four nines” or 99.99% uptime for their servers, operating systems, virtualized and cloud environments, while a seven percent (7%) minority demanded the highest levels of “five nines” – 99.999% or greater availability.

A decade ago, in ITIC’s 2014 Hourly Cost of Downtime poll, 49% of businesses required 99.99% or greater reliability/uptime; this is an increase of 39% percentage points in the last six years to the present fall 2020. Four nines – 99.99%+ and greater reliability are mission-critical are now the minimum standard for reliability. In our latest 2020 survey – none – 0%- of survey respondents indicated their organizations could live with just “two nines” – 99% uptime or 88 hours of annual unplanned per server downtime!

As Exhibit 5 illustrates, “four nines” or 99.99% uptime and availability is the average minimum requirement for 88% of organizations. However, more and more companies – an overall average of 25% of respondents across all vertical industries as of November 2020 say their businesses now require “five nines” or 99.999% server and operating system uptime. This equates to 5.26 minutes of per server/per annum unplanned downtime. And three percent (3%) of leading-edge businesses need “six nines” 99.9999% near-flawless mainframe class fault tolerant server availability of 31.5 seconds per server/per month.

Increasingly many organizations have even more stringent reliability needs. Requirements of “five and six nines” – 99.999% and 99.9999% – reliability and availability are becoming much more commonplace among all classes of businesses.  The reasons are clear: corporations have no tolerance for downtime. They, their end users, business partners, customers, and suppliers all demand uninterrupted access to data and applications to conduct business 24 x7 irrespective of geographic location.

Security Attacks: End Users are Biggest Culprits in Downtime

 

ITIC’s latest survey results found that security issues and end user carelessness were among the top causes of unplanned system and network downtime in 20202 ITIC expects this trend to continue throughout 2021 and beyond as organized hackers launch ever more sophisticated and pernicious targeted attacks.

 

  • Security attacks – including targeted attacks by organized hackers, Ransomware attacks, Phishing and Email scams and CEO fraud hacks – now rank as the top cause of downtime, according to 84% of ITIC survey respondents (See Exhibit 6)
  • User Error– is also increasingly contributing to corporate downtime and is now rated among the top three causes of company outages along with software flaws/bugs, according to 69% of ITIC survey respondents. End user carelessness encompasses everything from company employees being careless with and losing their own and company owned BYOD devices like laptops, tablets, and mobile phones. Many users fail to properly secure their devices and when they’re lost or stolen, the intellectual property (IP) and sensitive data is easily accessible to prying eyes and thieves. End user carelessness also manifests itself in other ways: many naïve users click on bad links and fall prey to Phishing scams, CEO fraud and leave themselves and their company wide open to security hacks.

The percentage of enterprises unable to calculate the hourly cost of downtime consistently outpaces those that had the ability to estimate downtime costs over the last 10 years. Of the 39% that responded “Yes” only 42% – can make detailed downtime estimates. Only 22% of organizations, or approximately one in five, can accurately assess the hourly cost of downtime and its impact on productivity, daily operations/transactions and the business’ bottom line.

 

Consequences of Downtime

 

There is never an opportune time for an unplanned network, system, or service failure. The hourly costs associated with downtime paint a grim picture. But to reiterate, they do not tell the whole story of just how devastating downtime can be to the business’ bottom line, productivity, and reputation.

 

The ITIC survey data revealed that although monetary losses topped users’ list of downtime concerns, it was one of several factors worrying organizations. The top five business consequences that concerned users are (in order):

  • Transaction/sales losses.
  • Lost/damaged data.
  • Customer dissatisfaction.
  • Restarting/return to full operation.
  • Regulatory compliance exposure.

 

The National Archives and Records Administration statistics indicate that 93% of organizations that experience a data center failure go bankrupt within a year.

 

Consider these scenarios:

 

  • Healthcare: A system failure during an operation could jeopardize human lives. Additionally, targeted hacks by organized groups of professional “black hat” hackers increasingly seek out confidential patient data like Social Security numbers, birth records and prescription drug information. Healthcare is one of the most highly regulated vertical industries and the U.S. and other countries’ government agencies worldwide are aggressively penalizing physicians, clinics, hospitals and healthcare organizations that fail to live up to regulatory compliance standards with respect to privacy and security.
  • Banking and Finance: Unplanned outages during peak transaction time periods could cause business to grind to a halt. Banks and stock exchanges could potentially be unable to complete transactions such as processing deposits and withdrawals and customers might not be able to access funds in ATM machines. Brokerage firms and stock exchanges routinely process millions and even billions of transactions daily. The exchanges could lose millions of dollars if transactions or trading were interrupted for just minutes during normal business hours. Financial institutions and exchanges are also among the most heavily regulated industries. Any security breaches will be the subject of intense scrutiny and investigation.
  • Government Agencies: A system failure within the Social Security Administration (SSA) that occurs when the agency is processing checks could result in delayed payments, lost productivity and require administrators to spend hours or days in remedial action.
  • Manufacturing: The manufacturing vertical is one of the top verticals targeted by hackers, surpassing the healthcare industry. According to the US National Center for Manufacturing Sciences (NCMS), 39% of all cyber-attacks in 2016 were against the manufacturing industry. Since January of 2017 and continuing to the present, March 2024, attacks against manufacturing firms are up 38% thanks to technologies like Machine Learning (ML), Artificial Intelligence (AI) and IoT. Manufacturers are often viewed as “soft targets” or easy entry points of entry into other types of enterprises and even government agencies. Efficiency and uninterrupted productivity are staples and stocks in trade in the manufacturing arena. Any slips are well documented and usually well publicized. The manufacturing shop floor has a near total reliance on robotics and machines and automated networks to get the job done. There are literally thousands of potential entry points – or potential vulnerability points into the network. The implementation of industrial control systems (ICS), centralized command centers that control and connect processes and machines, and the Internet of Things (IoT) external device integration like cameras and robotics, add multiple points of process failure and access points with possible wormholes allowing hackers to infiltrate larger networks.
  • Retail: Retailers and sales force personnel trying to close end-of-quarter results would be hard pressed if an outage occurred, which rendered them unable to access or delay access to order entries, the ability to log sales and issue invoices. This could have a domino effect on suppliers, customers, and shareholders.
  • Travel, Transportation and Logistics (TT&L): An outage at the Federal Aviation Administration’s (FAA) air traffic control systems could cause chaos: air traffic controllers would find it difficult to track flights and flight paths, raising the risk of massive delays and in a worst-case scenario, airborne and even runway collisions. An airlines reservation system outage of even a few minutes would leave the airlines unable to process reservations and issue tickets and boarding passes via online systems. This scenario has occurred several times in the past several years. Just about every major airline has experienced costly outages over the last five years; this continues to the present day. A June 2019 report released by the U.S. General Accounting Office (GAO) confirmed that 34 airline IT outages that occurred over a three-year span encompassing the years 2015 through 2018. According to the GAO “about 85% of these led to flight delays or cancellations. In 2023 and 2024, the aviation industry has been hit hard by flight cancellations and delays due to faulty aircraft components before and during flights. This almost always results in a domino effect that impacts other businesses and causes supply chain disruptions. Additionally, the U.S. Department of Transportation’s 2023 Annual Transportation Statistics Annual Report[1] noted the entire U.S. transportation system vulnerable to cyber and electronic disruptions. This is particularly true in the aviation system, which is dependent on electronic and digital navigation aids, communication systems, command and control technologies, and public information systems. Outages and cybersecurity issues also plague other transportation sectors like the trucking and auto industry. Cyber incidents pose a variety of threats to transportation systems. Cyber vulnerabilities have been documented in multimodal operational systems, control centers, signaling and telecommunications networks, draw bridge operations, transit and rail operations, pipelines, and other existing and emerging technologies. State and local governments face growing threats from hackers and cybercriminals, including those who use ransomware software that hijacks computer systems, encrypts data, and locks machines, holding them hostage until victims pay a ransom or restore the data on their own. In February 2018 hackers struck the Colorado Department of Transportation in two ransomware attacks that disrupted operations for weeks. State officials had to shut down 2,000 computers, and transportation employees were forced to use pen and paper or their personal devices instead of their work computers. A 2021 Report by SOTI, a global management firm headquartered in Ontario, Canada found that in the trucking industry “…When trucks are on the road, T&L companies make money. When they’re not, they’re losing money due to the cost of downtime which averages $448 to $760 USD per vehicle per day.”

 

When downtime occurs, business grinds to a halt, productivity is impaired and the impact on customers follows almost immediate. Aside from the immediate consequences of being unable to conduct business and loss of revenue, unreliable systems also undermine quality of service, potentially causing firms to be unable to meet SLA agreements. This can lead to breach of contract, cause the company to lose business and customers, and put it at risk for expensive penalties and litigation. Even in the absence of formal litigation, many organizations will offer their customers, business partners and suppliers some goodwill concessions in the form of future credits or discounts for inconvenience and as a way of retaining the business and mitigating the damage to the company’s reputation.  These goodwill actions/concessions, while advisable, inevitably affect the corporate bottom line.

 

Conclusions

 

The hourly cost of downtime will continue to rise. Downtime of any duration is expensive and disruptive. When a mission critical application, server or network is unavailable for even a few minutes, the business risks increase commensurately. They include:

 

  • Lost productivity.
  • Lost, damaged, destroyed, changed or stolen data.
  • Damage to the company’s reputation potentially can result in lost business.
  • Potential for litigation by business partners, customers, and suppliers.
  • Regulatory compliance exposure.
  • Potential for civil, criminal liabilities, penalties and even jail time for company executives.
  • Potential for unsustainable losses which can result in companies going out of business.

 

All these issues create pressure on organizations and their IT and security departments to ensure very high levels of system availability and avoid outages at all costs. Some 90% of businesses now require a minimum of 99.99% or greater system and network availability. Additionally, in 2024, 44% of companies say they strive for 99.999% uptime and availability which is the equivalent of 5.26 minutes of per server annual unplanned downtime. ITIC anticipates that these numbers will continue to increase.

 

If the network or a critical server or application stops, so does the business.

 

Ensuring network availability is challenging in today’s demanding business environment. Mission critical servers and crucial line of business applications are increasingly located or co-located in public and hybrid cloud environments. Cloud deployments are almost exclusively virtualized, with multiple instances of mission critical applications housed in a single server.  Without proper management, security and oversight, there is the potential for higher collateral damage in the event of any unplanned outage or system disruption.

 

Additional risk of downtime is also posed by the higher number of devices that are interconnected via IoT networks utilizing Analytics, BI and AI. IoT networks facilitate communications among applications, devices, and people. In IoT ecosystems in which devices, data, applications, and people are all interconnected, there is a heightened risk of collateral damage and potential security exposures in the event of an unplanned outage.

 

Technologies like cloud, virtualization, mobility, BYOD, IoT, Analytics, BI, AI and AIOps all deliver tangible business and technology benefits and economies of scale that can drive revenue and lower Total Cost of Ownership (TCO) and accelerate Return on Investment (ROI). But they are not fool proof and there are no panaceas.

 

In many businesses – mobility – whether because of employee travel or working remotely – is commonplace. BYOD usage is also common. Employees routinely use their personal mobile devices as well as company owned laptops, tablets, smart phones, and other devices to access the corporate data networks remotely. IT administrators have more devices to monitor, larger and more complex applications to provision and monitor and more endpoints and network portals to secure. Many enterprise IT shops are under-staffed and overworked. And many SMBs firms with 20, 50 or even 100 employees may have limited, part-time or even no dedicated onsite IT or security administrators.

 

Time is money.

To reiterate, downtime will almost certainly have a negative impact on companies’ relationships with customers, business suppliers and partners.

To minimize downtime, increase system, network availability corporations, and minimize risk enterprises must ensure that robust reliability is an inherent feature in all servers, network connectivity devices, applications, and mobile devices. This requires careful tactical and strategic planning to construct a solid business and technology strategy. A crucial component of this strategy is to deploy the appropriate device and network security and monitoring tools. Every 21st Century network environment needs robust security on all of its core infrastructure devices and systems (e.g., servers, firewalls, routers, etc.) and continuous, comprehensive end-to-end monitoring for complex, distributed applications in physical, virtual and cloud environments.

 

In summary, it is imperative that companies of all sizes and across every vertical market discipline thoroughly review every instance of downtime and estimate all the associated monetary costs; the impact on internal productivity; remediation efforts and the business risk to the organization. Companies should also determine whether customers, suppliers and business partners experienced any negative impact, e.g. unanticipated downtime, lost productivity/lost business, security exposures because of the outage.

 

All appropriate corporate stakeholders from the C-suite executives; IT and security administrators; department heads and impacted workers should have a hand in correctly calculating the hourly cost of downtime. Companies should then determine how much downtime and risk the corporation can withstand.

 

It is imperative that all businesses from micro SMBs to the largest global enterprises calculate the cost of employee and IT and security administrative time in terms of monetary and business costs. This includes the impact on productivity; data assets as well as the time it takes to remediate and restore the company to full operation. Companies must also fully assess and estimate how much risk their firms can assume, including potential liability for municipal, state, federal and even international regulatory compliance laws.

 

[1] U.S. Department of Transportation, “Transportation Statistics Annual Report 2023,” Pg. 1 -39, URL: https://www.bts.gov/sites/bts.dot.gov/files/2023-12/TSAR-2023_123023.pdf

 

ITIC 2024 Hourly Cost of Downtime Part 2 Read More »

ITIC 2024 Hourly Cost of Downtime Report Part 1

Cost of Hourly Downtime Exceeds $300,000 for 90% of Firms; 41% of Enterprises Say Hourly Downtime Costs $1 Million to Over $5 Million

ITIC Position

In the 21st century Digital Age of “always on” IoT interconnected systems, AI, analytics and cloud computing, organizations have zero tolerance for downtime. This is true for all organizations – from micro SMBs with 1 to 20 users to Fortune 100 global multinational enterprises with 100,000+ workers. Outages of even a few minutes duration cause business and productivity to grind to a halt; negatively impact reliability and security and place companies at higher risk for regulatory compliance as well as civil and even criminal penalties.

ITIC’s latest research indicates the cost of hourly downtime continues to spike.  The average cost of a single hour of downtime now exceeds $300,000 for over 90% of mid-size and large enterprises. These costs are exclusive of litigation, civil or criminal penalties. These are the results of ITIC’s 2024 Hourly Cost of Downtime Survey, an independent Web survey that polled over 1,000 firms worldwide from November 2023 through mid-March 2024.

Downtime Dangers in a Post-Pandemic World Economy

 ITIC survey data finds the escalating cost of computing/network outages is attributable to several factors:

  • An increase in the number of interconnected devices, systems and networks via the Cloud and the Internet of Things (IoT) ecosystems. Connectivity is a two-edged sword. It facilitates faster, more efficient transmissions and data access. But it also creates a limitless “attack surface” and exponentially increases the number of vulnerability points across the entire corporate ecosystem.
  • An ongoing sharp spike in security vulnerabilities. These include targeted security and ransomware attacks by organized hackers; Email Phishing scams; CEO fraud and a wide range of malware, viruses and rogue code. The spike in security and data breaches were further exacerbated by the COVID-19 global pandemic that forced countries to go on lockdown and businesses to mandate that employees work from home. This in turn, gave rise to a spate of opportunistic COVID-19 related security scams which continue today.
  • End user carelessness. Everyone from CEOs, knowledge workers, IT and Security administrators, developers, full and part-time employees, and contract workers access corporate servers, applications and information. Users regularly access sensitive data assets and intellectual property (IP) via a wide array of devices and networks. These include company and employee-owned (BYOD) mobile phones, tablets, laptops, and desktops as well as public networks. Unfortunately, many of these devices and networks lack adequate security. Absent up-to-date security and encryption, a lost or stolen device leaves the company’s data assets as well as personal employee information and the data of corporate customers, business partners and suppliers all potentially vulnerable and exposed.
  • Organizations’ near-total reliance on computers and networks to conduct business. Downtime of even a few minutes interrupts productivity and daily business operations. Downtime also has a domino effect, even if no data is lost, stolen, changed, destroyed, or hacked.

ITIC anticipates that all these trends – particularly security and data breaches as well as the trend towards remote working and remote learning — will continue unabated. The hourly cost of downtime will continue to rise.  It is imperative that organizations implement the necessary measures to ensure the reliability and security of their hardware, software applications and connectivity devices across the entire network ecosystem. Security and security awareness training are necessary to maintain the uptime and availability of devices and data assets. This will ensure continuous business operations and mitigate risk.

 

Security and Human Error are Chief Culprits Causing Downtime

And continuing a trend that has manifested over the past three years, Security, Human error, followed by software flaws and bugs, are the chief issues that undermine server hardware/server operating system, application software, appliances and network reliability resulting in unplanned downtime.

Unsurprisingly, in the 21st Century Digital Age, the functionality and reliability of the core foundation server hardware and server operating systems is more crucial than ever. The server hardware and the server OSes are the bedrock upon which the organization’s mainstream line of business (LOB) applications rest.  High reliability and near continual system and application availability is imperative for organizations’ on-premises, cloud based and Network Edge/Perimeter environments. Infrastructure — irrespective of location – is essential to the overall health of business operations.

The inherent reliability and robustness of server hardware and the server operating systems are the singularly most critical factors that influence, impact, and ultimately determine the uptime and availability of mission critical line of business applications, virtual machines (VMs) that run on top of them and the connectivity devices that access them.

Additionally, ITIC’s latest 2024 Reliability research reveals that a variety of external factors are having more of a direct impact on system downtime and overall availability. These include overworked and understaffed IT departments; the rapid mainstream adoption of complex new technologies such as the aforementioned IoT, Big Data Analytics, virtualization and increasing cloud computing deployments and the continuing proliferation of BYOD and mobility technologies.

In the context of its annual Reliability and Security surveys, ITIC broadly defines human error to encompass both the technology and business mistakes organizations make with respect to their network equipment and strategies.

Human error as it relates to technology includes but is not limited to:

  • Configuration, deployment, and management mistakes.
  • Failure to upgrade or right size servers to accommodate more data and compute intensive workloads.
  • Failure to migrate and upgrade outmoded applications that are no longer supported by the vendor.
  • Failing to keep up to date on patches and security.

Human error with respect to business issues includes:

  • Failure to allocate the appropriate Capital Expenditure and Operational Expenditure funds for equipment purchases and ongoing management and maintenance functions.
  • Failure to devise, implement and upgrade the necessary computer and network to address issues like Cloud computing, Mobility, Remote Access, and Bring Your Own Device (BYOD).
  • Failure to construct and enforce strong computer and network security policies.
  • Ignorance of Total Cost of Ownership (TCO), Return on Investment (ROI).
  • Failure to track hourly downtime costs.
  • Failure to track and assess the impact of Service Level Agreements and regulatory compliance issues like Sarbanes-Oxley (SOX), Health Insurance Portability and Accountability Act (HIPAA).

On a positive note, the inherent reliability of server hardware and server operating system software as well as advancements in the underlying processor technology, all continue to improve year over year. But the survey results also reveal that external issues, most notably human error and security breaches, have also assumed greater significance in undermining system and network accessibility and performance.

The overall health of network operations, applications, management, and security functions all depend on the core foundation elements: server hardware, server operating systems and virtualization to deliver high availability, robust management and solid security. The reliability of the server, server OS and virtualization platforms form the foundation of the entire network infrastructure. The individual and collective reliability of these platforms has a direct, immediate, and long-lasting impact on daily operations and business results.

[1] U.S. Department of Transportation, “Transportation Statistics Annual Report 2023,” Pg. 1 -39, URL: https://www.bts.gov/sites/bts.dot.gov/files/2023-12/TSAR-2023_123023.pdf

 

ITIC 2024 Hourly Cost of Downtime Report Part 1 Read More »

ITIC 2024 Sexual Harassment, Gender Bias & Equal Pay Survey

This survey polls professional women (including students and interns) in Science, Technology, Engineering, and Math (STEM) disciplines on their real-world experiences dealing with the very serious issues of Sexual Harassment, Gender Bias, and Equal Pay in the workplace and how they deal with them.

 

Take the survey here: https://www.surveymonkey.com/r/VWXRC97

 

Leave a comment along with your email address for a chance to win one of three (3) $100 Amazon gift cards.

All responses are confidential.

 

 

ITIC 2024 Sexual Harassment, Gender Bias & Equal Pay Survey Read More »

ITIC 2023 Reliability Survey IBM Z Results

The IBM z16 mainframe lives up to its reputation for delivering “zero downtime.”

 

The latest z16 server, introduced in April 2022, delivers nine nines—99.9999999%—of uptime and reliability. This is just over 30 milliseconds – 31.56 milliseconds to be precise – of per server annual downtime, according to the results of the ITIC 2023 Global Server Hardware, Server OS Reliability Survey.

ITIC’s 2023 Global Server Hardware, Server OS Reliability independent web-based survey polled nearly 1,900 corporations worldwide across over 30 vertical market segments on the reliability, performance and security of the leading mainstream on-premises and cloud-based servers from January through July 2023. To maintain objectivity, ITIC accepted no vendor sponsorship.

ITIC’s 2023 Global Server Hardware, Server OS Reliability survey also found that an 88% majority of the newest IBM Power10 server (shipping since September 2021) users say their organizations achieved eight nines—99.999999%—of uptime. This is 315 milliseconds of unplanned, per server, per annum outage time due to underlying system flaws or component failures. So, Power10 corporate enterprises spend just $7.18 per server/per year performing remediation due to unplanned server outages that occurred due to inherent flaws in the server hardware or component parts.

The IBM z16 and Power 10 server-specific uptime statistics were obtained by breaking out the results of more than 200 respondent organizations that deployed the z16 since it began shipping in April/May 2022. A 96% majority of these z16 enterprises say their businesses achieved nine
nines—99.9999999%—of server uptime. This is the equivalent of a near-imperceptible 31.56 milliseconds of per server annual downtime due to any inherent flaws in the server hardware and its various components (See Table 1).

An IBM spokesperson says that currently the IBM Z mainframe achieves an average of “eight nines” or 99.999999% reliability overall and that statistic includes the various versions (the z13, z14, z15 and z16) of its mainframe enterprise system. IBM has not yet reviewed ITIC’s independent survey data on the z16 results.

To put these statistics into perspective: The latest z16 corporate enterprises and their IT managers spend mere pennies per server/per year performing remediation activities due to unplanned per server outages that occurred due to inherent system failures.

This is the 15th consecutive year that the IBM Z and IBM Power Systems have dominated with the best across-the-board uptime reliability ratings among 18 mainstream distributions.

Additionally, the z16 customers say their firms experienced a 20% to 30% improvement in overall reliability, performance, response times and critical security metrics versus older iterations of the zSystems platforms.

Previous versions of the IBM Z mainframe—the z13, z14 and z15—always delivered best-in-class reliability. ITIC’s 2023 Global Reliability study found that the aggregate average results from all z13, z14 and z15 customers ranged between seven and eight nines of uptime depending on the version, age, server configurations and specific use cases.

There is an order of magnitude of that distinguishes the “nines” of uptime and reliability. For example, four nines of uptime—which is the current acceptable level of uptime for many mainstream businesses—equals 52.56 minutes of unplanned annual per server downtime. In contrast, five nines of uptime is the equivalent of just 5.26 minutes of unplanned annual per server downtime.

Meanwhile, the fault-tolerant levels of reliability – seven and eight nines, 99.99999% and 99.999999% represent 3.15 seconds and 315 milliseconds, respectively of unplanned per server annual outages due to server or component failures.

 

The z16: A Quantum Leap Forward in Reliability, Performance and Cloud Functionality

 

The IBM Z mainframes have always delivered best-in-class reliability, performance and security. However, the z16 quite literally takes a quantum leap forward by providing advanced capabilities like on-chip AI inferencing and quantum-safe computing.

The IBM z16 and Power10 servers also delivered the strongest server security, experiencing the fewest number of successful data breaches, the least amount of downtime due to security-related incidents and the fastest mean time to detection (MTTD). ITIC’s latest 2023 Global Server Hardware Security Survey found that 97% of IBM z16 enterprises were able to detect, isolate and shut down attempted data breaches immediately to within the first 10 minutes. Additionally, 92% of IBM Power10 customers detected and repelled attempted hacks immediately to within the first 10 minutes. An organization’s ability to identify and thwart security breaches, minimizes downtime, saves money and mitigates risk.

ITIC’s 2023 survey data found that 84% of respondent enterprises cited security issues as the top cause of unplanned downtime. And 67% of respondents cite human error as the cause of unplanned server and application outages. Human error encompasses everything from accidentally disconnecting a server, to misconfiguration issues and incompatibilities among disparate hardware and application and server OS software to failure to properly right size the server to adequately accommodate mission critical workloads.

Overall, the IBM z16 offers near perfect reliability and the most incredibly robust mainstream security available today.

ITIC 2023 Reliability Survey IBM Z Results Read More »

IBM Z, IBM Power Systems & Lenovo ThinkSystem Servers Most Secure, Toughest to Crack

For the fourth straight year, enterprises ranked mission critical servers from IBM, Lenovo, Huawei and Hewlett-Packard Enterprise (in that order) as the most secure platforms which experienced the least amount of successful data breaches and proved the most formidable for hackers to crack.

Only a miniscule 0.1% of IBM Z mainframes suffered unplanned downtime due to a successful data breach. And just two percent (2%) of IBM Power Systems; two percent (2%) of Lenovo Think Systems; three percent (3%) of Huawei KunLun and four percent (4%) of HPE Superdome servers experienced downtime, application inaccessibility and productivity disruptions due to security attacks.

Those are the results of ITIC’s 2022 Global Server Hardware Security survey which compared the security features and functions of 18 different server platforms. ITIC’s independent Web-based survey polled 1,550 businesses worldwide across 30 different vertical market sectors from January through mid-November 2022.

ITIC’s latest study found that strong security enabled IBM, Lenovo, Huawei and HPE corporate enterprises to lower annual IT operational costs related to cyberattacks by 27% to over 60%, compared to the least secure server hardware distributions. .

IBM, Lenovo, Huawei, HPE and Cisco hardware (in that order) recorded the top overall scores in every security category, successfully solidifying and improving their top positions as the most secure and reliable server platforms despite a significant 86% spike in security hacks and data breaches over the past two and a half years.

The top servers led by the IBM Z; IBM POWER; the Lenovo ThinkSystem; the Huawei KunLun and HPE (in that order), all scored their respective best security performances in the latest poll. These vendors achieved the best security results among 18 mainstream server hardware platforms in every security category, including:

  • The fewest number of successful security hacks/data breaches.
  • The least amount of overall unplanned server downtime for any reason and the least amount of unplanned server downtime due to a data breach incident.
  • The fastest Mean Time to Detection (MTTD) from the onset of the attack until the company isolated and shut it down.
  • The fastest Mean Time to Remediation (MTTR) to restore servers, applications and networks to full operation.
  • The least amount of lost, stolen, destroyed, damaged or changed data as a direct consequence of a security data breach (e.g. Ransomware, phishing scam or CEO fraud).
  • The least amount of monetary losses due to a successful security hack.
  • The highest confidence in the embedded security of the server hardware to deliver alerts/warnings and repel security attacks and data breaches.

The IBM Z mainframe outperformed all other server distributions – delivering near foolproof security and true fault tolerant seven nines or better (99.9999999%) uptime and reliability. Only a minuscule – 0.1% – of IBM Z mainframes and 0.2% of IBM LinuxONE III systems experienced a successful security breach.

IBM standalone Power Systems and the Lenovo ThinkSystem servers were in a statistical tie; with only two percent (2%) of respondents reporting a successful hack over the last 12 months. Only a minuscule – 0.1% – of IBM Z mainframes and IBM LinuxONE III systems experienced a successful security breach. The IBM Power8, Power9 and Power10 servers again delivered top notch security among all mainstream hardware distributions with 95% of survey respondents reporting their firms were able to identify and thwart attempted security penetrations immediately or within the first 10 minutes of detection.

The Lenovo ThinkSystem servers achieved the best security scores among all x86 server distributions for the fourth year in a row. Lenovo ThinkSystem servers similarly delivered the best MTTD rates among all Intel x86-based servers. A 95% of majority of Lenovo ThinkSystem survey respondents said their IT and security administrators detected and repelled attempted hacks and data breaches immediately or within the first 10 minutes of the penetration.

Huawei’s KunLun mission critical platform was close behind with three percent (3%) of customers experiencing a successful hack and four percent (4%) HPE Integrity Superdome customers said they had a successful security breach over the last year.

Just over one-in-ten or 11% of Cisco UCS servers were successfully hacked. Cisco’s hardware performed extremely well, particularly considering that a large portion of UCS servers are deployed in remote locations and at the network edge. Inexpensive unbranded White box servers again proved the most porous – nearly half – 48% – of survey respondents said their businesses were hacked. This is a four percent (4%) increase compared to ITIC’s 2021 survey.

Security is, and will remain the number one issue that either fortifies or undermines the reliability of mission critical server hardware, server operating system and applications. Businesses that hope to keep their data assets secure and ensure continuous, uninterrupted operations are well advised to deploy the most secure server hardware, server OS and application infrastructure. Security is and will continue to rank as the number one cause of unanticipated downtime for the foreseeable future. Any organization that ignores security does so at its own risk. Ask yourselves: what does my organization have to lose and how much is my company willing to risk?

IBM Z, IBM Power Systems & Lenovo ThinkSystem Servers Most Secure, Toughest to Crack Read More »

Server and Application Reliability by the Numbers: Understanding “The Nines”

Reliability/Uptime by the Numbers

Organizations measure server and application reliability percentages in “nines.” There is an order of magnitude difference of server and application reliability and uptime between each additional “nine.”  Four nines – 99.99% – reliability equals 52.56 minutes of unplanned per server/per annum downtime or 4.32 minutes of per server monthly unplanned downtime (See Table 1). By contrast, five nines – 99.999% – is the equivalent of 5.26 minutes of unplanned per server/per annum and just 25.9 seconds of monthly unplanned system downtime. The highly sought after continuous uptime and availability levels of six nines equals a near-imperceptible 2.59 seconds of per server unplanned monthly downtime, while seven nines equals 3.15 seconds of yearly system downtime.

Table1 below depicts the availability percentages and the equivalent number of annual, monthly and weekly hours and minutes of per server/per annum downtime. It illustrates the business and monetary impact on operations. ITIC publishes this table in every one of its Global Server Hardware, Server OS Reliability reports. It serves as a useful reference guide to enable organizations to calculate downtime and determine their levels of server uptime.

Table 1: Reliability/Uptime by the Numbers

Reliability %                   Downtime per year Downtime per month Downtime per week
90% (one nine) 36.5 days 72 hours 16.8 hours
95% 18.25 days 36 hours 8.4 hours
97% 10.96 days 21.6 hours 5.04 hours
98% 7.30 days 14.4 hours 3.36 hours
99% (two nines) 3.65 days 7.20 hours 1.68 hours
99.5% 1.83 days 3.60 hours 50.4 minutes
99.8% 17.52 hours 86.23 minutes 20.16 minutes
99.9% (three nines) 8.76 hours 43.8 minutes 10.1 minutes
99.95% 4.38 hours 21.56 minutes 5.04 minutes
99.99% (four nines) 52.56 minutes 4.32 minutes 1.01 minutes
99.999% (five nines) 5.26 minutes 25.9 seconds 6.05 seconds
99.9999% (six nines) 31.5 seconds 2.59 seconds 0.605 seconds
99.99999% (seven nines) 3.15 seconds 0.259 seconds 0.0605 seconds

Source: ITIC 2022 Global Server Hardware, Server OS Reliability Survey

The aforementioned metrics clearly underscore that the IBM z14, z15 and the newest z16; along with the LinuxONE III platform continue to maintain continuous levels of reliability, with just 0.0043 minutes of unplanned monthly per server downtime. This equates to just 3.15 seconds of unplanned per server annual downtime which is the equivalent of “seven nines” of true fault tolerant uptime. They were followed closely by the IBM Power8, Power9 and Power10 with one (1) minute of per server unplanned monthly downtime and the Lenovo x86-based ThinkSystem with 1.10 minutes of per server unplanned downtime each month. In practical terms, this means there is minimal or imperceptible impact on daily business operations, end user productivity and corporate revenue.

In 2022 and heading into 2023, a price tag of $100,000 (USD) for one hour of downtime for a single server is extremely conservative for all but the smallest micro SMBs with one to 25 employees. It equates to $1,670 per minute/per server. Hourly cost of downtime calculated at $300,000 equals about $5,000 per server/per minute. The cost of a more severe or protracted hourly outage that a business estimated at $1 million (USD) is the equivalent of $16,700 per server/per minute.

ITIC’s 2022 Global Server Hardware and Server OS Reliability Survey found that 91% of respondents now estimate that one hour of downtime costs the firm $301,000 or more; this is an increase of two (2) percentage points in less than two year. Of that number, 44% of those polled indicated that hourly downtime costs now exceed $1 million. Since 2021, only one (1%) percent of respondents said a single hour of downtime costs them $100,000 or less. Nine percent (9%) of respondents valued hourly downtime at $101,000 to $300,000.

There are many cost variables. For instance, an issue that takes down a server(s) running a non-business essential application; or downtime that occurs in off-peak or non-usage hours, may have minimal to no impact on business operations and negligible financial consequences.

On the other end of the spectrum, cloud-based server outages involving a virtualized server running two, three or four instances of a business-critical application housed in a single physical machine have the potential to double, triple or quadruple business losses when daily business operations are interrupted and employees and business partners, suppliers and other stakeholders are denied access to critical data.

The most expensive hourly downtime scenario presented in Table 2 depicts per server/per minute outage expense impacting 1,000 servers at an organization that values an hour of downtime at $10 million. In this example, a large enterprise could conceivably sustain crippling losses of $166,667,000 per server/per minute.

The aforementioned ITIC Hourly Downtime monetary figures represent only the costs associated with remediating the actual technical issues and business problems that caused the server or OS to fail. They do not include legal fees, criminal or civil penalties the company may incur or any “goodwill gestures” that the firm may elect to pay customers (e.g., discounted or free equipment or services).

Server and Application Reliability by the Numbers: Understanding “The Nines” Read More »

ITIC 2022 Global Server Reliability Survey Finds IBM Z, IBM Power Systems, Lenovo ThinkSystem deliver top reliability

  • The IBM Z and IBM Power Systems continue to dominate, delivering the best server reliability, uptime and security for the 14th straight year.
  • Lenovo’s ThinkSystem servers provide the top reliability and security among all x86 server distributions for nearly nine straight years.
  • Huawei KunLun, Hewlett-Packard Enterprise (HPE) Superdome mission critical servers also register high reliability and security rankings challenging the leaders. Cisco continues to up its game with robust network edge reliability and security.
  • IBM Z and IBM Power Systems deliver over 40x more uptime than least efficient “White box” platforms and 60x lower Total Cost of Ownership (TCO). The Lenovo ThinkSystem, Huawei KunLun and HPE Superdome (in that order) delivered the highest reliability among x86 platforms.
  • Over three-quarters of businesses – 78% – cite security as the top cause of unplanned downtime and 64% said human error causes unplanned outages.

 

 

Mission critical server and server OS distributions from IBM, Lenovo, Hewlett-Packard Enterprise (HPE), Huawei and Cisco continue to deliver the highest levels of inherent reliability and availability among 18 different server platforms despite a continuing spike in security hacks, increasing ecosystem complexities and ongoing supply chain challenges.

For the 14th consecutive year, the IBM Z, the LinuxONE III and the IBM Power Systems remained the preeminent server platforms posting the best across-the-board reliability ratings among 18 mainstream distributions. Some 96% of IBM Z mainframes and LinuxONE III server customers recorded seven nines (99.99999%) of true fault tolerant reliability and availability. The IBM Z, and LinuxONE III recorded a near-imperceptible 0.0043 minutes of per server unplanned monthly outages or just 3.15 seconds of unplanned per server downtime annually (See Table 1). This was followed by 93% of IBM Power Systems clients said the IBM systems achieved five and six nines of system reliability and availability (See Exhibit 1). The IBM Power8, Power9 and Power10 servers posted just one (1) minute each of unplanned per server monthly downtime.

The Lenovo ThinkSystem servers followed closely and posted the highest levels of reliability among all x86 hardware distributions for the eighth consecutive year. A 92% majority of Lenovo servers attained five and six nines of reliability, posting just over one minute – 1.10 – of unplanned per server monthly downtime. The Huawei KunLun and Fusion servers, the HPE Superdome and the Cisco UCS hardware (in that order), rounded out the top five most reliable server platforms.

Those are the results of the ITIC 2022 Global Server Hardware, Server OS Reliability independent Web-based survey. It polled 1,550 corporations across 30 vertical market segments worldwide on the reliability, performance and security of the leading mainstream on-premises and cloud-based servers from July through mid-November 2022. In order to maintain objectivity, ITIC accepted no vendor sponsorship.

The increased server and server operating system uptime and availability enabled the IBM, Lenovo, Huawei, HPE and Cisco servers (in that order) to deliver, the most economical Total Cost of Ownership (TCO) among all mainstream distributions in datacenters, at the network edge and in hybrid cloud environments.

The Lenovo ThinkSystem servers likewise improved their uptime and availability recording the best reliability among all x86 servers – a scant 1.10 minutes of per server unplanned monthly outages. The Huawei KunLun and Fusion platforms also improved uptimes with 1.27 minutes each of unplanned per server outage, along with the HPE Superdome platform which averaged 1.44 minutes of unanticipated per server downtime. Cisco’s UCS servers also hung tough. Cisco servers frequently are installed at the network edge/perimeter, which is often the first line of attack. The Cisco UCS servers registered two (2) minutes of monthly unplanned per server downtime.

The top server reliability vendors – led by IBM, Lenovo, HPE and Huawei – also delivered the strongest server security, experiencing the fewest number of successful data breaches and the least amount of downtime due to security-related incidents.

ITIC 2022 Global Server Reliability Survey Finds IBM Z, IBM Power Systems, Lenovo ThinkSystem deliver top reliability Read More »

Scroll to Top