Year

ITIC 2024 Hourly Cost of Downtime Part 2

There’s No Right Time for Downtime

                                                      

ITIC’s 11th annual Hourly Cost of Downtime Survey data indicates that 97% of large enterprises with more than 1000 employees say that on average, a single hour of downtime per year costs their company over $100,000. Even more significantly: four in 10 enterprises – 41% – indicate that hourly downtime costs their firms $1 million to over $5 million (See Exhibit 1). It’s important to note that these statistics represent the “average” hourly cost of downtime.  In a worst-case scenario – such as a catastrophic outage that occurs during peak usage times or an event that disrupts a crucial business transaction – the monetary losses to the organization can reach and even exceed millions per minute.

Additionally, in highly regulated vertical industries like Banking and Finance, Food, Energy, Government, Healthcare, Hospitality, Hotels, Manufacturing, Media and Communications, Retail, Transportation and Utilities, must also factor in the potential losses related to litigation. Businesses may also be liable for civil penalties stemming from their failure to meet Service Level Agreements (SLAs) or Compliance Regulations. Moreover, for select organizations, whose businesses are based on compute-intensive data transactions, like stock exchanges or utilities, losses may be calculated in millions of dollars per minute.

ITIC’s most recent poll – conducted in conjunction with the ITIC 2023 Global Server Hardware Server OS Reliability Survey – found that a 90% majority of organizations now require a minimum of 99.99% availability. This is up from 88% in the last 2 ½ years. The so-called 99.99% or “four nines” of reliability equals 52 minutes of unplanned per server/per annum downtime for mission critical systems and applications or, 4.33 minutes of unplanned monthly outages for servers, applications, and networks.

All categories of businesses were represented in the survey respondent pool: 27% were small/midsized (SMB) firms with up to 200 users; 28% came from the small/midsized (SME) enterprise sector with 201 to 1,000 users and 45% were large enterprises with over 1,000 users.

These above statistics are not absolute. They are the respondents’ estimates of the cost of one hour of hourly downtime due to interrupted transactions, lost/damaged data, and end user productivity losses that negatively impacted corporations’ bottom line. These figures exclude the cost of litigation, fines or civil or criminal penalties associated with regulatory non-compliance violations. These statistics are also exclusive of any voluntary “good will” gestures a company elects to make of its own accord to its customers and business partners that were negatively affected by a system or network failure. Protracted legal battles and out-of-court settlements, fines and voluntary good-will gestures do take a toll on the company’s revenue and cause costs to skyrocket further – even if they do help the firm retain the customer account. There are also “soft costs” that are more elusive and difficult to measure, but nonetheless negative. These include the damage to the company’s reputation which may result in untold lost business and persist for months and years after a highly publicized incident.

To reiterate: in today’s Digital Age of “always on” networks and connectivity, organizations have no tolerance for downtime. It is expensive and risky. And it is just plain bad for business.

Only four (4%) percent of enterprise respondents said that downtime costs their companies less than $100,000 in a single 60-minute period and of that number an overwhelming 93% majority were micro SMBs with fewer than 10 employees. Downtime costs are similarly high for small and midsized businesses (SMBs) with 11 to 200 employees. To reiterate, these figures are exclusive of penalties, remedial action by IT administrators and any ensuing monetary awards that are the result of litigation, civil or criminal non-compliance penalties.

 

Do the Math: Downtime Costs Quickly Add Up

 

Downtime is expensive for all businesses – from global multinational corporations to small businesses with fewer than 20, 50 or 100 employees. Hourly losses of hundreds of thousands or millions per hour or even minutes in transaction-heavy environments are unfortunately commonplace. Exhibit 3 depicts the monetary costs of per server/per minute downtime involving a single server to as many as 1,000 servers in which businesses calculate hourly downtime costs from $100,000 to $10,000,000 million (USD).

As Exhibit 3 illustrates a one minute of downtime for a single server in a company that calculates its hourly cost of downtime for a mission critical server or application at $100,000 is $1,667 and $16,670 per minute when downtime affects 10 servers and main line of business applications/data assets. The above chart graphically emphasizes how quickly downtime costs add up for corporate enterprises.

Small businesses are equally at risk, even if their potential downtime statistics are a fraction of large enterprises.  For example, an SMB company that estimates that one hour of downtime “only” costs the firm $10,000 could still incur a cost of $167 for a single minute of per server downtime on their business-critical server. Similarly, an SMB company that assumes that one hour of downtime costs the business $25,000 could still potentially lose an estimated $417 per server/per minute. With few exceptions micro SMBs –with 1 to 20 employees – typically would not rack up hourly downtime costs of hundreds of thousands or millions in hourly losses. Small companies, however, typically lack the deep pockets, larger budgets, and reserve funds of their enterprise counterparts to absorb financial losses or potential litigation associated with downtime. Therefore, the resulting impact could be as devastating for them as it is for enterprise firms.

Hourly downtime costs of $25,000; $50,000 or $75,000 (exclusive of litigation or civil and even criminal penalties) may be serious enough to put the SMB out of business – or severely damage its reputation and cause it to lose business.

 

Hourly Downtime Costs Exceed $5 Million for Top Verticals

 

Exhibit 4 shows that ITIC’s Hourly Cost of Downtime survey revealed that for large enterprises, the costs associated with a single hour of downtime are much higher. Average hourly outage costs topped the $5 Million (USD) mark for the top verticals. These include Banking/Finance; Government; Healthcare; Manufacturing; Media & Communications; Retail; Transportation and Utilities.

Once again, except in specific and rare instances, in 2024, corporations have a near total reliance on their personal and employer-owned interconnected networks and applications to conduct business. Corporate revenue and productivity are inextricably linked to the reliability and availability of the corporate network and its data assets. When servers, applications and networks are unavailable for any reason business and productivity slow down or cease completely.

The minimum reliability/uptime requirements for the top vertical market segments are even more stringent and demanding than the corporate averages in over 40 other verticals as Exhibit 5 below illustrates.

The above industries are highly regulated and incorporate strict compliance laws. But even without regulatory oversight the top vertical market segments are highly visible. Their business operations demand near flawless levels of uninterrupted, continuous operation. In the event of an unplanned outage of even a few minutes, when users cannot access data and applications, business stops and productivity ceases.

These statistics reinforce what everyone knows: infrastructure, security, data access and data privacy and adherence to regulatory compliance are all imperative.

Server hardware, server OS and application reliability all have direct and far-reaching consequences on the corporate bottom line and ongoing business operations.  Unreliable and unavailable server hardware, server operating systems and applications will irreparably damage companies’ reputation.

In certain extreme cases, business, and monetary losses because of unreliable servers can cause an enterprise to miss its quarterly or annual revenue forecasts or even go out of business as a direct consequence of sustained losses and litigation brought on by the outage.

 

Minimum Reliability Requirements Increase Year over Year

Time is money. Time also equates to productivity and the efficiency and continuity of ongoing, uninterrupted daily operations. If any of these activities are compromised by outages for any reason – technical or operational failure that renders the systems and the data unavailable – business grinds to a halt. This negatively impacts the corporate enterprise. The longer the outage lasts, the higher the likelihood of having a domino effect on the corporation’s customers, business partners and suppliers. This in turn will almost certainly raise Total Cost of Ownership (TCO) and undermine the return on investment (ROI).

High reliability and high availability are necessary to manage the corporation’s level of risk and exposure to liability and potential litigation resulting from unplanned downtime and potential non-compliance with regulatory issues. This is evidenced by corporations’ reliability requirements which have increased every year for the past 11 years that ITIC has polled organizations on these metrics.

Consider the following: in 2008, the first year that ITIC surveyed enterprises on their Reliability requirements, 27% of businesses said they needed just 99% uptime; four-in-10 corporations – 40% – required 99.9% availability. In that same 2008 survey, only 23% of firms indicated they required a minimum of “four nines” or 99.99% uptime for their servers, operating systems, virtualized and cloud environments, while a seven percent (7%) minority demanded the highest levels of “five nines” – 99.999% or greater availability.

A decade ago, in ITIC’s 2014 Hourly Cost of Downtime poll, 49% of businesses required 99.99% or greater reliability/uptime; this is an increase of 39% percentage points in the last six years to the present fall 2020. Four nines – 99.99%+ and greater reliability are mission-critical are now the minimum standard for reliability. In our latest 2020 survey – none – 0%- of survey respondents indicated their organizations could live with just “two nines” – 99% uptime or 88 hours of annual unplanned per server downtime!

As Exhibit 5 illustrates, “four nines” or 99.99% uptime and availability is the average minimum requirement for 88% of organizations. However, more and more companies – an overall average of 25% of respondents across all vertical industries as of November 2020 say their businesses now require “five nines” or 99.999% server and operating system uptime. This equates to 5.26 minutes of per server/per annum unplanned downtime. And three percent (3%) of leading-edge businesses need “six nines” 99.9999% near-flawless mainframe class fault tolerant server availability of 31.5 seconds per server/per month.

Increasingly many organizations have even more stringent reliability needs. Requirements of “five and six nines” – 99.999% and 99.9999% – reliability and availability are becoming much more commonplace among all classes of businesses.  The reasons are clear: corporations have no tolerance for downtime. They, their end users, business partners, customers, and suppliers all demand uninterrupted access to data and applications to conduct business 24 x7 irrespective of geographic location.

Security Attacks: End Users are Biggest Culprits in Downtime

 

ITIC’s latest survey results found that security issues and end user carelessness were among the top causes of unplanned system and network downtime in 20202 ITIC expects this trend to continue throughout 2021 and beyond as organized hackers launch ever more sophisticated and pernicious targeted attacks.

 

  • Security attacks – including targeted attacks by organized hackers, Ransomware attacks, Phishing and Email scams and CEO fraud hacks – now rank as the top cause of downtime, according to 84% of ITIC survey respondents (See Exhibit 6)
  • User Error– is also increasingly contributing to corporate downtime and is now rated among the top three causes of company outages along with software flaws/bugs, according to 69% of ITIC survey respondents. End user carelessness encompasses everything from company employees being careless with and losing their own and company owned BYOD devices like laptops, tablets, and mobile phones. Many users fail to properly secure their devices and when they’re lost or stolen, the intellectual property (IP) and sensitive data is easily accessible to prying eyes and thieves. End user carelessness also manifests itself in other ways: many naïve users click on bad links and fall prey to Phishing scams, CEO fraud and leave themselves and their company wide open to security hacks.

The percentage of enterprises unable to calculate the hourly cost of downtime consistently outpaces those that had the ability to estimate downtime costs over the last 10 years. Of the 39% that responded “Yes” only 42% – can make detailed downtime estimates. Only 22% of organizations, or approximately one in five, can accurately assess the hourly cost of downtime and its impact on productivity, daily operations/transactions and the business’ bottom line.

 

Consequences of Downtime

 

There is never an opportune time for an unplanned network, system, or service failure. The hourly costs associated with downtime paint a grim picture. But to reiterate, they do not tell the whole story of just how devastating downtime can be to the business’ bottom line, productivity, and reputation.

 

The ITIC survey data revealed that although monetary losses topped users’ list of downtime concerns, it was one of several factors worrying organizations. The top five business consequences that concerned users are (in order):

  • Transaction/sales losses.
  • Lost/damaged data.
  • Customer dissatisfaction.
  • Restarting/return to full operation.
  • Regulatory compliance exposure.

 

The National Archives and Records Administration statistics indicate that 93% of organizations that experience a data center failure go bankrupt within a year.

 

Consider these scenarios:

 

  • Healthcare: A system failure during an operation could jeopardize human lives. Additionally, targeted hacks by organized groups of professional “black hat” hackers increasingly seek out confidential patient data like Social Security numbers, birth records and prescription drug information. Healthcare is one of the most highly regulated vertical industries and the U.S. and other countries’ government agencies worldwide are aggressively penalizing physicians, clinics, hospitals and healthcare organizations that fail to live up to regulatory compliance standards with respect to privacy and security.
  • Banking and Finance: Unplanned outages during peak transaction time periods could cause business to grind to a halt. Banks and stock exchanges could potentially be unable to complete transactions such as processing deposits and withdrawals and customers might not be able to access funds in ATM machines. Brokerage firms and stock exchanges routinely process millions and even billions of transactions daily. The exchanges could lose millions of dollars if transactions or trading were interrupted for just minutes during normal business hours. Financial institutions and exchanges are also among the most heavily regulated industries. Any security breaches will be the subject of intense scrutiny and investigation.
  • Government Agencies: A system failure within the Social Security Administration (SSA) that occurs when the agency is processing checks could result in delayed payments, lost productivity and require administrators to spend hours or days in remedial action.
  • Manufacturing: The manufacturing vertical is one of the top verticals targeted by hackers, surpassing the healthcare industry. According to the US National Center for Manufacturing Sciences (NCMS), 39% of all cyber-attacks in 2016 were against the manufacturing industry. Since January of 2017 and continuing to the present, March 2024, attacks against manufacturing firms are up 38% thanks to technologies like Machine Learning (ML), Artificial Intelligence (AI) and IoT. Manufacturers are often viewed as “soft targets” or easy entry points of entry into other types of enterprises and even government agencies. Efficiency and uninterrupted productivity are staples and stocks in trade in the manufacturing arena. Any slips are well documented and usually well publicized. The manufacturing shop floor has a near total reliance on robotics and machines and automated networks to get the job done. There are literally thousands of potential entry points – or potential vulnerability points into the network. The implementation of industrial control systems (ICS), centralized command centers that control and connect processes and machines, and the Internet of Things (IoT) external device integration like cameras and robotics, add multiple points of process failure and access points with possible wormholes allowing hackers to infiltrate larger networks.
  • Retail: Retailers and sales force personnel trying to close end-of-quarter results would be hard pressed if an outage occurred, which rendered them unable to access or delay access to order entries, the ability to log sales and issue invoices. This could have a domino effect on suppliers, customers, and shareholders.
  • Travel, Transportation and Logistics (TT&L): An outage at the Federal Aviation Administration’s (FAA) air traffic control systems could cause chaos: air traffic controllers would find it difficult to track flights and flight paths, raising the risk of massive delays and in a worst-case scenario, airborne and even runway collisions. An airlines reservation system outage of even a few minutes would leave the airlines unable to process reservations and issue tickets and boarding passes via online systems. This scenario has occurred several times in the past several years. Just about every major airline has experienced costly outages over the last five years; this continues to the present day. A June 2019 report released by the U.S. General Accounting Office (GAO) confirmed that 34 airline IT outages that occurred over a three-year span encompassing the years 2015 through 2018. According to the GAO “about 85% of these led to flight delays or cancellations. In 2023 and 2024, the aviation industry has been hit hard by flight cancellations and delays due to faulty aircraft components before and during flights. This almost always results in a domino effect that impacts other businesses and causes supply chain disruptions. Additionally, the U.S. Department of Transportation’s 2023 Annual Transportation Statistics Annual Report[1] noted the entire U.S. transportation system vulnerable to cyber and electronic disruptions. This is particularly true in the aviation system, which is dependent on electronic and digital navigation aids, communication systems, command and control technologies, and public information systems. Outages and cybersecurity issues also plague other transportation sectors like the trucking and auto industry. Cyber incidents pose a variety of threats to transportation systems. Cyber vulnerabilities have been documented in multimodal operational systems, control centers, signaling and telecommunications networks, draw bridge operations, transit and rail operations, pipelines, and other existing and emerging technologies. State and local governments face growing threats from hackers and cybercriminals, including those who use ransomware software that hijacks computer systems, encrypts data, and locks machines, holding them hostage until victims pay a ransom or restore the data on their own. In February 2018 hackers struck the Colorado Department of Transportation in two ransomware attacks that disrupted operations for weeks. State officials had to shut down 2,000 computers, and transportation employees were forced to use pen and paper or their personal devices instead of their work computers. A 2021 Report by SOTI, a global management firm headquartered in Ontario, Canada found that in the trucking industry “…When trucks are on the road, T&L companies make money. When they’re not, they’re losing money due to the cost of downtime which averages $448 to $760 USD per vehicle per day.”

 

When downtime occurs, business grinds to a halt, productivity is impaired and the impact on customers follows almost immediate. Aside from the immediate consequences of being unable to conduct business and loss of revenue, unreliable systems also undermine quality of service, potentially causing firms to be unable to meet SLA agreements. This can lead to breach of contract, cause the company to lose business and customers, and put it at risk for expensive penalties and litigation. Even in the absence of formal litigation, many organizations will offer their customers, business partners and suppliers some goodwill concessions in the form of future credits or discounts for inconvenience and as a way of retaining the business and mitigating the damage to the company’s reputation.  These goodwill actions/concessions, while advisable, inevitably affect the corporate bottom line.

 

Conclusions

 

The hourly cost of downtime will continue to rise. Downtime of any duration is expensive and disruptive. When a mission critical application, server or network is unavailable for even a few minutes, the business risks increase commensurately. They include:

 

  • Lost productivity.
  • Lost, damaged, destroyed, changed or stolen data.
  • Damage to the company’s reputation potentially can result in lost business.
  • Potential for litigation by business partners, customers, and suppliers.
  • Regulatory compliance exposure.
  • Potential for civil, criminal liabilities, penalties and even jail time for company executives.
  • Potential for unsustainable losses which can result in companies going out of business.

 

All these issues create pressure on organizations and their IT and security departments to ensure very high levels of system availability and avoid outages at all costs. Some 90% of businesses now require a minimum of 99.99% or greater system and network availability. Additionally, in 2024, 44% of companies say they strive for 99.999% uptime and availability which is the equivalent of 5.26 minutes of per server annual unplanned downtime. ITIC anticipates that these numbers will continue to increase.

 

If the network or a critical server or application stops, so does the business.

 

Ensuring network availability is challenging in today’s demanding business environment. Mission critical servers and crucial line of business applications are increasingly located or co-located in public and hybrid cloud environments. Cloud deployments are almost exclusively virtualized, with multiple instances of mission critical applications housed in a single server.  Without proper management, security and oversight, there is the potential for higher collateral damage in the event of any unplanned outage or system disruption.

 

Additional risk of downtime is also posed by the higher number of devices that are interconnected via IoT networks utilizing Analytics, BI and AI. IoT networks facilitate communications among applications, devices, and people. In IoT ecosystems in which devices, data, applications, and people are all interconnected, there is a heightened risk of collateral damage and potential security exposures in the event of an unplanned outage.

 

Technologies like cloud, virtualization, mobility, BYOD, IoT, Analytics, BI, AI and AIOps all deliver tangible business and technology benefits and economies of scale that can drive revenue and lower Total Cost of Ownership (TCO) and accelerate Return on Investment (ROI). But they are not fool proof and there are no panaceas.

 

In many businesses – mobility – whether because of employee travel or working remotely – is commonplace. BYOD usage is also common. Employees routinely use their personal mobile devices as well as company owned laptops, tablets, smart phones, and other devices to access the corporate data networks remotely. IT administrators have more devices to monitor, larger and more complex applications to provision and monitor and more endpoints and network portals to secure. Many enterprise IT shops are under-staffed and overworked. And many SMBs firms with 20, 50 or even 100 employees may have limited, part-time or even no dedicated onsite IT or security administrators.

 

Time is money.

To reiterate, downtime will almost certainly have a negative impact on companies’ relationships with customers, business suppliers and partners.

To minimize downtime, increase system, network availability corporations, and minimize risk enterprises must ensure that robust reliability is an inherent feature in all servers, network connectivity devices, applications, and mobile devices. This requires careful tactical and strategic planning to construct a solid business and technology strategy. A crucial component of this strategy is to deploy the appropriate device and network security and monitoring tools. Every 21st Century network environment needs robust security on all of its core infrastructure devices and systems (e.g., servers, firewalls, routers, etc.) and continuous, comprehensive end-to-end monitoring for complex, distributed applications in physical, virtual and cloud environments.

 

In summary, it is imperative that companies of all sizes and across every vertical market discipline thoroughly review every instance of downtime and estimate all the associated monetary costs; the impact on internal productivity; remediation efforts and the business risk to the organization. Companies should also determine whether customers, suppliers and business partners experienced any negative impact, e.g. unanticipated downtime, lost productivity/lost business, security exposures because of the outage.

 

All appropriate corporate stakeholders from the C-suite executives; IT and security administrators; department heads and impacted workers should have a hand in correctly calculating the hourly cost of downtime. Companies should then determine how much downtime and risk the corporation can withstand.

 

It is imperative that all businesses from micro SMBs to the largest global enterprises calculate the cost of employee and IT and security administrative time in terms of monetary and business costs. This includes the impact on productivity; data assets as well as the time it takes to remediate and restore the company to full operation. Companies must also fully assess and estimate how much risk their firms can assume, including potential liability for municipal, state, federal and even international regulatory compliance laws.

 

[1] U.S. Department of Transportation, “Transportation Statistics Annual Report 2023,” Pg. 1 -39, URL: https://www.bts.gov/sites/bts.dot.gov/files/2023-12/TSAR-2023_123023.pdf

 

ITIC 2024 Hourly Cost of Downtime Part 2 Read More »

ITIC 2024 Hourly Cost of Downtime Report Part 1

Cost of Hourly Downtime Exceeds $300,000 for 90% of Firms; 41% of Enterprises Say Hourly Downtime Costs $1 Million to Over $5 Million

ITIC Position

In the 21st century Digital Age of “always on” IoT interconnected systems, AI, analytics and cloud computing, organizations have zero tolerance for downtime. This is true for all organizations – from micro SMBs with 1 to 20 users to Fortune 100 global multinational enterprises with 100,000+ workers. Outages of even a few minutes duration cause business and productivity to grind to a halt; negatively impact reliability and security and place companies at higher risk for regulatory compliance as well as civil and even criminal penalties.

ITIC’s latest research indicates the cost of hourly downtime continues to spike.  The average cost of a single hour of downtime now exceeds $300,000 for over 90% of mid-size and large enterprises. These costs are exclusive of litigation, civil or criminal penalties. These are the results of ITIC’s 2024 Hourly Cost of Downtime Survey, an independent Web survey that polled over 1,000 firms worldwide from November 2023 through mid-March 2024.

Downtime Dangers in a Post-Pandemic World Economy

 ITIC survey data finds the escalating cost of computing/network outages is attributable to several factors:

  • An increase in the number of interconnected devices, systems and networks via the Cloud and the Internet of Things (IoT) ecosystems. Connectivity is a two-edged sword. It facilitates faster, more efficient transmissions and data access. But it also creates a limitless “attack surface” and exponentially increases the number of vulnerability points across the entire corporate ecosystem.
  • An ongoing sharp spike in security vulnerabilities. These include targeted security and ransomware attacks by organized hackers; Email Phishing scams; CEO fraud and a wide range of malware, viruses and rogue code. The spike in security and data breaches were further exacerbated by the COVID-19 global pandemic that forced countries to go on lockdown and businesses to mandate that employees work from home. This in turn, gave rise to a spate of opportunistic COVID-19 related security scams which continue today.
  • End user carelessness. Everyone from CEOs, knowledge workers, IT and Security administrators, developers, full and part-time employees, and contract workers access corporate servers, applications and information. Users regularly access sensitive data assets and intellectual property (IP) via a wide array of devices and networks. These include company and employee-owned (BYOD) mobile phones, tablets, laptops, and desktops as well as public networks. Unfortunately, many of these devices and networks lack adequate security. Absent up-to-date security and encryption, a lost or stolen device leaves the company’s data assets as well as personal employee information and the data of corporate customers, business partners and suppliers all potentially vulnerable and exposed.
  • Organizations’ near-total reliance on computers and networks to conduct business. Downtime of even a few minutes interrupts productivity and daily business operations. Downtime also has a domino effect, even if no data is lost, stolen, changed, destroyed, or hacked.

ITIC anticipates that all these trends – particularly security and data breaches as well as the trend towards remote working and remote learning — will continue unabated. The hourly cost of downtime will continue to rise.  It is imperative that organizations implement the necessary measures to ensure the reliability and security of their hardware, software applications and connectivity devices across the entire network ecosystem. Security and security awareness training are necessary to maintain the uptime and availability of devices and data assets. This will ensure continuous business operations and mitigate risk.

 

Security and Human Error are Chief Culprits Causing Downtime

And continuing a trend that has manifested over the past three years, Security, Human error, followed by software flaws and bugs, are the chief issues that undermine server hardware/server operating system, application software, appliances and network reliability resulting in unplanned downtime.

Unsurprisingly, in the 21st Century Digital Age, the functionality and reliability of the core foundation server hardware and server operating systems is more crucial than ever. The server hardware and the server OSes are the bedrock upon which the organization’s mainstream line of business (LOB) applications rest.  High reliability and near continual system and application availability is imperative for organizations’ on-premises, cloud based and Network Edge/Perimeter environments. Infrastructure — irrespective of location – is essential to the overall health of business operations.

The inherent reliability and robustness of server hardware and the server operating systems are the singularly most critical factors that influence, impact, and ultimately determine the uptime and availability of mission critical line of business applications, virtual machines (VMs) that run on top of them and the connectivity devices that access them.

Additionally, ITIC’s latest 2024 Reliability research reveals that a variety of external factors are having more of a direct impact on system downtime and overall availability. These include overworked and understaffed IT departments; the rapid mainstream adoption of complex new technologies such as the aforementioned IoT, Big Data Analytics, virtualization and increasing cloud computing deployments and the continuing proliferation of BYOD and mobility technologies.

In the context of its annual Reliability and Security surveys, ITIC broadly defines human error to encompass both the technology and business mistakes organizations make with respect to their network equipment and strategies.

Human error as it relates to technology includes but is not limited to:

  • Configuration, deployment, and management mistakes.
  • Failure to upgrade or right size servers to accommodate more data and compute intensive workloads.
  • Failure to migrate and upgrade outmoded applications that are no longer supported by the vendor.
  • Failing to keep up to date on patches and security.

Human error with respect to business issues includes:

  • Failure to allocate the appropriate Capital Expenditure and Operational Expenditure funds for equipment purchases and ongoing management and maintenance functions.
  • Failure to devise, implement and upgrade the necessary computer and network to address issues like Cloud computing, Mobility, Remote Access, and Bring Your Own Device (BYOD).
  • Failure to construct and enforce strong computer and network security policies.
  • Ignorance of Total Cost of Ownership (TCO), Return on Investment (ROI).
  • Failure to track hourly downtime costs.
  • Failure to track and assess the impact of Service Level Agreements and regulatory compliance issues like Sarbanes-Oxley (SOX), Health Insurance Portability and Accountability Act (HIPAA).

On a positive note, the inherent reliability of server hardware and server operating system software as well as advancements in the underlying processor technology, all continue to improve year over year. But the survey results also reveal that external issues, most notably human error and security breaches, have also assumed greater significance in undermining system and network accessibility and performance.

The overall health of network operations, applications, management, and security functions all depend on the core foundation elements: server hardware, server operating systems and virtualization to deliver high availability, robust management and solid security. The reliability of the server, server OS and virtualization platforms form the foundation of the entire network infrastructure. The individual and collective reliability of these platforms has a direct, immediate, and long-lasting impact on daily operations and business results.

[1] U.S. Department of Transportation, “Transportation Statistics Annual Report 2023,” Pg. 1 -39, URL: https://www.bts.gov/sites/bts.dot.gov/files/2023-12/TSAR-2023_123023.pdf

 

ITIC 2024 Hourly Cost of Downtime Report Part 1 Read More »

ITIC 2024 Sexual Harassment, Gender Bias & Equal Pay Survey

This survey polls professional women (including students and interns) in Science, Technology, Engineering, and Math (STEM) disciplines on their real-world experiences dealing with the very serious issues of Sexual Harassment, Gender Bias, and Equal Pay in the workplace and how they deal with them.

 

Take the survey here: https://www.surveymonkey.com/r/VWXRC97

 

Leave a comment along with your email address for a chance to win one of three (3) $100 Amazon gift cards.

All responses are confidential.

 

 

ITIC 2024 Sexual Harassment, Gender Bias & Equal Pay Survey Read More »

ITIC 2023 Reliability Survey IBM Z Results

The IBM z16 mainframe lives up to its reputation for delivering “zero downtime.”

 

The latest z16 server, introduced in April 2022, delivers nine nines—99.9999999%—of uptime and reliability. This is just over 30 milliseconds – 31.56 milliseconds to be precise – of per server annual downtime, according to the results of the ITIC 2023 Global Server Hardware, Server OS Reliability Survey.

ITIC’s 2023 Global Server Hardware, Server OS Reliability independent web-based survey polled nearly 1,900 corporations worldwide across over 30 vertical market segments on the reliability, performance and security of the leading mainstream on-premises and cloud-based servers from January through July 2023. To maintain objectivity, ITIC accepted no vendor sponsorship.

ITIC’s 2023 Global Server Hardware, Server OS Reliability survey also found that an 88% majority of the newest IBM Power10 server (shipping since September 2021) users say their organizations achieved eight nines—99.999999%—of uptime. This is 315 milliseconds of unplanned, per server, per annum outage time due to underlying system flaws or component failures. So, Power10 corporate enterprises spend just $7.18 per server/per year performing remediation due to unplanned server outages that occurred due to inherent flaws in the server hardware or component parts.

The IBM z16 and Power 10 server-specific uptime statistics were obtained by breaking out the results of more than 200 respondent organizations that deployed the z16 since it began shipping in April/May 2022. A 96% majority of these z16 enterprises say their businesses achieved nine
nines—99.9999999%—of server uptime. This is the equivalent of a near-imperceptible 31.56 milliseconds of per server annual downtime due to any inherent flaws in the server hardware and its various components (See Table 1).

An IBM spokesperson says that currently the IBM Z mainframe achieves an average of “eight nines” or 99.999999% reliability overall and that statistic includes the various versions (the z13, z14, z15 and z16) of its mainframe enterprise system. IBM has not yet reviewed ITIC’s independent survey data on the z16 results.

To put these statistics into perspective: The latest z16 corporate enterprises and their IT managers spend mere pennies per server/per year performing remediation activities due to unplanned per server outages that occurred due to inherent system failures.

This is the 15th consecutive year that the IBM Z and IBM Power Systems have dominated with the best across-the-board uptime reliability ratings among 18 mainstream distributions.

Additionally, the z16 customers say their firms experienced a 20% to 30% improvement in overall reliability, performance, response times and critical security metrics versus older iterations of the zSystems platforms.

Previous versions of the IBM Z mainframe—the z13, z14 and z15—always delivered best-in-class reliability. ITIC’s 2023 Global Reliability study found that the aggregate average results from all z13, z14 and z15 customers ranged between seven and eight nines of uptime depending on the version, age, server configurations and specific use cases.

There is an order of magnitude of that distinguishes the “nines” of uptime and reliability. For example, four nines of uptime—which is the current acceptable level of uptime for many mainstream businesses—equals 52.56 minutes of unplanned annual per server downtime. In contrast, five nines of uptime is the equivalent of just 5.26 minutes of unplanned annual per server downtime.

Meanwhile, the fault-tolerant levels of reliability – seven and eight nines, 99.99999% and 99.999999% represent 3.15 seconds and 315 milliseconds, respectively of unplanned per server annual outages due to server or component failures.

 

The z16: A Quantum Leap Forward in Reliability, Performance and Cloud Functionality

 

The IBM Z mainframes have always delivered best-in-class reliability, performance and security. However, the z16 quite literally takes a quantum leap forward by providing advanced capabilities like on-chip AI inferencing and quantum-safe computing.

The IBM z16 and Power10 servers also delivered the strongest server security, experiencing the fewest number of successful data breaches, the least amount of downtime due to security-related incidents and the fastest mean time to detection (MTTD). ITIC’s latest 2023 Global Server Hardware Security Survey found that 97% of IBM z16 enterprises were able to detect, isolate and shut down attempted data breaches immediately to within the first 10 minutes. Additionally, 92% of IBM Power10 customers detected and repelled attempted hacks immediately to within the first 10 minutes. An organization’s ability to identify and thwart security breaches, minimizes downtime, saves money and mitigates risk.

ITIC’s 2023 survey data found that 84% of respondent enterprises cited security issues as the top cause of unplanned downtime. And 67% of respondents cite human error as the cause of unplanned server and application outages. Human error encompasses everything from accidentally disconnecting a server, to misconfiguration issues and incompatibilities among disparate hardware and application and server OS software to failure to properly right size the server to adequately accommodate mission critical workloads.

Overall, the IBM z16 offers near perfect reliability and the most incredibly robust mainstream security available today.

ITIC 2023 Reliability Survey IBM Z Results Read More »

IBM Z, IBM Power Systems & Lenovo ThinkSystem Servers Most Secure, Toughest to Crack

For the fourth straight year, enterprises ranked mission critical servers from IBM, Lenovo, Huawei and Hewlett-Packard Enterprise (in that order) as the most secure platforms which experienced the least amount of successful data breaches and proved the most formidable for hackers to crack.

Only a miniscule 0.1% of IBM Z mainframes suffered unplanned downtime due to a successful data breach. And just two percent (2%) of IBM Power Systems; two percent (2%) of Lenovo Think Systems; three percent (3%) of Huawei KunLun and four percent (4%) of HPE Superdome servers experienced downtime, application inaccessibility and productivity disruptions due to security attacks.

Those are the results of ITIC’s 2022 Global Server Hardware Security survey which compared the security features and functions of 18 different server platforms. ITIC’s independent Web-based survey polled 1,550 businesses worldwide across 30 different vertical market sectors from January through mid-November 2022.

ITIC’s latest study found that strong security enabled IBM, Lenovo, Huawei and HPE corporate enterprises to lower annual IT operational costs related to cyberattacks by 27% to over 60%, compared to the least secure server hardware distributions. .

IBM, Lenovo, Huawei, HPE and Cisco hardware (in that order) recorded the top overall scores in every security category, successfully solidifying and improving their top positions as the most secure and reliable server platforms despite a significant 86% spike in security hacks and data breaches over the past two and a half years.

The top servers led by the IBM Z; IBM POWER; the Lenovo ThinkSystem; the Huawei KunLun and HPE (in that order), all scored their respective best security performances in the latest poll. These vendors achieved the best security results among 18 mainstream server hardware platforms in every security category, including:

  • The fewest number of successful security hacks/data breaches.
  • The least amount of overall unplanned server downtime for any reason and the least amount of unplanned server downtime due to a data breach incident.
  • The fastest Mean Time to Detection (MTTD) from the onset of the attack until the company isolated and shut it down.
  • The fastest Mean Time to Remediation (MTTR) to restore servers, applications and networks to full operation.
  • The least amount of lost, stolen, destroyed, damaged or changed data as a direct consequence of a security data breach (e.g. Ransomware, phishing scam or CEO fraud).
  • The least amount of monetary losses due to a successful security hack.
  • The highest confidence in the embedded security of the server hardware to deliver alerts/warnings and repel security attacks and data breaches.

The IBM Z mainframe outperformed all other server distributions – delivering near foolproof security and true fault tolerant seven nines or better (99.9999999%) uptime and reliability. Only a minuscule – 0.1% – of IBM Z mainframes and 0.2% of IBM LinuxONE III systems experienced a successful security breach.

IBM standalone Power Systems and the Lenovo ThinkSystem servers were in a statistical tie; with only two percent (2%) of respondents reporting a successful hack over the last 12 months. Only a minuscule – 0.1% – of IBM Z mainframes and IBM LinuxONE III systems experienced a successful security breach. The IBM Power8, Power9 and Power10 servers again delivered top notch security among all mainstream hardware distributions with 95% of survey respondents reporting their firms were able to identify and thwart attempted security penetrations immediately or within the first 10 minutes of detection.

The Lenovo ThinkSystem servers achieved the best security scores among all x86 server distributions for the fourth year in a row. Lenovo ThinkSystem servers similarly delivered the best MTTD rates among all Intel x86-based servers. A 95% of majority of Lenovo ThinkSystem survey respondents said their IT and security administrators detected and repelled attempted hacks and data breaches immediately or within the first 10 minutes of the penetration.

Huawei’s KunLun mission critical platform was close behind with three percent (3%) of customers experiencing a successful hack and four percent (4%) HPE Integrity Superdome customers said they had a successful security breach over the last year.

Just over one-in-ten or 11% of Cisco UCS servers were successfully hacked. Cisco’s hardware performed extremely well, particularly considering that a large portion of UCS servers are deployed in remote locations and at the network edge. Inexpensive unbranded White box servers again proved the most porous – nearly half – 48% – of survey respondents said their businesses were hacked. This is a four percent (4%) increase compared to ITIC’s 2021 survey.

Security is, and will remain the number one issue that either fortifies or undermines the reliability of mission critical server hardware, server operating system and applications. Businesses that hope to keep their data assets secure and ensure continuous, uninterrupted operations are well advised to deploy the most secure server hardware, server OS and application infrastructure. Security is and will continue to rank as the number one cause of unanticipated downtime for the foreseeable future. Any organization that ignores security does so at its own risk. Ask yourselves: what does my organization have to lose and how much is my company willing to risk?

IBM Z, IBM Power Systems & Lenovo ThinkSystem Servers Most Secure, Toughest to Crack Read More »

Server and Application Reliability by the Numbers: Understanding “The Nines”

Reliability/Uptime by the Numbers

Organizations measure server and application reliability percentages in “nines.” There is an order of magnitude difference of server and application reliability and uptime between each additional “nine.”  Four nines – 99.99% – reliability equals 52.56 minutes of unplanned per server/per annum downtime or 4.32 minutes of per server monthly unplanned downtime (See Table 1). By contrast, five nines – 99.999% – is the equivalent of 5.26 minutes of unplanned per server/per annum and just 25.9 seconds of monthly unplanned system downtime. The highly sought after continuous uptime and availability levels of six nines equals a near-imperceptible 2.59 seconds of per server unplanned monthly downtime, while seven nines equals 3.15 seconds of yearly system downtime.

Table1 below depicts the availability percentages and the equivalent number of annual, monthly and weekly hours and minutes of per server/per annum downtime. It illustrates the business and monetary impact on operations. ITIC publishes this table in every one of its Global Server Hardware, Server OS Reliability reports. It serves as a useful reference guide to enable organizations to calculate downtime and determine their levels of server uptime.

Table 1: Reliability/Uptime by the Numbers

Reliability %                   Downtime per year Downtime per month Downtime per week
90% (one nine) 36.5 days 72 hours 16.8 hours
95% 18.25 days 36 hours 8.4 hours
97% 10.96 days 21.6 hours 5.04 hours
98% 7.30 days 14.4 hours 3.36 hours
99% (two nines) 3.65 days 7.20 hours 1.68 hours
99.5% 1.83 days 3.60 hours 50.4 minutes
99.8% 17.52 hours 86.23 minutes 20.16 minutes
99.9% (three nines) 8.76 hours 43.8 minutes 10.1 minutes
99.95% 4.38 hours 21.56 minutes 5.04 minutes
99.99% (four nines) 52.56 minutes 4.32 minutes 1.01 minutes
99.999% (five nines) 5.26 minutes 25.9 seconds 6.05 seconds
99.9999% (six nines) 31.5 seconds 2.59 seconds 0.605 seconds
99.99999% (seven nines) 3.15 seconds 0.259 seconds 0.0605 seconds

Source: ITIC 2022 Global Server Hardware, Server OS Reliability Survey

The aforementioned metrics clearly underscore that the IBM z14, z15 and the newest z16; along with the LinuxONE III platform continue to maintain continuous levels of reliability, with just 0.0043 minutes of unplanned monthly per server downtime. This equates to just 3.15 seconds of unplanned per server annual downtime which is the equivalent of “seven nines” of true fault tolerant uptime. They were followed closely by the IBM Power8, Power9 and Power10 with one (1) minute of per server unplanned monthly downtime and the Lenovo x86-based ThinkSystem with 1.10 minutes of per server unplanned downtime each month. In practical terms, this means there is minimal or imperceptible impact on daily business operations, end user productivity and corporate revenue.

In 2022 and heading into 2023, a price tag of $100,000 (USD) for one hour of downtime for a single server is extremely conservative for all but the smallest micro SMBs with one to 25 employees. It equates to $1,670 per minute/per server. Hourly cost of downtime calculated at $300,000 equals about $5,000 per server/per minute. The cost of a more severe or protracted hourly outage that a business estimated at $1 million (USD) is the equivalent of $16,700 per server/per minute.

ITIC’s 2022 Global Server Hardware and Server OS Reliability Survey found that 91% of respondents now estimate that one hour of downtime costs the firm $301,000 or more; this is an increase of two (2) percentage points in less than two year. Of that number, 44% of those polled indicated that hourly downtime costs now exceed $1 million. Since 2021, only one (1%) percent of respondents said a single hour of downtime costs them $100,000 or less. Nine percent (9%) of respondents valued hourly downtime at $101,000 to $300,000.

There are many cost variables. For instance, an issue that takes down a server(s) running a non-business essential application; or downtime that occurs in off-peak or non-usage hours, may have minimal to no impact on business operations and negligible financial consequences.

On the other end of the spectrum, cloud-based server outages involving a virtualized server running two, three or four instances of a business-critical application housed in a single physical machine have the potential to double, triple or quadruple business losses when daily business operations are interrupted and employees and business partners, suppliers and other stakeholders are denied access to critical data.

The most expensive hourly downtime scenario presented in Table 2 depicts per server/per minute outage expense impacting 1,000 servers at an organization that values an hour of downtime at $10 million. In this example, a large enterprise could conceivably sustain crippling losses of $166,667,000 per server/per minute.

The aforementioned ITIC Hourly Downtime monetary figures represent only the costs associated with remediating the actual technical issues and business problems that caused the server or OS to fail. They do not include legal fees, criminal or civil penalties the company may incur or any “goodwill gestures” that the firm may elect to pay customers (e.g., discounted or free equipment or services).

Server and Application Reliability by the Numbers: Understanding “The Nines” Read More »

ITIC 2022 Global Server Reliability Survey Finds IBM Z, IBM Power Systems, Lenovo ThinkSystem deliver top reliability

  • The IBM Z and IBM Power Systems continue to dominate, delivering the best server reliability, uptime and security for the 14th straight year.
  • Lenovo’s ThinkSystem servers provide the top reliability and security among all x86 server distributions for nearly nine straight years.
  • Huawei KunLun, Hewlett-Packard Enterprise (HPE) Superdome mission critical servers also register high reliability and security rankings challenging the leaders. Cisco continues to up its game with robust network edge reliability and security.
  • IBM Z and IBM Power Systems deliver over 40x more uptime than least efficient “White box” platforms and 60x lower Total Cost of Ownership (TCO). The Lenovo ThinkSystem, Huawei KunLun and HPE Superdome (in that order) delivered the highest reliability among x86 platforms.
  • Over three-quarters of businesses – 78% – cite security as the top cause of unplanned downtime and 64% said human error causes unplanned outages.

 

 

Mission critical server and server OS distributions from IBM, Lenovo, Hewlett-Packard Enterprise (HPE), Huawei and Cisco continue to deliver the highest levels of inherent reliability and availability among 18 different server platforms despite a continuing spike in security hacks, increasing ecosystem complexities and ongoing supply chain challenges.

For the 14th consecutive year, the IBM Z, the LinuxONE III and the IBM Power Systems remained the preeminent server platforms posting the best across-the-board reliability ratings among 18 mainstream distributions. Some 96% of IBM Z mainframes and LinuxONE III server customers recorded seven nines (99.99999%) of true fault tolerant reliability and availability. The IBM Z, and LinuxONE III recorded a near-imperceptible 0.0043 minutes of per server unplanned monthly outages or just 3.15 seconds of unplanned per server downtime annually (See Table 1). This was followed by 93% of IBM Power Systems clients said the IBM systems achieved five and six nines of system reliability and availability (See Exhibit 1). The IBM Power8, Power9 and Power10 servers posted just one (1) minute each of unplanned per server monthly downtime.

The Lenovo ThinkSystem servers followed closely and posted the highest levels of reliability among all x86 hardware distributions for the eighth consecutive year. A 92% majority of Lenovo servers attained five and six nines of reliability, posting just over one minute – 1.10 – of unplanned per server monthly downtime. The Huawei KunLun and Fusion servers, the HPE Superdome and the Cisco UCS hardware (in that order), rounded out the top five most reliable server platforms.

Those are the results of the ITIC 2022 Global Server Hardware, Server OS Reliability independent Web-based survey. It polled 1,550 corporations across 30 vertical market segments worldwide on the reliability, performance and security of the leading mainstream on-premises and cloud-based servers from July through mid-November 2022. In order to maintain objectivity, ITIC accepted no vendor sponsorship.

The increased server and server operating system uptime and availability enabled the IBM, Lenovo, Huawei, HPE and Cisco servers (in that order) to deliver, the most economical Total Cost of Ownership (TCO) among all mainstream distributions in datacenters, at the network edge and in hybrid cloud environments.

The Lenovo ThinkSystem servers likewise improved their uptime and availability recording the best reliability among all x86 servers – a scant 1.10 minutes of per server unplanned monthly outages. The Huawei KunLun and Fusion platforms also improved uptimes with 1.27 minutes each of unplanned per server outage, along with the HPE Superdome platform which averaged 1.44 minutes of unanticipated per server downtime. Cisco’s UCS servers also hung tough. Cisco servers frequently are installed at the network edge/perimeter, which is often the first line of attack. The Cisco UCS servers registered two (2) minutes of monthly unplanned per server downtime.

The top server reliability vendors – led by IBM, Lenovo, HPE and Huawei – also delivered the strongest server security, experiencing the fewest number of successful data breaches and the least amount of downtime due to security-related incidents.

ITIC 2022 Global Server Reliability Survey Finds IBM Z, IBM Power Systems, Lenovo ThinkSystem deliver top reliability Read More »

De-mystifying Cloud Computing: the Pros and Cons of Cloud Services

 

Cloud computing has been a part of the corporate and consumer lexicon for the past 15 years. Despite this, many organizations and their users are still fuzzy on the finer points of cloud usage and terminology.

De-mystifying the cloud

So what exactly is a cloud computing environment?

The simplest and most straightforward definition is that a cloud is a grid or utility style pay-as-you-go computing model that uses the web to deliver applications and services in real-time.

Organizations can opt to deploy a private cloud infrastructure where they host their services on-premise from behind the safety of the corporate firewall. The advantage here is that the IT department always knows what’s going on with all aspects of the corporate data from bandwidth and CPU utilization to all-important security issues.

Alternatively, organizations can choose a public cloud deployment in which a third party vendor like Amazon Web Services (AWS), Microsoft Azure, Google Cloud, IBM Cloud, Oracle Cloud and other third parties host the services at an off-premises remote location. This scenario saves businesses money and manpower hours by utilizing the host provider’s equipment and management. All that’s needed is a web browser and a high-speed internet connection to connect to the host to access applications, services and data.

However, the public cloud infrastructure is also a shared model in which corporate customers share bandwidth and space on the host’s servers. Enterprises that prioritize privacy and require near impenetrable security and those that require more data control and oversight, typically opt for a private cloud infrastructure in which the hosted services are delivered to the corporation’s end users from behind the safe confines of an internal corporate firewall. However, a private cloud is more than just a hosted services model that exists behind the confines of a firewall. Any discussion of private and/or public cloud infrastructure must also include virtualization. While most virtualized desktop, server, storage and network environments are not yet part of a cloud infrastructure, just about every private and public cloud will feature a virtualized environment.

Organizations contemplating a private cloud also need to ensure that they feature very high (near fault tolerant) availability with at least “five nines” or “six nines – 99.999% or 99.9999% and even true fault tolerant “seven nines” – 99.99999% uptime to ensure uninterrupted operations.

Private clouds should also be able to scale dynamically to accommodate the needs and demands of the users. And unlike most existing, traditional datacenters, the private cloud model should also incorporate a high degree of user-based resource provisioning. Ideally, the IT department should also be able to track resource usage in the private cloud by user, department or groups of users working on specific projects for chargeback purposes. Private clouds will also make extensive use of AI, analytics, business intelligence and business process automation to guarantee that resources are available to the users on demand.

All but the most cash-rich organizations (and there are very few of those) will almost certainly have to upgrade their network infrastructure in advance of migrating to a private cloud environment. Organizations considering outsourcing any of their datacenter needs to a public cloud will also have to perform due diligence to determine the bona fides of their potential cloud service providers.

In 2022 and beyond, a hybrid cloud environment is the most popular model, chosen by over 75% of corporate enterprises. The hybrid cloud theoretically gives businesses the best of both worlds: with some services and applications being hosted on a public cloud while other specific, crucial business applications and services in a private or on-premises cloud behind a firewall.

Types of Cloud Computing Services

There are several types of cloud computing models. They include:

  • Software as a Service (SaaS) which utilizes the Internet to deliver software applications to customers. Examples of this are Salesforce.com, which has one of the most popular, widely deployed, and the earliest cloud-based CRM application and Google Apps, which is among the market leaders. Google Apps comes in three editions—Standard, Education and Premier (the first two are free). It provides consumers and corporations with customizable versions of the company’s applications like Google Mail, Google Docs and Calendar.
  • Platform as a Service (PaaS) offerings; examples of this include the above-mentioned Amazon Web Services and Microsoft’s top tier Azure Platform. The Microsoft Azure offering contains all the elements of a traditional application stack from the operating system up to the applications and the development framework. It includes the Windows Azure Platform AppFabric (formerly .NET Services for Azure) as well as the SQL Azure Database service. Customers that build applications for Azure will host it in the cloud. However, it is not a multi-tenant architecture meant to host your entire infrastructure. With Azure, businesses rent resources that will reside in Microsoft datacenters. The costs are based on a per usage model. This gives customers the flexibility to rent fewer or more resources depending on their business needs.
  • Infrastructure as a Service (IaaS) is exactly what its name implies: the entire infrastructure becomes a multi-tiered hosted cloud model and delivery mechanism. Public, private and hybrid should all be flexible and agile. The resources should be available on demand and should be able to scale up or scale back as business needs dictate.
  • Serverless This is a more recent technology innovation. And it can be a bit confusing to the uninitiated. A Serverless cloud is a cloud-native development model that enables cloud   developers to build and run applications without having to manage servers. The developers do not manage, provision or maintain the servers when deploying code. The actual code execution is fully managed by the cloud provider, in contrast to the traditional method of writing and developing applications and then deploying them on a servers. To be clear, there are still servers in a serverless model, but they are abstracted away from application development.

Cloud computing—pros and cons

Cloud computing like any technology is not a panacea. It offers both potential benefits as well possible pratfalls. Before beginning any infrastructure upgrade or migration, organizations are well advised to gather all interested parties and stakeholders and construct a business plan that best suits their organization’s needs and budget. When it comes to the cloud, there are no absolutes. Many organizations will have hybrid clouds that include public and private cloud networks. Additionally, many businesses may have multiple cloud hosting providers present in their networks. Whatever your firm’s specific implementation it’s crucial to create a realistic set of goals, a budget and a deployment timetable.

Prior to beginning any technology migration organizations should first perform a thorough inventory and review of their existing legacy infrastructure and make the necessary upgrades, revisions and modifications. All stakeholders within the enterprise should identify the company’s current tactical business goals and map out a two-to-five year cloud infrastructure and services business plan. This should incorporate an annual operational and capital expenditure budget. The migration timetable should include server hardware, server OS and software application interoperability and security vulnerability testing; performance and capacity evaluation and final provisioning and deployment.

Public clouds—advantages and disadvantages

The biggest allure of a public cloud infrastructure over traditional premises-based network infrastructures is the ability to offload the tedious and time consuming management chores to a third party. This in turn can help businesses:

 Shave precious capital expenditure monies because they avoid the expensive investment in new equipment including hardware, software and applications as well as the attendant configuration planning and provisioning that accompanies any new technology rollout.

Accelerated deployment timetable. Having an experienced third party cloud services provider do all the work also accelerates the deployment timetable and most likely means less time spent on trial and error.

Construct a flexible, scalable cloud infrastructure that is tailored to their business needs. A company that has performed its due diligence and is working with an experienced cloud provider can architect a cloud infrastructure that will scale up or down according to the organization’s business and technical needs and budget.

 

Public Cloud Downsides

Shared Tenancy: The potential downside of a public cloud is that the business is essentially “renting” or sharing common virtualized servers and infrastructure tenancy with other customers. This is much like being a tenant in a large apartment building. Depending on the resources of the particular cloud model, there exists the potential for performance, latency and security issues as well as acceptable response, and service and support from the cloud provider.

Risk: Risk is another potential pitfall associated with outsourcing any of your firm’s resources and services to a third party. To mitigate risk and lower it to an acceptable level, it’s essential that organizations choose a reputable, experienced third party cloud services provider very carefully. Ask for customer references. Cloud services providers must work closely and transparently with the corporation to build a cloud infrastructure that best suits the business’ budget, technology and business goals. To ensure that the expectations of both parties are met, organizations should create a checklist of items and issues that are of crucial importance to their business and incorporate them into service level agreements (SLAs). Be as specific as possible. These should include but are not limited to:

  • What types of equipment do they use?
  • How old is the server hardware? Is the configuration powerful enough?
  • How often is the data center equipment/infrastructure upgraded?
  • How much bandwidth does the provider have?
  • Does the service provider use open standards or is it a proprietary datacenter?
  • How many customers will you be sharing data/resources with?
  • Where is the cloud services provider’s datacenter physically located?
  • What specific guarantees, if any, will it provide for securing sensitive data?
  • What level of guaranteed response time will it provide for service and support?
  • What is the minimum acceptable latency/response time for its cloud services?
  • Will it provide multiple access points to and from the cloud infrastructure?
  • What specific provisions will apply to Service Level Agreements (SLAs)?
  • How will financial remuneration for SLA violations be determined?
  • What are the capacity ceilings for the service infrastructure?
  • What provisions will there be for service failures and disruptions?
  • How are upgrade and maintenance provisions defined?
  • What are the costs over the term of the contract agreement?
  • How much will the costs rise over the term of the contract?
  • Does the cloud service provider use the Secure Sockets Layer (SSL) and state of the art AES encryption to transmit data?
  • Does the cloud services provider encrypt the resting data to prohibit and restrict access?
  • How often does the cloud services provider perform audits?
  • What mechanisms will it use to quickly shut down a hack, and can it track a hacker?
  • If your cloud services provider is located outside your country of origin, what are the privacy and security rules of that country and what impact will that have on your firm’s privacy and security issues?

Finally, the corporation should appoint a liaison who meets regularly with the designated counterpart at the cloud services provider. While a public cloud does provide managed hosting services,  that does not mean the company should forget about it as though their data assets really did reside in an amorphous cloud! Regular meetings between the company and its cloud services provider will ensure that the company attains its immediate goals and that it is always aware and working on future technology and business goals. It will also help the corporation to understand usage and capacity issues and ensure that its cloud services provider(s) meets SLAs. Outsourcing any part of your infrastructure to a public cloud does not mean forgetting and abandoning it.

Private clouds—advantages and disadvantages

The biggest advantage of a private cloud infrastructure is that your organization retains control of its corporate assets and can safeguard and preserve its privacy and security. Your organization is in command of its own destiny. That can be a double-edged sword.

Before committing to build a private cloud model the organization must do a thorough assessment of its current infrastructure, its budget, and the expertise and preparedness of its IT department. Is your firm ready to assume the responsibility for such a large burden from both a technical and ongoing operational standpoint? Only you can answer that. Remember that the private cloud should be highly reliable and highly available—at least 99.999% uptime with built-in redundancy and failover capabilities. Many organizations struggle to attain and maintain 99.99% uptime and reliability which is the equivalent of 8.76 hours of per server, per annum downtime. When your private cloud is down for any length of time, your employees, business partners, customers and suppliers will be unable to access resources.

Private Cloud Downsides

The biggest potential upside of a private cloud is also potentially it’s biggest disadvantage. Namely: that the onus falls entirely on the corporation to achieve the company’s performance, reliability and security goals. To do so, the organization must ensure that its IT administrators and security professionals are up to date on training and certification. To ensure optimal performance, the company must regularly upgrade and rightsize its servers and stay current on all versions of mission critical applications – particularly with respect to licensing, compliance and installing the latest patches and fixes. Security must be a priority! Hackers are professionals. And hacking is big business. The hacks themselves — ransomware, Email phishing scams, CEO fraud etc. are more pervasive and more pernicious. And the cost of hourly downtime is more expensive than ever. ITIC’s latest survey data shows that 91% of midsize and large enterprises estimate that the average cost of a single hour of downtime is $300,000 or more. These statistics are just industry averages. They do not include any additional costs a company may incur due to penalties associated with civil or criminal litigation or compliance penalties. In other words: in a private cloud, the buck stops with the corporation.

Realistically, in order for an organization to successfully implement and maintain a private  cloud, it needs the following:

  • Robust equipment that can handle the workloads efficiently during peak usage times.
  • An experienced, trained IT staff that is familiar with all aspects of virtualization, virtualization management, grid, utility and chargeback computing models.
  • An adequate capital expenditure and operational expenditure budget.
  • The right set of private cloud product offerings and service agreements.
  • Appropriate third party virtualization and management tools to support the private cloud.
  • Specific SLA agreements with vendors, suppliers and business partners.
  • Operational level agreements (OLAs) to ensure that each person in the organization is responsible for specific routine tasks and in the event of an outage.
  • A disaster recovery and backup strategy.
  • Strong security products and policies.
  • Efficient chargeback utilities, policies and procedures.

Other potential private cloud pitfalls include; deciding which applications to virtualize, vendor lock-in and integration, and interoperability issues. Businesses grapple with these same issues today in their existing environments.

Conclusions

Hybrid, public and private cloud infrastructure deployments will continue to experience double digit growth for the foreseeable future. The benefits of cloud computing will vary according to individual organization’s implementation. Preparedness and prior to deployment are crucial. Cloud vendors are responsible for maintaining performance, reliability and security. However, corporate enterprises cannot simply cede total responsibility to their vendor partners because the data assets are housed off-premises. Businesses must continue to perform their due diligence. All appropriate corporate enterprise stakeholder must regularly review and monitor performance and capacity; security; compliance and SLA results – preferably on a quarterly or semi-annual basis. This will ensure your organization achieves the optimal business and technical benefits. Keeping a watchful eye on security is imperative. Cloud vendors and businesses must work in concert as true business partners to achieve optimal TCO and ROI and mitigate risk.

De-mystifying Cloud Computing: the Pros and Cons of Cloud Services Read More »

The Cloud Gets Crowded and more Competitive

The cloud is getting crowded.

In 2022 the cloud computing market – particularly the hybrid cloud – is hotter and more competitive than ever.

Corporate enterprises are flocking to the cloud as a way to offload onerous IT administrative tasks and more easily and efficiently manage increasingly complex infrastructure, storage and security. Migrating operations from the data center to the cloud can also greatly reduce their operational and capital expenditure costs.

Cloud vendors led by market leaders like Amazon Web Services (AWS), Microsoft Azure, Google Cloud, IBM Cloud, Oracle Cloud Infrastructure, SAP, Salesforce, Rackspace Cloud, and VMware, as well as China’s Alibaba and Huawei Cloud, are all racing to meet demand. The current accelerated shift to the cloud was fueled by the COVID-19 global pandemic which created supply chain disruptions and upended many aspects of traditional work life. Since 2020, government agencies, commercial businesses and schools shifted to remote working and learning. Although COVID is generally waning (albeit with continuing flare-ups), a hybrid work environment is the new normal. This in turn, makes a compelling business case for furthering cloud migrations.

In 2022, more than $1.3 trillion in enterprise IT spending is at stake from the shift to cloud, and that revenue will increase to nearly $1.8 trillion by 2025 according to the February 2022 report “Market Impact: Cloud Shift – 2022 Through 2025” by Gartner, Inc. in Stamford, Conn.  Furthermore, Gartner’s latest research forecasts that enterprise IT spending on public cloud computing, within addressable market segments, will outpace traditional IT spending in 2025.

Hottest cloud trends in 2022

Hybrid Clouds

Hybrid cloud is exactly what its name implies: it’s a combination of public, private and dedicated on-premises datacenter infrastructure and applications. Companies can adopt a hybrid approach for specific use cases and applications – outsourcing some portions of their operations to a hosted cloud environment, while keeping others onsite. This approach lets companies continue to leverage and maintain their legacy data infrastructure as they migrate to the cloud.

Cloud security and compliance: There is no such thing as too much security. ITIC’s 2022 Global Server Hardware Security survey indicates that businesses experienced an 84% surge in security incidents like ransomware, email phishing scams and targeted data breaches over the last two years that were especially prevalent and commonplace. The hackers are extremely sophisticated; they choose their targets with great precision with the intent to inflict maximum damage and net the biggest payback. This trend shows no signs of abating. In 2021, the average cost of a successful data breach increased to $4.24 million (USD); this is a 10% increase from $3.86 million in 2020 according to the 2021 Cost of a Data Breach Study, jointly conducted by IBM and the Ponemon Institute. The $4.24 million average cost of a single data breach is the highest number in the 17 years since IBM and Ponemon began conducting the survey. It represents an increase of 10% in the last 12 months and 20% over the last two years. Not surprisingly, in 2021, 61% of malware directed at enterprises targeted remote employees via cloud applications. Any security breach will have a domino effect on regulatory compliance. In response, cloud vendors are doubling down on security capabilities and compliance certifications. There is now a groundswell of demand for Secure Access Service Edge (SASE) cloud security architecture designed to safeguard, monitor and access connectivity among myriad cloud applications services, as well as datacenter IT infrastructure and end user devices. SASE gives users a single sign-on capability across multiple cloud applications while ensuring compliance.

Cloud-based disaster recovery (DR): The ongoing concerns around security and compliance issues has also shone the spotlight on the importance of cloud-based disaster recovery. DR uses cloud computing to back up data and continue to run the necessary business processes in case of disaster. Organizations can utilize cloud-based DR for load balancing and to replicate cloud services across multiple cloud environments and providers. The result: enterprise transactions will continue uninterrupted if they lose access to their physical infrastructure in the event of an outage.

Cloud-based Artificial Intelligence (AI) and Machine Learning (ML): Another hot cloud trend is the use of Artificial Intelligence (AI) and Machine Learning (ML). Both AI and ML allow organizations to cut through the data deluge and process and analyze the data to make informed business decisions and quickly respond to current and future market trends.

Top cloud vendors diversify, differentiate their offerings

There are dozens of cloud providers with more entering this lucrative market arena all the time. However, the top four vendors: Amazon AWS, Microsoft Azure, Google Cloud and IBM Cloud currently account for over 70% of the installed base.

Amazon AWS: Amazon AWS has been the undisputed cloud market leader for the past decade. And it remains the number one vendor in 2022. Simply put, Amazon is everywhere and it has amazing brand recognition. Amazon AWS offers a wide array of services that appeal to companies of all sizes. The AWS cloud-based platform enables companies to build customized business solutions using integrated Web services. AWS also offers a broad portfolio of Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS).  These include Elastic Cloud Compute (EC2), Elastic Beanstalk, Simple Storage Service (S3) and Relational Database Service (RDS). AWS also enables organizations to customize their infrastructure requirements and it provides them with a wide variety of administrative controls via its secure Web-based client. Other key features include: data backup and long-term storage; Service Level Agreement (SLA) of “four nines” – 99.99% – guaranteed SLA uptime;  AI and ML capabilities; automatic capacity scaling; support for virtual private clouds and free migration tools.

As with all of the cloud vendors, the devil is in the details when it comes to pricing and cost. On the surface, the pricing model appears straightforward. AWS offers three different pricing options. They are “Pay as you Go,” “Save when you reserve” and “Pay less using more.”  AWS also offers a free 12-month plan. Once the trial period has expired, the customer must either choose a paid plan or cancel its AWS subscription. While Amazon does provide a price calculator to estimate potential cloud costs, the many variables make it confusing to discern.

Microsoft Azure: Microsoft Azure ranks close behind Amazon AWS and the platform has been the catalyst for the Redmond, Washington software giant’s resurgence over the last 12 years. As Microsoft transitioned away from its core Windows-based business model, it used a tried and true success strategy: that is, the integration and interoperability of its various software offerings.  Microsoft also moved its popular and well-entrenched legacy on-premises software application suites like Microsoft Office, SharePoint, SQL Server and others to the cloud. This gave customers a sense of confidence and familiarity when it came to adoption. Microsoft also boasts one of the tech industry’s largest partner ecosystem. Microsoft regularly refreshes and updates its cloud portfolio. In February, Microsoft unveiled three industry-specific cloud offerings: Microsoft Cloud for Financial Services, Microsoft Cloud for Manufacturing and Microsoft Cloud for Nonprofit. All of these services leverage the company’s security and AI functions. For example,  new feature in Microsoft Cloud for Financial Services, called Loan Manager will enable lenders to close loans faster by streamlining workflows and increasing transparency through automation and collaboration.  Microsoft Azure offers all the basic and advanced cloud features and functions including: data backup and storage; business continuity and DR solutions; capacity planning; business analytics; AI and ML; single sign-on (SSO) and multifactor authentication as well as serverless computing. Ease of configuration and management are among its biggest advantages, and Microsoft does an excellent job of regularly updating the platform, but documentation and patches may lag a bit. Azure also offers a 99.95% SLA uptime guarantee which is a bit less than “four nines.”  Again, the biggest business challenge for existing and prospective Azure customers is figuring out the licensing and pricing model to get the best deal.

Google Cloud Platform (GCP): Like Amazon, Google is a ubiquitous entity with strong brand name recognition. Google touts its ability to enable customers to scale their business as needed using flexible, open technology. Google Cloud consists of over 150 products and developer tools. GCP is a suite of cloud computing services provided by Google. It is a public cloud computing platform consisting of a variety of IaaS and PaaS services like compute, storage, networking, application development and Big Data analytics. The GCP services all run on the same cloud infrastructure that Google uses internally for its end-user products, such as Google Search, Photos, Gmail and YouTube, etc. The GCP services can be accessed by software developers, cloud administrators and IT professionals over the internet or through a dedicated network connection. Notably, Google developed Kubernetes, an open source container standard that automates software deployment, scaling and management. GCP offers a wide array of cloud services including: storage and backup, application development, API management, virtual private clouds, monitoring and management services, migration tools, AI and ML. In order to woo customers, Google does offer very steep discounts and flexible contracts.

IBM: It’s no secret that IBM Cloud lagged behind market leaders AWS and Microsoft Azure, but Big Blue shifted into overdrive to close the gap. Most notably, IBM’s 2019 acquisition of Red Hat for $34 billion gave IBM much needed momentum, solidifying its hybrid cloud foundation and expanding its global cloud reach to 175 countries with over 3,500 hybrid cloud customers. And it shows. On April 19, IBM told Wall Street it expects to hit the top end of its revenue growth forecast for 2022. IBM’s Cloud & Data Platforms unit is the growth driver Cloud revenue grew 14% to $5 billion during the just ended March 31 quarter. Software and consulting sales which represent over 70% of IBM’s business were up 12% and 13%, respectively. IBM Cloud incorporates a host of cloud computing services that run on IaaS or PaaS.  And the Red Hat Open Shift platform further fortifies IBM’s hybrid cloud initiatives. Open Shift is an enterprise-ready Kubernetes container platform built for an open hybrid cloud strategy. It provides a consistent application platform to manage hybrid cloud, multicloud, and edge deployments. According to IBM, 47 of the Fortune 50 companies use IBM as their private cloud provider.  IBM has upped its cloud game with several key technologies. They include advanced quantum safe cryptography which safeguards applications running on the IBM z16 mainframe which is popular with high end IBM enterprise customers. Quantum-safe cryptography is as close to unbreakable or impenetrable encryption as a system can get. It uses quantum mechanics to secure and transmit data in a way that currently makes it near-impossible to hack. Another advanced feature is the AI on-chip inferencing, available on the newly announced IBM z16 mainframe. It can deliver up to 300 billion deep learning inference operations per day with 1ms response time. This will enable even non-data scientist customers to cut through the data deluge and predict and automate for “increased decision velocity.”  AI on-chip inferencing can help customers prevent fraud before it happens by scoring up to 100% of transactions in real-time without impacting Service Level Agreements (SLAs). AI on-chip inferencing can also assist companies with compliance; automating the process to allow firms to cut audit preparation time from one month to one week to maintain compliance and avoid fines and penalties. The IBM Cloud also incorporates the Keep Your Own Key (KYOK) which uses z Hyperprotect in the IBM public cloud.  Another key security differentiator is IBM’s Confidential Computing which protects sensitive data by performing computation in a hardware-based trusted execution environment (TEE). IBM Cloud goes beyond confidential computing by protecting data across the entire compute lifecycle. This provides customers with a higher level of privacy assurance – giving them complete authority over data at rest, data in transit and data in use. IBM further distinguishes its IBM Cloud from competitors via its extensive work in supporting and securing regulated workloads, particularly for Financial Services companies. The company’s Power Systems enterprise servers are supported in the IBM Cloud as well. IBM Cloud also offers full server customization; everything included in the server is handpicked by the customer so they don’t have to pay for features they may never use. IBM is targeting its Cloud offering at customers that want a hybrid, highly secure, open, multi-cloud and manageable environment.

Conclusions

Cloud computing adoption – most especially the hybrid cloud model – will continue to accelerate throughout 2022 and beyond. At the same time, vendors will continue to promote AI, machine learning and analytics as advanced mechanisms to help enterprises derive immediate, greater value and actionable insights to drive revenue and profitability.

Security and compliance issues will also be must-have crucial elements of every cloud deployment. Organizations now demand a minimum of four nines of uptime – and preferably, five and six nines of availability – 99.999% and 99.9999% to ensure uninterrupted business continuity. Vendors, particularly IBM with its newly Quantum-safe cryptography capabilities for its infrastructure and IBM Z mainframe, will continue to fortify cloud security and deploy AI.

 

 

The Cloud Gets Crowded and more Competitive Read More »

IBM’s New z16 Aims for the Cloud; Delivers Quantum-safe Cryptography & AI on-Chip Inferencing

IBM has once again outdone itself with its latest z16 mainframe server.

This latest offering has it all: unbreakable security; fast low-latency performance; top notch, easy-to-use analytics and true fault tolerant reliability that provides the lowest total cost of ownership (TCO) and immediate return on investment (ROI) among 15 mainstream servers, industrywide.

The new IBM z16 delivers a cornucopia of embedded, and enhanced functions. This includes hardened security, leading edge AI and performance improvements. The result: the z16 delivers even greater cost efficiencies and a solid “seven nines” – 99.99999% – of uptime and reliability. The AI on-chip inferencing is icing on the cake. It makes readily accessible for all employees — not just data scientists.

The IBM z16 is indisputably the most powerful enterprise system from the zSystems family, to date. It incorporates 7nm technology with clock speeds of 5.2GHz, and it supports a maximum of 200 cores and up to 40TB of memory. According to IBM, this results in 25% more processor capacity per drawer and an 11% per core performance improvement. Overall, IBM said the z16 will deliver 40% better performance than the prior z15 models. And it’s engineered for hybrid cloud environments and provides interoperability with a wide range of environments including Linux and open source.

As impressive as those performance statistics are, the immediate and strategic impact of the IBM z16 is far more than a laundry list of “speeds and feeds.”

In a pre-briefing with analysts Ross Mauri, General Manager of IBM zSystems and LinuxONE, said IBM designed the IBM z16 to address enterprise customers’ need for top notch system performance, resiliency, security/data privacy and protection; dedicated workload accelerators and optimization across IBM’s entire product stack. Barry Baker, VP of Product Management for IBM zSystems said the openness of the IBM z16 enterprise system in supporting multiple operating system environments including Linux, z/OS and a variety of open source distributions like Ubuntu is a win for customers.
Mauri and Baker said the IBM z16 delivers automation, predictive and security capabilities across environments to help enterprise customers on their journey to hybrid cloud and AI. “We are focused on the entire ecosystem. IBM’s strategy has the zSystems platform integrated throughout our products and services offerings to build more value to our clients,” Mauri said.

IBM’s z16 addresses all of the hot button issues confronting organizations in the digital age: AI; performance and low latency; resiliency/security; hybrid cloud; workload optimization; cost efficiencies; interoperability and application modernization.

IBM z16 Quantum Cryptographic Security and AI on-chip Inferencing
The IBM z16 also includes several ground-breaking technology “firsts.” Two of the most noteworthy are the AI on-chip inferencing function and the quantum-safe cryptographic security capability.
The AI on-chip inferencing, which is available at no extra cost, is “a game changer”, IBM executives said. It can deliver up to 300 billion deep learning inference operations per day with 1ms response time. IBM executives also said that the IBM z16’s accelerated on-chip AI “effectively eliminates” latency in inferencing. The result: businesses can cut through the data deluge and predict and automate for “increased decision velocity.” It enables even “non data scientist” customers and users to analyze data and derive insights at heretofore unprecedented speeds. Additionally, leveraging AI in routine daily operational processes can proactively assist businesses to take preventive actions, like identifying and stopping outages before they occur.

AI on-chip inferencing can assist customers in preventing fraud before it happens by scoring up to 100% of transactions in real-time without impacting Service Level Agreements (SLAs) and helps companies keep up to date on fast-changing regulatory issues. The AI on-chip inferencing can also assist companies with compliance; automating the process to allow firms to cut audit preparation time from one month to one week to maintain compliance and avoid fines and penalties.

On the security front, the IBM z16 takes the pervasive encryption introduced in the z14 model and z15 System Recovery Boost and turbo charges it with quantum-safe cryptographic security. The z14 pervasive encryption model provided security at every layer of the stack. The z15 System Recovery Boost capability allowed businesses to drastically reduce the time it takes to shutdown, restart and process the backlog that occurred during a system outage.

Quantum-safe cryptography is as close to unbreakable or impenetrable encryption as a system can get. It uses quantum mechanics to secure and transmit data in a way that cannot be hacked (at least not yet).

The IBM z16 is the preeminent mainstream server platform for digital enterprises requiring nothing less than seven nines – 99.99999% — best-in-class fault tolerant reliability; quantum cryptographic security and AI on-chip acceleration across multi platforms from datacenters to hybrid clouds and the network edge while delivering the lowest TCO and immediate ROI.

IBM’s New z16 Aims for the Cloud; Delivers Quantum-safe Cryptography & AI on-Chip Inferencing Read More »

Scroll to Top