When Software Errors Destroy Millions of Dollars Every Second

Stock market glitches as a result of inadequate IT modernization

When Software Errors Destroy Millions of Dollars Every Second

Computer problems have caused frequent breakdowns on the trading floor, but recently a software failure paralyzed the New York Stock Exchange for almost four hours. This new example shows that software has become incredibly complex, and companies that cut back on maintenance and modernization are putting themselves at grave risk. Foto: AFP

Usually, the floor of the New York Stock Exchange (NYSE) resembles a beehive, but on July 8, 2015, the traditional stock exchange remained unusually quiet. At 11:32 a.m. New York time, trading on the world's oldest and most renowned stock exchange had completely collapsed. At three hours and 38 minutes, the NYSE, on which a total of 2,300 companies with a combined value of 27,000 billion US dollars are traded, recorded the longest breakdown in its history. For more than half of the trading day, investors were unable to buy or sell shares. Fortunately, only the New York Stock Exchange was affected, and trading continued on other exchanges.

The NYSE had fallen victim to a glaring software error, citing "major technical difficulties" and "a problem with the configuration" of its computer systems as the cause. Tom Farley, president of the stock exchange, suspected that the reason for the collapse was a faulty system configuration that had occurred after a system update. Consequently, a software failure had occurred during the installation of new program components, which had caused the stock exchange to go down for several hours. (A graphical representation showing the abrupt interruption of stock trading can be found here).

The consequences of the blatant outage were correspondingly far-reaching: U.S. President Barack Obama had to be informed about the trading interruption, the U.S. Department of Homeland investigated the possibility of a cyber attack, the Treasury Department and the Federal Bureau of Investigation were called in, and the U.S. Securities and Exchange Commission (SEC) took up the matter. Nevertheless, affected stock market traders reacted relatively calmly to the outage and tried to shift their trades. The fact that the failure did not result in an economic catastrophe or even a market collapse was only due to the complex development of the stock exchange business.

This Time the Complexity of the Stock Market Prevented a Catastrophe

Ironically, it was the complexity of U.S. stock trading and its increasing deregulation and fragmentation that ensured that the total failure of the NYSE was not associated with an economic calamity. The face of the stock market landscape has simply changed: Whereas at the end of the 1990s the New York Stock Exchange still accounted for around 80 percent of the market, this figure has since fallen to between 20 and 25 percent. More and more trading venues have been able to establish themselves in recent years, and while around 40 percent of stock trading now takes place in so-called "dark pools" (bank- and exchange-internal trading platforms for anonymous trading in financial products) and on alternative trading platforms, a total of eleven exchanges share the rest of the trading volume. (A graph showing the redistribution to other trading venues can be found here).

A Long List of Curious Stock Market Blunders

Or, to put it another way: an economic crisis that could easily have been caused by the serious software error was averted primarily because business at other major trading venues continued as usual, and for some time now it has been possible to trade NYSE shares on other stock exchanges and alternative trading venues. For the NYSE, this may have been of little consolation: What trader is pleased when his customers migrate to the competition because the software is on strike? And even if the complexity of modern stock trading for once showed its good side, the stock market is nevertheless much more complex these days due to increased fragmentation and the increased share of high-frequency trading, in which computer systems execute numerous transactions every millisecond, and has already brought numerous software glitches in the stock market environment.

Not the First Software Debacle on the Stock Market

And the nearly four-hour NYSE outage - shockingly - is by no means the first incident in which a software problem has caused headaches on the trading floor. Most recently, it was stockbroker Knight Capital that produced such headlines: faulty trading software nearly forced the solid, billion-dollar company into bankruptcy on July 31, 2012, after a software update caused the company's fully automated trading software systems to use an incorrect algorithm to buy stocks at inflated prices and generate losses every microsecond.

The critical incident wiped out tens of millions of dollars per minute, and after the system was taken offline in despair after 40 minutes, the losses clocked in at around 440 million dollars. According to the Wall Street Journal, Knight Capital was subsequently sitting on a mountain of unintentionally overpriced shares worth around seven billion dollars, which it had to sell at a discount to Goldman Sachs to avoid bankruptcy. It was a glaring example of how software permeates corporate operations to the core, and in this case resulted in the pun "Knightmare on Wall Street." Knight Capital was forced to provide debt securities to a group of investors in exchange for about $400 million, which saved the company from ruin but caused the stock price to plummet and about 75 percent of the company to fall into investor hands.

And Knight Capital is by no means the only prominent victim of defective software in this regard – the list of curious stock market breakdowns is unfortunately much longer: For example, in May 2010, a "flash crash", a violent collapse of the stock markets, caused prices to plummet within minutes – the Dow Jones lost almost 1,000 points and around a trillion dollars were destroyed. The cause was subsequently identified as computer-controlled high-frequency trading.

BATS IPO fails: At the initial listing of the third-largest U.S. exchange, BATS Global Markets, in March 2012, new software caused the new BATS shares to drop from $16 to less than a cent within minutes due to incorrect transactions and it had to be withdrawn.
Facebook crash: When the well-known social network Facebook went public in May 2012, errors in the trading system of the Nasdaq exchange caused the company's share price to plummet, resulting in losses of millions for the companies involved.
Eurex comes to a standstill: On the morning of March 26, 2013, a faulty time synchronization forced the derivatives exchange Eurex to suspend services for one hour and reset all products to the status they had before the exchange opened.
Nasdaq paralyzed: Due to a technical glitch, the US technology exchange Nasdaq was at a standstill for around three hours on August 22, 2013, after the transmission of price data to the New York Stock Exchange apparently broke down due to a software bug.
Goldman places orders unintentionally: The U.S. investment bank Goldman Sachs also unintentionally purchased shares in the past; after a computer glitch on August 21, 2013, caused mere expressions of interest in stock options to be mistakenly sent as trading orders. The potential loss was in the millions.

Putative Conflict between Time-to-Market and Quality Assurance

In the face of such glaring economic threats associated with defective software, the question quickly arises as to causes and remedies, which in the case of the NYSE are likely to be primarily related to software maintenance.

The bottom line is that the NYSE's failure casts a poor light on its operator, Intercontinental Exchange (ICE), which took over the traditional exchange in 2012 for 8.2 billion U.S. dollars and probably primarily hoped for cost synergies with the acquisition. Even though ICE denied the accusations shortly after the outage, it is easy to get the impression that the Georgia-based company simply cut costs in the wrong place and thus triggered weaknesses in its software, which then led to the software error that caused the total outage.

And ICE would certainly not be alone with such a cost-cutting attitude towards the maintenance and modernization of IT and software. From a management perspective, there is a conflict of objectives between a fast time-to-market with extensive functions and the investment in quality assurance and maintainability of software. To avoid misunderstandings: Of course companies invest in software quality, otherwise there would be many more security gaps. However, when the deadline approaches or the pressure increases, priorities are regularly shifted towards more functionality in the heat of the moment. This is paradoxical, since the subsequent closing of security gaps and the potential (reputational) damage of software errors are much more expensive than focusing on software quality from the very beginning.

The price to be paid for the rapid production of new features is the incurrence of "quality debt" or "technical debt." It is imperative to reduce this debt as quickly as possible because it produces rising costs over time. After all, it is primarily the abstract, "invisible" nature of software that makes it difficult for managers to distinguish between low- and high-quality software. And since bugs and malfunctions only appear over time, it is expensive and time-consuming to discover and correct these errors at such a late stage. In the worst case – as in the case of the NYSE – there is a risk of total failure due to a software error.

Cutting Corners in Software Modernization Is Not an Option

The importance of software modernization has therefore increased significantly in view of the massive increase in the complexity of software. At the same time, however, the immediate relevance of modernization and maintenance measures is often not readily apparent - especially since senior management usually lacks a deeper technical understanding of IT. "Sorry before Save" is consequently a popular approach when it comes to software modernization, and it is correspondingly difficult for technical managers to obtain budgets for this purpose. But how can they convince their budgeters, decision-makers and internal customers of the need for quality- and productivity-enhancing measures if software quality remains largely invisible?

In order to be able to reconcile both points of view - software quality, which is important from a technical point of view, and the fast time-to-market sought by management - it is necessary to create a common understanding even for highly technical issues. In order to get enough time allocated by management, a technical manager must ultimately convince the non-technical stakeholders to invest money in the modernization and sustainability of applications. He can do this by making technical risks visible in a way that is intuitive even to non-technical people, such as through automated data-driven software management. Such software analytics processes can detect software vulnerabilities in a semi-automated manner and provide recommended decisions. This eliminates bugs and security vulnerabilities and allows valuable resources to be reorganized to satisfy time-to-market.

In addition, software managers should integrate the focus on software quality into the internal incentive and reward systems from the very beginning. In concrete terms, this means that a developer is only rated as good if he implements many features in a short time and these features are of high quality and well tested. To ensure this, transparency is particularly important. Both management and developers must be able to see at any time whether solutions have been programmed cleanly or sloppily. Otherwise, software quickly threatens to become a dilapidated bridge that is heavily traveled and whose structural damage has not been repaired, but merely covered with new paint. And what company likes to see its customers switch to other marketplaces in the event of the collapse of this important trading bridge?

Original article in German published in June 2015 in "Manager Magazin".