CISQ Sponsors Meet in Bangalore to Improve the Sizing of Maintenance Work

Dr. Bill Curtis, Executive Director, CISQ

 

During May 25-27 the sponsors of CISQ met in Bangalore, India to develop a specification for automating a Function Point-style measure for analyzing the productivity of maintenance and enhancement activity. Current Function Point-based measures do not account for significant portions of the code in a modern application, that is, the non-functional code required for operating large multi-language, multi-layer IT applications. Thus developers or maintenance staff can perform extensive work enhancing, modifying, and deleting code that does not affect traditional Function Point counts. Consequently their productivity cannot be accurately measured. Although NESMA has proposed an adjustment for this problem, the IT community needs an automatable solution that analyzes the full application.

 

The goal for this mew measure involves sizing the portion of an application affected during maintenance and enhancement activity in a way that is strongly related to the effort expended. The fundamental question related to this goal is how non-functional code should be measured when it is involved in changes. This spring several CISQ sponsors ran scripts on some of their applications to determine what portions of their code went unmeasured in traditional Function Point counting. In Bangalore they compared their results and discussed options for measuring the application code affected in maintenance activity. After two days of debate and discussion they coalesced on an approach which, after being formalized, will be submitted to the Object Management Group for consideration as a supported specification (an OMG standard).

 

Although the sponsors started from traditional methods for counting Function Points, they did not limit themselves to the constraints of these counting techniques. Thus, this specification might be more accurately thought of as Automated Implementation Points since it measures more than just the functional aspects of an application. This new measure will supplement traditional Function Point measures by providing a more complete sizing of the work performed during maintenance and enhancement. Thus this measure will enable more accurate analysis of productivity for use in benchmarking, estimating maintenance effort, and understanding the factors that cause variation in results. We will provide additional updates as the specification progresses.

 

You’ve Been Cloned

We no longer need biology to clone people. Electronics will do nicely. Hieu Minh Ngo, an enterprising young citizen of Vietnam, has just been arraigned in New Hampshire for posing as a private investigator from Singapore and offering an underground service that provided clients with identity information including social security numbers that were available from Court Ventures, an Experian subsidiary that provides access to court records, as well as from US Info Search, a firm that provides identity verification information. While it is unknown how many identities were breached, the likely count is in the millions.

 

How many databases hold shards of information about you? Start with what you have published openly on Facebook, LinkedIn, Twitter, and similar social sites. Then add the information saved by companies with which you do business, electronically or face-to-face, such as credit cards, purchases, preferences, and the like. Then add all the companies that gather data from them, collate it into records about you they sell to others regarding your financial, criminal, shopping, and charitable history. Then add all the medical data held by your doctors, hospitals, insurance companies, and record collating firms. Then add all the data retained by educational institutions you attended about your grades, disciplinary problems, test scores, diplomas, and other achievements. Then add all the data held by your employers such as salary, performance, retirement accounts, automatic deposit accounts, and tax information. Then add all the information local, state, and federal governments maintain on your driving, marital history, criminal record, real estate, tax, travel, and so many other aspects of your life. And who knows what the NSA has gotten ahold of? With such rich sources, who needs your DNA?

 

In Jurassic Park they needed DNA preserved in amber for eons to clone dinosaurs. On the internet they only need a few SQL injections and big data analytics to fully clone you. Integrate the medical data with the pictures you posted on Facebook, tie it to a 3D printer, and presto — instant you.

 

Now add the driver’s license, social security, and passport data held by government agencies and presto — you can pass through security. Now add financial, educational, and employment data, and presto — your alter ego thrives. Who needs the witness protection program, just become whoever you want? On the run or just gone bankrupt, no problem, just become someone else with a clean record.

 

The bottom line is that unless we clean up the security weaknesses in software that maintains personal information, we are headed for an IDENTITY HOLOCAUST. Anyone who has been affected by credit card fraud or identity theft has already experienced the nightmare. And it will be worse next time. In the late 1990s the world attacked the Y2K problem with a vengeance so that civilization as we know it would not end at midnight. We need the same internationally-coordinated determination to harden the software that defends our electronic identities.

 

Wow, you just texted me from Sydney, but didn’t I just see you over at………

What Does Application Security Cost? – Your Job!

Today Target Stores announced that Beth Jacob, their CIO since 2008, has resigned.  Estimates vary, but the confidential data of at least 70 million of Target’s customers were compromised.  Target’s profits and sales have declined as a result, and it faces over $100 million in legal settlements.  Not surprisingly, CEO Gregg Steinhafel announced that Target will hire an interim CIO charged with dramatically upgrading its information security and compliance infrastructure. 

 

Whether it’s security breaches at Target, humiliating performance at Healthcare.gov, outages in airline ticketing systems, or 30 minutes of disastrous trading at Knight Capital, the costs of poor structural quality can be staggering.  In fact, they are now so high that CEOs are being held accountable for IT’s misses and messes.  Consequently, Ms. Jacob will not be the last CIO to lose a job over an application quality problem.

 

Don’t be surprised if the next CIO survey from one of the IT industry analysts reports that a CIO’s top concern is some combination of application security, resilience, and risk reduction.  These issues just moved from variable to fixed income.  That is, rather than having improvements in security and dependability affect a CIO’s bonus, they will instead affect a CIO’s salary continuation plan.

 

Regardless of what the org chart says, the CIO is now the head of security.  The threats online overwhelm those onsite.  The CIO’s new top priority is to guard the premises of the firm’s electronic business.  Failing to accomplish this is failing, period.  CIOs and VPs of Application Development, Maintenance, and Quality Assurance must walk on the job knowing these techniques.  On-the-job learning is too expensive to be tolerated for long.

 

By its nature, size, and complexity, software is impossible to completely protect from disruptions and breaches.  However, if you want to keep your job, it shouldn’t be the CEO calling for an overhaul of information security and compliance with industry standards.

Tough Love for Software Security

Each day brings more reports of hacked systems.  The security breaches at Target, TJ Maxx, and Heartland Payment Systems are reported to have cost well beyond $140,000,000 each.  Are we near a tipping point where people stop trusting online and electronic systems and go back to buying over-the-counter with cash and personal checks?  When does the financial services industry reach the breaking point and start charging excessive fees to cover their losses?  Before we arrive there, IT needs to apply some tough love to software security.

 

Reports following the shutdown of a crime ring last summer that had stolen 130,000,000+ credit card numbers indicated that the weakness most frequently exploited to gain entry was SQL injection.  SQL injection???  Haven’t we known about that weakness for two decades?  How can we still be creating these types of vulnerabilities?  How can we not have detected them before putting the code into production?  Don’t you validate your input?  Don’t you wash your hands before eating?

 

What do we have to do to derail this hacking express?  What will it take to develop a global profession of software engineers who understand the structural elements of secure code?  We need some tough love for those who continue to leave glaring holes in critical applications.

 

Here is a tough love recommendation.  It is admittedly a bit whacky, but you’ll get the point.  First, we rate each of the code-based weaknesses in the Common Weakness Enumeration (cwe.mitre.org) on a severity scale from ‘1 – very minor and difficult to exploit’, to ‘9 – you just rolled out a red carpet to the confidential data’.  Next, we implement technology that continually scans code during development for security vulnerabilities.  Finally, we immediately enforce the following penalties when a security-related flaw is detected during a coding session.

 

  • Severity rating 1, 2 — “Come on dude, that’s dumb” flashes on the developer’s display
  • Severity rating 3, 4 — developer placed in ‘timeout’ for 2 hours by auto-locking IDE
  • Severity rating 5, 6 — developer’s name and defect published on daily bozo list
  • Severity rating 7, 8 — mild electric shock administered through the developer’s keyboard
  • Severity rating 9 — developer banished to database administration for 1 month

 

Okay, this is a bit much, but with the cost of security flaws to business running well into 9-digits, the status quo in development is no longer tolerable.  Here are some reasonable steps to take on the road to tough love.

 

  1. All applications touching confidential information should be automatically scanned for security weaknesses during development, and immediate feedback provided to developers.
  2. Before each release into production, all application code should be scanned at the system level for security weaknesses. 
  3. All high severity security weaknesses should be removed before the code enters production.
  4. All other security weaknesses should be prioritized on a maintenance or story backlog for future remediation.
  5. All developers should be trained in developing secure code for each of their languages and platforms.
  6. Developers who continue to submit components to builds that harbor security weaknesses should receive additional training and/or mentoring.
  7. Developers who are unable to produce secure code even after additional training and/or mentoring should be assigned to other work.

 

The latter recommendations may upset some developers.  However, as the financial damage of security breaches escalates, the industry must take steps necessary to ensure that those entrusted to develop secure systems have the knowledge, skill, and discipline necessary to the task.  Organizations must accept some responsibility for preparing developers and sustaining their skills.  Academic institutions need to incorporate cyber-security as a requirement into their computer science and software engineering curricula.

 

The cyber-security community is supporting many important initiatives, and IT needs to take advantage of them.  Good places to start include the CERT website (www.cert.org) supported by the Software Engineering Institute at Carnegie Mellon University, the SANS Institute (www.sans.org), and the Common Weakness Enumeration (cwe.mitre.org) repository supported by Mitre on behalf of the US Department of Homeland Security.  Ultimately, developers must be held accountable for their capability and work results, since the risk to which they expose a business has grown unacceptably large.  Tough love for tougher security.

CISQ – 2013 Review and 2014 Plans

Happy New Year

Image courtesy of satit_srihin / FreeDigitalPhotos.net

Since our formation in 2011, The Consortium for IT Software Quality (CISQ) has taken IT industry leadership for measuring and improving the quality and productivity of business application software. We are the collective voice of global IT leaders explaining the costs and risks of poor application quality along with best practices to improve.

 

Highlights of CISQ in 2013

  1. The CISQ specification for Automated Function Points (AFP) was approved as a supported specification of the Object Management Group, making it an international standard.
  2. We conducted CISQ Deployment Workshops on Productivity and Quality Measurement at OMG meetings in Reston, Virginia (March) and Berlin, Germany (June).
  3. We signed up industry giants Huawei and Wipro as CISQ Silver Sponsors.
  4. We now have over 750 worldwide CISQ members across a broad range of industries.
  5. We hosted a lively CISQ IT Executive Roundtable in New York City on Software Robustness and Resiliency in Capital Markets.
  6. We submitted comments on CISQ’s behalf to the Securities and Exchange Commission (SEC) regarding a proposed new regulation governing software that affects securities trading markets, “Regulation Systems Compliance and Integrity – Rule 1000(b)(1)”.
  7. We provided guidance globally on best practices for deploying industry-standard measures of software size and structural quality.

 

Plans for CISQ in 2014

As exciting as the first three years have been, they have laid the foundation for what should be a banner year for CISQ accomplishments in 2014. Not only will we be submitting additional measures for approval by OMG, but we will be initiating work on additional measures that build the next level of measurement infrastructure atop our existing definitional work. Our plans for 2014 include:

 

  1. We will submit the CISQ Quality Characteristic Measures for Reliability, Performance Efficiency, Security, and Maintainability for approval as supported specifications of the OMG.
  2. We will initiate work with CISQ Sponsors to define CISQ measures for Technical Debt and Quality-Adjusted Productivity. Initial definition workshops to be held in the second quarter in North America, Europe, and Asia.
  3. We will begin developing a CISQ Certification Program for individuals, technologies, and organizations that provide quality diagnostics and services using CISQ/OMG measures.
  4. We will conduct CISQ Deployment Workshops at OMG meetings quarterly, beginning at Reston, Virginia in March.  We will also begin conducting CISQ Workshops in Europe and Asia either as standalone events or in conjunction with other meetings or conferences.
  5. We will provide comments and other input on policies and regulations regarding software quality to public organizations requesting input.
  6. We will substantially expand the structural quality content available on the CISQ Website.
  7. We will begin providing internal consulting on productivity and structural quality measures for CISQ Sponsors.

 

I am looking forward to a productive 2014 and what we will do together to improve IT software quality in the wake of the recurrent disasters that have plagued IT in financial services, retail, and other industry segments.

 

My best wishes to you, your family, and your organization for a happy and successful 2014!

Leaving Software Health Uninsured Part 1 – The Healthcare.gov Front End

 

stethoscope notepadDr. Bill Curtis

Director, Consortium for IT Software Quality

 

 

According to testimony before a US Congressional Subcommittee, government administrators knew about the performance problems of Healthcare.gov long before the American public were used as system testers. Of course they did. I have never seen a system disaster of this magnitude when the technical folks weren’t alerting management about operational risks long before the system went live, if it ever did.

 

I will leave it to journalists to report the decisions behind an immutable Oct. 1 go-live date regardless of the operational consequences. As a result of these and other decisions, the health and quality of the software in Healthcare.gov suffered from the engineering equivalent of medical malpractice. As with all uninsured patients, the costs will be borne by the public. This series of posts will focus on the constellation of snafus in requirements, code, acquisition, system integration strategy, etc. that collectively created the Healthcare.gov fiasco. So let’s start where the American public attempted to start, at the start page with the now banished smiling lady who was the friendly face of Healthcare.gov.

 

Amateurish. In a word, the front end to Healthcare.gov was amateurish in its construction and execution. Experts have posted numerous analyses of the front end code they could access and the common conclusion was that this would not be code produced by experienced developers of high-volume commercial websites. An excellent review of some of these problems indicated that during the initial stage of the signup process the user’s browser was loading 50+ Javascript files simultaneously, and that they had not been optimized.1 Why? Isn’t optimizing the interaction with browsers to increase responsiveness standard practice?

 

Other pages responded slowly because they were downloading large custom font and typeface files.1 Has the Department of Health and Human Services (HHS) created a new architectural principleFunction Follows Format? Occasionally Javascripts loaded incorrectly creating bizarre pages with texts overwriting texts. An entertaining technical discussion of some of these issues can be found on Reddit.com.2 Why were obvious problems like this not fixed before Oct. 1?

 

Worse than this were obvious and easily detected security problems. For instance, Michael Scherer writing for Time’s Swampland site reported that, “An error message from the site relayed personal information over the internet without encryption, while the email verification system could be bypassed without access to the email account. Both security vulnerabilities could be exploited to hijack an account.” 3 This latter security problem was independently verified, while at the same time discovering a new severe vulnerability that appeared to be created while fixing a website problem.1 The sloppiness of easily discovered security holes such as these reveal coding performed in such haste that novice-level mistakes were made without even minimally adequate attention being devoted to their detection.

 

The onslaught of Americans trying to explore health care options on Oct. 1 created a national denial of service attack on a website that reportedly failed with a load of only several hundred during pre-launch testing. Although the unanticipated size of the response was blamed for the website’s problems, the analyses cited here indicate that the website’s resources would have been overloaded by even the anticipated light customer load. The American public completed beta test before the system integrator, the Centers for Medicare and Medicaid Services (CMS) within HHS, had barely entered alpha test.

 

How could this happen at a taxpayer expense that will easily exceed the currently reported 9 digits. Here are a few hypotheses for journalists to explore. Some of these may be culprits and others may not be involved, but they are common problems that lead to these types of software quality fiascos.

 

  1. The development team was too inexperienced at building high volume websites because either the contractor did not have adequate talent available or had to staff the project with less experienced developers to win cost-conscious competitive bid.

  2. Development work started too late because of the government’s contracting bureaucracy or the inability to stabilize site policies and requirements until late in the schedule.

  3. Changing requirements slowed development work and may have forced significant rework.

  4. By the time development started, the schedule was too compressed to adequately test the system or optimize the front end code.

  5. When signing up for an account became a requirement for using Healthcare.gov, the front end had to interact with so many different systems hosted by different agencies that interaction with the browser became hopelessly overloaded if the session was required to be conducted synchronously.

  6. System integrators at CMS did not have sufficient experience with systems at this scale to anticipate the effort and time involved or the breadth of quality assurance practices required.

  7. Warnings from knowledgeable IT people were ignored at many stages.

  8. Political priorities were weighted far higher than software quality, user experience, protection of confidential information, effective operation, or eventual system costs in the go-live decision.

 

Usually in IT fiascos like this, the correction of the initial problems reveals new layers of problems. So it is with Healthcare.gov. As performance problems were addressed, new waves of problems involving incorrect data being transmitted to insurance companies have begun to emerge. As of this writing we are far from having an effective, efficient, and secure system. The answer to how long it will take to correct the problems is in the defect and incident logs, not in the promises of administrators. In upcoming blogs I will explore several dimensions of the software quality fiasco that is Healthcare.gov. However, the ultimate lesson is simple. Software quality controls its own pace, and not all the dictates of administrators, legislators, and even Presidents can shorten it.

 

 

1. http://blog.castsoftware.com/investigating-healthcare-gov-what-went-wrong

2. http://www.reddit.com/r/webdev/comments/1nifc5/i_guess_a_couple_of_are_trying_to_sign_up_for/ccivseb

3. http://swampland.time.com/2013/10/24/traffic-didnt-crash-the-obamacare-site-alone-bad-coding-did-too/

Insecure Software and My Supersonic Trip around the World

Insecure SoftwareA year and a half ago I registered for the spring semester at Baruch College in New York City.  The same morning I had an eye procedure in Florida.  Shortly after that I bought $4000 of art from a dealer in Kansas City.  By midday I had bought several thousand dollars more art in Australia.  Apparently I was having a fine time at supersonic speeds.  Then my credit card company’s neural nets caught up with me.  Well, not me exactly.

 

Within an hour that fine morning I received a call, an email, and a text message telling me my credit card had been terminated and wanting to verify recent charges.  Apparently I was joined on this round-the-world foray by several thousand other credit card customers.  The credit card company figured the only way we could have executed this spending spree was on the Concorde, which of course had been grounded years earlier…and rarely flew to Australia anyway.  Yep, somebody had been hacked.

 

I received a new credit card the next day, and all the fraudulent charges were reversed so I survived far better than the merchants who had parted with goods or services.  And how much did it cost the credit card company to express mail thousands of credit cards to victimized customers in addition to the hours it took to clean up the financial debris?  Yet my biggest question is what kind of idiot thinks he can enroll at Baruch College with a stolen credit card and not get caught?

 

These memories came flooding back today when US federal prosecutors working with international authorities announced they had broken a hacker ring that spent almost ten years fleecing millions of unsuspecting customers and merchants using 160 million credit card numbers stolen from the IT systems of several large companies.  HOW DOES THIS KEEP HAPPENING?  EVEN WORSE HOW DO YOU NOT GET DETECTED IN THE MIDST OF USING 160 MILLION STOLEN CREDIT CARDS FOR ALMOST A DECADE?

 

Five hackers have been charged ranging from age 26 to 32, and two are in custody.  The damage to merchants and creditors is reported to be at least $300 million.  Once breached, the card numbers were sold around the world–$10 for an American card, $15 for a Canadian card, and $50 for a European card.  My good credit is worth only ten crummy bucks!  Wow, that’s humiliating.

 

In addition to hacking NASDAQ (a US stock exchange), the hackers also penetrated Heartland Payment Systems and Global Payment Systems, companies that clear large amounts of credit card transactions for merchants.  It looks like PCI-compliance is no more a guarantee of impenetrable systems than being CMMI Level 5 is a guarantee of impeccable quality.  There appears to be more needed for a solid defense against hackers than just compliant processes.

 

Here’s the worst of it.  Apparently the hackers used a weakness known as ‘SQL injection’ to break into the systems.  SQL injection!  That’s one of the oldest attack patterns in the book and we knew about it in the last century.  The ‘book’ in this case is the Common Weakness Enumeration (CWE) repository maintained by Mitre Corporation with support from the US Department of Homeland Security (cwe.mitre.org).  Out of the 800+ software vulnerabilities enumerated in the repository, the CWE team lists a Top 25 most common weaknesses exploited by hackers.  Guess which weakness is ranked #1 or #2 year after year.  Right, SQL injection!

 

But Mitre is not alone in assailing SQL injection.  The Open Web Application Security Project (OWASP) publishes a list of the top 10 security vulnerabilities every three years.  And of course, there is SQL injection right at the top. 

 

If everyone agrees SQL injection is a huge security problem, how come we still see these weaknesses in critical business systems?  How can one of the most well know security vulnerabilities still be a primary source of unauthorized penetration?  As a profession do we ever learn?  How sophisticated do hackers have to be if they don’t even have to read very far down the list of weaknesses to find a way in?  Why don’t developers and testers know about these weaknesses and how to detect and avoid them?  If software systems are this easily penetrated can we really call software development ‘engineering’, or even a profession? 

 

The Consortium for IT Software Quality (CISQ) has recently defined a measure of software security based on the top 25 weaknesses in the CWE that were measureable.  The effort was led by Bob Martin who is in charge of the CWE repository.  The specification for CISQ’s security measure is available free on the CISQ website.  CISQ will hold a seminar on Wednesday, September 25 at the Hyatt Regency in New Brunswick, NJ in which much of the afternoon session will focus on how to measure and manage software security.

 

Vulnerable customers need immediate preventative action on all systems that access confidential information.  Here are some of the actions every IT organization with these systems needs to take ASAP.

 

  1. Ensure that every developer working on a system accessing confidential information is trained in secure architectural and coding practices as well as at least the top 25 CWE weaknesses.
  2. Ensure everyone involved in testing or any other form of defect detection is trained in how to detect violations of secure architectural and coding practices, and especially the top 25 CWE weaknesses.
  3. Implement and enforce quality assurance practices that are capable of detecting at a minimum the top 25 CWE weaknesses.
  4. Require all systems that access confidential information to undergo automated analysis at both the code unit and system levels for structural (non-functional) flaws that affect security.
  5. Enforce a process that provides sufficient time and resource to detect security weaknesses.
  6. Prepare an evidence-based case supporting claims that any system accessing confidential information is protected against known threats, attack patterns, and security weaknesses.
  7. Report security measures and other types of audit information regarding system security to upper management on a periodic basis.

 

This sounds expensive—until you get the bill for a major unauthorized penetration.  $300 million makes these recommendations seem inexpensive.  Investing in secure software should be guided by a risk analysis that evaluates potential losses against improvement costs.  There is no way to guarantee a foolproof system.  But there is no sense in being the fool who proves it.

CISQ/OMG Automated Function Point specification available on the CISQ Website

The CISQ specification for Automated Function Points has been approved as a Supported Specification of CISQ’s co-sponsor, the Object Management Group. The preliminary draft of the specification currently undergoing finalization in OMG is available in the members area as 13-02-01 Automated Function Points. Membership in CISQ is free and we will be posting more materials related to the use of this specification over time.
 
This Thursday, February 14 at 10:30 AM EST (15:30 GMT) CISQ is sponsoring a Webinar on how automated Function Points will affect the future of software sizing and application development. The Featured Speaker will be David Herron, co-author of the book Function Point Analysis, co-founder of the David Consulting Group, and the technical lead of the multi-national CISQ work group that developed the Automated Function Point specification.
 
This specification mirrors as closely as possible the counting guidelines of the International Function Point User Group (IFPUG). However, the specification necessarily resolved any decisions involving subjective judgment in order to support automation. Therefore, although counts may differ from those of manual counters, the cost of counting will be dramatically reduced and the consistency increased. CISQ members believe that these cost and consistency benefits will dramatically increase the use of Function Points as a sizing measure of choice in IT as well as expand the opportunities for Function Point experts to engage with IT organizations to support productivity analysis, benchmarking, estimating, vendor management, and other activities to which sizing is important.

Can’t Anybody Here Play This Game?

This famous quote was allegedly uttered by Casey Stengel, coach of the New York Mets baseball team during their inaugural season in 1962. It became the title of a book by Jimmy Breslin documenting a season of endless bumbling by the Mets. I wonder how many senior corporate or government executives feel like Stengel when they face another IT disaster?

 

 

I just heard yet another news report about a $1 billion dollar system development failure in the U.S. government. This one was cancelled by the Air Force because after 7 years and $1B it didn’t work and showed no signs of ever working. “Can’t anybody here build these things?”

 

When asked how this colossal waste of taxpayer money compared with the infamous $600 toilet seats exposed by Congress years ago, U.S. Senator John McCain quipped, “Well in that case at least we got a toilet seat.” In IT you get nothing, not even something for flushing it all down!

 

The U.S. government seems perpetually susceptible to these catastrophes. Failed IT projects in the IRS, FAA, FBI, and many other agencies have left the government hobbling along with outdated systems and Jurassic work processes. But it is not just the government that can’t seem to play this game. Corporations have their own tales of woe, with the lawsuits and firings to prove it.

 

What are the characteristics of IT projects that seem unable to play this game? Here are just a few:

  1. They lack project or acquisition managers with experience at the scale of the project to be undertaken. As Ross Perot once quipped in a Presidential debate, “Just because you ran a mom-and-pop-shop doesn’t mean you can run Wal-Mart.”
  2.  They start with incomplete or ambiguous requirements matched to fixed budgets and delivery dates mandated on a crisis schedule. Then they add more requirements…lots more.
  3. They do not conduct regular risk analysis. If you’re not tracking progress and the status of risks, you’re functioning at the level of lemmings. If you can’t play the game, the least you could do is forfeit early.
  4. They are afraid to utter the two universally unspoken words in the management chain, “No!” and “Stop!” While these words can be career threatening, massive IT failures are guaranteed career crushers.
  5. Coordination among organizations building different parts of the system is an assumption rather than an activity. It reminds me of playing touch football as a kid when I would say, “Everybody go long and I’ll throw the ball as far as I can.” “Hail Mary” is a prayer, but as a football play it rarely succeeds.
  6. No one is evaluating whether the product has a chance of working. The senior architects almost always see the wreck coming, but no one wants to listen. Since there is no hard data recording the failing structural integrity and performance of the product, no insight leaks through the impermeable executive sediment.

 

Here are seven characteristics of IT folks who can play the game:

  1. They don’t take on more risk or ambiguity than they can manage based on past experience, staff capability, and organizational maturity.
  2. They don’t launch without capable project managers and technical leads.
  3. They measure and track progress and risks—and occasionally say, “No.”
  4. They build things in bite-size pieces to make requirements easier to manage, progress easier to see, and calamities easier to squelch early.
  5. They enforce sensible discipline.
  6. They continually measure not only the status of the project, but also the status and quality of the product.
  7. They listen to their technical experts more often than to motivational speakers.

 

The continuing epidemic of expensive project failures and cancellations make it tempting to paraphrase the old adage that, “Those who can do, and those who can’t work on IT projects.” Those who govern on boards of directors or in Congress need to demand more accountability for monitoring in-game performance. No coach can excuse failure by saying, “I just can’t recruit people who can play the game,” Either recruit or develop IT people who can play the game, or cut back to a simpler, slower game you can play. We’re fed up with paying for losers.

Architecturally Complex Defects


There’s a new strain of detection-resistant bugs going around—architecturally complex defects. The bugs are difficult to diagnose and even more difficult to remove. And they are often the most deadly.
 

What Are They?

 
An architecturally complex defect involves interactions between several components, often spread across different levels of an application. These defects reside at the subsystem or system level rather than at the code unit level. Architecturally complex defects are even more complicated if the modules or code units involved are written in different languages or reside on different technology platforms.
 
When tested or analyzed at the unit level, the modules or code units may show no signs of defect. The problems emerge at the subsystem or system level resulting frequently from incorrect assumptions about how different components will interact. Usually such defects can only by detected by quality techniques performed after software exits the build process, such as integration testing and static or dynamic analysis of the integrated software. Thus, quality analysis after integration must be just as intense as unit testing, but with a focus on a different class of defects.
 

Why Are They Uniquely Bad?

 
According to recent research published in the highly regarded scientific journal, Empirical Software Engineering, architecturally complex defects account for only about 8% of the total defects in an application. However, they absorb 52% of the total effort spent repairing defects. In a low maturity organization where 40% of the total effort is spend on rework, the remediation of architecturally complex defects can absorb 20% of an application’s development and maintenance budget.
These staggering costs accrue from making up to 20 times more fixes in eliminating these defects than are required to fix single-component defects. In essence, fixes have to be made to numerous files, and this fixing can become iterative as subsequent fixes have to be added to ensure the remediation is complete across all the components involved. It’s the old problem of fixing something and then realizing there was more to the problem than you originally realized. Because remediating these defects is like pealing an onion, these defects tend to survive across multiple releases.
 

Why Are They Hard to Find?

 
Architecturally complex defects are harder to detect for three reasons. First, they tend to be structural rather than functional. Consequently you cannot write a test case based on the functional requirements or specifications to find them. In fact, Diomidis Spinellis (Code Quality, Addison Wesley, 1968) and others have noted the difficulty of detecting non-functional, structural defects through traditional testing. They are more often detected through techniques such as peer reviews and inspections, static and dynamic analysis, and load/stress testing.
 
Second, since architecturally complex defects reside at the architectural level they can rarely be detected until a significant portion of the system has been built. As mentioned earlier, unit test and static analysis tools at the IDE/developer level will not detect their presence. In fact, without the system-level context that triggers architecturally complex defects, there may be no trace of the lurking problem. This context is only available when the system can be tested as an integrated whole.
 
Third, applications have become too large and complex for any single individual or team to fully understand them. Although developers may be expert in one or two of the languages and technologies in the application stack, they make assumptions about technologies with which they are less familiar. Enough of these assumptions will be wrong to provide the initial vulnerability from which an architecturally complex defect emerges. It is difficult to detect defects formed from system-level interactions you did not understand.
 

What Are Architectural Hotspots?

 
The map of an architecturally complex defect often looks like a snake slithering through the code since the faulty interactions typically trace a transaction, control path, or data flow through the code. A single component may sit in the paths of several architecturally complex defects. These components are called architectural hotspots because they are centrally located in the paths of several defective interactions.
 
Rather than being fixed, architectural hotspots should probably be rewritten from scratch. Their involvement in so many defects is usually the result of poor design or construction that cannot be remediated through incremental fixes. Eliminating architectural hotspots offers the greatest opportunity to reduce the risk and cost of IT applications.
 

What Should We Do?

 
In order to improve the early detection of architecturally complex defects, those responsible for quality assurance must take two steps. First, they must implement a suite of quality management techniques that supplements traditional testing with static and dynamic analysis. Schedule pressure must never be used as an excuse to skimp on full system testing and analysis. The assumption that test and analysis results at the component or subsystem level can be extrapolated to system level results is dangerous. The context changes dramatically when multiple languages, technologies, and platforms are lashed together. The system-level environment is different because full system knowledge is incomplete and spread much more thinly than knowledge at the individual technology or component levels.
 
Second, focus on detecting and redeveloping architectural hotspots. This involves identifying the paths of architecturally complex defects and tracing their intersections. The resulting heat map of the application is an excellent method for identifying architectural hotspots and prioritizing the components to be fixed.
 

Does This Apply to Agile Methods?

 
This system level analysis is even more important in and agile or iterative environment. Both the nature of development and the timescale in highly iterative methods often curtails system level analysis. Since the stories that serve as system requirements are functional, the test cases derived from them will be functional and not designed to detect structural flaws. Short delivery schedules truncate the time for system-level testing and analysis since many of the components are being developed and integrated toward the end of the sprint or cycle. Architecturally complex defects are the ones least likely to be detected under these circumstances.
 
With the pressure to produce a potentially runnable increment on a short schedule, time must be reserved, even if outside the context of the sprint, to complete a thorough system-level structural analysis before releasing the code into operations. System-level structural analysis would not be conducted after every daily build, but it could be conducted weekly or bi-weekly, or at least once before release to Operations. Since the emerging DevOps roles tend to focus on the non-functional, structural attributes of operational software, issues regarding architecturally complex defects should be among their primary concerns.
 

Summary

 
Although architecturally complex defects constitute only a small proportion of the defects in an application, they consume a disproportionate amount of cost and often cause a disproportionate amount of damage. They are the reason that thorough structural analysis must be conducted at the system level as well as within individual technologies and components. As the technology stacks underlying most applications become more complex, the cost and risk attributable to architecturally complex defects will continue growing in comparison to other IT and application development challenges. They must be addressed by strengthening quality analysis at the system level.