There’s a new strain of detection-resistant bugs going around—architecturally complex defects. The bugs are difficult to diagnose and even more difficult to remove. And they are often the most deadly.
What Are They?
An architecturally complex defect involves interactions between several components, often spread across different levels of an application. These defects reside at the subsystem or system level rather than at the code unit level. Architecturally complex defects are even more complicated if the modules or code units involved are written in different languages or reside on different technology platforms.
When tested or analyzed at the unit level, the modules or code units may show no signs of defect. The problems emerge at the subsystem or system level resulting frequently from incorrect assumptions about how different components will interact. Usually such defects can only by detected by quality techniques performed after software exits the build process, such as integration testing and static or dynamic analysis of the integrated software. Thus, quality analysis after integration must be just as intense as unit testing, but with a focus on a different class of defects.
Why Are They Uniquely Bad?
According to recent research published in the highly regarded scientific journal, Empirical Software Engineering, architecturally complex defects account for only about 8% of the total defects in an application. However, they absorb 52% of the total effort spent repairing defects. In a low maturity organization where 40% of the total effort is spend on rework, the remediation of architecturally complex defects can absorb 20% of an application’s development and maintenance budget.
These staggering costs accrue from making up to 20 times more fixes in eliminating these defects than are required to fix single-component defects. In essence, fixes have to be made to numerous files, and this fixing can become iterative as subsequent fixes have to be added to ensure the remediation is complete across all the components involved. It’s the old problem of fixing something and then realizing there was more to the problem than you originally realized. Because remediating these defects is like pealing an onion, these defects tend to survive across multiple releases.
Why Are They Hard to Find?
Architecturally complex defects are harder to detect for three reasons. First, they tend to be structural rather than functional. Consequently you cannot write a test case based on the functional requirements or specifications to find them. In fact, Diomidis Spinellis (Code Quality, Addison Wesley, 1968) and others have noted the difficulty of detecting non-functional, structural defects through traditional testing. They are more often detected through techniques such as peer reviews and inspections, static and dynamic analysis, and load/stress testing.
Second, since architecturally complex defects reside at the architectural level they can rarely be detected until a significant portion of the system has been built. As mentioned earlier, unit test and static analysis tools at the IDE/developer level will not detect their presence. In fact, without the system-level context that triggers architecturally complex defects, there may be no trace of the lurking problem. This context is only available when the system can be tested as an integrated whole.
Third, applications have become too large and complex for any single individual or team to fully understand them. Although developers may be expert in one or two of the languages and technologies in the application stack, they make assumptions about technologies with which they are less familiar. Enough of these assumptions will be wrong to provide the initial vulnerability from which an architecturally complex defect emerges. It is difficult to detect defects formed from system-level interactions you did not understand.
What Are Architectural Hotspots?
The map of an architecturally complex defect often looks like a snake slithering through the code since the faulty interactions typically trace a transaction, control path, or data flow through the code. A single component may sit in the paths of several architecturally complex defects. These components are called architectural hotspots because they are centrally located in the paths of several defective interactions.
Rather than being fixed, architectural hotspots should probably be rewritten from scratch. Their involvement in so many defects is usually the result of poor design or construction that cannot be remediated through incremental fixes. Eliminating architectural hotspots offers the greatest opportunity to reduce the risk and cost of IT applications.
What Should We Do?
In order to improve the early detection of architecturally complex defects, those responsible for quality assurance must take two steps. First, they must implement a suite of quality management techniques that supplements traditional testing with static and dynamic analysis. Schedule pressure must never be used as an excuse to skimp on full system testing and analysis. The assumption that test and analysis results at the component or subsystem level can be extrapolated to system level results is dangerous. The context changes dramatically when multiple languages, technologies, and platforms are lashed together. The system-level environment is different because full system knowledge is incomplete and spread much more thinly than knowledge at the individual technology or component levels.
Second, focus on detecting and redeveloping architectural hotspots. This involves identifying the paths of architecturally complex defects and tracing their intersections. The resulting heat map of the application is an excellent method for identifying architectural hotspots and prioritizing the components to be fixed.
Does This Apply to Agile Methods?
This system level analysis is even more important in and agile or iterative environment. Both the nature of development and the timescale in highly iterative methods often curtails system level analysis. Since the stories that serve as system requirements are functional, the test cases derived from them will be functional and not designed to detect structural flaws. Short delivery schedules truncate the time for system-level testing and analysis since many of the components are being developed and integrated toward the end of the sprint or cycle. Architecturally complex defects are the ones least likely to be detected under these circumstances.
With the pressure to produce a potentially runnable increment on a short schedule, time must be reserved, even if outside the context of the sprint, to complete a thorough system-level structural analysis before releasing the code into operations. System-level structural analysis would not be conducted after every daily build, but it could be conducted weekly or bi-weekly, or at least once before release to Operations. Since the emerging DevOps roles tend to focus on the non-functional, structural attributes of operational software, issues regarding architecturally complex defects should be among their primary concerns.
Although architecturally complex defects constitute only a small proportion of the defects in an application, they consume a disproportionate amount of cost and often cause a disproportionate amount of damage. They are the reason that thorough structural analysis must be conducted at the system level as well as within individual technologies and components. As the technology stacks underlying most applications become more complex, the cost and risk attributable to architecturally complex defects will continue growing in comparison to other IT and application development challenges. They must be addressed by strengthening quality analysis at the system level.