Adjusting Agile for Remote Environments

Bill Dickenson, Independent Consultant, Strategy On The Web

 

In most commercial environments the developers are distributed — rarely occupying the same physical site and often on very different hours. Faced with this reality, AGILE struggles. In the 12 principles from the “Agile Manifesto” is the principle that “The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.” This is clearly true and taken as a fixed principle, would rule out Agile for remote teams.

 

Research from CISQ (Consortium for IT Software Quality) has recently evaluated the effectiveness of software teams using Agile as well as Waterfall and found a surprising result. While both Agile and Waterfall produced quality software, organizations that used both methodologies produced higher quality software than organizations that used exclusively with one or the other. This opens up some interesting approaches for Agile in a distributed environment.

 

Start with the Right Projects

 

In general, Agile is best suited when the requirements are too high level or unclear to benefit from a more rapid, iterative approach. One approach that successful companies have used is to separate the well-defined, and clearly documented changes from those that benefit from the interactions that Agile provides.

 

Smaller Work Packets

 

One of the strengths in Agile is speed to value. The smaller the project the more likely it is to deliver the value (Capers Jones). Agile work packets should be small enough to be completed as quickly as possible, and if the organization is moving to DevOps, released to production as soon as practical.  Understand the problem clearly. Resist solving the problem until you understand the problem that needs to be solved.

 

Form Consistent Teams

 

Creating a team requires more than physically grouping developers together. Organizational dynamics dictate that even high performing individuals need time and practice to be a team. Create some stability by naming teams in advance and find ways for the group to interact. A trip to a common location tends to jump start team dynamics. The goal is improved communication and communication becomes far more effective when the team has a level of trust. Chances are that at some point, one of the team members will talk to the business about a function worked on by a remote member. Trust makes that process smoother.

 

Create a Team Room

 

Teams need persistent communications. Shared team rooms help make that possible. Many allow a full spectrum of common notes, virtual post its, drawings, etc. that rival a live team room. At some point everyone will be remote and these need to be as robust as possible.

 

The business users should be included here as well. The “community” of people who have an interest in the results of this change should be included. One highly successful remote Agile group borrowed the rules from an online university with required posts, answers and instructional techniques. As a side note, many of the “problems” the team was asked to solve were solved by other members of the business community who just had not talked to themselves. Communities are a major source of business effectiveness.

 

Include some “personal” space here as well for non-work related posts. Teams have common interests outside the project and again, the more these are used, the higher the trust.

 

Defect Free Code

 

Inexperienced Agile teams tend to lump bugs and defects into the same “group” and then hide behind the “Our highest priority is to satisfy the customer through early and continuous delivery of valuable software” while ignoring the “Continuous attention to technical excellence and good design enhances agility” principle a few items down.  Defects are the measure of technical excellence and Agile teams need to understand, adhere and be audited on compliance. Few issues destroy business value and credibility more than defects in code. 

 

Consider the following benchmarks for quality:

 

1) Security violations per Automated Function Point: The MITRE Common Weakness Enumeration (CWE) database contains very clear guidance on unacceptable coding practices that lead to security weakness. In a perfect world, delivered code should not violate any of these practices. More realistically, all code developed should have no violations of the Top 25 most dangerous and severe security violations, 22 of which are measureable in the source code and constitute the CISQ Automated Source Code Security Measure

 

2) Reliability below 0.1 violations per Automated Function Point: In any code there are data conditions that could cause the code to break in a way that allows an antagonist to gain access to the system. This can cause delivery failures in the expected functionality of the code. Reliability measures how well the code handles unexpected events and how easily system performance can be reestablished. Reliability can be measured as weaknesses in the code that can cause outages, data corruption, or unexpected behaviors. See the CISQ Automated Source Code Reliability Measure.

 

3) Performance Efficiency below 1.0 violations per Automated Function Point: Performance Efficiency measures how efficiently the application performs or uses resources such as processor or memory capacity. Performance Efficiency is measured as weaknesses in the code base that cause performance degradation or excessive processor or memory use. See the CISQ Automated Source Code Performance Efficiency Measure.

 

4) Maintainability violations below 3.0 per Automated Function Point: As code becomes more complex, the change effort to adapt to evolving requirements also increases. Organizations that focus on Maintainability have a lower cost to operate, faster response to change, and a higher return on investment for operating costs. It is important that code can be easily understood by different teams that inherit its maintenance. Maintainable, easily changed code is more modular, more structured, less complex, and less interwoven with other system components, making it easier to understand and change, conforming to “good design enhances agility”. See the CISQ Automated Source Code Maintainability Measure.

 

Feedback

 

Most Agile shops are familiar with the traditional burn down charts, user acceptance of the functionality delivered, time and productivity measures. All Agile teams should incorporate the quality measures above as well as a tracking across time. Teams function best when the feedback is unambiguous and frequent.

 

Conclusion

 

Agile is a powerful tool that can enhance the time to value in many organizations. These principles will guide an organization to finding the benefits of Agile while taking advantage of the best available resources globally. The extra effort in planning and execution pays off with better software and business value.

 

What Developers Should Expect from Operations in DevOps

Bill Dickenson, Independent Consultant, Strategy On The Web

 

Expectation Management

As DevOps becomes increasingly mainstream, it is essential that expectations are met for each group involved in the process. Part 1 of this blog focused on what operations should expect from the developers in DevOps  while this part (Part 2) will focus on what developers should expect from Operations. Managing both sides is essential to a successful flow.

 

To be successful, software must operate efficently on the target platform, handle exceptions without intervention, and be easily changed while remaining secure. It must deliver the functionality at the lowest cost possible. CISQ has evolved a set of quality characteristic measures that when combined with automated software tools, provide a way to make sure that the code delivered, delivers. To deliver on this, Operations must provide the right tools and the right processes to succeed.

 

Specifications for Continuous Release

 

DevOps dramatically increases the speed that application code is developed and moved into production and the first requirement is to design for speed. Specifications should be designed to be delivered in work “packets” that are smaller than typical waterfall design. CISQ research has shown that designing even long projects as a series of smaller fixed scope projects in the 1-3 month range dramatically improves stability and cost control. When staggered to allow continuous releases, the smaller packet design can make DevOps easier. As releases get “bigger” the corresponding risk management problems also get bigger. The success rate for the projects also increases with the reduced time frame.

 

Tools, Tools, Tools

 

As speed increases, there is no room for manual processes which are not only unpredictable but inefficient as well. One of the goals of streamlining the process is to deliver business value rapidly and that requires a better approach. The code delivery “pipeline” must be optimized to deliver an increasingly rapid flow.

  • Software Quality: In the previous blog we discussed CISQ recommendation for software quality. These should be part of the developer’s toolkit. Select a tool that can look at the whole portfolio as many security violations are in the spaces between programs. While there are some worthy open source analysis tools, this is an area where getting the best tool not only reduces the risk but also makes the process smoother. While the open source tools are evolving rapidly, the business case will more that support high quality tools. The entire pipeline should start with quality objectives.
  • Source Code Control/Packet Repository: One area where DevOps implementations report issues is in the software control process. Increasing the speed of development puts source code control at risk especially in legacy environments where the release cycle was measured in months. The faster “packet” design will stress the existing toolset. The Packet repository should hold the products of the entire process. Deployment tools become more important.
  • Codified and Comprehensive Risk Management: Many DevOps implementations fail when an unusually large amount of risk is introduced rapidly. Data center operations are not typically application risk aware and there is usually no codified process beyond the dangerous High-Medium-Low scale. In addition to investing in a better risk management process, the approach must contemplate both application and infrastructure.

 

Environments

 

As the pace quickens, environments need to be defined and available at a far more aggressive pace. Cloud-based services shine at this but hybrid environments work also.

 

  • Test environments: Testing will increase in volume as the more continuous flow drives repetitive testing. The process will drive considerably higher testing needs.
  • Test Data Management: Unlike quarterly and even longer cycles, it becomes almost
    impossible to manually manage test data. The “golden transaction” process where the data necessary to test is preloaded into the image, becomes increasingly critical. The test system images now need to include replicated environments that can be tested rapidly.

From Operations, Developers should expect specifications designed to be implemented more frequently, tools to support the process, and environments designed for application services. Both groups benefit from understanding each others’ needs.  

 

What Operations Should Expect from Developers in DevOps

Bill Dickenson, Independent Consultant, Strategy On The Web

 

Expectation Management

DevOps brings both the developers and operations processes into alignment. This blog focuses on what operations should expect from the developers while my next blog will focus on what developers should expect from Operations. Managing both sides is essential to a successful flow.

 

One of the major weaknesses in application development is that while software only delivers value when it is running, few universities or professional training organizations focus on how to make software operate smoothly. To be successful, software must operate efficently on the target platform, handle exceptions without intervention, and be easily changed while remaining secure. Security may sound like an odd addition here but studies continue to validate that many violations in security are at the application level. It must deliver the functionality at the lowest cost possible.  CISQ has evolved a set of quality characteristic measures that when combined with automated software tools, provide a way to make sure that the code delivered, delivers. Operations has every reason to expect that software will be delivered with these characteristics.

 

Setting SLA Measurements for Structural Quality Characteristics

CISQ recommends the following four OMG standard measures engineered into the DevOps process.   CISQ measures for Security, Reliability, Performance Efficiency, and Maintainability were developed by representatives from 24 CISQ member companies that included large IT organizations, software service providers, and software technology vendors.

 

1) Security Violations per Automated Function Point

 

The MITRE Common Weakness Enumeration (CWE) database contains very clear guidance on unacceptable coding practices. Delivered code should not violate any of these practices however the top 22 are considered the most egregious. They place an unreasonable burden on the infrastructure to protect.  Operations cannot plug the leaks between modules where the security issues occur.  The CISQ Security measure covers the Top 22 CWEs.

 

2) Reliability below 0.1 violations per Automated Function Point

 

In any code there are data conditions that could cause the code to break in a way that allows an antagonist to gain access to the system. These also cause delivery failures in the expected functionality of the code.  Reliability can be measured as weaknesses in the code that can cause outages, data corruption, or unexpected behaviors.  The CISQ Reliability measure is composed from 29 severe violations of good architectural and coding practice that can cause applications to behave unreliably.  

 

3) Performance Efficiency below 1.0 violations per Automated Function Point

 

Performance Efficiency measures how efficiently the application performs or uses resources such as processor or memory capacity.  Performance Efficiency is measured as weaknesses in the code base that cause performance degradation or excessive processor or memory use.  This has been operationalized in the CISQ Performance Efficiency measure.  In today’s relatively cheap hardware environment, violations of this have become common. Unfortunately, they also degrade the cloud readiness.

 

4) Maintainability violations below 3.0 per Automated Function Point

 

As code becomes more complex, the change effort to adapt to evolving requirements also increases. Organizations that focus on Maintainability have a lower cost to operate, faster response to change, and a higher return on investment for operating costs. Up to 50% of maintenance effort is spent understanding the code before modification. The CISQ Maintainability measure is composed from 20 severe violations of good architectural and coding practice that make code unnecessarily complex.

 

These 4 are the minimum requirements that operations should expect from developers. In the next blog we will discuss what developers should require from operations!

 

How to Identify Architecturally Complex Violations

Bill Dickenson, Independent Consultant, Strategy On The Web

 

Dr. Richard Soley, the Chariman and CEO of OMG, published a paper for CISQ titled, How to Deliver Resilient, Secure, Efficient, and Easily Changed IT Systems in Line with CISQ Recommendations, that outlines the software quality standard for IT business applications. The last post explored the relationship between unit and system level issues.

 

The logical and obvious conclusion is to dramatically increase the effort focused on detecting the few really dangerous architectural software defects. Unfortunately, identifying such ‘architecturally complex violations’ is anything but easy. It requires holistic analysis at both the Technology and System Levels, as well as a comprehensive, detailed understanding of the overall structure and layering of an application. For those needing further confirmation and explanation of such problems, the most common examples for each of the four CISQ characteristics, are described below.   

 

#1 Reliability & Resiliency: Lack of reliability and resilience is often rooted in the “error handling.” Local, Unit Level analysis can help find missing error handling when it’s related to local issues, but when it comes to checking the consistency of the error management across multiple technology stacks, which is tactically always the case in sophisticated business applications, a contextual understanding at the Technology and System Levels is needed. A full analysis of the application is mandatory because developers may simply bypass data manipulation frameworks, approved access methods, or layers. As a result, multiple programs may touch the data in an uncontrolled, chaotic way. Bad coding practices at the Technology Level lead to two‐thirds of the serious problems in production.   

 

#2 Performance Efficiency: Performance or efficiency problems are well known to damage end‐user productivity, customer loyalty, and to consume more IT resources than they should. The ‘remote calls inside loops’ (i.e. remote programs executed on a remote device from another program itself located in a loop) are a well‐known example that creates performance problems. A top down, System Level analysis is required to search down the entire system calling graph to identify the source of the problem. Performance issues in the vast majority of cases reside in System Level.

 

#3 Security & Vulnerability: Detecting backdoor or unsecure dynamic SQL queries through multiple layers requires a deep understanding of all the data manipulation layers as well as the data structure itself. Overall, security experts Greg Hoglund and Gary McGraw believe cross‐layer security issues account for 50% of all security issues. The Common Weakness Enumeration database maintained by MITRE is essential for removing common defects.

The Relationship Between Unit and System Level Issues

Bill Dickenson, Independent Consultant, Strategy On The Web

 

Dr. Richard Soley, the Chariman and CEO of OMG, published a paper for CISQ titled, How to Deliver Resilient, Secure, Efficient, and Easily Changed IT Systems in Line with CISQ Recommendations, that outlines the software quality standard for IT business applications. He classified software engineering best practices into two main categories:

  • Rules of good coding practice within a program at the Unit Level without the full Technology or System Level context in which the program operates, and
  • Rules of good architectural and design practice at the Technology or System level that take into consideration the broader architectural context within which a unit of code is integrated.

Correlations between programming defects and production defects revealed something really interesting and to some extent, counter-intuitive. It appears that basic Unit Level errors account for 92% of the total errors in the source code. That’s a staggering number. It implies that in fact the coding at the individual program level is much weaker than expected even with quality checks built into the IDE. However, these code level issues eventually count for only 10% of the defects in production. There is no question that it drives up the cost of support and maintenance as well as decreased flexibility, but the translation of these into production defects is not as large as might be expected. It also calls into question the effectiveness of development level IDE to eliminate production defects.

 

On the other hand, bad software engineering practices at the Technology and System Levels account for only 8% of total defects, but consume over half the effort spent on fixing problems. This eventually leads to 90% of the serious reliability, security, and efficiency issues in production. This means that tracking and fixing bad programming practices at the Unit Level alone may not translate into the anticipated business impact, since many of the most devastating defects can only be detected at the Technology and System Levels.

 

When we review the information from the CRASH database, this is not wholly unexpected. Many of the more serious defects are undetected until the components interact.