Architecting Process (.pdf)
For more on Architecture
Action Guides see
Action Guide book
iii. Software Architecture
For more on
Project Wipe-out: Big Failures
Architects help manage risk. We
learn from mistakes, so the best mistakes to learn from are someone
else's! Here are some big failures to learn from.
System faults and failures--some examples that made the news
Software and systems project failure:
Standish: Project Success Rates Improved Over 10 Years,
SoftwareMag.com, Jan 15, 2004.
Imagination, process failures doom software projects, by David Worthington,
November 2, 2009
Why projects fail? It's all the business' fault, Matt Deacon
Software Project Failure: The Reasons, The Costs, Carmine Mangione
Hall of Shame, Robert N. Charette, IEEE Spectrum,
Rate, summarizing statistics on IT project failures, by IT
collected by IT Cortex.
by David F. Carr and Edward Cone, Baseline,
McDonald's: McBusted by Larry Barrett and Sean Gallagher,
Baseline, July 2, 2003.
Ford and Oracle mega-software project crumbles by Patricia Keefe,
ADTMag, November 11, 2004.
Who Killed the
Virtual Case File?, IEEE Spectrum, September 2005.
Software system failure:
Who Needs Hackers, John Schwartz, The New York Times
Top 25 programming errors in software security, Ellen Messmer
Why Software Fails, Robert Charette
Big IT Projects Fail Worldwide, podcast by Robert Charette
Resonance and the Stock Market, Grady Booch
Intermittent Behavior and
Intermittent Behavior Revisited, Grady Booch
Television for Software Engineers, Dan Prichett
The RISKS Digest:
ACM Forum on Risks to the Public in Computers and Related Systems.
Volume 24, Issue
29, May 26, 2006.
History's Worst Software Bugs, Simson Garfinkel,
Horror Stories, Nachum Deshowitz, Tel Aviv University
Software failure cited in August blackout investigation,
Computerworld, November 20, 2004.
Health Plans Case Study, Robert N. Charette, IEEE
Spectrum, September 2005.
Teched-out Cars Bug Drivers, by Julia Sheers,
Glitch in iTunes deletes drives, by Farhad Manjoo,
November 05, 2001.
Classic examples of system failures
Bugs, Slugs and Work-arounds
Learning from failure:
Software Failure, IEEE Spectrum, September 2005.
Why Software Fails,
Robert N. Charette, IEEE Spectrum, September 2005.
Causes, summarizing statistics on causes of IT project failures,
by IT Cortex.
To Engineer is Human: The Role of Failure in Successful Design,
by Henry Petroski, 1992.
Success through Failure: The Paradox of Design, by Henry
Petroski, Princeton University Press, 2006.
Early, Fail Often, Jeff Atwood, May 1, 2006
Robert Wears, and Richard Cook,
Automation, Interaction, Complexity and Failure
database: D.R. Kuhn, D.R. Wallace, A.J. Gallo, Jr., Software
Fault Interactions and Implications for Software Testing,
IEEE Trans. on Software Engineering, vol. 30, no. 6, June, 2004.
Learning from failure
From Ruth Malan's Journal
What to do
Steve McConnell's "Classic
Mistakes Enumerated" is a well-researched treatise on mistakes to
avoid on software projects.
2010 CWE/SANS Top
25 Most Dangerous Software Errors --
Assessing the Odds of Catastrophe; see also
wikipedia; see also
The Rugged Software Manifesto.
'Rugged Manifesto' promotes secure coding NetworkWorld, 2/28/10, and
Rugged Software Manifesto, Vikas Hazrati, InfoQ, June 22, 2010
Architecture Reviews, Grady Booch, ♫IEEE
Software on architecture #25, July 2010
Gems and Keepers
The most precious gem found on this
"...a club that began in 1945 when engineers
found a moth in Panel F, Relay #70 of the Harvard Mark II system.The
computer was running a test of its multiplier and adder when the
engineers noticed something was wrong. The moth was trapped, removed and
taped into the computer's logbook with the words: 'first actual case of
a bug being found.'" from
History's Worst Software Bugs, Simson Garfinkel, Wired,
I might keep this one around for when
my kids are teenagers:
"Good judgment comes from
experience, and experience comes from bad judgment." Barry
And this is a good one for a
chuckle in workshops:
"None of us is as dumb as all of
These are neat too:
"The nicest thing about not
planning is that failure comes as a complete surprise and is
not preceded by a period of worry and depression." - Sir
"Not only are a system’s
desired operating modes influenced by its architecture, but
so are some of its failure modes. Thus an architecture that
permits only one path between elements may fail if a leg of
any path breaks. All of a tree below a broken node is
isolated from the rest of the tree."
-- Edward Crawley, Olivier de Weck,
Steven Eppinger, Christopher Magee, Joel Moses, Warren
Seering, Joel Schindall, David Wallace and Daniel Whitney,
Influence of Architecture in Engineering Systems,"
MIT esd, March 2004
"Success breeds complacency. Complacency breeds failure.
Only the paranoid survive." -- Andrew Grove
“If automobiles had followed the same development cycle as
the computer, a Rolls-Royce would today cost $100, get a
million miles per gallon, and explode once a year, killing
everyone inside” -- Robert Cringely
Editorial Comment on Software
These horror stories (I can't let them
enter my mind when I'm
car!), neglect several
points of note:
- software is everywhere, and most of
the time, we don't even notice it. That means it is doing its job
well—most of the time.
- small software failures do not make
the news. They may add up to big losses for our economy and for our
businesses, but they are under the radar.
Even when these failures cost nothing
more than hours, it is a big drain on productivity, and a high stress
burden. The gain is high. The pain is sporadic, and usually hits below
our pain-versus-gain tolerance threshold and we grumble, but put up
Still, ignoring the pain is not a very
resourceful way to approach our future health!
Software development is a
complex, human endeavor. Even the very smartest, best people make
mistakes. It is a hard truth, but bugs are hard to eliminate—entirely. Some of the approach to bug control is
- reduce complexity: separation of
concerns, partitioning the problem, encapsulation, ...
- bug elimination: use proven parts
where available; insist parts come with test suites and bug reports;
design tests for the interactions among parts; ...
- bug damage control: identify failure
modes and failure consequences and create strategies to reduce or at
least contain damage. We're getting better at doing this for
security. We need to do it with all kinds of failure modes. We need
to explore scenarios like "what happens if the Web server goes
down?" so we don't have this situation:
"One day, one of
the credit bureaus' Web servers went down for hours. When Lydian
Trust's 'get credit' service tried to make the call, there was no
answer. Because the connection to the server was loosely linked, the
system didn't know what to do. 'Get credit' hadn't been built to
make more than one call. So while it waited for a response, hundreds
of loan applications stalled." (Koch, Christopher, "A
New Blueprint For The Enterprise," CIO Magazine, May 1, 2005.)
And some of the approach to bug control
is through process (as poisonous as that might be to some):
- have another pair of eyes help
detect bugs (e.g., pair programming in XP, design and code reviews,
- start testing before any code is
written, and test every day from then on!
And some of the approach to bug control
is through management discipline:
- set realistic schedules and don't
overestimate what can be done
- set realistic expectations (up and
down the organization)
- don't press to move to the next
iteration (feature, storyline, etc.) if quality is already slipping;
call a holt to decide on the severity of the situation and approach
to moving forward. Eric Sink's (Why
we all sell code with bugs) approach to the fix-and-delay
versus release-and-face-the-music decision is pragmatic. But we need to
have less of those bugs to decide upon! Because if there are bugs we
know about, how many more are lurking behind the cladding?