Subscribe for updates

Thursday, March 23, 2023

Software Design – history and principles – how to fix our software nightmare

 Software Design – history and principles – how to fix our software nightmare.

I first wrote this in 2018 - and just discovered it. I was writing a bit to one of my grandson’s about his career in architecture, and wanted to give him an example of “patterns.” I know it does not fit with most of the stuff here - but I thought it was worth the time when I wrote it. Worst case - just skip this one. I will not be offended. Thanks.

We have had some software design debacles of late. The Affordable Care Act website craziness, and the MN vehicle registration system are two examples. I worked in software for 30 years, I taught software design and architecture, and was a practitioner. I actually stumbled on the right way to do this stuff, and always assumed that everyone else would eventually get the message. I did not create the solution – a bunch of other people created it – but it amazes me how poorly understood the problem is at this late date.

As is my wont in these pieces, I will explain the problem and the solution as succinctly and briefly as possible. Then I will expand on each point with a few examples and stories, to drive it home.

Brief Outline Summary

Structured Design” is the key. The alternative is spaghetti design, amorphous design, uncontrolled design – whatever else you want to name it. Structured Design has a very simple set of principles:

  1. Single Entry – Single Exit.
    Every component in a software system must have these attributes. You invoke it by a “Perform” or a BAL – but never by a GOTO or a branch. The software component starts up – does its thing – and then does a RETURN or goes back to where it was invoked. It CANNOT go anywhere else. This seems relatively simple, but it is rarely accomplished in practice. I have stories! In structured software methodologies, these components might be called “methods”, or “subroutines”.  This is the MOST important principle, and would, all by itself, solve 90% of the software quality problems. Sometimes you will see this referred to as “object oriented” design, and implemented via languages like Smalltalk, C+, etc., but it can be done in any language, any technical infrastructure.

  2. Reusable Components.
    Every component is designed to be reused. This is also a brilliantly simple principle, but rarely if ever used. Trust me on this. To enable reuse, each component is published to the larger community, and given a descriptive name, so that people can find it when they need it. This also requires that the interface be as flexible and simple as possible, for which see the next principle. This is more focused on productivity, but it also enhances quality. If we figured out how to do something once without errors, why do it again?

  3. Standard Descriptive Interfaces.
    Every invocation of a component follows standard rules for flexible interfaces. By “standard” I mean STANDARD. This means people from an industry or domain have met, discussed and agreed upon what things are named, and how they are described. If it isn’t a standard, you have to make it one before you can use it. The best example of this today is XML. When you invoke a process, the interface requires that you describe the components you are passing, and the components you are expecting, according to a standard protocol, which requires that you use the descriptive terms as part of the interface. If you know html or xml, this will seem obvious, but others might need an example.
    I invoke a routine to calculate premium. I pass it two fields in this way: <Type of risk>Auto</Type of risk>  <Value of risk>1000</Value of risk>. The values are surrounded by the descriptor – and it indicates when it starts, and when it ends. The subroutine likely would return a single field: <Premium>50</Premium>. You get it? Simple, eh? No length variable, no figuring out a location in a string – THIS FIELD RIGHT HERE is the one. It starts here, and ends there.
    The way things used to be done, and the way idiots continue to do it, is to pass to the subroutine a single address pointing at the location of the beginning of the data, or to actually send the data in a bit stream, consisting purely of the data – nothing else. That seems effective and efficient, but totally dangerous. See below for a fuller understanding of how this works.

  4. Use a Common Message Queue.
    This is not an absolute requirement, but a really useful one. To maintain logical isolation, all subroutine invocation is done through a common message queue. This means that the subroutines must have names and published, standardized interfaces – see above. To invoke a subroutine, you name it, and provide the standard descriptive data it needs. The subroutine processes the data, and then RETURNS the result in the same way. This also implies that a QUEUE can only hand the message to one subroutine, and then MUST return the results to the invoker – no one else – see principle 1 above. This might be called a “Message Broker” if you are looking stuff up here.

  5. Use Software Design Patterns.
    An architect by the name of Christopher Alexander first came up with the idea that physical architecture could be reduced to repeatable “patterns” that worked well. A few pioneers in software picked this up and expanded it into software patterns. This is more focused on productivity than quality, but it helps both. If someone has a genius solution to a difficult problem, and they publish it, then everyone else faced with that problem does not need to reinvent the wheel. This is called a software design pattern. The pattern description should appear in a standard journal or thesaurus, which might be collected into reference works, education classes, etc. Just for a simple example, if I described a design pattern as “publish and subscribe”, what does that mean? One use would be where you have a variable or a data item which changes every January 1. You want everyone that uses that variable to get the change as soon as it appears. How to do that? You should keep a catalog of every piece of software that uses it, and then go change them all at about the same time and hope for the best. You could broadcast it to the whole world using the queue above. Hit and miss. Instead, you establish a publish / subscribe relationship. The owner of the data builds a subscription list – anyone that sends that owner a “subscribe” message is put on the list. If they are no longer interested, they send an “unsubscribe” message. When the data changes, the change is sent to all subscribers via a queued message, instantly. Problem solved. 

  6. Use a Framework.
    This is a little more advanced, and it is not as fundamental a requirement, but the benefits are similarly advanced. A “framework” is a collection of sub-routines or methods, constructed around a common problem, or a common industry. Think about it. How much of your HR system is unique to your company? How much is common to a whole raft of companies? Make the common stuff into a standard framework. Publish it, or sell it, and everyone benefits. If you use the design principles above, you can even upgrade it and change it, pretty much on the fly. If you add new functionality, you can drop the new framework right in, and NONE of the new functions will be invoked, because none of the higher routines even know about them. But they can be used immediately – as soon you make the needed changes. Using a standard descriptive data interface and a message queue delivers that capability.

Why Would Anyone Do This?

Back when I was gainfully employed, I was researching this stuff, and found a noted expert who had published several studies on the topics. I asked him to come and visit with the senior staff of our IT division, and we could change the world. I contacted him, we negotiated a fee, and he agreed to come and talk to the senior management. He had one proviso – he had to meet with the division head, the VP, before he would address the group. Fine by me. We set up the appointment, and I accompanied him to the meeting. After introductions and some general discussion about the field and what he was about, he asked the boss, “What do you think the potential is in your staff to improve the productivity and quality of their software development?” The boss said – 10 to 15%. The discussion continued politely for a bit, and then we made our exit. As we walked out, he said – it’s not worth the time and effort. If the boss’s vision is that narrow, you are not going to get anywhere. He and I both knew that the potential for using the above techniques is not a percentage – but a multiple – 2 to 3 times as productive. You don’t need to believe me – look it up.

Why Isn’t Everyone Doing This?

I think it is like a religion. You can explain things to people until you turn blue. If it is contrary to what they believe, it goes nowhere. The field of economics is exactly like this, and politics is as well. The other side of that, once you do see this, you cannot let it go. It is so obvious, you cannot just forget it. But tread carefully and watch your step. If you try to foment a revolution in one big step, the powers that be will class you as a radical and you will be terminated, removed, etc. I know this from experience. If you aren’t THE boss, go with the flow, move it gently, bide your time.

More Details

In this section, we will expand on the above principles, to explain why they work and what they can do. I have seen enough examples of the wrong way to do things that my experience might persuade you whereas the dry explanation of the right way might not. We shall see.

Single Entry Single Exit

You will see this described in more complex fashion in the literature, but let me give you the way I learned it. I took a software design class in 1970 which really brought it home to me. At the time, I was writing a piece of software which would be used as a utility to make global changes to some software instructions called JCL stored in a procedure library. The boss wanted all of the JCL procedures standardized and simplified, and suggested that I write some software to do it rather than trying to code all of the things by hand. It seemed easy enough, but it was a nightmare. The logic was very complex. So, I took a class in “structured design”.

The instructor started with this example, which I used in every class I ever taught on this topic. In this example you have a file with 50 variables or fields. To simplify things a bit, every field can have only 2 states – they are binary – either one or the other value. If you were to build software the standard way, design around functions and inputs and outputs, in order to completely test this software, you would need to test every possible combination of the 50 fields that might appear in the input file. There is a formula for that: 2 to the 50th power minus 1. It means, there are 2 possible states, and those states must be squared for every possible combination – so 50 times, more or less. I actually calculated that number and proceeded to write it out on the black or white board for all of the class to see. Written in scientific notation: 1.1259E+15. OR: 1,125,899,906,842,620. That is 1 quadrillion – the number above trillion. You get that – it’s a LOT of test cases. Guess what – NO ONE is ever going to run all combinations to test that program. In my prior life, there were many examples of software design that had even more complex situations, and software is daily becoming ever more complex. 

Now how can you design software to handle these 50 fields and be able to thoroughly test it? What is principle number 1? One entry, one exit. You design a separate routine to process each field, or each combination of fields in a hierarchical fashion. In Cobol you do a Perform for any routine you need – a perform goes to the routine, and then returns to the caller. ALL subroutines can only process, invoke another subroutine with a perform, and then end or return. NO OTHER branching is allowed. If you structure it as a hierarchy, at the top level, you say, perform premium payment, perform coverage change, perform whatever. You are not going to have 50 routines for 50 fields at that level, but even if you do – that is only 50. At the lowest level, you could have 50 tests times 2 – one for each field and each of the two variables. Then you have a test for each routine that processes any combination of fields – say we have another 50 or 100. Then another test for every routine that invokes any combination of those, etc.  The result might be several hundred test cases, or even a thousand if the thing is really complex. But if you are certain that all routines stay in the confines of their data and code – the problem is greatly simplified. You can test it completely with a modest number of carefully designed test cases. And if a test fails, you know exactly where it happened.

Trust me on this – very few people in the published literature on programming design really understand this and the power it has. Isolating each logical component and structuring them in a hierarchy gives enormous gains in simplicity.

An Example:
I remember one program in particular called “Main Run.” It was a collection of COBOL programs that processed the Personal Lines insurance for a large insurance company. It ran every night, so it could process the payments received, and keep track of the changes to every policy. It was classic spaghetti code. It would run for hours – and it NEVER ran without encountering a problem. The programming staff were on call every single night to try to fix the thing and get it running again. When I heard about this, I came up with a “genius” solution. IBM had a feature on their 360 generation computers called “checkpoint / restart”. Every so many records or minutes, the system would stop the program, and take a complete dump or checkpoint of the state of the program and store it on disk. Then it would start things up again at that point. That way, in case something died in the system, or the power went out, they could restart the program from one of those checkpoints. To take advantage of this feature, I wrote a little software routine which used the STAE routine to catch the program abend in flight. When this routine got control from the OS, it would take a copy of the data in the buffers – which record was being processed – and write that to a disk file, and then let the abend proceed so there was a dump of the problem. Every time the software started a new record, it passed the data to this subroutine and asked – “Have we been here before?” The routine would hold that data and then write it to disk if it caught an abend. And it would read the records on the disk and compare them. If this one caused an abend the first time, the program was instructed to write out a log record and just skip it this time – thank you very much. You could set how many abends you wanted to process – 10 would usually get them through the night. So, instead of testing the thing to death, you just catch the failures and then avoid them. Worked slick. I know for a fact that people used that routine for over 20 years, because I would get questions about it all the time.

In a perverse approach to this same problem from the other angle, the original space shuttle software development team set a “zero defect” goal with their software. After all, any problem there would endanger the lives of the astronauts. Their approach was literally, to run every single possible test against the software. That works, but it is pricey! 

Just so you know, in the original descent to the moon, astronaut Buzz Aldrin had to deal with five alarms from the lunar module computer that it was its memory was full. He had to restart it three times. Imagine the excitement around that. The “memory full” was a known bug! 

As I noted above, be very careful using the term “zero defect”, lest you be labeled a crazy, irrational idealistic nut, and be shown the door. Rather, talk about defect reduction, etc. Slow but easy, we can get there.

Reusable Components

This makes perfect sense, but almost no one is doing this. Instead of looking around for well designed software that can do some parts of what we want to do, virtually every system design starts from scratch. Let’s say you want a really fancy bathroom. You hire a first class architect – he does a wonderful, unique design, just for you. Now you get a contractor to build it. Let me ask you, does the contractor go fell trees, to cut them into 2 x 4s? Does he dig in the dirt and get clay and build a kiln to make the tiles you need for the wall and the floor? Does he refine copper to make the wire, make steel to create the faucets, build a plastic factory for the tub and shower curtain? Of course not – he finds things off the shelf that will do the job. And, amazingly, these things will all work together, because, even more amazingly, they all share STANDARD DESIGN INTERFACES. You can almost buy any sink or tub, and almost any faucet will fit on it. You can get any light fixture, and it will connect to the standard power line. Talk about standardized design. If you want to get real creative, you can design your own sink, but you better have a standard set of holes for the faucet, or you are not going to make it work. Reusable components go hand in hand with the standard design interfaces. Which is our next topic. 

The other side of this problem is that even if people are building reusable components, they are not publishing them for others. At a minimum, within one software operation, you should be able to document and publish “utility software” – standard, reusable components. Not everyone needs to store and update name and address on every person that is connected to your company. Put them all in one system, give it a clear interface with standard data and your problem is solved. If a customer has more than one policy, or is also an agent or a supplier – they still have one address, one bank, etc.

Standard Descriptive Interfaces

This is another key component of this approach, since it reduces enormously the problem of passing data back and forth between systems. It used to be, if you wanted to send information to a government entity, and then get results back, you had to conform to the standard bit oriented data structure published by that agency. And it might be very different from state to state, agency to agency. But, in this modern era, you can define standard terms for every possible data entity. The problems with bits is that they can change, and you can drop one or two, which totally messes up everything. When we were shuttling data back and forth with agents, every time we added a new system, we had to redo the whole business, because no one actually implemented the standards the way they were designed.

Now, with XML and its derivatives, the entire insurance industry has designed a standard set of terms for all of the data elements that exist in this business domain. And, there are tools which can transform data elements from one domain into another, like from building maintenance to insurance. It’s called XSLT – check it out.

For an example, let’s say that 3M wants to shop ALL of its risks for a new insurance bid. Guess how many data elements 3M would have to provide to the underwriters to give them a reasonable guess on the cost. How many? Remember, they are worldwide, and they have numerous plants and facilities, and numerous different businesses. I think it is safe to say that no one is going to do the bid based on all of that data. But, what if building and infrastructure maintenance had an XML standard for every location, every building, every component that they maintain. What if the insurance industry had a standard for every insurable risk and option – that standard actually exists – I helped create the first version. Now, with a handy little conversion tool – I also helped design one of them – you can define an XSLT transform which would accept all of the 3M data and spit out the insurance data. It is the same data – you are just changing the descriptors required by the different industry. Would that we could adopt universal descriptors for common things, like, say name and address and telephone – maybe someday. Until then, you can actually transform from one standard to another.

Having set up your transform utility, you company is now in the position to be able to quote virtually any business they would like, with almost NO EFFORT. I am on the board of a townhome association. We have 111 homes – they are of 8 different architecture styles. We have a heck of a time getting an insurance agent to bid on them – it is just too much work. I know the insurance standard exists – I would be pleased to hand code the 8 different models and identify them in the 111 risks, if I could find any carrier sophisticated enough to pull this off. When will the future arrive? 

Common Message Queue

This seems pretty simple. You set up, or better, buy, a message queuing facility. Every piece of software in your enterprise is connected to this. Every software routine only gets its input from a message that is delivered via this common queue, and it only puts its results back into that queue. You define the standard transactions and standard descriptive data – and you are done. For example, let’s say you are very forward thinking and have designed a common name and address database, so that all of your systems only have to look in one place to locate the contact information for any entity, any risk, any person that you deal with. If you define the interface correctly, all requests for that data, and all changes to it are conducted via a message with a variable amount of data that has standard descriptors attached. Problem resolved. 

In an actual example, a long time ago, my employer had built just such a common database. They had, however, neglected some of the necessary design features. The database was a single entity, but the support staff decided, probably under some pressure from the other programmers, to adapt their interface to almost every requestor. They published 60 interfaces. I had the staff in a design review session, and we asked them how they could support 60 interfaces. They told us that there were 250 interfaces – they only published 50 of them. Mamma mia! What a nightmare. Some of these were passed via sequential files, so nothing could be done in real time. 

Software Design Patterns

Let’s say you are a mechanical engineer. You want to design a throttle regulator. The problem is that the throttle is an in and out gadget – it increases fuel by pushing down, decreases by pulling up. The engine you want to throttle is a rotating gadget – it spins a flywheel or axel, or something that rotates. How do you translate the rotating speed of one gadget into an in and out or up and down motion in another gadget? You need a bit of genius thinking. That, or go to any standard engineering workbook or catalogue and look it up. It has been done – it is as simple as pie. You have the rotating component spin a control with a weighted pair of arms. The faster it spins, the further out the arms go, which pulls up a slide on the center of the rotating gadget, which is connected to your throttle. Image result for design throttle governor diagram


ALL kinds of engineering problems are designed and available just like that. New ones are created and added all the time. If you are a mechanical engineer or electrical engineer or a carpenter or a plumber, you do not design things from scratch – you go look and see what someone has already contributed to the field. There are literally catalogs of software design patterns. But precious few system software engineers even know they exist. People building a motor vehicle registration system should NOT be designing it from scratch – please!

For a simple example of how not to do this, I used to teach a COBOL programming class. And I was good at it. I would teach the students all of the commands, and give them exercises so that they could use them. Then I would throw a final exam at them – here’s the transaction file, the master file, and what required reports. Figure it out and show me a successful test. The students would literally spend weeks trying to figure out how to apply transactions to a master file. I could have shown them that pattern in about 30 minutes, and we could have spent the rest of the time learning more and different design patterns. But that simple idea never occurred to me, and it does not occur to most system developers to this day. In my day, there were no design patterns – we were making the things up. Now there are hundreds if not thousands of them. Look around a bit.

Frameworks

This is getting out there a bit, but a framework is just collections of the above components that are tailored to an industry or a problem. No one designs a new automobile by figuring out how to caste steel and how to make plastic. That infrastructure already exists. You can specify what you want from the suppliers, and they will deliver it. 

Similarly, in the software world, find a framework that works for your problem, and tailor it. Most of the current literature treats a framework as a very generic set of infrastructure – similar to an operating system, or functions of a coding language. Those are very low level frameworks – just above the hardware level. The genius here is to build or find a framework designed for an industry or domain. This type of framework provides the basic functions needed for every part of that business domain, and allows the end user to combine them in creative ways to solve their unique problem. IBM constructed an insurance framework back in the 80s and 90s, and it is still being advertised. Look at the reference listed below to get a decent pitch on how this all works. IBM was the clear leader in this field way back when, and they seem to be marketing a wide variety of industry specific frameworks. 

Conclusion

Once you see this, you can’t forget it, right? Then why on earth are we still building nightmare software? We don’t design buildings or cars that way – why can’t we teach software design and engineering like we teach mechanical engineering? You can do it. I’m pulling for you!

For an OLD but GOOD set of software design principles, you could do worse than this set: https://sites.google.com/site/carlscheider/carls-papers/design-principles

Bibliography

Anonymous, this discussion reaches the conclusion that it is impossible to fully test a highly complex system. I would agree, if they are not using these basic design principles. See what you think.
https://softwareengineering.stackexchange.com/questions/195571/is-it-possible-to-reach-absolute-zero-bug-state-for-large-scale-software?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa 

Fishman, Charles, on Nasa’s zero defect attempt.
https://www.fastcompany.com/28121/they-write-right-stuff 

Gamma, Eric, et al: Design Patterns, 1995.
https://www.amazon.com/Design-Patterns-Elements-Reusable-Object-Oriented/dp/0201633612/ref=sr_1_2?ie=UTF8&qid=1525208081&sr=8-2&keywords=software+patterns&dpID=51szD9HC9pL&preST=_SX218_BO1,204,203,200_QL40_&dpSrc=srch 

https://en.wikipedia.org/wiki/Software_design_pattern 

Rouse, Margaret – a definition of structured programming – or modular programming. Introduces the ideas “micro services”, “container technology”, “service oriented architecture”.
https://searchsoftwarequality.techtarget.com/definition/structured-programming-modular-programming 

Saltarello, Andrea and Dino Esposito: Design principles and patterns for “software engineering”. OK, but beware, this is from Microsoft, after all.
https://www.microsoftpressstore.com/articles/article.aspx?p=2233326 

https://www.codingame.com/playgrounds/503/design-patterns/origin-of-design-patterns
Nice explanation of Christopher Alexander’s 253 architecture patterns that worked, and how that led to software patterns. Gamma et al applied the idea to software with their book: Design Patterns: Elements of Reusable Object-Oriented Software

Insurance Software Framework:
ftp://public.dhe.ibm.com/software/industries/frameworks/insurance/2010_04_11_Insurance_Framework__Detailed-V4.pdf
http://www.fasttechnology.com/software/components/ 

Software Frameworks:
https://en.wikipedia.org/wiki/Software_framework
https://techterms.com/definition/framework
https://www.quora.com/What-is-a-framework-in-programming
https://www.infoworld.com/article/2902242/application-development/7-reasons-why-frameworks-are-the-new-programming-languages.html
https://www.ibm.com/us-en/marketplace/hospitality-framework 

Fayed, Mohammad E, and Johnson, Ralph E.: Domain Specific Application Frameworks Experience, 1999. https://www.amazon.com/Domain-Specific-Application-Frameworks-Experience-Industry/dp/0471332801 

Governor for a steam turbine
https://en.wikipedia.org/wiki/Steam_turbine_governing