Archiv der Kategorie: Quality Management

These posts are about general topics about quality management.

Test Approches for Green and Brown Field Projects

The picture  on the right side show the relationships between the different test levels. The tests are a kind of test stages in green field projects. Usually, the stages are done from bottom to top and from different parties or members of the time.

 

 

Green Field Projects (and also new code in brown field if possible)

Here it is to be pointed out, that the usual order of testing in Green Field Projects (and code writing) is done in the following direction:

  1. Writing Code (developer)
  2. Checking Code (developer and CI system with plugins)
  3. Writing Unit Tests (performed as block in TDD with steps 1 and 2)
  4. Writing Component Tests (developer and QA)
  5. Writing Integration Tests (developer and QA)
  6. Setting up automated GUI tests (QA)
  7. Manual Testing (QA and if needed, developers conducted by QA)
  8. Endurance Tests (QA)
  9. Performance Tests
  10. Load Tests (Stress Tests)
  11. Decoupled: Penetration Tests (continuously by all, but systematically by QA)

The first three steps which go hand in hand and in TDD it seems that the numbering is reorder to 3, 1 and 2, but the code of the tests is also code.

The tests become from one step to the next less detailed, but go more into use cases and real world scenarios. The tests become also more and more realistic in the configuration and setup in relationship to the target environment. Even patch levels need to be controlled, if needed.

Brown Field Projects

In Brown Field Projects it is the other way around. A legacy code base is available and not under test. The approach can not go here to write unit tests first, because it is too much work to do and in most cases it takes a long time to understand the old legacy code. To get a good unit test suite, one calculates roughly to have at least 1 unit test for every 100 lines. In a legacy code base for only 100,000 lines it would mean to write at least 1,000 meaningful unit tests. This work would really suck…

The approach goes here to write integration tests first to assure the basic functionality. The basic use cases need to be right and the functionality is assured. With integration tests not all paths of working can be checked, but the basic behavior, the correct results for the most important use cases and the error handling for the most common error scenarios can be checked.

As soon as new functionality is developed, functionality changed or extended, new integration tests are to be implemented first to assure the still needed functionality stays unchanged and the new functionality is tested as it is required. Unit test and other tests are implemented as needed and practical.

As soon as an integration check fails, the failure analysis will reveal the details of the issue. These details are then check with additionally developed component and unit tests. As long as this procedure is used, more and more meaningful unit tests are developed and the more test coverage is reached.

The same is with GUI tests. At first one has only a chance to perform manual GUI tests. Later on the most common use cases and the general GUI behavior can be tested automatically. This also leads to a better test coverage.

In Brown Field Projects the tests in different tests groups are also developed in parallel. Integration tests are developed in parallel with the performing of manual testing which assures the current ability to deliver of the software. The current situation at hand dictates what is to be done.

Programming is both: A Craft and Science

There is a latent conflict in software development. I experienced it several times in different contexts. It is no always obvious, but nevertheless it is there… There are two different approaches to software development: An academic and a craftsman approach. One needs to be a good craftsman to write solid, maintainable, and efficient code, but one also needs to be an academic to understand IT systems and their specialties.

The academic approach is all about theory. Students in universities get told that theory. These students become very good and they are very knowledgeable, but they lack the experience and practice to develop real world software. They also learn a lot of mathematics like calculus, numerical mathematics, statistics, and linear algebra. For academics it is often difficult to realize that real world software cannot be perfect (too expensive), needs to be maintainable (software needs to be maintained and bug fixed and also developed on in future) and cost efficient, robust solutions might not be modern, cool or state-of-the-art. Academics also tend to fall in love with some problems and try to solve them in the most beautiful way possible. This is economically meaningless and too expensive.

The other approach is, that software is a craft. Real craftsmanship needs a lot of time, practice and guidance by a senior. It is like building stuff. The first trials will be messy, unusable, ugly… But, the more practice a craftsman get the better the outcome is. But, good code does not solve hard and complex issues. These need an academic approach. For example, some problems need to be converted into problems of Linear Algebra to get them performed efficiently. A pure programmer who only learn to program is not able to do that.

The problem is now: Students are intellectually trained very well, but they are no craftsman. Pure craftsman can write good code, but may not be intellectually prepared for complex systems. As a consultant I see both sites. I train scientists and engineers which are experts in their specific domain to write good, maintainable code, but I also see people who write good code, but need some input on how to tackle some hard problems.

What can be done!?

Academics: Science and engineering is ruled by laws and intellectual challenges. It is fine, but there is more to be done in Software. For Academics I recommend to read some books like „The Pragmatic Programmer“ by Andrew Hunt et. al. (http://astore.amazon.de/pureso-21/detail/020161622X) and think about initiatives like Software Craftsmanship (http://manifesto.softwarecraftsmanship.org) and observe the own daily doing. Academics in most cases are very good in self reflection. It is easily understood what needs to be done to not hassle later on with bad code, unmaintainable software. (For questions, I can be booked, too… ;-)) The main concern should be: How can the job be done in a way with maximum of automation, minimum of repetitive work and as maintainable as possible.

Craftsman: A craft is doing, evaluation and redoing until perfection is reached. It is a good approach, but some problems cannot be solved so easily. The easiest way to proceed is to team up with a domain expert, the second best thing is to read about the domain of expertise which is missing to become an expert in this field to a certain level which is needed. The main concern should be: How can the problem be solved efficiently? The most problems are already solved to a certain level. This can be copied and studied. Only the remaining parts need to be invented.

The best software developers combine skills of academics and craftsman. Both sites are challenging on its own, but the combination needs to be developed over time. The result is worth it: The software developed by these people is amazing: Maintainable, understandable, simple, efficient…

„We are not a software company!“

„We are not a software company!“, is a phrase I have heard a lot of times. Sometimes I heart it in exactly this wording and sometimes hidden behind some nice words, explanations and epic speeches. The problem is, companies telling you something like that are software companies already. What people try to say is: „We do not sell software to customers.“. But, the real meaning is: „We do not need tools, processes and QA like companies who sell software to customers.“ It is in most cases only an excuse to not do something important, meaningful, but maybe expensive. Please, let me disagree here…

There are two major issues with that sentence: The first is to distinguish between companies who sell software to customers as either box product or service, and one self. This distinction is correct and in most cases obvious, but it does not have any meaning with the second point, when it comes to the explanation that something is not needed in software development because of that. The second issue is, that the need to do some QA or something else is a business requirement and not a need by the customer itself.

The big problem is to distinguish only between software products which are directly relevant for revenue and those who are not. For box software and services directly sold to customers, it is obvious that this software is a product and for most people it is very important to assure quality for the customer. But, when it comes to software used only internally, some people come to the conclusion, that QA, certain tools and procedures are not needed.

Internally used software has one major motivation: It saves money! It is not making any revenue, but it increases profit by cutting losses. What a motivation!

The financial savings come from saving time, making better decisions, better monitoring to reduce defect costs and waste and many more… But, why do QA there? Because, software not checked may increase defect rates, produce wrong data which leads to wrong decisions, may perform badly what increases wait times and reduces through puts… And there are much more negative influence a software may have. Think about testing some security functionality which should prevent the own intellectual property from being stolen?

Issues in in-house software may lead to serious trouble, to increased costs and reduced profits by not influencing the revenue in any direction. So, why do a lot managers only focus on revenue?

As soon as a company is using software which is created internally, there is a strong motivation to do proper QA. The business may depend on it more heavily than expected…

Reblog: Why “Agile” and especially Scrum are terrible

Have a look to the blog post: https://michaelochurch.wordpress.com/2015/06/06/why-agile-and-especially-scrum-are-terrible/

A very interesting read, but I do not agree 100%. Some parts are a little too biased and come out of situations where Scrum is not applied correctly. It is also possible to “plan” refactorings, paying back technical debts and so forth. But, it is always good to read some critics about stuff one does all the time to re-think the own doing.

Software Engineering and the 4 Causes of Aristotle

I stumbled on the principle of the 4 causes of Aristotle some years ago when I did some reading about cause and effect, leadership and management. The question is: How do we get things done? When do we get things done? And also very important: How do we get things done in the right way?

The Four Causes of Aristotle

Everything what happens has four causes. There are no more and no less. Exactly four. Only if all four causes are present, something can come into existence. That is what Aristotle formulated in one of his most famous books: Physica. (Have also a look here: http://en.wikipedia.org/wiki/Four_causes)

In the next sections I describe the four causes. These causes appear in the order I present them. To give an accessible example, I use the picture of building a house. You will see how the four causes apply there.

Causa Finalis: The final cause

The causa finalis or final cause is the basic reason for anything to happen. You can also translate it as need, requirement or wish. Only with something like that a trigger is present to start some development.

For our house building example, we can think of it like the need to move into a new place because of a lack of space. For instance, a couple lives in a small flat and they are happy, but a child is to be born. They found out, that with a small child the place is too small, there is not a good chance to have a children room, the bath room is to narrow… They find that the current situation is changing and a need for more space is coming up.

This is the causa finalis. A need or requirement which is not detailed, yet. There is just an issue to be solved, but there is not a detailed plan, yet. But, the final result is formulated. In our case: More space.

Causa Formalis: The form

After the causa finalis is met, the second cause happens: The causa formalis or form. The final cause showed an issue and its final abstract solution and now it is thought about it and a plan or form is formulated. The causa formalis brings the vision for how the cause finalis can be solved.

In the house building example, our couple may have thought about renting the flat next to them and re-modelling a wall (what is maybe not allowed by the owner), moving into a bigger flat (which might be too expensive), or to build a house. Maybe, after deciding to build a house, they go to an architect and plan how large the house will be, what it will look like, and so forth. At the end of this process a clear vision exists how the situation is to be solved. After the abstract final result, more space, the plan has now a real, detailed form, but is still not physical, yet.

Causa Materialis: The material

After a clear vision exists due to the causa formalis, this vision needs to come to the final solution somehow. For that the last two causes are needed. The very next is the causa materialis or the material. For anything to happen, the material needs to be present or in other words: The bondary conditions need to be met.

In the house building example, the material is the material to build the house likes stones, concrete, wood and so forth, but also the knowledge on how to build it. The physical material is needed, what is obvious, but also the knowledge. Without these, there is no chance that a house can be build.

As boundary conditions other stuff is needed as well like some ground to put the house onto, time to build it and so forth. This is also part of the causa materialis.

Causa Efficiense: The execution

After the needs started the process, the vision was formed and the material was organized, the last cause is the execution or causa efficiense. A trainer of mine told me once: A vision without execution is just hallucination. He is right. One can dream of the best stuff, to have everything in place and so forth, but without execution nothing happens. That’s what the last cause is about. In this meaning, it also links to the block post I wrote some years before about The Trinity of Action.

In the house building example, this is the actual building of the house. We have everything in place now to perform the actual solution. We have the need to give as drive and energy, we have a vision and plan, we have the material and knowledge, and the last step is to put the material together with the knowledge we have to get the actual house build.

The Dependencies of the Four Causes

The four causes are sorted in another order in most cases, but in my opinion this is not optimal. There is a strict order of appearance in the order I described it above.

The first cause in my opinion is the causa finalis. It is a natures principle that without energy nothing takes place. Without the need, requirement or issue, there is no energy to change a current status quo. There is always a causa finalis needed as a seed for any change. I cannot think of a situation where it is different.

The second cause after the seeding by the causa finalis is always a kind of plan. A lot of actions in our world seem to be performed without plan, but the closer look reveals, that even there is a plan, but maybe not a well thought through one or just based on a pattern or experience, but there is one.

The cause materialis may be also the second reason, because the material might be already there for the solution, but it is not seen as such without a kind of plan. On the other side a plan also reveals what material might be missing which is needed to be organized or waited for.

At the very end the action, the causa efficiense can take place, but without a plan or material nothing can be done.

The principle of the Four Causes helps me a lot during my daily private and professional life, because it helps me to understand what happens on one site, but it also provides me a guideline on how to work in certain situation, because it gives me an order for what to work on.

The 4 Causes in Software Engineering

There is not so much to write anymore. In Software Engineering, this principles are at work as well.

At first a customer has a need to be solved which lead into some requirements which are the causa finalis. It is formulated what the final outcome has to be. After that architectural and design papers are written and planning done which are the causa formalis. With the organization of hardware, software, developers, office space and everything else, the boundary conditions for the actual development are met. The final development is the causa efficiense.

The trick is to reflect the current status of a software project from time to time and think about the four principles to find out, whether all causae are there. If something is missing, the project cannot be finished successfully:

  1. Is the causa finalis not met, no customer will pay due to a lack of need.
  2. Is the cause formalis not met, no customer will pay, because it is not useful.
  3. Is the cause materialis not met, the product cannot be developed, shipped or run. Again, nobody will pay.
  4. At least, if the causa efficiense is missing, the actual product is not build and cannot be sold either.

That’s all the secret in here…

Testing: Eastern Philosophy and Testing can go Hand in Hand

During the preparation for a talk about practical testing and test coverage analysis I found in the large web a small booklet „The Way of Testivus“ (http://www.agitar.com/downloads/TheWayOfTestivus.pdf). This small booklet is a humorous summary of testing principles and wisdom packaged into a nice form for everybody to read who is interesting in testing to get inspired and to have a fun time.

This book clearly shows, that testing is not a matter of good or bad or right and wrong, it is about thinking of the right way. There is not standard recipe to use to write good tests, enough tests or what so ever.

For my talk mentioned above, I found an additional post (at www.artima.com/forums/flat.jsp?forum=106&thread=204677) with a nice story written in the same style:

Testivus On Test Coverage

Early one morning, a programmer asked the great master:

“I am ready to write some unit tests. What code coverage should I aim for?”The great master replied:

“Don’t worry about coverage, just write some good tests.”The programmer smiled, bowed, and left.

Later that day, a second programmer asked the same question.

The great master pointed at a pot of boiling water and said:

“How many grains of rice should put in that pot?”The programmer, looking puzzled, replied:

“How can I possibly tell you? It depends on how many people you need to feed, how hungry they are, what other food you are serving, how much rice you have available, and so on.”“Exactly,” said the great master.

The second programmer smiled, bowed, and left.

Toward the end of the day, a third programmer came and asked the same question about code coverage.

“Eighty percent and no less!” Replied the master in a stern voice, pounding his fist on the table.The third programmer smiled, bowed, and left.

After this last reply, a young apprentice approached the great master:

“Great master, today I overheard you answer the same question about code coverage with three different answers. Why?”The great master stood up from his chair:

“Come get some fresh tea with me and let’s talk about it.”After they filled their cups with smoking hot green tea, the great master began to answer:

“The first programmer is new and just getting started with testing. Right now he has a lot of code and no tests. He has a long way to go; focusing on code coverage at this time would be depressing and quite useless. He’s better off just getting used to writing and running some tests. He can worry about coverage later.”

“The second programmer, on the other hand, is quite experience both at programming and testing. When I replied by asking her how many grains of rice I should put in a pot, I helped her realize that the amount of testing necessary depends on a number of factors, and she knows those factors better than I do – it’s her code after all. There is no single, simple, answer, and she’s smart enough to handle the truth and work with that.”

“I see,” said the young apprentice, “but if there is no single simple answer, then why did you answer the third programmer ‘Eighty percent and no less’?”

The great master laughed so hard and loud that his belly, evidence that he drank more than just green tea, flopped up and down.

“The third programmer wants only simple answers – even when there are no simple answers … and then does not follow them anyway.”The young apprentice and the grizzled great master finished drinking their tea in contemplative silence.

My experience is, that the test coverage percentages are totally meaningless. The numbers do not tell anything meaningful at all, it is only a rough number on how much testing is done, but nothing about the test quality. The only really interesting information are the lines of code which are not tested at all. For these lines, one should think about adding some meaningful tests or about removing the code, because it is not used at all like dead code.

 

JUnit test reporting

Some time ago in 2011, I asked a question in stackoverflow about test report enrichment of JUnit tests. Because I was asked several times on this topic afterwards, and today again, I write down  the solution Ideveloped and used that time. Currently, I do not have a better idea, yet.

The issue to solve

For a customer detailed test reports are needed for integration tests which not only show, that everything is pass, but also what the test did and to which requirement it is linked to. These reports should be created automatically to avoid repetitive tasks to save money and to not frustrate valuable developers and engineers.

For that, I thought about a way how to document the more complex integration tests (which is also valuable for component tests). A documentation would be needed for classes and test methods. For the colleagues responsible for QA it is a good help to see to which requirement, Jira ticket or whatever the test is linked to and what the test actually tries to do to double check it. If the report is good enough, it may also be sent out to the customers to prove quality.

The big question now is: How can the documentation for each method and each test class put into the JUnit reports? JUnit 4.9 was used at the beginning of this issue with Maven 3.0 and not I am on JUnit 4.11 and Maven 3.1.0.

The solution

I did what was suggested in stackoverflow, because I did not find a way to enhance the surefire reports directly. The solution for me was the development of a JUnit run listener which collects some additional information about the tests which can be later used to aggregate the reports.

The JUnit listener was attached to maven-surefire-plugin  like:

The listener needs to extend org.junit.runner.notification.RunListener and there you can overwrite the methods testRunStarted, testRunFinished and so forth. The listener needs to be put into a separate Maven project, because a listener cannot be build and used in the same project, so another project is to be created separately.
For the tests itself, I also created an annotation where I put it in the requirements id and other information needed:

A test class can then look like that:
In the run listener one gets the reference to the actual test which can be scanned for annotations via reflection.The Description class is quite handy, one can retrieve the test class as easy as:

The information found there can be stored away, what I did in a simple CSV file. After the tests run, these information can be gathered into a fancy report. The information which can be collection reaches from start timestampes, over the actual test class calls, the result of the tests (assertions and assumptions) to the finish time stamps. The classes can be check via reflection at all times and all information can be written into a file or even to a database or server.
The reporting was done offline later on, when I first used this solution. But, if some more work effort is possible, a Maven report plugin (or a normal plugin) can be written which collects the information in the post-integration-test phase for example. Here the limitation is the imagination only. It depends whether only a technical report is needed as TXT file or a fancy marketing campaign needs to be build around it. In daily life it is something in between.

Sample Defect Cost Calculations: Why tests are worth it…

Sample Defect Cost Calculation

The cost calculation is not very simple and a lot of assumptions, and estimations are needed. The following definitions, terms, and formulas are a first step for an understanding of the costs related to defects and testing. Especially, the results shall be an example and maybe an argument for a strong and tight testing.

Let us define first the Defect Detection Percentage (DDP):

DDP [$] =  (bugs detected) / (bugs present) = (test bugs) / (test bugs + production bugs)

The defect detection percentage is a quite simple formula, because it is just the ratio of the bugs found before delivery to the total bugs in the code which is unknown. The real insight comes, as soon as the ratio is rewritten to the ratio of the bugs found during testing and the sum of the found bugs and the bugs which are delivered into production. This formula shows in a very simple manner the simple truth: All bugs which are not found during test go directly into production. The first ratio is to abstract to get this point straight with the right emphasis, but that is really important: Bugs in production at customer’s site destroy reputation, decrease future revenue due to lost customers, and in worst case bring in calls for liability.

The relationship of total found bugs to critical bugs found should be under all circumstances:

DDP (all bugs) < DDP(critical bugs)

Or in other words: It should be the final goal to have a much better detection percentage for critical bugs than for all bugs or minor bugs. It is in most cases not very critical to have a button in a UI at the wrong position, but a crashing application or an application which cannot keep a database connection open, is critical.

The consequence is: We should put more time and effort into testing of critical and blocking defects than on minor issue testing. I have seen this in real world already. The focus and the effort on the UI was remarkable, because it was obvious and everybody could see it (especially the own management and the customers management), but the performance was not sufficient and the stability was not as good as needed. But, the product was shiny and looked beautiful…

Have a look into the post about the quality characteristics: http://rickrainerludwig.wordpress.com/2014/02/10/software-quality-charateristics-in-iso-9126. What is critical and what not may change from project to project. A deep though is needed for the weighting, but the weighting should lead the testing priorities.

The possible costs for defects (see Beautiful Testing: http://astore.amazon.de/pureso-21/detail/B002TWIVOY) can be classified as:

  1. Cost of detection: All costs for testing and accompanying actions. Even if no bugs are found, there are costs which need to be taken into account. For example: Performing a quality risk analysis, setting up the test environment, and creating test data are activities to be taken into account. They all incur the costs of detection and there are much more tasks to do. These are the permanent costs on timely basis.
  2. Cost of internal failure: The testing and development costs which incur purely because bugs are found. For example filing bug reports, adding new tests for the found bugs, fixing bugs, confirmation test bug fixes, and regression testing are activities that incur costs of internal failure. These are the actual costs per bug.
  3. Cost of external failure: The support, testing, development, and other costs that incur because no 100% bug-free software was delivered. For example, costs for technical support, help desk organization, and sustaining engineering teams are costs of external failure. These costs are much higher. The costs are at least as high as the internal failure costs, but there are additional costs for external communication, containment, and external support.

Let’s define the Average Cost of Test Bug with the knowledge from above:

ACTB = (cost of detection + cost of internal failure) / (test bugs) [$/Bug]

There is also the definition of the Average Cost of a Production Bug:

ACPB = (cost of external failure) / (production bugs) [$/Bug]

With both definitions from above the calculating of a Test Return On Investment can be defined:

TestROI = ( ( ACPB – ACTB) x test bugs) / (cost of detection) [1]

Example Test ROI Calculation

Let us assume, we have 5 developers in a test team which cost about 50k$ per month (salary and all other costs). Let us further assume, they find an average of 50 bugs per month and a customer finds 5 bugs a month. The production bug cost is assumed with 5k$ (5 people for a week for customer support, bug fixing and negotiations) and a testing bug cost of 1k$ (1 developer for a week). We have the following numbers:

ACTB = (50 k$/month + 50 bug/month * 1k$/bug) / (50 bug/month) = (100 k$/month) / (50 bug/month) = 2k$/bug

ACPT = (10 bug/month * 5 k$/bug) / (10 bug/month) = 5 k$/bug (what we obviously defined)

TestROI = ((5 k$/bug – 2k$/bug) * 50 bug/month) / (50 k$/month) = 3

This simple example shows, that testing might be economically meaningful…

Automated and Manual Regression Tests

Some simple formulas for automated testing. It is sometimes difficult to communicate the positive impact of automated testing, because the initial costs for hardware, operational costs and the costs for development of automated tests are significant. But, the next formulas give an impression for the benefit.

Let us define the Regression Test Automation ratio:

RTA = (automated regression tests) / (manual tests + automated regression tests)

The Regression Risk Coverage is shown here to have the complete set of formulas:

RRC = (regression risk covered) / (regression risks identified)

The RRC is used as a measurement for the confidence on the automated tests. The higher the RRC, the better the confidence not to deliver critical bugs.

To calculate the benefit of automated test, the Acceleration of Regression Testing can be used:

ART = (manual regression test duration – automated regression test duration) / (manual regression test duration)

Automated tests bring benefits twofold:

  1. Development and test engineers are relieved from repetitive work. Engineers in most cases are not very good in this. Engineers are better placed inside of development projects and in failure analysis which brings more economical benefits. Every hour saved for manual testing can be better spend in functionality development or quality improvement.
  2. Automated tests can be run anytime, even during night. Continuous Integration (CI) and Continuous Delivery (CD) approaches use this by automated triggering of tests. Tests are run for example after each code change in the Source Code Management tool. Bugs can be found and fixed very quickly.

The costs can be lower, too. If the testing environment costs 1k$/month, it is about 1/5 of a developer, but it runs 24/7 and executes much more tests per time unit than any developer can do. It is a real bargain.

Summary

Even it is only a short post on some defect cost calculations, it shows that testing is meaningful economically. The costs for testing like developers developing automated tests, dedicated testers doing manual tests, hardware, QA staff managing testing and assuring the quality compliance, and so forth seem very high, but taken into account that the costs for defects increase significantly the later they are found in the product life cycle, testing brings a huge benefit.

Continuous Improvement in Software Development

I already wrote about „The Killer Difference between Mechanical Engineering and Software Engineering“ and „Similarities in Quality Assurance between Cambodian Silk Painting and Software Development“. Related to this some comments on continuous improvement in software development and refactoring.

As mentioned earlier, there is not classical production process in software development. In classic production, we have repetitive work and quality measures like Cp and Cpk which are monitored over time and if they are out of certain limits (warn limits and out of specification limits), actions are taken. Over time this leads to a better production process by continuous monitoring and continuous improvement by taking the metrics from the monitoring process. In software development we need to do something else. We need a new approach (see post about the difference on mechanical engineering and software engineering).

During software development we need to use raw models to guide the development like source conventions, design patterns, and standard designs for error handling and such stuff. A clean, layered architecture also helps to guide development and leads to good code on green field projects (see the article about QA for silk painting).  But, as soon as the already present code is changed, the code degrades slowly, but steadily. Here a totally new approach needs to come in: We need to improve the existing code as steadily as the code degrades. As the entropy in physics tells us: The entropy is increasing steadily as long as there is no energy used to decrease it. That is why we need to put a lot of effort into the permanent improvement of existing code.

The buzz word for this process is „Refactoring“. It is needed to have a fixed amount of time planned for continuous improvement of existing code by refactoring. I suggest an amount of 5% to 10% of time as a minimum for each project. Depending on the actual state of the project it may be needed to plan more. Up to 50% is realistic if a new brown field project is  started (a new development is started on existing code) to smooth the way for the new project. There are some points to consider:

  1. Is the refactoring done independently? Are some people planned to do only refactoring on the code independently of the other developers? This can be done to change the architecture or some designs. Exchanging frameworks, cleaning up parts of the software or separating some parts from another are tasks which can be considered to be done, too.
  2. Is the refactoring done during development? The refactoring can be done during the development of new functionality. This approach needs a strong mindset of the developers. For each new functionality to be implemented, the code is evaluated first and refactored to provide a clean ground for the new functionality. It is better to clean the effected parts first, before adding new functionality.

Both approaches have their advantages and disadvantages. I suggest to do both: It is best to train developers to do some refactoring before new functionality is implemented to get a clean implementation of new functionality. These refactorings are allowed to be made after it was clarified that enough tests back up the changes. If there are not enough: These tests need to be created. In the daily standup each developer needs can tell her colleagues which parts are to be refactored to avoid conflicts due to simultaneous code changes. Additionally, a senior developer or architect may be needed to be asked for approval if the changes are more risky or effect too much code.

Independent refactorings should be added as tasks to the backlog as soon as they touch the overall design or architecture. These refactorings should be planned separately, but it is to be assured, that they are not forgotten. A good backlog is needed for that and a good set of rules on how to determine the priorities for each ticket.

Can Programs Be Made Faster?

Short answer: No. But, more efficient.

A happy new year to all of you! This is the first post in 2014 and it is a (not so) short post about a topic which follows me all the time during discussions about high performance computing. During discussions and in projects I get asked about how programs can be programmed to run faster. The problem is, that this mind set is misleading. It always takes me some minutes to explain the correct mind set: Programs cannot run faster, but more efficient to save time.

If we neglect that we can scale vertically by using faster CPUs, faster memory and faster disks, the speed of a computer is constant (by also neglecting CPUs which change there speed so save power). All programs run always with the same speed and we cannot do anything to speed them up by just changing the programming. What we can do is, to use the hardware we got as efficient as possible. The effect is: We get more done in less time. This reduces the program run time and the software seem to run faster. That is what people mean, but looking on efficiency brings the mind set to find the correct leverages on how to decrease run time.

A soon as a program returns the correct results it is effective, but there is also the efficiency which is to be looked at. Have a look to my post about effectiveness and efficiency for more details about the difference between effectiveness and efficiency. To gain efficiency, we can do the following:

Use all hardware available

All cores of a multi-core CPU can be utilized and all CPUs of the system if we have more than one CPU in the system. GPU or physical accelerator cards can be used for calculation if present.

Especially in brown field projects, where the original code comes from single core systems (before 2005 or so) or system which did not have appropriate GPUs (before 2009), developers did not pay attention multi-threaded, heterogeneous programming. These programs have a lot of potential for performance gains.

Look out for:

CPU utilization

Introduce mutli-thread programming into your software. Check the CPU utilization during an actual run and look for CPU idle tines. If there are any, check your software whether it can do something at the time the idle times occur.

GPU utilization

Introduce OpenCL or CUDA into your software to utilize the GPU board or physics accelerator cards if present. Check the utilization of the cards during calculation and look for optimizations.

Data partitioning for optimal hardware utilization

If a calculation does not need too much data, everything should be loaded into memory to have the data present there for efficient access. Data can also organized to have access in different modes for sake of efficiency. But, if there are calculations with amounts of data which do not fit into memory, a good strategy is needed for not to perform calculations on disk.

The data should be partitioned into smaller pieces. These pieces should fit into memory and the calculations on these pieces should run in memory completely. The bandwidth CPU to memory is about 100 to 1000 faster than CPU to disk. If you have done this, check with tools for cache misses and check whether you can optimize this.

Intelligent, parallel data loading

The bottle neck for calculations are CPU and/or GPU. They need to be utilized, because only they bring relevant results. All other hardware a facilities around that. So, do everything to keep the CPUs and/or GPUs busy. It is not a good idea to load all data into memory (and let CPU/GPU idle), then start a calcuation (everything is busy) to store the results afterwards (and have the CPU/GPU idle again). Develop you software with dynamic data loading. During the time calculations run, new data can be caught from disk to prepare the next calculations. The next calculations can run during the time the former results are written onto disk.This maybe keeps a CPU core busy with IO, but the other cores do meaningful work and the overall utilization increases.

Do not do unnecessary things

Have a look to my post about the seven muda to get an impression about wastes. All these wastes can be found in software and these lead into inefficiency. Everything which does not directly contribute to the expected results of the software needs to be questioned. Everything which uses CPU power, memory bandwidth and disk bandwidth, but is not directly connected to the requested calculation may be treated as potential waste.

To have a starter look for, check and optimize:

Decide early

Decide early, when to abort loops, what calculations to do and how to proceed. Some decisions are made in code on a certain position, but sometimes these checks can be done earlier in code or before loops, because the information is already present. This is something to be checked. During refactorings there might be other, more efficient positions for these checks. Look out for them.

Validate economically

Do not check in functions the validity of your parameters. Check the model parameters at the beginning of the calculations. Do it once and thoroughly. If these checks are sufficient, there should be no illegal state afterwards related to the input data. So they do not need to be checked permanently.

Let it crash

Check only input parameters of functions or methods if a fail of those be fatal (like returning wrong results). Let there be a NullPointerException, IllegalArgumentException and what so ever if something happens. This is OK and exceptions are meant for situations like that. The calculation can be aborted that way and the exception can be caught in a higher function to abort the software or the calculation gracefully, but the cost to check everything permanently is high. On the other side: What will you do when a negative value come into a square root function with double output or the matrix dimensions do not fit in a matrix multiplication? There is no meaningful way to proceed, but to abort the calculation. Check the input model and everything is fine.

Crash early

Include sanity checks in your calculations. As soon as the calculation is not bringing more precision, runs into a wrong result, gives the first nan or inf values or behaves strangely in any way, abort the calculation and let the computer compute something more meaningful. It is a total waste of resources to let a program run, which does not do anything meaningful anymore. It is also very social to let other people calculate stuff in the meantime.

Organize data for efficient access

I have seen software which looks up data in arrays element wise by scanning from the first element to the position where the data is found. This leads into linear time behavior O(n) for the search. This can be done with binary search for instance which brings logarithmic time behavior O(log(n)). Sometimes, it is also possible to hold data in memory in a not normalized way to have access to it in different ways. Sometimes a mapping is needed from index to data and sometimes the other way around. If memory is not an issue, think about keeping the data in memory twice for optimized access.

Conclusion

I hope, I could show how the focus on efficiency can bring the right insights on how to reduce software run times. The correct mind set helps to identify the weak points in software and the selection of the points above should point out some directions to look into software to find inefficiencies. A starting point is presented, but the way to go is different for every project.