Schlagwort-Archive: high performance

The Seven Muda (Wastes) and Software Engineering

I was introduced into the term Muda (waste) when I was working for a semiconductor fab. I learned that there a seven of them and that  a close look out to these wastes can reduce costs and increase quality dramatically. A short introduction about it can be found here:

I want to write about these wastes with a shifted focus. The seven Muda were formulated for classical production and ‚real world‘, but as we can see later on, these principles can be used for software engineering, too. Looking out for these wastes can help to make architecture, design and code lean and clean.


Classic meaning:

The classic meaning is the reduction of transportation. This means, one should look out for everything what is transported and how these transports can be minimized. In fabrication it means for example that the transport for goods can be optimized by ordering larger quantities, to use vendors which are closer by and so forth. The transportation costs can be reduced and the margin can be increased. Transportation does not add value to the product, but it adds risk. During any transport operation the risk is there for breaking, loosing and delaying the product.

Software engineering:

Transportation can be translated here for example to IO. Avoid transportation over network, to disc and what so ever. If IO can be reduced, the timing is better, bandwidth is saved and other applications and systems are not affected negatively by a bandwidth exhaustion due to excessive use of IO of one system. To safe IO, good data models should be used for a high reuse of data. Double fetches could be avoided, only data which is needed is fetched from a DB for example and not the whole database is read just in case… Savings here leads directly into more responsive systems, reduced costs for IO facilities and high throughput.


Classic meaning:

The meaning of Waste of Inventory is quite easy. It is about the waste to have raw material, finished products and work in progress laying around without the prospect to monetize it. It may or may not be sold. In this state it is a potential waste and should be avoided. A waste of inventory may lead into selling the products under value or even into dumping them.

Software engineering:

In software engineering inventory is twofold:

  1. Source Code: Writing source code which is not requested by the customer (directly or indirectly) may not lead to revenue. So it is a waste of time and also resources to produce it. Only functionality which brings in revenue is to be developed. Everything else is potential waste, like finished products laying in a warehouse without a demand by customers. A lot of money can be saved by letting developers produce software which adds value and brings in money. Also trial and error development (or Programming by Coincidence like described in The Pragmatic Programmer by Andrew Hunt et. al.) is waste. All development versions which are dumped on the way to the final version are waste. Some thoughts in advance can save a lot of time, money and trouble.
  2. Data: With the ‚Big Data‘ discussion the Waste of Inventory is put into public again. Every piece of data stored costs some money. Even one pays storage by cents per gigabyte, the big amount of data makes it expensive. Data should be stored only if needed and selected carefully. Costs for storage can be reduced. Due to date is transported to the storage facilities, transport costs are also reduced.


Classic meaning:

When transport is about bringing the goods from one facility to another or from one machine to another, than Motion is about the handling of products during the production process. The more handling is involved during production, the more time is needed for that action and risk is added for damage and low quality. Also, motion is mechanical motion and lead into a degeneration of the machines used. For people it is the same. To much handling of products lead to illnesses and other issues which also cost money in form of sick days. Avoiding motion reduces costs for maintenance, sick days and broken products.

Software engineering:

Motion is software engineering is not so easy to define. The closes equivalents in my opinion are:

  1. Motion = unnecessary things done in software. This may be an animation too much which is not needed, but might break the application by its presence and wastes CPU time. It may be a storage operations too much just to be save some data temporarily for a case of power failure, but it the data could be recalculated if needed. There might be a watchdog to much. Defensive programming is find, but too much is not needed. Things done unnecessarily lead into waste of time and resources. The software runs longer and wastes CPU time. Sorry, I do not have a better explanation, but maybe you get the point.
  2. Motion = unnecessary things done during software development process. This can mean unnecessary work due to Programming by Coincidence due to missing design sessions. It can also mean writing unnecessary documentation, design papers and such stuff. It can mean unnecessary meetings, conference calls and status presentations.


Classic meaning:

Every product in Work which waits for something, does not add value, consumes space and the delays may lead to a bad reputation. All wait times need to be reduced. This can be done by queue managing. Have a look to the book The Principles of Product Development Flow: Second Generation Lean Product Development by Donald G. Reinertsen for more information.

Software engineering:

In software engineering, the most obvious waste due to waiting would be a programmed delay or sleep in a program to wait for something. This is obviously not a good design. Better do a design with asynchronous execution and notification. A program should always do something meaningful if possible. Do not wait for something to happen. Do something in meanwhile and wait for notification for example or have some processes in parallel which fill up the CPU time of a sleeping thread. All waits are a waste of customers time. This should be avoided, otherwise it leads into frustration without adding value to anything.


Classic meaning:

In classic fabrication this means: You do something better, more accurate or more beautiful than required. The customer is paying for a product with a negotiated specification. This specification needs to be met, but not more. More work on the product will lead into higher production costs, more time needed and more risk for damage without a monetary compensation.

Pay attention: Over processing can be a part of a marketing strategy and a customer satisfaction program. By over delivering a customer may be surprised positively which may lead to a returning customer, a higher order for the next time and so forth. This is not over processing as it is meant above. This is part of a strategy which brings higher revenue in future.

Software engineering:

Over processing is quite the same as in classic engineering. A software product calculates more accurately than needed. The performance tuning was done extensively to get the last microseconds out of the calculations. And there are much more things like that. As long as the product is good enough, we should stop working on features already done. It does not bring more value to the customer.

Here too: Please pay attention for over delivering. This is a magic tool if done right. See above at the classic engineering section.


Classic meaning:

Over-production is simply the production of more pieces of a product than needed the time of production. There is a risk that not all products which were produced can be sold. The avoidance of over production reduces costs, reduces the amount of resources needed for production and is also good for the environment.

Software engineering:

Over-production has two meanings in software engineering, as far as I can see it:

  1. Over production of results: A software product which produces more results than needed, wastes resources and time. This is not what customers want and that is also nothing they want to pay for. At least, provide configuration possibilities.
  2. Over production of features: In software engineering (as in all engineering disciplines), engineers tend to over-engineering. Full blood engineers want to make the product perfect, feature rich, shiny and so forth. This might lead to feature bloat. Every feature which is not requested by the customer does not add value. A customer will not pay more money for functions they do not want to use. That’s why a lot of products come in different flavors like community, basic and enterprise version. The customer chooses what features are needed and pays for exactly them.


Classic meaning:

Defect products need repairing or if they can not be repaired, need to be dumped. Both choices cost money. Additionally, the reputation is influenced negatively which costs future money due to customers not wanting to pay again for a product from the same manufacturer. It becomes even worse as soon as the defect damages something on customer site and the customer asks (un-)politely for regress. A good customer support division can compensate a lot, but this is expensive, too. So: Defects should be avoided. They always waste a lot of money.

Software engineering:

For software defects the same facts are valid like for mechanical engineering. Defects cost money and reputation. So, the best is not to have any. Avoiding defects by excessive testing and quality control is cheaper than handling angry customers, doing failure analysis, bug fixing, patch releasing and loosing future customers.

Additionally as 8th Muda: Latent skill

There is an additional unofficial 8th Muda: Latent skill. Officially, it is spoken about utilizing the skills of employers. People which were hired to fill out a certain position might be able to do much more or more valuable work than what the position requires. These people should be given an oportunity to grow and do what they are capable of. Additionally, a lot of employees want to learn more and want to be trained. It is not only about getting a higher salary, but also about personal grow and satisfaction.

In my opinion there is another site of Latent Skill Waste: It is about machinery. Some high-tech machines are capable of doing more, than there were bought to do. They can be utilized if it is possible. In IT this is were cloud computing was invented. It is partly waste by waiting and waste by latent skill, when servers are not utilized due to too less work. With cloud computing utilization of servers can be increased. This utilization comes in two flavors: Doing more of the same work (reducing waste of waiting) or running other services in parallel (reducing waste due to latent skill). A higher utilization means more revenue and therefore more profit, because the deprecation costs are the same.

A Final Thought

The Muda are not meant to be used for cost reduction in first place. The mind set is not correct, in my opinion. The Muda are about efficiency. To do cost reduction, efficiency needs to be increased, that is correct. But, to think about cost reduction only leads into decisions which might hurt quality and effectiveness. About the difference in mind set and practical approach, I might write about later on in another post.

Thoughts on High Performance Computing

During my work as consultant, I was asked about high performance computing (HPC) and how to implement it. As always,  one of the strongest constraints is a tight budget.

In the last years, techniques for HPC changed as the hardware changed. Several years before HPC was only possible on computers  made of special HPC processors like NEC’s vector CPUs or a large mainframe was installed with thousands of standard CPUs which work together to run in an astonishing speed. Sometimes, combinations of that was installed.The complexity to program such machines is massive and special knowledge is needed about the programming paradigms and the hardware to get optimal results.

Today the situation is a little different due to several  factors:

  1. Standard CPU will not get faster significantly. The physical constraints are reached and downsizing the chips is not that easy anymore or even impossible. In some dimensions production specifications are around atoms. As long as we do not want to split atoms, we can not reduce some dimensions.
  2. Due to the constrains in the point above, CPU architectures changes. The most significant change are the multi core processors. Moore’s law on speed is extended by multiplying the number of cores in a process.
  3. Gaming industry and industry for graphics processing have let the computer industry into a development of high performance graphics cards. As it turns out, with some minor constraints, these cards are very well suited for HPC. Even on my „old“ nVidia GeForce 8600 GTS, I found 4 multi core processors with 8 cores per processor.

Possibilities for HPC

I do not want to write about special computer hardware and special designed machines. The standard PC technologies are presented here for customers with small budgets where the purchase of a HPC server with thousands of cores is not an option.

Therefore, the following possibilities for HPC are available today:

  1. Even if it is an older approach, cluster computing with PVM or MPI is still a valid possibility. In cluster computing several PCs or servers are interconnected with a standard Ethernet network. The big drawback are latencies of and the speed  in the network. If large computations can be run in parallel where the time consumption of the latency and the bandwidth are much smaller than the computation time, the approach can and should be used. A very prominent example is movie rendering. The scenery information is sent to a client and the calculation is performed on the client. Hundreds of clients can share the work and speed up the whole process dramatically.
  2. Multi Core and Multi Processor parallelization machine is a common choice today. The current number of cores in a standard PC are limited from 2 to 8. Multi core processors with more cores can be expected within the next years, that is for sure. The total speed up of a software is therefore limited on the number of available cores. Even if not HPC is done, the parallelization of software should be a topic, because customers want their machines running as fast as possible and the investment should be used efficiently. For HPC itself it is not a real option, because standard software should use it, too. So it is not special high performance about it.
  3. Real HPC can be done with GPU programming. One constraint of GPUs are the limitation to single precision floating point operations. It is quite ok for calculation of 3D graphics, but for some scientific calculations it is not good enough. nVidia has met this demand by creating the so called Tesla cards. These cards contain up to 448 cores with 6GB RAM and operate in double precision mode. Programmed with nVidias CUDA framework or the OpenCL language high speed ups can be achieved. This is a real low budget HPC solution for many customers.


For a small test with OpenCL, I programmed a small C program which has to perform a simple matrix multiplication. In C a classical sequential matrix multiplication lools like:

I assumed here, that we have quadratic matrices with a size of MATRIX_SIZE in each direction. For a size of 1024 this algorithm needs about 51.9 seconds on my AMD Operton 2600.

The same algorithm was implemented in OpenCL. The Kernel code looks like:

Started is the kernel on my nVidia GeForce 8600 GTS after copying the needed matrix data into the graphics card RAM with:

This leads into a start of 1,048,576 threads which are started on 32 cores. The whole operation is finished in roughly 3.3 seconds. This is a total speed up of 15.7.

One of the specialties to be taken into account is, that GPU processors are not cached and that therefore, no cache coherence is to be expected. All processors write directly into RAM. The host process has to take care for concurrency and to avoid it. In the example above the two index variables for the results matrix are independent and the calculation itself, too. So we could create independent threads for these two variables. The third variable is dependent and can not be parallelized without additional locking mechanisms.

The situation on graphics cards are much more interesting as soon as we take the different memories into account which exist on a graphics card, too. In the example above I used the global memory which is accessible for reading and writing by all processors and the private memory which is private for each core. The private variable r was used due to fast read and write capabilities of the private memory. It’s faster to sum up the result first in private memory and to set the result in global memory later on. We also have a read only memory for constants on the graphics boards (read only for the GPU processors, but writable by the host), texture memory and some more…


As shown above, massive parallel GPU programming and OpenCL is a big chance for HPC on a small budget. Taken into account that my graphics card is not state of the art anymore and the nVidia Tesla cards with their performance, HPC is possible for Science and Research Institutes and Organizations with strong budget constraints.