Test Approches for Green and Brown Field Projects

The picture  on the right side show the relationships between the different test levels. The tests are a kind of test stages in green field projects. Usually, the stages are done from bottom to top and from different parties or members of the time.

 

 

Green Field Projects (and also new code in brown field if possible)

Here it is to be pointed out, that the usual order of testing in Green Field Projects (and code writing) is done in the following direction:

  1. Writing Code (developer)
  2. Checking Code (developer and CI system with plugins)
  3. Writing Unit Tests (performed as block in TDD with steps 1 and 2)
  4. Writing Component Tests (developer and QA)
  5. Writing Integration Tests (developer and QA)
  6. Setting up automated GUI tests (QA)
  7. Manual Testing (QA and if needed, developers conducted by QA)
  8. Endurance Tests (QA)
  9. Performance Tests
  10. Load Tests (Stress Tests)
  11. Decoupled: Penetration Tests (continuously by all, but systematically by QA)

The first three steps which go hand in hand and in TDD it seems that the numbering is reorder to 3, 1 and 2, but the code of the tests is also code.

The tests become from one step to the next less detailed, but go more into use cases and real world scenarios. The tests become also more and more realistic in the configuration and setup in relationship to the target environment. Even patch levels need to be controlled, if needed.

Brown Field Projects

In Brown Field Projects it is the other way around. A legacy code base is available and not under test. The approach can not go here to write unit tests first, because it is too much work to do and in most cases it takes a long time to understand the old legacy code. To get a good unit test suite, one calculates roughly to have at least 1 unit test for every 100 lines. In a legacy code base for only 100,000 lines it would mean to write at least 1,000 meaningful unit tests. This work would really suck…

The approach goes here to write integration tests first to assure the basic functionality. The basic use cases need to be right and the functionality is assured. With integration tests not all paths of working can be checked, but the basic behavior, the correct results for the most important use cases and the error handling for the most common error scenarios can be checked.

As soon as new functionality is developed, functionality changed or extended, new integration tests are to be implemented first to assure the still needed functionality stays unchanged and the new functionality is tested as it is required. Unit test and other tests are implemented as needed and practical.

As soon as an integration check fails, the failure analysis will reveal the details of the issue. These details are then check with additionally developed component and unit tests. As long as this procedure is used, more and more meaningful unit tests are developed and the more test coverage is reached.

The same is with GUI tests. At first one has only a chance to perform manual GUI tests. Later on the most common use cases and the general GUI behavior can be tested automatically. This also leads to a better test coverage.

In Brown Field Projects the tests in different tests groups are also developed in parallel. Integration tests are developed in parallel with the performing of manual testing which assures the current ability to deliver of the software. The current situation at hand dictates what is to be done.

Hadoop Client in WildFly – A Difficult Marriage (Part 2)

After writing about the first impressions about Hadoop and its WildFly integration, I did some more research and work to bring Hadoop into JavaEE on WildFly in a more elegant way.

The follow-up idea after putting Hadoop into an EAR for isolation was to isolate Hadoop and HBase clients into a WildFly module. Both clients work very well in a Java SE environment, so the idea was to use these modules which are not container managed (like Java SE) to provide the functionality to be used in a container.

The resulting (so far) module.xml is:

For HBase client, the module.xml looks like this:

For the modules above, I cannot garantee the completeness of all dependencies, but for the use cases I have at hand, these modules work.

Programming is both: A Craft and Science

There is a latent conflict in software development. I experienced it several times in different contexts. It is no always obvious, but nevertheless it is there… There are two different approaches to software development: An academic and a craftsman approach. One needs to be a good craftsman to write solid, maintainable, and efficient code, but one also needs to be an academic to understand IT systems and their specialties.

The academic approach is all about theory. Students in universities get told that theory. These students become very good and they are very knowledgeable, but they lack the experience and practice to develop real world software. They also learn a lot of mathematics like calculus, numerical mathematics, statistics, and linear algebra. For academics it is often difficult to realize that real world software cannot be perfect (too expensive), needs to be maintainable (software needs to be maintained and bug fixed and also developed on in future) and cost efficient, robust solutions might not be modern, cool or state-of-the-art. Academics also tend to fall in love with some problems and try to solve them in the most beautiful way possible. This is economically meaningless and too expensive.

The other approach is, that software is a craft. Real craftsmanship needs a lot of time, practice and guidance by a senior. It is like building stuff. The first trials will be messy, unusable, ugly… But, the more practice a craftsman get the better the outcome is. But, good code does not solve hard and complex issues. These need an academic approach. For example, some problems need to be converted into problems of Linear Algebra to get them performed efficiently. A pure programmer who only learn to program is not able to do that.

The problem is now: Students are intellectually trained very well, but they are no craftsman. Pure craftsman can write good code, but may not be intellectually prepared for complex systems. As a consultant I see both sites. I train scientists and engineers which are experts in their specific domain to write good, maintainable code, but I also see people who write good code, but need some input on how to tackle some hard problems.

What can be done!?

Academics: Science and engineering is ruled by laws and intellectual challenges. It is fine, but there is more to be done in Software. For Academics I recommend to read some books like “The Pragmatic Programmer” by Andrew Hunt et. al. (http://astore.amazon.de/pureso-21/detail/020161622X) and think about initiatives like Software Craftsmanship (http://manifesto.softwarecraftsmanship.org) and observe the own daily doing. Academics in most cases are very good in self reflection. It is easily understood what needs to be done to not hassle later on with bad code, unmaintainable software. (For questions, I can be booked, too… ;-)) The main concern should be: How can the job be done in a way with maximum of automation, minimum of repetitive work and as maintainable as possible.

Craftsman: A craft is doing, evaluation and redoing until perfection is reached. It is a good approach, but some problems cannot be solved so easily. The easiest way to proceed is to team up with a domain expert, the second best thing is to read about the domain of expertise which is missing to become an expert in this field to a certain level which is needed. The main concern should be: How can the problem be solved efficiently? The most problems are already solved to a certain level. This can be copied and studied. Only the remaining parts need to be invented.

The best software developers combine skills of academics and craftsman. Both sites are challenging on its own, but the combination needs to be developed over time. The result is worth it: The software developed by these people is amazing: Maintainable, understandable, simple, efficient…

SSL in Real Life

We live in the Post Snowden Era and we all need to think about what shall happen with our Privacy, with the Internet and the way we communicate. The situation is fundamentally different to what we had about 20 years before…

All the laws which should protect us were made in times were surveillance needed physical access to place phone taps and microphones. Communication was bound to a distributed physical medium which needed direct access to break into it. Additionally, communication was from point to point on shorted path.

In the internet everything is connected now and on infrastructure based on TCP/IP serves everything. The internet is not as distributed as most people think. Because some very large internet knots bundle a lot of communication on them. Their number is limited and some institutions are able to hook into them as revealed…

One of the recommendations is to encrypt everything. From simple web pages over email to all other forms of communications. As long as only some communication is encrypted, there is still to much information open, so everybody should encrypt it messages (even the unimportant ones) to increase the noise in the net to make it attackers more difficult to find out what are important communications and what is not. With this in mind, we decided at PureSol Technologies to provide all out Website via HTTPS (SSL encryption via HTTP). As soon as the decision was made, the issues started…

Trusted Communication only via Root CA

One of the current major issues is that only Root CAs are able to issue good certificate for SSL which bring Browsers to the “Green Lock Mode” which shows the site as trusted without any warning. Otherwise, browsers show a warning about not trusted certificates, because the authentication was not performed and nobody knows whether the certificate was provided by the correct organization or person. This is correct in so far, that I could create certificates which claim to be issued from Microsoft Corportation. To avoid this, some so called Root CAs are authorized to check the certificate to be authentic. A good idea so far…

Problem 1: There are not so much Root CAs and the prices are too high for small companies like PureSol Technologies. The prices for certificates can go up to the 1kEUR per year. I better spend this money for developers.

Problem 2: There were already issues with security breaches in Root CAs. The most prominent one was the erronous issues certificates to an individual in the name of Microsoft. The issue, this individual was not in any kind related to Microsoft. See Mircosoft database: https://support.microsoft.com/en-us/kb/293818

Why shouldn’t I provide my own certificates? I can create my keys and certificates on a completely disconnected computer and copy the stuff via Sneakernet. OK, there is still the issue, that I claim to be me, but knowbody knows whether this is correct or not… But to pay thousands of Euros for that kind of service which should be a one time event?

Not handled Certificate Revokation Lists

It is still not standardized when, how and by whom the certificates are to be rechecked for validity. The above mentioned issue with the VeriSign create Microsoft certificates showed, that even Microsoft had to create a hot patch to contain that issue, because Windows itself could not deal with certificate revokation lists. It is still a place of construction and a week spot in SSL and the network of trust.

So, Root CAs provide an expensive service to provide security, but it is not really clear, how the revokation lists are handled. The problem here is manyfold. What happens, if the service for revokation check is down? How often is to be checked?

Running an Own Certificate Authority

After checking prices, services and how much work it is to deal with SSL and certificates on our own, we decided to run our own Certificate Authority.

You can find the needed public information at http://ca.puresol-technologies.com.

There is only one issue: The PureSol Technologies Certificate Authority is not a pre-installed CA in browsers. So, if an HTTPS connection is opened to PureSol Technologie site, the browsers will download the certificates and during the check they find, that no known Root CA has checked and signed these.

To overcome this, we provide the PureSol Technologies Certificate Authority Website for certificate download. To check the authenticity we provide the certificates to clients via Sneakernet for local installation.

Additionally, site which are “read-only” for clients are provided via HTTP and HTTPS. Only sites which enforce user interaction like logins, we provide via HTTPS only. All HTTP connections on these sites are automatically redirected to its HTTPS variant.

Hadoop Client in WildFly – A Difficult Marriage

(This article was triggered by a question “Hadoop Jersey conflicts with Wildfly resteasy” on StackOverflow, because I hit the same wall…)

For a current project, I evaluate the usage of Hadoop 2.7.1 for handling data. The current idea is to use Hadoop’s HDFS, HBase and Spark to handle bigger amount of data (1 TB range, no real Big Data).

The current demonstrator implementation uses Cassandra and Titan as databases. Due to some developments with Cassandra and Titan (Aurelius was aquired by DataStax.), the stack seems not to be future-proof. An alternative is to be evaluated.

The first goal is to use the Hadoop client

in WildFly 9.0.1. (The content of this artical should be also valid for WildFly >=8.1.0.) HDFS is to be used at first to store and retrieve raw files.

Setting up Hadoop in a pseudo distributed mode as it is described in “Hadoop: The Definitive Guide” was a breeze. I was full of hope and added the dependency above to an EJB Maven module and wanted to use the client to connect to HDFS to store and retrieve single files. Here it is, where the problems started…

I cannot provide all stack traces and error messages anymore, but roughly, this is what happend (one after another; when I removed one obstacle, the next came up):

  • Duplicate providers for Weld where brought up as errors due to multiple providers in the Hadoop client. Several JARs are loaded as implicit bean archives, because JavaEE annotations are included. I did not expect that and it seems strange to have it in a client library which is mainly used in Java SE context.
  • The client dependency is not self contained. During compile time an issue arised due to missing libraries.
  • The client libraries contains depencencies which provide web applications. These applications are also loaded and WildFly try to initialize them, but fails due to missing libraries which are set to provided, but not included in WildFly (but maybe in Geronimo?). Again, I am puzzled, why something like that is packaged in a client library.
  • Due to providers delivered in sub-dependencies of Hadoop client, the JSON provider was switched from Jackson 2 (default since WildFly 8.1.0) back to Jackson 1 causing infinite recursions in trees I need to marshall into JSON, because the com.fasterxml.jackson.*  annotations were not recognized anymore and the org.codehaus.jackson.*  annotations were not provided.

The issues are manyfold and  is caused by a very strange, no to say messy packaging of Hadoop client.

Following are the solutions so far:

Broken implicte bean archives

Several JARs contain JavaEE annotations which leads to an implicit bean archive loading (see: http://weld.cdi-spec.org/documentation/#4). Implicit bean archive support needs to be switched off. For WildFly, it looks like this:

Change the Weld settings in WildFly’s standalone.conf from

to

This switches the implicit bean archive handling off. All libraries used for CDI need to have a /META-INF/beans.xml  file now. (After switching off the implicit archives, I found a lot of libraries with missing beans.xml  files.)

Missing dependencies

I added the following dependencies to fix the compile/linkage issues:

Services provided, but not working

After switching off the implicit bean archives and added new dependencies to get the project compilied, I run into issues during deployment. Mostly, the issues were missing runtime dependencies due to missing injection providers.

The first goal was to shut off all (hopefully) not needed stuff which was disturbing. I excluded the Hadoop MapReduce Client App and JobClient (no idea what these are for). Additionally, I excluded Jackson dependencies, beacause they are already provided in the WildFly container.

Broken JSON marshalling in RestEasy

After all the fixes above, the project compiled and deployed successfully. During test I found that JSON marshalling was broken due to infinite recursions I got during marshalling of my file trees. I drove me cracy to find out the issue. I was almost sure that WildFly 9 switched the default Jackson implementation back to Jackson 1, but I did not find any release note for that. After a long while and some good luck, I found a  YarnJacksonJaxbJsonProvider class which forces the container to use Jackson 1 instead of Jackson 2 messing up my application…

That was the final point to decide (maybe too late), that I need a kind of calvanic isolation. Hadoop client and WildFly need to talk through a proxy of some kind not sharing any dependencies except of one common interface.

Current Solution

I created now one Hadoop connector EAR archive which contains the above mentioned and fixed Hadoop client dependencies. Additionally, I create a Remote EJB and add it to the EAR which provides the proxy to use Hadoop. The proxy implements a Remote Interface which is also used by the client. The client performs a lookup on the remote interface of the EJB. That setup seems to work so far…

The only drawback in this scenario at the moment is, that I cannot use stream throug EJB, because streams cannot be serialized. I think about creating a REST interface for Hadoop, but I have no idea about the performance. Additionally, will the integration of HBase be as difficult as this!?

For the next versions, maybe a fix can be in place. I found a Jira ticket HDFS-2261 “AOP unit tests are not getting compiled or run”.

“We are not a software company!”

“We are not a software company!”, is a phrase I have heard a lot of times. Sometimes I heart it in exactly this wording and sometimes hidden behind some nice words, explanations and epic speeches. The problem is, companies telling you something like that are software companies already. What people try to say is: “We do not sell software to customers.”. But, the real meaning is: “We do not need tools, processes and QA like companies who sell software to customers.” It is in most cases only an excuse to not do something important, meaningful, but maybe expensive. Please, let me disagree here…

There are two major issues with that sentence: The first is to distinguish between companies who sell software to customers as either box product or service, and one self. This distinction is correct and in most cases obvious, but it does not have any meaning with the second point, when it comes to the explanation that something is not needed in software development because of that. The second issue is, that the need to do some QA or something else is a business requirement and not a need by the customer itself.

The big problem is to distinguish only between software products which are directly relevant for revenue and those who are not. For box software and services directly sold to customers, it is obvious that this software is a product and for most people it is very important to assure quality for the customer. But, when it comes to software used only internally, some people come to the conclusion, that QA, certain tools and procedures are not needed.

Internally used software has one major motivation: It saves money! It is not making any revenue, but it increases profit by cutting losses. What a motivation!

The financial savings come from saving time, making better decisions, better monitoring to reduce defect costs and waste and many more… But, why do QA there? Because, software not checked may increase defect rates, produce wrong data which leads to wrong decisions, may perform badly what increases wait times and reduces through puts… And there are much more negative influence a software may have. Think about testing some security functionality which should prevent the own intellectual property from being stolen?

Issues in in-house software may lead to serious trouble, to increased costs and reduced profits by not influencing the revenue in any direction. So, why do a lot managers only focus on revenue?

As soon as a company is using software which is created internally, there is a strong motivation to do proper QA. The business may depend on it more heavily than expected…

Reblog: Why “Agile” and especially Scrum are terrible

Have a look to the blog post: https://michaelochurch.wordpress.com/2015/06/06/why-agile-and-especially-scrum-are-terrible/

A very interesting read, but I do not agree 100%. Some parts are a little too biased and come out of situations where Scrum is not applied correctly. It is also possible to “plan” refactorings, paying back technical debts and so forth. But, it is always good to read some critics about stuff one does all the time to re-think the own doing.

Software Engineering and the 4 Causes of Aristotle

I stumbled on the principle of the 4 causes of Aristotle some years ago when I did some reading about cause and effect, leadership and management. The question is: How do we get things done? When do we get things done? And also very important: How do we get things done in the right way?

The Four Causes of Aristotle

Everything what happens has four causes. There are no more and no less. Exactly four. Only if all four causes are present, something can come into existence. That is what Aristotle formulated in one of his most famous books: Physica. (Have also a look here: http://en.wikipedia.org/wiki/Four_causes)

In the next sections I describe the four causes. These causes appear in the order I present them. To give an accessible example, I use the picture of building a house. You will see how the four causes apply there.

Causa Finalis: The final cause

The causa finalis or final cause is the basic reason for anything to happen. You can also translate it as need, requirement or wish. Only with something like that a trigger is present to start some development.

For our house building example, we can think of it like the need to move into a new place because of a lack of space. For instance, a couple lives in a small flat and they are happy, but a child is to be born. They found out, that with a small child the place is too small, there is not a good chance to have a children room, the bath room is to narrow… They find that the current situation is changing and a need for more space is coming up.

This is the causa finalis. A need or requirement which is not detailed, yet. There is just an issue to be solved, but there is not a detailed plan, yet. But, the final result is formulated. In our case: More space.

Causa Formalis: The form

After the causa finalis is met, the second cause happens: The causa formalis or form. The final cause showed an issue and its final abstract solution and now it is thought about it and a plan or form is formulated. The causa formalis brings the vision for how the cause finalis can be solved.

In the house building example, our couple may have thought about renting the flat next to them and re-modelling a wall (what is maybe not allowed by the owner), moving into a bigger flat (which might be too expensive), or to build a house. Maybe, after deciding to build a house, they go to an architect and plan how large the house will be, what it will look like, and so forth. At the end of this process a clear vision exists how the situation is to be solved. After the abstract final result, more space, the plan has now a real, detailed form, but is still not physical, yet.

Causa Materialis: The material

After a clear vision exists due to the causa formalis, this vision needs to come to the final solution somehow. For that the last two causes are needed. The very next is the causa materialis or the material. For anything to happen, the material needs to be present or in other words: The bondary conditions need to be met.

In the house building example, the material is the material to build the house likes stones, concrete, wood and so forth, but also the knowledge on how to build it. The physical material is needed, what is obvious, but also the knowledge. Without these, there is no chance that a house can be build.

As boundary conditions other stuff is needed as well like some ground to put the house onto, time to build it and so forth. This is also part of the causa materialis.

Causa Efficiense: The execution

After the needs started the process, the vision was formed and the material was organized, the last cause is the execution or causa efficiense. A trainer of mine told me once: A vision without execution is just hallucination. He is right. One can dream of the best stuff, to have everything in place and so forth, but without execution nothing happens. That’s what the last cause is about. In this meaning, it also links to the block post I wrote some years before about The Trinity of Action.

In the house building example, this is the actual building of the house. We have everything in place now to perform the actual solution. We have the need to give as drive and energy, we have a vision and plan, we have the material and knowledge, and the last step is to put the material together with the knowledge we have to get the actual house build.

The Dependencies of the Four Causes

The four causes are sorted in another order in most cases, but in my opinion this is not optimal. There is a strict order of appearance in the order I described it above.

The first cause in my opinion is the causa finalis. It is a natures principle that without energy nothing takes place. Without the need, requirement or issue, there is no energy to change a current status quo. There is always a causa finalis needed as a seed for any change. I cannot think of a situation where it is different.

The second cause after the seeding by the causa finalis is always a kind of plan. A lot of actions in our world seem to be performed without plan, but the closer look reveals, that even there is a plan, but maybe not a well thought through one or just based on a pattern or experience, but there is one.

The cause materialis may be also the second reason, because the material might be already there for the solution, but it is not seen as such without a kind of plan. On the other side a plan also reveals what material might be missing which is needed to be organized or waited for.

At the very end the action, the causa efficiense can take place, but without a plan or material nothing can be done.

The principle of the Four Causes helps me a lot during my daily private and professional life, because it helps me to understand what happens on one site, but it also provides me a guideline on how to work in certain situation, because it gives me an order for what to work on.

The 4 Causes in Software Engineering

There is not so much to write anymore. In Software Engineering, this principles are at work as well.

At first a customer has a need to be solved which lead into some requirements which are the causa finalis. It is formulated what the final outcome has to be. After that architectural and design papers are written and planning done which are the causa formalis. With the organization of hardware, software, developers, office space and everything else, the boundary conditions for the actual development are met. The final development is the causa efficiense.

The trick is to reflect the current status of a software project from time to time and think about the four principles to find out, whether all causae are there. If something is missing, the project cannot be finished successfully:

  1. Is the causa finalis not met, no customer will pay due to a lack of need.
  2. Is the cause formalis not met, no customer will pay, because it is not useful.
  3. Is the cause materialis not met, the product cannot be developed, shipped or run. Again, nobody will pay.
  4. At least, if the causa efficiense is missing, the actual product is not build and cannot be sold either.

That’s all the secret in here…

JavaEE: Arquillian Tests support Multi-WAR-EARs

Before version 1.0.2 Arquillian did not support EAR deployments which contained multiple WAR files. The issue was the selection of the WAR into which the arquillian artifacts are to be placed for testing. The trick until then was to remove all WAR files not needed and to leave only one WAR within the EAR.

Starting from version 1.0.2 Arquillian does support Multi-WAR-EAR-Deployments as described in http://arquillian.org/blog/2012/07/25/arquillian-core-1-0-2-Final.

If you have an EAR which is used via

you can select a WAR with the following lines:

That’s it. After this selection Arquillian stops complaining about multiple WAR files within the EAR and the selected WAR is enriched and tested.

JavaEE WebSockets and Periodic Message Delivery

For a project I had the need to implement a monitoring functionality based on HTML5 and WebSockets. It is quite trivial with JavaEE 7, as I will explain below.

Let us assume the easy requirement of a simple monitoring which sends periodic status information to web clients. The web client shall show the information on a web page (inside a <div>…</div> for instance). For that scenario, the technical details are shown below…

The JavaScript code is quite easy and can be taken from JavaScript WebSocket books and tutorials. (A good introduction is for instance Java WebSocket Programming by Oracle Press). A simple client might look like:

The functions for onopen, onclose and onerror are neglected, because we want to focus on JavaEE. The important stuff is shown above: We connect with new WebSocket to the URL which shall provide the periodic updated and with onmessage we put the data somewhere into our web page. That’s it from the client site.

For JavaEE, there is a lot of documentation which shows how to create @ServerEndpoint classes. For instance:

But, how to make it send periodic messages easily? After some testing on WildFly 8.2, I came to this simple solution:

The trick is to make the @ServerEndpoint class also an EJB @Singleton. The @Singleton assures that only one instance is living at a time and this instance can keep also the session provided during @OnOpen. In other words: The actual server endpoint instance is exactly the same where the scheduler is running on. If it would not be @Singleton, multiple instances will or may exist and the session field is not set in @Schedule and might lead to a NullPointerException if not checked for.