I am working on a medium-size Java application with a test harness consisting of 180 tests grouped in 6 packages. Most of the automated tests read one or more input files and create multiple output files which are compared with “control” files. This approach provides an easy way to add more tests without coding. Over a period of three years this lead to the creation of around 350 input files and 600 control files taking 167MB. But as tests change, not all the test data is used anymore, which raise the question is: How can I find the test data files that are still used in my regression testing suite? The search for an answer lead to exploring Java 7, Spring AOP and Aspect J.
I tried a few ways to answer the question. In the end, the solution was to identify which files are read during the execution of the regression test. I am doing the day-to-day development in a Windows environment and on that OS a file has a “last accessed” attribute. This attribute is maintained by the OS if the registry variable HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\NtfsDisableLastAccessUpdate is turned on. Current Java standard library provides access to some of the file attributes in the class java.io.File but not to the Last-Accessed attribute. This was identified as an issue a long time ago and is scheduled to be address in Java 7. So, I downloaded the beta version of JDK 7.0 and installed on my computer and I created a small program that traverses the files that are in the test/data/in and test/data/ctrl directories and prints the Last Accessed attribute using the classes: java.nio.file.Attributes and java.nio.file.BasicFileAttributes. Using this new small tool, a small procedure let me know all files that are not accessed during the regression testing:
Save the current date/time in the variable junitStart
Run the regression testing
Traverse the test/data/in and test/data/ctrl directories with the FileLastAccessedVisitor
For each file which has the Last-Accessed attribute earlier than junit-start
Print the file in the standard output
I got a pretty long list of files that are not used anymore: 20% of the input files and 40% of the control files. Most of the files are either for tests of older versions of the code that was refactored but the test data was not removed or tests for specific bugs which also were later refactored in other test cases. I started to remove some of the files but I had to be careful: some of the test data is used in a performance or a UI harness. One lesson is learned that it is better to separate the test data by testing package.
Checking which files are read was not my first attempt to analyze the coverage of the test data. One approach that I considered was to instrument the Java to provide the answer to my problem. I just needed to find all the java.io.File objects created and print the name of the file after the object is initialized. One option was to get an open source implementation of the java.io.File, modify the constructor and put it in the classpath in front of the Java standard library. But, I did not pursue that direction, it seemed too intrusive and I did not see how I can leverage that idea in a tool that I can run regularly, for example in a nightly build. Another way would be to implement a decorator around the java.io.File class which calls the real File constructor and then prints the name of the file. Of the 320 classes (including 83 test cases) that made the application 183 are using File objects, so I used the Unix tool sed to replace all instances of “import java.io.File;” with “import com.intspc.io.File;”. That did not work well because the resulting code had many syntax errors when the File objects are not created in the application code but returned by other methods, for example File.listFiles(). I had to give up on this direction. But I found interesting things: I noticed that in many cases the code uses listFiles with an anonymous class to define the FileFilter when in most cases that code can be done using WildcardFileFilter. Also, the fact that there are calls to File.listFiles in so many places indicates that there is not enough data encapsulation.
I did not completely give up on the idea of finding the calls to the File constructor. Maybe the answer is in Aspect Oriented Programming (AOP) which basically intercepts the execution of a program at specific points (point-cuts) and inserts pieces of code (advices) to be executed. In my case the point-cut would be the java.io.File constructor and the advice will be to print the file name after the constructor is executed. I started with Spring AOP – a simpler version of AOP. Reading the Spring Reference Manual I realized that this direction will not work because in Spring AOP point-cuts can be only public methods (not constructor) for container-managed objects (java.io.File is not container-managed). But, the same documentation indicated that AspectJ is more powerful, allowing for defining constructor point-cuts. So, I installed in my working area Aspect J (thanks to Maven this was an easy task), brushed my AspectJ skills and tried a few examples of how to intercept the constructor. I created class (com.intspc.A) and got the code working. But, when I tried to intercept the java.io.File constructor my AspectJ advice was not executed. I realized that using AspectJ will not work either because the tool modifies the Java bytecode to insert the advice in the code. It worked for com.intspc.A because the tool had access to the class file but not for java.io.File which is part of the Java standard library. Probably I could have continued by instrumenting the Java standard library but I did not go further: modifying the Java standard library was beyond the options that I considered.
This work started with a simple question and searching the solution required dealing with Java 7, Spring AOP and AspectJ. I got my answer looking at the file attributes, but there were a few other lessons learned. One is that in order to try new technologies one needs tools that easily create new small applications. Maven was very effective for trying Spring AOP and AspectJ. I used before Ivy, which is good, but Maven seems was even better. Installing Java SE 7.0 worked without problems and having the ability in Eclipse to work with multiple JREs was very convenient. Management of the test data is a drag, the team needs tools and constant monitoring to keep it under control. Finally, I got a taste of how AspectJ could be used in an application, not to implement business functionality, but to instrument the code for non-functional requirements in this case application monitoring.
No Comments »
No comments yet.