Interested in JCR and looking for a new laptop? My employer has announced JCR Cup 2008, a JCR design and development competition. Too bad I can’t enter the competition, I’m sure we’ll come up with a whole load of cool ideas next week.
Archive for the ‘Java’ Category

Presenting Apache Tika
Yesterday, during the Fast Feather Track at the ApacheCon US, I presented the incubating Apache Tika project. See below for the slides:
I was positively surprised about the level of attendance and also the interest in Tika during the Search Roundtable BOF later in the evening. Even though the project is still just starting, it’s already generating lots of interest and I really look forward to getting the first releases out.

Concurrency and River
If you’re interested in concurrency, distributed systems, and ways to best use the manycore processors we’re being promised, then check out the Concurrency and River thread on the development mailing list of the incubating Apache River project. The thread is about concurrency and ways the River project (a continuation of Jini from Sun) and related technologies like JavaSpaces could be used to parallellize many computing tasks. There are also some nice comparisons to Erlang and Scala, and how the actor model used by them is related to the Jini network model.

The cause of an IOException
I just had to follow a stack trace through a complex codebase with multiple layers. The exception chaining mechanism introduced in Java 1.4 made the task easy up to the point where the last exception in the chain was an IOException thrown by code like this:
try {
....
} catch (Exception e) {
throw new IOException("...");
}
What a dead end! The problem is that the IOException constructors that allow exception chaining were only added in Java 6. Here’s a workaround that would have saved me a lot of extra effort:
try {
....
} catch (Exception e) {
IOException ioe = new IOException("...");
ioe.initCause(e);
throw ioe;
}

JUnit tests for known issues, part 3
My quest for a way to handle known issues as JUnit tests seemed already finished, when Marcel Reutegger, a committer of the Apache Jackrabbit project and a developer of the Technology Compatibility Kit (TCK) of JSR 170, displayed some serious JUnit-fu by pointing to the TestResult class in JUnit.
It turns out that JUnit creates a TestResult instance for each test being run and uses that instance to store the test results. It is possible to customize the TestResult being used by overriding the TestCase.createResult() method. You can then decide in TestResult.addFailure(Test, AssertionFailedError) and TestResult.addError(Test, Throwable) whether to skip some failure or error reports. This is what we ended up doing in Jackrabbit.
Digging deeper along these lines I found out that you could actually implement similar functionality also directly in a TestCase subclass, thus avoiding the need to override TestCase.createResult(). The best way to do this is to override the TestCase.runBare() method that gets invoked by TestResult to run the actual test sequence. The customized method can check whether to skip the test and just return without doing anything in such cases.
I implemented this solution as a generic JUnit 3.x ExcludableTestCase class, that you are free to copy and use under the Apache License, version 2.0. The class uses system properties named junit.excludes and junit.includes to determine whether a test should be excluded from the test run. Normally all tests are included, but a test can be excluded by including an identifier of the test in the junit.excludes system property. An exclusion can also be cancelled by including a test identifer in the junit.includes system property. Both system properties can contain multiple whitespace-separated identifiers. See the ExcludableTestCase javadocs for more details.
You can use this class by subclassing your test cases from ExcludableTestCase instead of directly from TestCase:
package my.test.package;
public class MyTestCase extends ExcludableTestCase {
public void testSomething() {
// your test code
}
}
You can then exclude the test case with -Djunit.excludes=my.test.package, -Djunit.excludes=MyTestCase, or -Djunit.excludes=testSomething or a combination of these identifiers. If you’ve for example excluded all tests in my.test.package, you can selectively enable this test class with -Djunit.includes=MyTestCase.
You can also add a custom identifiers to your test cases. For example, if your test case was written for Bugzilla issue #123, you can identify the test in the constructor like this:
public MyTestCase(String name) {
super(name);
addIdentifier("#123");
}
Then you can exclude tests for this issue with -Djunit.excludes=#123.

JUnit tests for known issues, part 2
A few days ago I considered different options for including known issue test cases (ones that you expect to fail) in a JUnit test suite in a way that wouldn’t make the full test suite fail. I decided to adopt a solution that uses system properties to selectively enable such known issue test cases. Here’s how I implemented it for Apache Jackrabbit using Maven 1 (we’re currently working on migrating to Maven 2, so I’ll probably post Maven 2 instructions later on).
The first thing to do is to make the known issue tests check for a system property used to enable a test. The example class below illustrates two ways of doing this; either to make the full known issue test code conditional, or to add an early conditional return to skip the known issue. You can either use a single property like “test.known.issues” or different properties to allow fine grained control over which tests are run and which skipped. I like to use the known issue identifier from the issue tracker as the controlling system property, so I can selectively enable the known issue tests for a single reported issue.
public class ExampleTest extends TestCase {
public void testFoo() {
if (Boolean.getBoolean("ISSUE-foo")) {
// test code for "foo"
}
}
public void testBar() {
if (!Boolean.getBoolean("ISSUE-bar")) {
return;
}
// test code for "bar"
}
}
Once this instrumentation is in place, the build system needs to be configured to pass the identified system properties to the code when requested. In Maven 1 this happens through the maven.junit.sysproperties setting in project.properties:
maven.junit.sysproperties=ISSUE-foo ISSUE-bar ISSUE-foo=false ISSUE-bar=false
This way the known issue tests will be skipped when normally running “maven test“, but can be selectively enabled either on the command line (“maven -DISSUE-foo=true test“) or by modifying project.properties or build.properties.

Missing API details in Java: Null references
Even though Java APIs tend to be well documented thanks to the Javadoc, there are some details that are quite often missing, causing developers to program by coincidence. One of the main issues is the handling of null references.
Although there are guaranteed to be no dangling references on the Java platform, a reference can still be null and cause the infamous NullPointerException (aka NPE) when passed to an unwary piece of code. Null references are very convenient in expressing the absense of something, but I these special cases are often not well documented. There are three main cases of null references that are commonly used but seldom documented: optional arguments, member variables, and return values.
Optional arguments
Instead of overloading a method name to cover the case where one or more of the arguments are unavailable, the method can allow some of its arguments to be null. This is especially common for constructors that allow optional configuration options. This practice is otherwise very convenient, but disturbingly often not documented, leaving the client developer to wonder whether it is OK to pass a null reference to as a seemingly optional method argument. Often the solution is to just pass the null reference and rely on coincidence to keep it working.
A good example is the DocumentBuilder.parse(InputStream stream, String systemId) method in JAXP. It is explicitly documented that an IllegalArgumentException is thrown when the stream argument is null, but the systemId argument is just documented as “Provide a base for resolving relative URIs“. There is also an overloaded DocumentBuilder.parse(InputStream stream) method, and incidentally it happens that calling the former method with a null systemId is equivalent to calling the latter method.
Now a JAXP client developer that has an InputStream and system identifier string that might be null, could either do the right thing and program defensively:
if (systemId == null) {
builder.parse(stream);
} else {
builder.parse(stream, systemId);
}
or rely on the coincidence that a null system identifier is actually allowed:
builder.parse(stream, systemId);
The latter case is in my experience what most of the developers would do, and thus the JAXP implementation is in practice required to keep allowing null system identifiers. The systemId argument should therefore be documented as “Optional base for resolving relative URIs” or even more explicitly as “Base for resolving relative URIs, or null“.
Member variables
A good practice in Java is to keep all member variables private or at least protected. Unfortunately this allows the developer to be lazy in documenting the permitted states of the variable. After all, a private member is not a part of the public interface of a class, so why bother documenting it. A member variable can be null either by having explicitly been set so or by having been passed as null to a constructor or a setter. Often you need to explicitly search through the sources of a class to determine the possible states of a member variable. This is especially important when using the JavaBean conventions where a private member variable is often exposed trough a getter method with a template javadoc that contains no mention of the valid states of the underlying variable.
The JavaBean case is actually especially troublesome as the common pattern for JavaBean properties is:
private Object something;
/** Returns something */
public Object getSomething() {
return something;
}
/** Sets something */
public void setSomething(Object something) {
this.something = something;
}
It is most often not documented whether null references are allowed in the setter or if the client is required to explicitly set the property before doing anything with a bean instance. This is in my experience the main cause ofNullPointerExceptions in component-based systems.
Return values
Null references are commonly used to represent the absence of some value. For example the Map.get(Object key) method returns null when an entry for the given key is not found in the map. Such cases are usually well documented (the Map.get method returns “the value to which this map maps the specified key, or null if the map contains no mapping for this key“), but in some cases it is just implicitly assumed that a client developer will expect a null return value.
The most common causes of undocumented null return values are the JavaBean getters described above, but sometimes a genuine processing method forgets to mention that the return value might be null. A good example is theZipInputStream.getNextEntry() method, that returns “the ZipEntry just read” but fails to mention that the “ZipEntry just read” is null if no more Zip entries are available. A clever developer will of course assume that this is the case, since the method doesn’t throw aNoSuchElementException like the Iterator.next() method does, but the only way to know for sure is to read the ZipInputStream sources and even then you are left with the bad feeling that the implementation might well be changed in a future release.
The return value of the getNextEntry() method should therefore be documented as “the ZipEntry just read, or null if no more entries are available“.

UMLet 7
Ever since learning UML back in 1998 I’ve been looking for decent UML tools that best suit my rather ad-hoc diagramming style. Even though I’ve occasionally used them, I’ve never really enjoyed the heavyweight, round-tripping, IDE-integrated (even IDE-embedding!) modelling monoliths that most of the UML tools seem to evolve into sooner or later. My reasons for using UML are documenting existing code and discussing new ideas, almost never to actually implement anything. I usually also work in highly heterogeneous settings with co-developers using a wide variety of tools and development environments. Adapting to a do-all-be-all UML tool is in many cases simply impossible or at least quite difficult.
Thus I’ve actively stayed away from the high-end offerings and focused more on the low-end alternatives like Dia and the most popular UML tool in the world, MS PowerPoint. However they never felt really natural, being either too inflexible or requiring too much manual work especially when rearranging diagrams. Luckily a few years ago, while doing my yearly lookout for better development tools, I stumbled upon UMLet, a lightweight open source UML diagram editor that has a rather original but very flexible and convenient user interface. It even works as a drop-in plugin for Eclipse.
A few weeks ago after upgrading to Eclipse 3.2, I went looking for an UMLet upgrade and was happy to find version 7 available for download. The new version has nice new features like color and transparency support, new diagram types, and various user interface improvements like improved mouse selection support. Warmly recommended.
The attached class diagrams were quickly created using UMLet 7 to describe the structure of a mid-sized patch I sent for consideration as part of the Jackrabbit issue JCR-415.

JUnit tests for known issues
A few months ago I started working on the Jackrabbit improvement issue JCR-325. Following a good practice, the first thing I did was create a test case for the missing functionality. However, this breaks another good practice of always passing 100% of the test suite. This and a few other know issue tests are currently causing the Jackrabbit test suite to fail even without any real problems, making it more difficult to check whether a recent change or an experimental patch breaks things.
To fix the situation I started wondering if there was a JUnit version of the TODO blocks in Perl’s Test::More. The problem is that JUnit can only report tests as successful or failing (or erroneous if they throw an exception), there is no way to easily mark test failures as TODOs. Googling around and asking the Jackrabbit mailing list produced some workarounds:
- Use a system property to determine whether to perform or skip the known issue test cases.
- Put the known issues tests in separate test case classes and exclude them from the test suite.
- Use a JUnit addon to ignore marked test cases as explained in an article that discusses this same issue.
- Use an alternative test framework like TestNG, that has this functionality built-in.
I didn’t want to start changing the entire test suite or even tweaking the build environment, so the last two options were out. I also wanted to make the setup easily configurable so a developer can selectively enable testing for a known issue, thus the first alternative of using a system property looks like the best solution. It seems that the Apache OJB project has reached the same solution.

Analyzing the Jackrabbit architecture with Lattix LDM
Tim Bray pointed to David Berlind who pointed to the Lattix company. Lattix makes a tool called Lattix LDM that uses a Dependency Structure Matrix to work with software architecture. I watched the nice Lattix demo and decided to try the software out.
After receiving my Community license and struggling for a while to get the software running on Linux (need to include both the jars and the Lattix directory in the Java classpath!) I loaded the latest Jackrabbit jar file for analysis. The dependency matrix of the top-level packages after an initial partitioning is shown below:
The matrix contains all the package dependencies. A number in a cell of the matrix tells how many dependencies the package on the vertical column has on the package on the horizontal row. You can tell how widely a package is used by reading the cells on the package row. The package column identifies the other packages that the selected package uses or depends on. In general a healthy architecture only contains dependencies located below the diagonal.
The packages 2-6 form the general purpose Jackrabbit commons module, while the more essential parts of the Jackrabbit architecture are found within the core module. I grouped the commons packages and expanded the core module to get a more complete view of the Jackrabbit internals:
There was no immediate structure appearing, so I used the automatic DSM partitioning tool on the core module to sort out the package dependencies:
The value, config, fs, and util packages form a lower utility layer and the jndi package a higher deployment tool layer. The most interesting part lies between those layers, in the large interdependent section in the middle. The key to the architecture seems to be the main core package that both uses and is used by other packages in this section. I opened a separate view for examining the contents of the main core package:
The partitioning suggests that it might make sense to split the package in two parts. Without concern for semantic grouping, I just grouped the classes in the upper half as core.A and the classes in the lower half as core.B. This seems useful as the core.B package seems to be a bit better in terms of external dependencies:
Running the package partitioning again, I got a bit more balanced results although the main section still is heavily interdependent:
Looking at the vertical columns it seems like the main culprits for the interdependencies are the nodetype, state, version, and the virtual core.A packages. Both the nodetype and state package contain subpackages so I wanted to see if the dependencies could be localized to just a part of those packages:
This is interesting, the interdependencies for the state package are for the main state package, while the nodetype interdependencies only affect the nodetype.virtual subpackage. I split both packages along those dependency relations,and partitioned the core module again:
The persistence managers in the state subpackages are now outside the main section just like the non-virtual nodetype classes. After a short while of further research on the dependencies I found that the partitioning of the main state package would suggest that the item state managers be split to a separate package:
After creating a new statemanager package for containing the item state managers, the partitioning of the core module starts to look better. The only remaining circular dependencies are for the virtual core.A and core.B packages:
Looking at the virtual core.B package we find that only the NodeId, PropertyId, and ItemId classes depend on the state package:
In fact it seems that it might make sense to move the classes there. After doing that the core module partitioning looks even better:
The only remaining source of cyclic dependencies is the virtual core.A package into which I wont be going any deeper at this moment. Even now the analysis seems to have provided a number of suggestions for reducing the amount of cyclic dependencies and thus the improving the design quality of the Jackrabbit core:
- Split the main core package into subpackages
- Move the nodetype.virtual package to a higher level
- Move the state subpackages to a separate package
- Make a separate package for the item state managers
- Move the NodeId, PropertyId, and ItemId classes to the state package
Note that these suggestions are just initial ideas based on a quick walkthrough of the Jackrabbit architecture using a Dependency Structure Matrix as a tool. As such the approach only gives structural insight to the architecture, and for this short analysis I didn’t much include knowledge about the semantic roles and relationships of the Jackrabbit classes and packages.







