Posts Tagged ‘Java’

Articles

Age discrimination with Clojure

In General on 2010-08-26 by Jukka Zitting Tagged: ,

Michael Dürig, a colleague of mine and big fan of Scala, wrote a nice post about the relative complexity of Scala and Java.

Such comparisons are of course highly debatable, as seen in the comments that Michi’s post sparked, but for the fun of it I wanted to see what the equivalent code would look like in Clojure, my favourite post-Java language.

(use '[clojure.contrib.seq :only (separate)])

(defstruct person :name :age)

(def persons
  [(struct person "Boris" 40)
   (struct person "Betty" 32)
   (struct person "Bambi" 17)])

(let [[minors majors] (separate #(<= (% :age) 18) persons)]
  (println minors)
  (println majors))

The output is:

({:name Bambi, :age 17})
({:name Boris, :age 40} {:name Betty, :age 32})

I guess the consensus among post-Java languages is that features like JavaBean-style structures and functional collection algorithms should either be a built-in part of the language or at least trivially implementable in supporting libraries.

Advertisements

Articles

Forking a JVM

In ASF,Java on 2010-05-27 by Jukka Zitting Tagged: , , , ,

The thread model of Java is pretty good and works well for many use cases, but every now and then you need a separate process for better isolation of certain computations. For example in Apache Tika we’re looking for a way to avoid OutOfMemoryErrors or JVM crashes caused by faulty libraries or troublesome input data.

In C and many other programming languages the straightforward way to achieve this is to fork separate processes for such tasks. Unfortunately Java doesn’t support the concept of a fork (i.e. creating a copy of a running process). Instead, all you can do is to start up a completely new process. To create a mirror copy of your current process you’d need to start a new JVM instance with a recreated classpath and make sure that the new process reaches a state where you can get useful results from it. This is quite complicated and typically depends on predefined knowledge of what your classpath looks like. Certainly not something for a simple library to do when deployed somewhere inside a complex application server.

But there’s another way! The latest Tika trunk now contains an early version of a fork feature that allows you to start a new JVM for running computations with the classes and data that you have in your current JVM instance. This is achieved by copying a few supporting class files to a temporary directory and starting the “child JVM” with only those classes. Once started, the supporting code in the child JVM establishes a simple communication protocol with the parent JVM using the standard input and output streams. You can then send serialized data and processing agents to the child JVM, where they will be deserialized using a special class loader that uses the communication link to access classes and other resources from the parent JVM.

My code is still far from production-ready, but I believe I’ve already solved all the tricky parts and everything seems to work as expected. Perhaps this code should go into an Apache Commons component, since it seems like it would be useful also to other projects beyond Tika. Initial searching didn’t bring up other implementations of the same idea, but I wouldn’t be surprised if there are some out there. Pointers welcome.

Articles

“SIMPLE”.toLowerCase() is simple, right?

In General on 2010-04-12 by Jukka Zitting Tagged: , ,

It turns out that "SIMPLE".toLowerCase().equals("simple") is not true if your default locale is Turkish, but your code is written in English. Turkish has two “i” characters, one with a dot and one without, which throws the above code off balance. The fix is to write the expression either as "SIMPLE".toLowerCase(Locale.ENGLISH).equals("simple") or even better as "SIMPLE".equalsIgnoreCase("simple").

I just stumbled on this issue with Apache Tika (see TIKA-404), and it seems like I’m not the only one.