Posts Tagged ‘Jackrabbit’


Buzzword conference in June

In General on 2010-05-14 by Jukka Zitting Tagged: , , , , ,

Like the Lucene conference I mentioned earlier, Berlin Buzzwords 2010 is a new conference that fills in the space left by the decision not to organize an ApacheCon in Europe this year. Going beyond the Apache scope, Berlin Buzzwords is a conference for all things related to scalability, storage and search. Some of the key projects in this space are Hadoop, CouchDB and Lucene.

I’ll be there to make a case for hierarchical databases (including JCR and Jackrabbit) and to present Apache Tika project. The abstracts of my talks are:

The return of the hierarchical model

After its introduction the relational model quickly replaced the network and hierarchical models used by many early databases, but the hierarchical model has lived on in file systems, directory services, XML and many other domains. There are many cases where the features of the hierarchical model fit the needs of modern use cases and distributed deployments better than the relational model, so it’s a good time to reconsider the idea of a general-purpose hierarchical database.

The first part of this presentation explores the features that differentiate hierarchical databases from relational databases and NoSQL alternatives like document databases and distributed key-value stores. Existing hierarchical database products like XML databases, LDAP servers and advanced filesystems are reviewed and compared.

The second part of the presentation introduces the Content Repositories for the Java Technology (JCR) standard as a modern take on standardizing generic hierarchical databases. We also look at Apache Jackrabbit, the open source JCR reference implementation, and how it implements the hierarchical model.


Text and metadata extraction with Apache Tika

Apache Tika is a toolkit for extracting text and metadata from digital documents. It’s the perfect companion to search engines and any other applications where it’s useful to know more than just the name and size of a file. Powered by parser libraries like Apache POI and PDFBox, Tika offers a simple and unified way to access content in dozens of document formats.

This presentation introduces Apache Tika and shows how it’s being used in projects like Apache Solr and Apache Jackrabbit. You will learn how to integrate Tika with your application and how to configure and extend Tika to best suit your needs. The presentation also summarizes the key characteristics of the more widely used file formats and metadata standards, and shows how Tika can help deal with that complexity.

I hear there are still some early bird tickets available. See you in Berlin!



Apache JCR Commons

In ASF,Jackrabbit on 2009-01-23 by Jukka Zitting Tagged: , ,

In the Apache Jackrabbit project we’ve decided to create a new JCR Commons subproject for developing and managing the set of generic JCR tools that has grown over time around the core Jackrabbit content repository implementation.

The JCR Commons subproject will to some extent resemble the Apache Commons project, and I’m hoping to use some of the ideas put forward by Henri in his blog post about a “federated commons”.

I’m hoping to flesh out the details of this new subproject over the next month or two. It would be nice to have releases of all the new JCR Commons components ready to be used as dependencies for the upcoming Jackrabbit 1.6 release.


Apache Jackrabbit 1.5.0 released

In Jackrabbit on 2008-12-08 by Jukka Zitting Tagged: , ,

Apache Jackrabbit 1.5.0, the latest and greatest release of the best content repository I know, is now available! Get it from the Jackrabbit web site or through the central Maven repository while it’s hot!

The most notable changes since version 1.4 are:

  • The standalone Jackrabbit server component. The runnable
    jackrabbit-standalone jar makes it very easy to start and run
    Jackrabbit as a standalone server with WebDAV and RMI access.
  • Search performance improvements. The performance of certain kinds
    of hierarchical XPath queries has improved notably.
  • Simple Google-style query language. The new GQL query syntax
    makes it very easy to express simple full text queries.
  • Transaction-safe versioning. Mixing transactions and versioning
    operations has traditionally been troublesome in Jackrabbit.
    This release contains a number of improvements in this area and
    has specifically been reviewed against potential deadlock issues.
  • Clustered workspace creation. A new workspace created in one
    cluster node will now automatically appear also in the other
    nodes of the cluster.
  • SPI improvements. The SPI layer introduced in Jackrabbit 1.4
    has seen a lot of improvements and bug fixes, and is shaping
    up as a solid framework for implementing JCR connectors.
  • Development preview: JSR 283 features. We have implemented
    a number of new features defined in the public review draft of
    JCR 2.0, created in JSR 283. These new features are accessible
    through special “jsr283” interfaces in the Jackrabbit API. Note
    however that none of these features are ready for production use,
    and will be replaced with final JCR 2.0 versions in Jackrabbit 2.0.

See the release notes for all the details.