Two weeks ago during the BarCamp at the ApacheCon US I chaired a short session titled “The RESTful Content Repository”. The idea of the session was to discuss the various ways that existing content repositories support RESTful access over HTTP and to perhaps find some common ground from which a generic content repository protocol could be formulated.
The REST architectural style was generally accepted as a useful set of constraints for the architecture of distributed content-based applications, but as an architectural style it doesn’t define what the bits on the wire should look like. This is what we set out to define with the HTTP protocol as a baseline. We didn’t get too far, but see below for some collected thoughts and a useful set of “test cases” that I hope to use to further investigate this idea.
Many existing content repositories and related products already support one or more HTTP-based access patterns: Apache Jackrabbit exposes two slightly different WebDAV-based access points. Apache Sling adds the SlingPostServlet and default JSON and XML renderings of content. Apache CouchDB uses JSON over HTTP as the primary access protocol. Apache Solr uses XML over HTTP. Midgard doesn’t have a built-in HTTP binding for content, but makes it very easy to implement such bindings. This list just scratches the surface…
There are even existing generic protocols that match at least parts of what we wanted to achieve. WebDAV has been around for ten years already, but the way it extends HTTP with extra methods makes it harder to use with existing HTTP clients and libraries. The AtomPub protocol solves that issue, but being based on the Atom format and leaving much of the server behaviour undefined, AtomPub may not be the best solution for generic content repositories.
Content repository operations over HTTP
To better understand the needs and capabilities of existing solutions, we should come up with a simple set of content operations and find out if and how different systems support those operations over HTTP. The most basic such set of operations is CRUD, i.e. how to create, read, update, and delete a document, so let’s start with that. I’m giving each operation a key (CRn, as in “Content Repository operation N”) and a brief description of what’s expected. In later posts I hope to explore how these operations can be implemented with curl or some other simple HTTP client accessing various kinds of content repositories. I’m also planning to extend the set of required operations to cover features like search, linking, versioning, transactions, etc.
CR1: Create a document
Documents with simple properties like strings and dates are basic building blocks of all content applications. How can I create a new document with the following properties?
- title = “Hello, World!” (string)
- date = 2009-11-17 (date)
At the end of this operation I should have a URL that I can use to access the created document.
CR2: Read a document
Given the URL of a document (see CR1), how do I read the properties of that document?
The retrieved property values should match the values given when the document was created.
CR3: Update a document
Given the URL of a document (see CR1), how do update the properties of that document? For example, I want to update the existing date property and add a new string property:
- date = 2009-11-18 (date)
- history = “Document date updated” (string)
When the document is read (see CR2) after this update, the retrieved information should contain the original title and the above updated date and history values.
CR4: Delete a document
Given the URL of a document (see CR1), how do I delete that document?
Once deleted, it should no longer be possible to read (see CR2) or update (see CR3) the document.