Articles

Implementing mRFC 0024

In Midgard on 2006-01-03 by Jukka Zitting

Today I wrote the mRFC 0024: Full text indexing in Midgard proposal for adding full text and content tree support to the Midgard Query Builder. Like Torben did for the MidCOM indexer, I’m planning to use Apache Lucene as the underlying full text engine. The search indexer process shall be based on the Lucene Java library, but I haven’t yet decided what I should use for the query part. On the surface the best option would seem to be either the Lucene4C or the CLucene library, but both options have drawbacks. The Lucene4C seems like the best match for the midgard-core environment, but it doesn’t seem to be too actively developed and there’s even been talk of abandoning it for a gcj-compiled version of Lucene Java. The CLucene library is more mature, but it’s written in C++ and might therefore cause some unexpected build issues for midgard-core. One option would of course be to actually try linking midgard-core with a gcj-compiled Lucene Java! I’ll prototype with all these options tomorrow while the mRFC 0024 vote takes place.

Another interesting issue in mRFC 0024 is the introduction of the parent cache, or actually a global content tree structure. Currently Midgard supports a sort of a tree model for all content, but it is mostly accessible only as limited views like for example the topic, page and snippet trees. Functions like is_in_tree or list_…_all have also required major scope limitations or other performance hacks to be useful. This is a bit troublesome for many use cases like searching and access controlling. The proposed parent cache would greatly simplify such content tree operations.

If the proposed content tree model catches on, then a natural migration path for Midgard 2.0 would be to make the proposed parent_guid field the official parent link in all Midgard records. This would both simplify the object model and allow for much flexibility in organizing the content tree. It would for example be possible to create an event calendar topic that has all the event objects as direct descendants instead of having to use an explicit link to a separate root event. The only problem with this approach is that it is a major backwards compatibility issue…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: