Archive for the ‘Midgard’ Category


Midgard: Where it all began

In Midgard on 2009-05-10 by Jukka Zitting

On Friday we celebrated the tenth anniversary of the Midgard project. The celebration took the form of a very nice gala evening with good food and drinks with live music, show and of course some speeches. I was asked to deliver a few words about how it all began for Midgard.

Here’s my speech, reconstructed from my draft notes and edited for the web audience:

We were a group of teenagers and young adults doing historical re-enactment and live action role playing games. One evening in early -97 we were sitting in a bus, returning from the woods with all our viking gear on. Bergie said to me: “Hey Yaro”, as I was known as Yaroslav at the time. “Hey Yaro”, he said, “you’re over 18 and you have a drivers license. Would you like to take a dozen teenagers to a trip to Norway and back?” Even back then Bergie was the one with big dreams and the power to inspire people. I had the skills required to make those dreams happen but not yet enough experience to tell that we perhaps should think twice. So I just answered: “Sounds cool, let’s do it!” That’s pretty much what happened also with Midgard.

The trip to Norway went well for us and was followed by a number of other adventures. One of them was our quest to build a better web site for our group. It was -97 and the web was booming. The de facto web publishing technology was FTP, that people used to push static HTML to a web server. Geocities was a major cool thing as it allowed you to publish your static HTML for free. We however had bigger plans and our own server running in the closet of a friendly internet company. And we were publishing lots of stuff: news, photos, articles, etc. Quite a few people were actively contributing new content to the web site.

Our first serious attempt at better managing the site was based on technologies called SGML and DSSSL. For the technically minded: nowadays you’d use XML and XSLT for similar tasks. We used this system to “cook” our content into nicely formatted HTML that was then served to the world. It worked pretty well, but was hopelessly too complex for almost all of our contributors. This was a time when people were only just discovering the Internet. Most of our contributors were teenagers who were using the net from libraries or schools. Internet connections with modems were only just finding their ways to normal households. Even FTP was often out of the question, so there was little hope of making the heavy SGML tooling work as well as we’d like.

We wanted a system that could be managed entirely through the browser. Not just the content you saw on the web site, but the layout templates and even the functional code used to list pages or to handle the forms for adding or modifying content. The system should allow you to build an entire web site, including all the administration interfaces, without any other tooling than a web browser. Such systems simply didn’t exist at the time and in fact they’re pretty rare even today.

So we had to build our own system. We looked at a number of potential platforms for something like this, and the LAMP stack seemed like a good fit. Our server already ran Linux and, like pretty much everyone, we used the Apache web server. We hadn’t used PHP or MySQL before, but they were getting some good press and were easy enough to get started with. In fact we hadn’t done much anything when we started: we hadn’t done Apache modules, we hadn’t extended (or even written!) PHP, and at the time I had only read about relational databases. As we used to say: “How hard can it be?” We didn’t know, and so we just did it.

The result of our efforts was called Midgard. We had used it to power our web site for about a year when Bergie was hired to build a new web site for a Finnish tech company. Midgard seemed like a good fit for that need, and we figured that also other people might find the system useful. Open source was cool and we wanted to join the movement so we decided to publish Midgard as open source. After nights spent researching licensing options, writing press releases, creating the project web site and setting up mailing lists and public CVS access we were finally ready to publish Midgard 1.0 to the world. That happened exactly ten years ago.

The 1.0 release was like the Land Rover it was named for. The magnificent car from -62, that we used on many of our trips, was really cool and when it worked, it did so very well. However every known and then it required some “manual help” to get it started or to keep it going. This was also the case for Midgard 1.0. The first external installation that I know of was done on a Solaris platform and required a few days worth of help and patches delivered over the mailing list before it was up and running. Much of that early feedback and experience was reflected in Midgard 1.1 that was our first release that people were actually managing to install and run without direct assistance. That started the growth of the Midgard community.

Meanwhile I had also been hired by the same company where Bergie worked, and much of our work there resulted in improvements to Midgard. Together with the feedback and early contributions we were getting from the mailing lists this made Midgard 1.2 already a pretty solid piece of software. It was fairly straightforward to install (at the standards of the time), it performed well and it had most of the functionality that you’d need to run a moderately complex web site.

And the results were showing. We were getting increasing traffic on the mailing lists, some companies would start offering Midgard support and the number of Midgard-based sites around the world was growing. One of my earliest concrete rewards for doing open source was a bottle of quality whiskey that some Midgard user from Germany sent me with a note saying: “Thanks for Midgard!” The whiskey is long gone, but I still treasure the memory. A few years later Bergie and a few other friends and Midgard developers went on to start their own company based on Midgard. I was tempted to join them, but at the time my life was taking  a different route and I gradually left Midgard to pursue other things.

Seeing the Midgard project take off and build a life of its own has been a very inspiring process for me. Having your first open source project become so successful is pretty amazing and also quite humbling. Looking at all the things Midgard is today fills me with pride of not what I’ve done, but of what you, the Midgard community, have accomplished. Thank you for that. Especially I’d like to thank my long time friend and co-conspirator in starting the Midgard project. Bergie, without your dreams and refusal to take  “no” as an answer we wouldn’t be here today. Thank you.


Comparing Midgard and JCR

In JCR,Midgard on 2009-02-10 by Jukka Zitting

MidgardMidgard is the open source content management framework that we originally created with Henri Bergius more than ten years ago. In the past few years I have been more involved with Java content repositories like Apache Jackrabbit, but I’m still following what goes on in Midgard and Henri’s recent comparison of Midgard and JCR prompted me to write up some of my thoughts on these two technologies. My experiences with Midgard and other content management systems that I’ve implemented go a long way explaining why I find the content repository concept so powerful.

In Midgard everything is content that is stored and managed inside a central content repository. The Midgard repository is an organized collection of MgdSchema objects stored in a specifically structured MySQL database. The repository contains site templates,  user preferences, content hierarchies and much more. All these content objects are accessed and managed through the Midgard core API and the language bindings that have been built on top of the API.

As Henri mentions, the Midgard repository clearly resembles to the JCR content repository model. The similarity is strong enough that I find it very interesting to look deeper at where the repository models differ and see which features I like better. Here’s a quick overview:

  • JCR typing is more flexible. The MgdSchema model makes it very easy to extend the repository with custom object types and the parameter feature allows even further runtime extensibility, but all objects are still clearly associated with a defined type. Midgard does not have unstructured nodes or mixin types that you find in JCR.
  • Midgard is less constrained by the hierarchy. In Midgard hierarchies are just a well supported special case of a more generic object linking mechanism. JCR references or event the shareable nodes in JCR 2.0 are not as powerful as the many-to-many relationships that you can easily handle in Midgard.
  • JCR is more addressable. As a downside of the above point, Midgard does not support as powerful path-based addressing of content objects as JCR does. The Midgard repository is only partially addressable by paths while in JCR everything has a path. On the other hand all Midgard objects are addressable by their identifiers, whereas only referenceable  nodes in JCR can be accessed by identifier.
  • Midgard queries are more powerful. The JCR 1.0 query model restricts search criteria to only refer to properties of a single node. Repository implementations like Jackrabbit extend the query model somewhat, but the Query Builder feature in Midgard allows more flexible search criteria to be used.

As a summary I think both JCR and the Midgard repository are good examples of the kind of infrastructure that provides a strong base for building modern content management systems. And Midgard’s relationship with the desktop world is an interesting example of how content repository technology isn’t really limited to just traditional content management systems.


Flashback: SusiSGML, Midgard, and Cocoon

In Midgard on 2008-10-29 by Jukka Zitting

Heikki wrote a nice post about how the Midgard project got started some ten years ago.  Bergie‘s comment also a briefly mentions the SusiSGML system that we used before Midgard.

SusiSGML was a custom SGML vocabulary with a set of DSSSL stylesheets that we used to cook the structured content into HTML with styling based on tables and all that stuff one used before the CSS revolution.

Architecturally SusiSGML was pretty much like early versions of Apache Cocoon, only with SGML and DSSSL instead of XML and XSL. Had we been at it a few years later, Midgard might have been based on Cocoon. :-)


The joy of troubleshooting

In Midgard on 2006-02-13 by Jukka Zitting

How to troubleshoot a problem in a computer system? There are a number of standard troubleshooting methodologies like trial-and-error, bottom-up, top-down, divide-and-conquer, and even root cause analysis, but none of them is a silver bullet for all possible problems. Often the most effective way is to dynamically combine these approaches based on experience and understanding of the underlying technologies. This is a story of a particularly hard problem with an unexpected cause that I troubleshooted recently.

The problem I was facing was related to a Midgard staging/live setup using the Exorcist tool. The staging site was working just fine but the live site was having weird problems on some pages. It appeared as if the MidCOM components of the troublesome pages were losing a part of their configuration parameters.

Step 1. Because everything was working fine on the staging site I figured that the problem is most likely related to the staging/live replication step performed using the Exorcist. Were the parameters not being replicated? None of the common causes for such situations (unapproved content, parameters in a different sitegroup than the main object, etc.) seemed to apply however so I had to inspect whether the replication dump actually contained the missing parameters. It did!

Step 2. So the parameters were being replicated, but was there some trouble in how they were being imported in the live database? There was nothing strange in the import log file, so I opened a MySQL prompt to inspect the live database. The troublesome parameters seemed to exist just where they should be, and they even had all the correct GUID mappings and other details that had previously caused problem with the Exorcist.

Step 3. It seemed that the live database was in perfect condition, so the problem must be caused by something else. The problem was affecting just some MidCOM components, so I figured that there might be some difference in the component versions used on the staging and live hosts. No, the MidCOM source trees on both the staging and live hosts were identical.

Step 4. If the problem wasn’t with MidCOM, then it must be a problem with the Midgard framework. Adding debug prints to one of the troublesome MidCOM components I was able to narrow down the problem to a $object->listparameters($component) call in the MidCOM configuration class. The configuration class seemed to be working just fine everywhere else, so the problem must be in the listparameter method! For some object/component combinations the method just returned nothing and didn’t log any errors even though I double- and triple-checked that the parameters existed in the database.

Step 5. I added a set of debug prints to the Midgard-PHP listparameters method, compiled and installed the modified module, and reloaded the troublesome pages. They still didn’t work properly, but now I got a detailed trace of what was happening under the hood. The method actually did execute the correct SQL query but received no results from the database. This was certainly weird, as I was very certain that the parameters in question actually did exist.

Step 6. Could there be something wrong with the MySQL client library used by the Midgard core to send queries to and receive results from the database? The normal MySQL command line tools use the same library so I started the MySQL prompt and entered the SQL statement copied from the Midgard log file. No results! The SQL statement was:

SELECT id,domain,name,value FROM record_extension WHERE (tablename='topic' AND oid=27 AND domain='de.linkm.sitemap' AND lang=0) AND (record_extension.sitegroup in (0,2)) ORDER BY name

Step 7. There are some autogeneration artifacts in the query, but no real reason why it shouldn’t work. I removed the domain='de.linkm.sitemap' constraint and got a list of parameters, including ones with the domain column set to de.linkm.sitemap! Even worse, the exact same query with domain='midcom' or domain='midcom.helper.nav' were returning results, for some totally unknown reason MySQL seemed to not like the domain='de.linkm.sitemap' constraint. I tried also some other variations of the query and found out that even just removing the ORDER BY clause made the query return the correct rows. This should ring heavy warning bells for anyone who knows SQL, as the ORDER BY clause only orders the returned rows.

Step 8. Could this be some weird character set and collation issue like the ones we’ve been facing recently? No, it works the same way (rows are returned or not returned depending on whether the ORDER BY clause is used) regardless of the character encoding being used.

Step 9. Has the database been corrupted? I backed up and restored the entire database but the query was still misbehaving. I then copied the correctly behaving database from the staging host to the live host. The same error appeared in that database as well!

Step 10. The staging host was running MySQL version 4.1.12 while the live host had MySQL 4.1.7. Both stable and tested releases installed from the standard RHEL packages. I was recalling ugly memories from the MySQL 3.x times, when weird database errors seemed to be much more common.

Step 11. OK, it seemed that the problem was some MySQL bug that got fixed somewhere between versions 4.1.7 and 4.1.12. I didn’t want to risk messing with the database installation so I figured I’d better find some workaround rather than trying to upgrade the database version.

Step 12. I used the EXPLAIN statement trying to find out what could be triggering the MySQL bug. The output suggested that the query planner was using some of the indexes on the record_extension table. The indexes actually look quite weird:

KEY record_extension_oid_idx(oid),
KEY record_extension_tablename_idx(tablename(10)),
KEY record_domain_tablename_idx(domain(10)),
KEY record_extension_sitegroup_idx(sitegroup),
KEY record_extension_name_idx(name(10))

The indexes don’t seem to be reasonable. Why the arbitrary ten character limit, and why use only a single column per index? This seems anomalous enough to trigger an obscure bug in MySQL, so I just dropped the record_domain_tablename_idx index. And it worked!

The entire troubleshooting session took about three hours with a couple of breaks in between.


Implementing mRFC 0024

In Midgard on 2006-01-03 by Jukka Zitting

Today I wrote the mRFC 0024: Full text indexing in Midgard proposal for adding full text and content tree support to the Midgard Query Builder. Like Torben did for the MidCOM indexer, I’m planning to use Apache Lucene as the underlying full text engine. The search indexer process shall be based on the Lucene Java library, but I haven’t yet decided what I should use for the query part. On the surface the best option would seem to be either the Lucene4C or the CLucene library, but both options have drawbacks. The Lucene4C seems like the best match for the midgard-core environment, but it doesn’t seem to be too actively developed and there’s even been talk of abandoning it for a gcj-compiled version of Lucene Java. The CLucene library is more mature, but it’s written in C++ and might therefore cause some unexpected build issues for midgard-core. One option would of course be to actually try linking midgard-core with a gcj-compiled Lucene Java! I’ll prototype with all these options tomorrow while the mRFC 0024 vote takes place.

Another interesting issue in mRFC 0024 is the introduction of the parent cache, or actually a global content tree structure. Currently Midgard supports a sort of a tree model for all content, but it is mostly accessible only as limited views like for example the topic, page and snippet trees. Functions like is_in_tree or list_…_all have also required major scope limitations or other performance hacks to be useful. This is a bit troublesome for many use cases like searching and access controlling. The proposed parent cache would greatly simplify such content tree operations.

If the proposed content tree model catches on, then a natural migration path for Midgard 2.0 would be to make the proposed parent_guid field the official parent link in all Midgard records. This would both simplify the object model and allow for much flexibility in organizing the content tree. It would for example be possible to create an event calendar topic that has all the event objects as direct descendants instead of having to use an explicit link to a separate root event. The only problem with this approach is that it is a major backwards compatibility issue…