Notes on Storing Hierarchical Data in a Database

So i’ve been working on setting up trees (hierarchies) of classification information for ActiveDirectory. My solution needs to be fast for reading the information and display, but because people publish new organizations much much less frequently than they look at the existing entries inserts can be slow. Really slow actually, because we only update the categories when we are changing the directory structure, not simply adding or moving organizations around within it.

The best model i’ve found out there is Modified Preorder Tree Traversal. The basic idea is that you pre-walk the hierarchy so that every node can easily find it’s parent and child with simple non-recursive SQL. The best introduction article i’ve found is storing hierarchical data in a database which covers the basics and has a nice little graphic to show you what it means. There’s a code example in php, ugh, about how to use and set data from your program.

Other than that there is a wealth of articles by Joe Celko. Including Trees in SQL – Nested Sets, and with instructions on how to do MPTT when when the tree changes often [free email reg required]. Many other useful Celko articles can be found here, here, and here.

Then it’s also worth looking at Tree in SQL design pattern wiki page.

If you are looking for a good text explaining relational databases I found Handbook of Relational Database Design to be indispensable. I wish they would update it and release a new version as it’s 10 years old. Even then it’s 100 times better to learn from this book which talks about relational databases than from any introduction to SQL.


About this entry