So i've been working on setting up trees (hierarchies) of classification information for ActiveDirectory. My solution needs to be fast for reading the information and display, but because people publish new organizations much much less frequently than they look at the existing entries inserts can be slow. Really slow actually, because we only update the categories when we are changing the directory structure, not simply adding or moving organizations around within it.
The best model i've found out there is Modified Preorder Tree Traversal. The basic idea is that you pre-walk the hierarchy so that every node can easily find it's parent and child with simple non-recursive SQL. The best introduction article i've found is storing hierarchical data in a database which covers the basics and has a nice little graphic to show you what it means. There's a code example in php, ugh, about how to use and set data from your program.
Other than that there is a wealth of articles by Joe Celko. Including Trees in SQL - Nested Sets, and with instructions on how to do MPTT when when the tree changes often [free email reg required]. Many other useful Celko articles can be found here, here, and here.
Then it's also worth looking at Tree in SQL design pattern wiki page.
If you are looking for a good text explaining relational databases I found Handbook of Relational Database Design to be indispensable. I wish they would update it and release a new version as it's 10 years old. Even then it's 100 times better to learn from this book which talks about relational databases than from any introduction to SQL.
Posted by rabble at August 6, 2003 11:03 AM | TrackBack... I didn't sleep much last night, so this will be quick. I'd like to continue this conversation further, though.
Sometime in April, I wrote up the skeleton of a link management system. One conclusion I've drawn from that is that it's absolutely essential that "objects" (i.e., links, groups, recipes, etc.) be classified in a facetted, rather than an hierarchical way.
I looked at Reverse-Preorder, but you can't build nested trees from that. In fact, my biggest problem is building SQL queries that aren't horrendously recursive. Small trees work ok, but I'm sure that scalability will pose some major issues. This may be due in large part to my relative lack of experience with SQL, but also because no-one seems to have documented (very well) nested SQL structures.
One possibility would be to have two databases; one, created at intervals, that contains all the possible traversals, and the other with the actual faceted data. An implementation of this could blur the lines by caching tree data as it is retrieved into the database, and building a partial RPT.
Anyways, I have an example database schema that allows for an immense amount of flexibility. Once I've woken up, I'll email it over, rather than pasting it in here.
Posted by: blaine at August 6, 2003 01:11 PMRead the sitepoint article, it explains that with Modified Preorder Tree Traversal you can do nested trees and have a forest of seperate trees. Joe Celko also explains here how to make a MPTT which allows for lots of entries to be added before the tree needs to be re-ordered. (i've added the above link to the article too.)
One thing i've been playing with is is how to combine facets and trees. So in my schema it allows for alternate names. I could use the name animal rights, and jose can use derechos animales, and we can connect the two as really being the same category. I could also use an alternate name within my language, such as 'animal liberation' which could either be a synomym of animal rights or a subset category.
Anyway i've got the schema in the test database and lots of pages of diagrams and object models on paper, but no working code to show yet.
Posted by: rabble at August 7, 2003 11:13 PMI was fascinated by that article as well as I am designing a CMS/Portal system with an expected size in the 10,000+ node range (at least!)
Obviously the standard MPTT method will not be feasable for updates when, on average, an update has to alter n/2 nodes (where n is the number of nodes). And what is worse it has to do it in two sweeps using Gij's method for updates.
I have made some improvements to the method, which I have discribed here
http://www.sitepointforums.com/showthread.php?p=1046647
I welcome anyone to comment on my methods and help with refining this method.
Cheers
Dave
Posted by: Dave Cheney at December 23, 2003 09:41 PMIn your free time, check out some relevant pages about- Tons of interesdting stuff!!!
Posted by: at December 2, 2004 01:21 AMAccess ACDSee Active ActiveX Address Adobe Advanced Agent All Antivirus Assistant Audio Backup Beta Book Boson Browser Build Builder Burner Burning Business Calculator Calendar Capture Cleaner Client Clock Code Commander Control Converter Cool Copy Creator Data Delphi Deluxe Designer Desktop Digital Disk Dos Download DVD Easy Edit Edition Editor Email English Enterprise Expert Explorer Express Extractor Fast File Final Firewall Flash Folder French FTP Full Generator German Gif Gold Home HTML Icon Image Internet Java Linux Lite Loader Lock Mac Macromedia Magic Mail Maker Manager Master Media Menu Microsoft Monitor Movie Multi Music Nero Net Network Office Organizer Pack Password PDF Personal Photo Photoshop Picture Player Plus Popup Power Print Pro Professional Quick Recorder Recovery Registry Retail Ripper ROM Saver Screen Screensaver Search Security Server Smart Software Sound Standard Studio Suite Super System Text Time Tool Tools Tracker Trainer Ulead Ultra Utilities Version Video View Viewer Virtual Visual Wave Web Win Windows Winrar Wizard Workshop Zip
Posted by: software at March 15, 2005 12:00 AM