Jumping through hoops to represent trees in Database

29 12 2009

Recently I have been working on a project where we have to represent hierarchical data in Database. Unfortunately we do not have much choice with the database. We are using a relational database.

If you have done this, you will agree with me that it is not a very enjoyable experience.

Firstly we need to choose between several models to represent trees in database

a. Adjacency (self referential tables)

b. Materialized path (lineage)

Shortcomings of adjacency model

Tree traversal is costly in adjacency model. Finding out children and grandchildren of a parent may be quite complex

Shortcomings of materialized path

Materialized path requires you to build this information at some point in time. If you have a million records for which you need to build materialized path, then I suggest you start now, because no knows when it will end. If some one knows of an efficient way of doing this please let me know. If you get past this stage, then there is the issue of updating the data to handle moves and deletions.

Static and Dynamic Data

The choice we make is mostly driven by how many changes can we expect. If we are never going to modify the data, probably materialized path any other approach which stores the lineage information alongside each row is useful. But this is rarely the case.

Some vendor specific help

The guys at micrsoft and oracle seem to have seen this issue and suggest the use of below techniques for this issue.

Sql Server

1. Common table expression: Popularly known as CTE, this is a way to run recursive queries on a self-referential table.

2. HierarchyID: This is a datatype that is available in SqlServer 2008. It uses materialized path.


1. Start with and connect by: This is similar to the above method. It works on self-referential Table.

Object modeling trees

Imagine a scenario where you need to model a huge Family. I guess we start by having Person class. Each person has 0 or more children. Children is nothing but a collection of Persons. Mapping this to the data in database is a pain.

1. Lazy loading: Most probably you will have to lazy load the children as and when you need them. Else you may have to wait a generation to get the complete tree loaded.

2. If we want to implement things like Delete or reassignment, saving the data back to database will not be easy.

Better ways to store hierarchical data

Hierarchies are graphs. It is better to use a database like neo4j. Neo4j has been a very popular graph Db.




One response

15 07 2013
mold removal toronto

What are some additional books that talks about this? Can
you kindly share a couple of other book titles that can possibly be available in
the local library? Appreciate it!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: