Recently I have been working on a project where we have to represent hierarchical data in Database. Unfortunately we do not have much choice with the database. We are using a relational database.
If you have done this, you will agree with me that it is not a very enjoyable experience.
Firstly we need to choose between several models to represent trees in database
a. Adjacency (self referential tables)
b. Materialized path (lineage)
Shortcomings of adjacency model
Tree traversal is costly in adjacency model. Finding out children and grandchildren of a parent may be quite complex
Shortcomings of materialized path
Materialized path requires you to build this information at some point in time. If you have a million records for which you need to build materialized path, then I suggest you start now, because no knows when it will end. If some one knows of an efficient way of doing this please let me know. If you get past this stage, then there is the issue of updating the data to handle moves and deletions.
Static and Dynamic Data
The choice we make is mostly driven by how many changes can we expect. If we are never going to modify the data, probably materialized path any other approach which stores the lineage information alongside each row is useful. But this is rarely the case.
Some vendor specific help
The guys at micrsoft and oracle seem to have seen this issue and suggest the use of below techniques for this issue.
1. Common table expression: Popularly known as CTE, this is a way to run recursive queries on a self-referential table.
2. HierarchyID: This is a datatype that is available in SqlServer 2008. It uses materialized path.
1. Start with and connect by: This is similar to the above method. It works on self-referential Table.
Object modeling trees
Imagine a scenario where you need to model a huge Family. I guess we start by having Person class. Each person has 0 or more children. Children is nothing but a collection of Persons. Mapping this to the data in database is a pain.
1. Lazy loading: Most probably you will have to lazy load the children as and when you need them. Else you may have to wait a generation to get the complete tree loaded.
2. If we want to implement things like Delete or reassignment, saving the data back to database will not be easy.
Better ways to store hierarchical data
Hierarchies are graphs. It is better to use a database like neo4j. Neo4j has been a very popular graph Db.