Jumping through hoops to represent trees in Database

29 12 2009

Recently I have been working on a project where we have to represent hierarchical data in Database. Unfortunately we do not have much choice with the database. We are using a relational database.

If you have done this, you will agree with me that it is not a very enjoyable experience.

Firstly we need to choose between several models to represent trees in database

a. Adjacency (self referential tables)

b. Materialized path (lineage)

Shortcomings of adjacency model

Tree traversal is costly in adjacency model. Finding out children and grandchildren of a parent may be quite complex

Shortcomings of materialized path

Materialized path requires you to build this information at some point in time. If you have a million records for which you need to build materialized path, then I suggest you start now, because no knows when it will end. If some one knows of an efficient way of doing this please let me know. If you get past this stage, then there is the issue of updating the data to handle moves and deletions.

Static and Dynamic Data

The choice we make is mostly driven by how many changes can we expect. If we are never going to modify the data, probably materialized path any other approach which stores the lineage information alongside each row is useful. But this is rarely the case.

Some vendor specific help

The guys at micrsoft and oracle seem to have seen this issue and suggest the use of below techniques for this issue.

Sql Server

1. Common table expression: Popularly known as CTE, this is a way to run recursive queries on a self-referential table.

2. HierarchyID: This is a datatype that is available in SqlServer 2008. It uses materialized path.

Oracle

1. Start with and connect by: This is similar to the above method. It works on self-referential Table.

Object modeling trees

Imagine a scenario where you need to model a huge Family. I guess we start by having Person class. Each person has 0 or more children. Children is nothing but a collection of Persons. Mapping this to the data in database is a pain.

1. Lazy loading: Most probably you will have to lazy load the children as and when you need them. Else you may have to wait a generation to get the complete tree loaded.

2. If we want to implement things like Delete or reassignment, saving the data back to database will not be easy.

Better ways to store hierarchical data

Hierarchies are graphs. It is better to use a database like neo4j. Neo4j has been a very popular graph Db.

Advertisements




I hate ORM

9 12 2009

The title is not meant to start a war over the concept of ORM. I appreciate the effort that has gone into mappers. But lets take a look at why I hate ORMs. (Dont hate me because I hate ORM 🙂 )

Prelude

I am beginning to wonder how many applications that we build really need a relational database.

Some terms become synonymous with their usage. For instance in the Xerox has become synonymous with Copiers.

Relational databases have almost become synonymous with Databases. As a developer or anyone involved in system design it is very important to know the options that are available to store data. The choice of persistence technology governs application scaling and performance in a very big way.

Now, Why do I hate ORM

ORMs hide the inconvenience that comes with using RDBMS with object oriented code.

When I learned relational modeling, I really liked it. I still do like making relational models.  But how long have relational databases been in existence. They were in existence much before the widespread usage of object oriented programming. Back then code was procedural. The relationship between data had to exist somewhere and it made sense to have it in the persistent store. Querying became easier.

But it was rather hard to switch older persistent stores with other technologies when we moved to object oriented code. Reasons were many. For example: availability skilled database developers, strong trust in RDBMS, good vendor support etc. But the move towards newer languages like C++, java and C# was inevitable. ORMs was win win solution to this problem.

Before ORM, all of us were known to writing a mapping layer ourselves. ORM was such a relief when it hit the markets. It set us free after years of wrangling with ugly mappers. But in the revelry we seem to have forgotten that it was database that needed a second look and not the codebase.

Now we have Duplication of relationships in data as well as in code. It is surprising that duplication of relationships has not struck us as problem.

Even frameworks like rails give us an impression that the standard way to build a web application is to use an RDBMS as a backend.

I simply cannot grasp the amount of effort we put into mapping object to schema. Another annoying issue is to have completed Database Design before starting development. Using Hibernate or Active Record on top of an existing schema is nothing less than tying oneself up in knots.

There is no point in great Object Oriented code if the system design is not appropriate. It is my humble thought that ORM should not be used as an excuse to choose Relational Databases over other options. As in any case use with Discretion.

Let me know what you think.