Jumping through hoops to represent trees in Database

29 12 2009

Recently I have been working on a project where we have to represent hierarchical data in Database. Unfortunately we do not have much choice with the database. We are using a relational database.

If you have done this, you will agree with me that it is not a very enjoyable experience.

Firstly we need to choose between several models to represent trees in database

a. Adjacency (self referential tables)

b. Materialized path (lineage)

Shortcomings of adjacency model

Tree traversal is costly in adjacency model. Finding out children and grandchildren of a parent may be quite complex

Shortcomings of materialized path

Materialized path requires you to build this information at some point in time. If you have a million records for which you need to build materialized path, then I suggest you start now, because no knows when it will end. If some one knows of an efficient way of doing this please let me know. If you get past this stage, then there is the issue of updating the data to handle moves and deletions.

Static and Dynamic Data

The choice we make is mostly driven by how many changes can we expect. If we are never going to modify the data, probably materialized path any other approach which stores the lineage information alongside each row is useful. But this is rarely the case.

Some vendor specific help

The guys at micrsoft and oracle seem to have seen this issue and suggest the use of below techniques for this issue.

Sql Server

1. Common table expression: Popularly known as CTE, this is a way to run recursive queries on a self-referential table.

2. HierarchyID: This is a datatype that is available in SqlServer 2008. It uses materialized path.

Oracle

1. Start with and connect by: This is similar to the above method. It works on self-referential Table.

Object modeling trees

Imagine a scenario where you need to model a huge Family. I guess we start by having Person class. Each person has 0 or more children. Children is nothing but a collection of Persons. Mapping this to the data in database is a pain.

1. Lazy loading: Most probably you will have to lazy load the children as and when you need them. Else you may have to wait a generation to get the complete tree loaded.

2. If we want to implement things like Delete or reassignment, saving the data back to database will not be easy.

Better ways to store hierarchical data

Hierarchies are graphs. It is better to use a database like neo4j. Neo4j has been a very popular graph Db.





Coroutines – back to basics

27 12 2009

Ruby 1.9 Fibers has got me reading about Coroutines.
Thought I should put all my understanding somewhere, as I read and understand coroutines in more depth.

Most of the content in this post just a aggregation of various sources.

Coroutines are program components that allow multiple entry points and can return any number of times. Coroutines belong to a category of programming construct called Continuations.

All programming languages have one way or another to handle control flow. Within a control flow there is an associated state. This state is information like value of a variable etc. Callstack is one of the most popular way to store this information. Every method has its own call stack and this stack is erased once the method returns either normally or through exceptional Flow.

In a Coroutine this is not the case. We can suspend and resume execution without loosing the stack.

Types of Coroutines:

1. Symmetric Coroutines: A symmetric coroutine can use one function to yield and another to resume. Example: Lua

2. Asymmetric Coroutines: They are also called as semi-coroutines. The choice for the transfer of control is limited. Asymmetric Coroutines can only transfer control back to their caller. Example: Ruby 1.9  Fibers

Examples:

producer consumer


#!/usr/bin/ruby1.9.1

def producer
Fiber.new do
value = 0
loop do
Fiber.yield value
value += 1
end
end
end

def consumer(source)
Fiber.new do
for x in 1..9 do
value = source.resume
puts value
end
end
end

consumer(producer).resume

Fibonacci


#!/usr/bin/ruby1.9.1

fib = Fiber.new do
x, y = 0, 1
loop do
Fiber.yield y
x, y = y, x+y
end
end

20.times { puts fib.resume }

Why are coroutines important?

The main reason why coroutines are making the limelight again is because of concurrency. In my humble opinion, concurrency is reviving many of the well known but forgotten programming concepts back.

To take the example of ruby, most of us are aware of the Global Interpreter Lock. Threading in ruby is totally useless because ultimately all thread run as part of the same OS thread, which means there no true concurrency. Fibers in ruby are very similar to threads but are light weight threads. They can scheduled, suspended and resumed as per the programmers choice.

Coroutines can be used to construct the actor model of concurrency. This is the same model used by Erlang. Revactor is a very nice implementation of the actor model in ruby.

I will add code here when time permits.





Remote inception – An Experience Report on an inception over phone

9 12 2009

Before I start, I would like to state that this article does not advocate for or against running an agile inception over phone. It is more of an experience report. Please feel free to post your comments.

Introduction

Inception is at the heart of a successful agile engagement. In an agile project we work with the client and not just for the client. Inception starts the process where the team, client and consultants, start thinking alike and working together.
It is so much easier to work with a colleague once you have synchronized your frequencies/wavelengths. Okay, enough blabber about wave theory.

This article is about my experience, learning and rant about remote inceptions. I intend to keep it more like a free flowing conversation.

Inception

Projects are set fail if the initial understanding and the basis for further development is flawed. Some key questions that immediately come to our mind:

Does the client know what he wants?

Does he really need to build it, or can he buy something that already exist (COTS)?

The answers to these and many other questions would become apparent in an inception.

Ideally inception is where we start by setting a vision for the project, break it down into achievable milestones and further down into playable stories. But all this mandates that you have the client right in front of you.

Most inception exercises require face to face interaction to make communication as clear as possible. It is necessary to use tools (simple and sophisticated) to make mental model explicit, elicit the requirements and clear any doubts. Inception is a fun and effective way to interact with the client and bring every one on the team on board with the project’s goals.

Most inception should have the below activities in the agenda.

1. Team introductions – May seem simple. But simple activities like playing a small team game act as the all important ice breaker.

2. Collaborative Modeling sessions – As many or as little, as per requirement of the project. A good inception would have several of these sessions on project specific topics as well as general discussions on Non Functional Requirements.

3. Prioritization – Lay out the options in front of the client story cards. Let the client move the cards to prioritize them. In some cases this exercise leads to a rough release plan.

4. Inception showcase
The above activities are a small subset of an inception. But, these are the ones which bring out the most useful facts that are necessary for the project success. Also they are the ones which require as much face to face interactions and team efforts.

At the end of an inception the team must be able to decide if they should go ahead with the project.

Context

Let me now explain a little bit about the scenario we were faced with. Our clients had very limited budget and could not afford to include travel expenses for either them to travel to our location (India) or for us to travel to theirs (Chicago). It may seem very sensible to not start the project until sufficient budget is available. But the client could not get more budget unless something was built and built soon. So we had to do an inception with them over phone, with a 12 hour time difference (Sadly the video conference equipment on their side was broken).

All this got us be more resourceful and improvise with what we had. The only way ahead was to address all the risks as best as we could.

Managing risk

A remote inception is very risky business. The probability of success is quite small. Always communicate this to the customer and try and push for a face to face inception. Remember this is not for your benefit, but it is in the best interest of the customer. It is a good idea to maintain a shared risks log with the customer.

Below are some risks that we faced.

Risk: Understanding about some Features may not be completely correct

Mitigation:

  • Client was made aware that there may be minor misunderstanding despite best efforts.
  • In our case the application functionality was quite closely associated with the UI. So we came with early mockups that were as close as possible to what the client wanted. We let them edit the same and maintained them for future reference.
  • Rather than plainly documenting technical understanding, we built very crude prototypes. Most of the time, code is the best documentation and communication mechanism.
  • The client was made aware that our initial estimates would be bumped up by a certain risk factor to accommodate any issues with understanding. It is better to promise less and deliver more.

Risk: 12 hour time lag. It was imminent from the beginning that we had to spend time outside our usual working hours to spend enough time on the inception.

Mitigation:

We scheduled for calls which ranged between 3 to 4 hours everyday. A face to face inception can have day long agenda. But it is better to maintain lesser number of hours on remote inceptions. Small 15 minute breaks were counted in.

Inception Agenda

We made sure that all stake holders were in a position to dedicate time for inception. Instant messenger proved very useful. We also sent out links to tools like webex (Desktop sharing tool). Initially we were confident about our own superhuman capabilities to spend late hours at office to have longer conference calls. But a senior member in our team rightly pointed out the flaw and reduced it to an optimal 3 hour call. This suited us well. After three hours over the phone it is extremely tiring to do any other productive work.

We prepared the agenda to ensure that we had time to cover all topics that we considered necessary. But it was not something that was set in stone. Some sessions finish ahead of time while others may reveal unknown areas, which require fresh slots to be included. We revised the agenda from time to time.

A typical day would start with the recap of the previous days meeting notes. This would be quick 15 minute exercise which would warm up the team for the long call ahead. We also used this time to follow up on each other’s progress.

Communication and meeting notes

As in the case of any normal inception, never go without a good scribe. We took turns at this role and noted down all the key points. Though it may have sounded silly sometimes, we tried to paraphrase the client’s sentences and validated our understanding. The client was informed at the very beginning of the inception that we may have to repeat some lines to confirm our understanding. In our case one team member from our customer side volunteered to take notes as well. At the end of the day we would share notes and if there are any differences in understanding we would resolve it in the following day’s meeting. Once all differences are cleared, we would put it in a place where every one has access.

Try to learn how each person sounds, so that you can associate a voice and/or accent to a person. Also suggest your customer to do the same. This helps a lot in keeping the conversation easy.

Sometimes you will not know when the person on the other side of the phone has lost interest in what you are saying. It is better to speak slowly and clearly. While talking to a person face to face it is very easy to detect when he/she is loosing interest. On the phone the one possible way to do this is to include small questions while one speaks. This way you know the person on the other side is listening.

Tools

We used low tech tools to simulate a virtual card wall where clients to could move cards. You could use an online card wall for this. Screen sharing tools like webex are extremely important. There are quite a lot of free tools available.

Start using a project management tool early in the cycle. Start adding stories to the project management tool as early as possible. A spreadsheet may be easy to start with. We used mingle for project management.

In Retrospect

If I had to do this all over again, I will still consider it extremely risky business. Few things that I might do differently are listed below.

  1. Get the video conference equipment worked out early. In our case since the team size was small. So this did not become a great issue. I would strongly recommend having a video conference for bigger teams.
  2. If there is not enough budget for the entire team to travel, try to have at least one representative from your side at the customer’s location. He could facilitate the activities.
  3. Capture the clients mood over the period of inception using tools like Niko-niko Calendar

Summary

Remote inceptions are tough if not impossible. Try to avoid it as much as you can. But if have to do it, you know are not alone. In the end it is our goal to help the customer, no matter what the constraint. Fortunately, in our case the exercise was a success and the customer was happy.





I hate ORM

9 12 2009

The title is not meant to start a war over the concept of ORM. I appreciate the effort that has gone into mappers. But lets take a look at why I hate ORMs. (Dont hate me because I hate ORM :) )

Prelude

I am beginning to wonder how many applications that we build really need a relational database.

Some terms become synonymous with their usage. For instance in the Xerox has become synonymous with Copiers.

Relational databases have almost become synonymous with Databases. As a developer or anyone involved in system design it is very important to know the options that are available to store data. The choice of persistence technology governs application scaling and performance in a very big way.

Now, Why do I hate ORM

ORMs hide the inconvenience that comes with using RDBMS with object oriented code.

When I learned relational modeling, I really liked it. I still do like making relational models.  But how long have relational databases been in existence. They were in existence much before the widespread usage of object oriented programming. Back then code was procedural. The relationship between data had to exist somewhere and it made sense to have it in the persistent store. Querying became easier.

But it was rather hard to switch older persistent stores with other technologies when we moved to object oriented code. Reasons were many. For example: availability skilled database developers, strong trust in RDBMS, good vendor support etc. But the move towards newer languages like C++, java and C# was inevitable. ORMs was win win solution to this problem.

Before ORM, all of us were known to writing a mapping layer ourselves. ORM was such a relief when it hit the markets. It set us free after years of wrangling with ugly mappers. But in the revelry we seem to have forgotten that it was database that needed a second look and not the codebase.

Now we have Duplication of relationships in data as well as in code. It is surprising that duplication of relationships has not struck us as problem.

Even frameworks like rails give us an impression that the standard way to build a web application is to use an RDBMS as a backend.

I simply cannot grasp the amount of effort we put into mapping object to schema. Another annoying issue is to have completed Database Design before starting development. Using Hibernate or Active Record on top of an existing schema is nothing less than tying oneself up in knots.

There is no point in great Object Oriented code if the system design is not appropriate. It is my humble thought that ORM should not be used as an excuse to choose Relational Databases over other options. As in any case use with Discretion.

Let me know what you think.








Follow

Get every new post delivered to your Inbox.

Join 239 other followers