Skip to page content or Skip to Accesskey List.

Work

Main Page Content

Working With Hierarchical Data In Sql Using Ancestor Tables

Rated 3.89 (Ratings: 0)

Want more?

 

christian cadieux

Member info

User since: 06 Jun 2007

Articles written: 1

This document describe an approach to SQL queries of hierarchical data using ancestor tables. Multiple inheritance are allowed but cycles are not allowed for this approach to work.

This approach is most appropriate when recursive SQL is not available and the hierarchies are not too deep.

With ancestor tables, inserts and deletes are slower but selects are very fast and there is no need to update the whole tree with each insert.

We use this hierarchy of objects as an example to illustrate the problem:

A common way to model this hierarchy is with a Parent table:

ObjectID

ParentID

5

3

6

3

6

4

7

4

3

2

4

2

2

1


This table can keep the relationships between the objects but does not allow to search for all the ancestors of an object or all the children of a parent using single SQL queries.

Using an ancestor table instead of a parent table has the advantage of flattening the relationships between object and simplifying the queries:

ObjectID

AncestorID

hops

5

3

1

5

2

2

5

1

3

6

3

1

6

2

2

6

1

3

6

4

1

6

2

2

6

1

3

7

4

1

7

2

2

7

1

3

3

2

1

3

1

2

4

2

1

4

1

2

2

1

1


Modeling the same hierarchy requires more records when ancestors are explicitly recorded but it simplifies the queries. For example, to get all the children of object-1:

select distinct object_id from ancestor_table where ancestor_id = 1;

object_id

=======

5

6

7

3

4

2

In order to preserve the hierarchical order, a third column called 'hops' can be added to the table. This is useful when query results need to be in order of proximity of the object to the ancestor. Saving the number of hops information in the table also allows to recontruct the table from scratch by first deleting all the records with hops > 1 and then re-inserting all the ancestors. The records with hops=1 are equivalent to the parent table, hops> 1 are the ancestor records.

It is also important to have the ability to add and delete associations from this graph without having to re-write the complete tree to the database. This is the general algorithm for inserting and deleting an association in the graph:

To add an association between child1 and parent1, find all the parents of parent1 and attach them to each children of child1

Similarly, for deleting an association:

To remove an association between child1 and parent1, find all the parents of parent1 and remove them from each child of child1

For this solution to work, redundant associations need to be recorded in the database. In our example, there is two associations between object-6 and object-2, one using object-3 and one using object-4. They both need to be present in the ancestor table. This allows the delete function to delete only one of the two associations. In our example, even after removing the object-3 to object-2 association, object-6 can still reach object-2.

The insert and delete algorithms need to properly maintain the 'hops' value. Updates need to be converted into a delete and insert combination.

Here are the two algorithms in Perl:

sub insert_association {

my($child, $parent) = @_;

# give all group ancestors to all children of group.

my $rc = sql("select ancestor_id, hops from ancestor_table where object_id = $parent");

my $rc2 = sql("select group_id,hops from ancestor_table where ancestor_id = $child ");

foreach my $obj (@$rc2) {

foreach my $anc (@$rc) {

my $hops = $obj->[1] + $anc->[1] + 1; # adjust the hops

sql("insert into ancestor_table (object_id, ancestor_id, hops)

values ($obj->[0], $anc->[0], $hops )");

}

}

}

sub delete_association {

my($child, $parent) = @_;

# remove parent and it's ancestors from group and its children

my $rc = sql("select ancestor_id,hops from ancestor_table where object_id = $parent ");

my $rc2 = sql("select object_id,hops from ancestor_table where ancestor_id = $child");

foreach my $c (@$rc2) {

foreach my $p (@$rc) {

my $hops = $c->[1] + $p->[1] + 1;

my $rc3 = sql("select oid,hops from ancestor_table

where object_id = $c->[0] and ancestor_id = $p->[0] and hops = $hops");

# delete only one association, with the right hops.

# oid is a postgres generated sequence_id used to identify each association.

foreach my $o (@$rc3) {

sql("delete from ancestor_table where oid = $o->[0]");

last;

}

}

}

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.org Evolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.