Functional Dependencies

I wrote a fun little program for computing the closure of a set of functional dependencies this week. Functional dependency theory is the foundation for Third Normal Form and Boyce-Codd Normal Form, and a variety of other neat things you can do with relation schemas. An interesting and powerful set of ideas, but I think that most database designers don’t really use functional dependency theory so much. But it is fun to play with!

Anyway, a lot of the problems are very simple in that they work with abstracted schemas, such as R(A, B, C, D, E), and if you take a set F of functional dependencies against the relation R, you would like to compute things like “What is the closure of F?” or “What is a canonical cover of F?” Then you can answer questions like “What are the candidate keys of R, given F?”

One of the problems on the homework for this material was to compute the closure of a set of functional dependencies, and there were almost 850 dependencies in the result. So a lot of the students wrote little programs to compute the results, instead of writing them out by hand. That prompted me to write one too, partly to check their answers, and partly to flaunt the L33T h4X0r sK1LLz. I wrote my program in C++, and packed the attribute-sets into int values so that it would be easy to take the union of two attribute-sets, or the intersection or the difference, and so forth. Most of my students used languages like Haskell or OCaml, which make this kind of problem pretty easy to solve.

Of course, their programs all took several minutes to complete, on a relation with five attributes. Mine took about a fifth of a second. Go C++! :-) My program starts to crawl when it gets up to 20+ attributes in the schema; at that point you have 2^38 different functional dependencies to wade through, so the answer will be slow in coming, regardless of the language.

It’s clear that there are some problems that are really well-suited to particular languages. And this is the kind of problem where there is definitely a lot of symbolic manipulation - you want to implement the various inference rules as easily and correctly as possible - but there’s also a lot of room for massive optimizations just by using a representation that’s streamlined for set operations. Not a linked list or some kind of tree.

Anyway, it was a fun little program to hack together. I suppose I should post it on my website sometime soon…

2 Responses to “Functional Dependencies”

  1. Fred Says:

    OCaml has a set data structure too, so I wonder if the students’ programs were just algorithmically inefficient or if it really has something to do with the language?

  2. donnie Says:

    Well Fred, you are correct on both counts. Yes, OCaml does have a set data structure, which the students actually used in their OCaml programs. And yes, the students’ programs were also algorithmically inefficient. :-)

    This is a bit of an apples-to-oranges comparison, I admit. I would expect my C++ program to be much slower if it relied only on STL sets. Those also use a tree to represent the values in the set, and it would be very inefficient to compute unions, differences, etc. So, the main difference here is simply in how you represent the set, based on the characteristics of the set, and based on what you will be doing with the set. For example, these sets will have no more than 26 elements! And, there will be a lot of set unions and differences. So, most of the general-purpose set data-types wouldn’t be a good choice.

    Similarly, I am sure that one could cook up an OCaml data type exactly like the one I made in C++: sets are ints, and each element in the set is a bit in the int. I imagine you could probably make a very efficient program in any language, as long as you have bitwise logical operations at your disposal.

    The other part of the problem is how one actually computes the set of functional dependencies. There are fast ways and slow ways, and in the set of slow ways there are some really slow ways, and then some not so slow ways. It may also be that the students didn’t have their programs as well-tuned as I did.

    So, maybe I should have entitled my post “Old Age and Treachery.” :-) This problem doesn’t come down specifically to the language that’s used; it’s more about recognizing what operations are used 95% of the time in the solution, then making those operations really really fast.

    Oh, and also I am fond of C++. :-)

Leave a Reply