Database Implementation Languages

I have been thinking for a long time about what language would be best to use for implementing a relational database system. Keep in mind that this is not for any serious use; I want to implement this as an educational platform for the students in my database systems course, so that they can have some of the grungier details all nailed down for them. So, a clean, modular architecture is the number one focus, and performance is a rather distant concern.

(This is also why I have generally ruled out open-source database systems as a teaching aid. Although most open-source databases are quite impressive in their capabilities and performance, they just don’t have the modularity and the simplicity that an effective teaching tool would need.)

I have been looking at a number of languages. C and C++ are often used for this kind of thing, but there are two big problems - memory management issues, and portability concerns. Although it is certainly valuable for students to learn to debug bizarre seg-faults, I kind of would rather they were focusing on the database part. So I am reticent about using those languages.

I also have thought about using a language like OCaml, but I don’t think it will be easy to make it do what I need. OCaml would be fantastic for the higher level tasks - parsing SQL, representing and transforming execution plans, and the like - but as far as reading and writing tuples to disk pages, that would take a lot more effort.

So, surprisingly, I am settling on Java. Not only does it skirt most of the memory management concerns, but it also provides this really interesting feature - you can set the maximum heap size on the virtual machine, so that the database has to cope with reduced resources. I think this could be very helpful, since it would allow students to experiment with the strengths and weaknesses of different execution plans, without needing tons of data to deal with the large amount of memory that most computers have nowadays. Plus, Java provides both platform-independent file-syncing capabilities and threading/networking capabilities, so a student can work on Windows, Linux or MacOS/X just as easily.

Anyway, I will have to write a much longer note about this when I get the chance, but this is a summary of my thoughts so far!

3 Responses to “Database Implementation Languages”

  1. Ben Says:

    Do I understand correctly that you are going to write your own database to teach a database class? Wow, that sounds hard core. How many units is the class?

  2. donnie Says:

    I want to implement a database to use in a course about implementing databases, yes. The course about relational database theory and usage is nearly complete by now, but an appropriate second half to such a course would be relational database implementation.

    There are one or two such systems floating around out there - the most interesting one is Minibase. But I want to avoid C and C++ if I can.

    The idea is to model this course on CS134 - although the students are supposed to write a compiler, if they happen to blow any of the assignments, they can complete subsequent assignments using code that Jason provides them. Same idea here - I don’t want students getting in trouble because they can’t figure out how to implement a query optimizer or a write-ahead log.

    I don’t quite know how many units yet. This will probably be offered next year, so I have some time to work on all the details.

  3. Ben Says:

    Very cool.

Leave a Reply