NanoDB

The summer break is finally here, and I for one am very excited about this summer. This year I have a SURF student helping me out with the implementation of my “educational database system” in Java. The working name is currently NanoDB, although I am certainly open to other suggestions.

The database is going to be pretty simple, but even simple databases are pretty complex pieces of software. NanoDB will initially support only a handful of data types, and it will lack most of the sophisticated schema-management features that modern databases typically offer. Also, when we have an option to make something simple vs. making it fast, we’ll probably choose to make it simple, since that’s what will help students understand what’s going on.

Today I finished up the core file-storage functionality, which should allow us to start storing tuples in data files, traversing them (i.e. simple SELECT statements), and even deleting tuples! I am using a very simple version of the slotted-page file format, partly to keep it simple for students, and partly because I don’t want my implementation to be too good!

One of the things that students always seem to enjoy is a competition. So, at the end of the term I want to run a “Database Olympics,” where students can run their databases against a large set of queries and other operations. The fastest databases will win… something? (Probably not better chances with the ladies…) Because of this, the base implementation I give them should not be the fastest possible implementation.

When implementing a database storage format, it is easy to think only about SELECT performance, without thinking about the performance and cost of INSERT or DELETE operations. To improve INSERT operations, database files frequently store a linked-list of “non-full pages,” so that a page with enough free space can be found with only a few disk accesses. Correspondingly, when DELETE operations are performed, this list of non-full pages needs to be updated properly so that the DB can easily find the space that just opened up.

My implementation doesn’t do any of that. So, it will be slow for lots of INSERT operations, or a mix of INSERT and DELETE operations. Not to mention that it will probably waste quite a bit of space on the disk as well.

But, it will be a good opportunity for students who want to dig into file-storage optimizations. And a few relatively simple adjustments should produce a database with much faster INSERT and DELETE performance, not to mention a smaller data-file footprint.

I can’t believe there are people who think this stuff is boring!

Leave a Reply