Inspired by The Third Manifesto, a book by Chris Date and Hugh Darwen, we're putting forward an implementation of a truly relational language using Python. We will address two problems:
- The impedance mismatch between programming languages and databases
- The weakness and syntactic awkwardness of SQL
Mind The Gap
Most of today's programs handle data in one way or another and often this data is stored in some kind of relational database. To read and modify this data, a program must bridge the gap between its representation and the one used by the dialect of SQL that the database provides. This bridge typically comprises a database API that sends queries as text strings, often accompanied by some kind of table-to-object mapper that has to coerce data and relationships in both directions, usually with elaborate layers of abstraction in an effort to keep the two sides loosely coupled.
"Yet by obscuring the true data source these solutions end up throwing away the most compelling feature of relational databases; the ability for the data to be queried."
—Microsoft, DLinq .NET Language-Integrated Query for Relational Data, May 2006
This approach not only adds complexity and increases the need for data transformations but, most importantly, it destroys the significant advantages provided by the relational model of data. The relational model is built upon predicate logic which brings the power of formal reasoning to data: it is the only sound foundation available.
Enough of the Shenanigans!
A number of approaches and frameworks have been proposed to span the gap between the two systems; most never question why there are two systems in the first place. Microsoft's forthcoming LINQ to SQL (formerly DLinq) is a major attempt to bring SQL closer into the program than before, but will still keep the database sub-language and all that it entails.
"It is no wonder that applications expected to bridge this gap are difficult to build and maintain. It would certainly simplify the equation to get rid of one side or the other. Yet relational databases provide critical infrastructure for long-term storage and query processing, and modern programming languages are indispensable for agile development and rich computation."
—Microsoft, DLinq .NET Language-Integrated Query for Relational Data, May 2006
The solution to the problem is not to get rid of one side or the other, nor to have one side overlap the other, but to merge the two sides into one: supersede SQL (the COBOL of database languages) with a true relational programming language, one that is computationally complete, and then the gap disappears. Our solution uses one of the most effective, expressive and readable languages available, Python, and extends it with relations and a sound relational algebra.
"It was Codd's very great insight that a database could be thought of as a set of relations, that a relation in turn could be thought of as a set of propositions (assumed by convention to be true), and hence that all of the apparatus of formal logic could be directly applied to the problem of database access and related problems."
C. J. Date, The Database Relational Model, Addison-Wesley, April 2000