Dee

makes Python relational

Author: Greg Gaughan
Copyright: Copyright (C) 2007 Greg Gaughan
Licence:GPL (see Licence.txt for details)
Contact: feedback@quicksort.co.uk
Date: 22/05/2007

Contents

Introduction

Inspired by Date and Darwen's Databases, Types and the Relational Model (The Third Manifesto), we're putting forward an implementation of a truly relational language using Python. We will address two problems:

  1. The impedance mismatch between programming languages and databases
  2. The weakness and syntactic awkwardness of SQL

Mind the Gap

Most of today's programs handle data in one way or another and often this data is stored in some kind of relational database. To read and modify this data, a program must bridge the gap between its representation and the one used by the dialect of SQL that the database provides. This bridge typically comprises a database API that sends queries as text strings, often accompanied by some kind of table-to-object mapper that has to coerce data and relationships in both directions, usually with elaborate layers of abstraction in an effort to keep the two sides loosely coupled.

"Yet by obscuring the true data source these solutions end up throwing away the most compelling feature of relational databases; the ability for the data to be queried."

—Microsoft, DLinq .NET Language-Integrated Query for Relational Data, May 2006

This approach not only adds complexity and increases the need for data transformations but, most importantly, it destroys the significant advantages provided by the relational model of data. The relational model is built upon predicate logic which brings the power of formal reasoning to data: it is the only sound foundation available.

Enough of the Shenanigans!

A number of approaches and frameworks have been proposed to span the gap between the two systems; most never question why there are two systems in the first place.

Microsoft's forthcoming LINQ to SQL (formerly DLinq) is a major attempt to bring SQL closer into the program than before, but will still keep the database sub-language and all that it entails.

"It is no wonder that applications expected to bridge this gap are difficult to build and maintain. It would certainly simplify the equation to get rid of one side or the other. Yet relational databases provide critical infrastructure for long-term storage and query processing, and modern programming languages are indispensable for agile development and rich computation."

—Microsoft, DLinq .NET Language-Integrated Query for Relational Data, May 2006

The solution to the problem is not to get rid of one side or the other, nor to have one side overlap the other, but to merge the two sides into one: supersede SQL (the COBOL of database languages) with a true relational programming language, one that is computationally complete, and then the gap disappears. Our solution uses one of the most effective, expressive and readable languages available, Python, and extends it with relations and a sound relational algebra.

A Bit of History

Since its inception in 1969 by E. F. Codd, the relational model has been the foundation for nearly all databases. It replaced earlier network and hierarchical ad-hoc approaches to data storage by being as simple as it needed to be, but no simpler. It was so powerful it allowed users to ask for what they wanted to find, rather than specify how they might find it.

Over the decades, SQL has become the de-facto language for relational databases, but SQL misses many of the benefits of relational technology. In recent years, partly due to SQL's weaknesses and partly due to minimalistic and stagnant implementations, the database has become merely a storage engine fronted by layers of drivers, mappers, hierarchical markups and frameworks which make flexible querying both complex and distant from the application code.

Where We're Coming From

Having implemented a comprehensive, standards-compliant SQL server, ThinkSQL, we did some further research into the history of SQL's dominance in the marketplace and its quirky syntax. We found a far superior alternative in the form of D [1], a generic name for any relational language that conforms to The Third Manifesto. We’ve implemented such a language, Dee, as an extension to Python.

The relational algebra and most of the ideas underlying Dee come from Date and Darwen's Databases, Types and the Relational Model (The Third Manifesto). An introduction into the ideas behind it can be found in Databases in Depth and many related links and reference materials are on The Third Manifesto website.

The current version of Dee is an initial release to gain feedback regarding the approach. We chose Python because its interpreted style, dynamic typing and built-in sets and dictionaries make it ideal for interacting with data; plus any language that allows you to do the following sorts of things has got to be good:

>>> x, y = 45, 90
>>> print x, y
45 90
>>> x, y = y, x         #swapping values without the usual temporary variable!
>>> print x, y
90 45

>>> 70 < x < 120
True

See Why Use Python? for more information on the advantages of the language. A guide to the Python language can be found in An Introduction to Python. We do assume you are familiar with Python in what follows.

Where We're Going

The current release is an initial proposal, intended to encourage feedback. We have many ideas for future versions to make it more deployable. See the Future Work section below for more details.

Basics

To start using Dee from within the Python interpreter or from a Python program, first import the module. (For demonstrating we import everything but it's recommended that you only import the features you need.)

>>> from Dee import *

Tuples

A Tuple is a set of attribute/value pairs. A Tuple can be represented by a Python dictionary, e.g.

>>> print {"StudentId":'S1', "Name":'Anne'}
{'StudentId': 'S1', 'Name': 'Anne'}

and the attributes and values can be extracted using the standard Python syntax, e.g.

>>> t1 = {"StudentId":'S1', "Name":'Anne'}
>>> t1["StudentId"]
'S1'

>>> "Name" in t1
True

>>> t1.keys()
['StudentId', 'Name']

A more powerful way is to use the Tuple class which allows a slightly simpler syntax for denoting attribute values. To specify a Tuple:

>>> t1 = Tuple(StudentId='S1', Name='Anne')

and then the attributes values can be extracted in the same way as the Python dictionary but also using the dot notation without the quotes, e.g.

>>> t1["Name"]
'Anne'

>>> t1.Name
'Anne'

The Tuple class also provides a number of useful methods, such as project and remove, for manipulating relational tuples.

Attribute values are dynamically typed in the usual Python way and they must be of the same type for every tuple in a given relation. Currently, the types can be anything that can be pickled.

Relations

A Relation comprises a heading and a body. The heading is a set of attribute name/type pairs. The body is a set of tuples. Each tuple in the body comprises a value for every attribute in the heading. To specify a relation literal, pass the heading as a list of attribute names followed by the body as a list of tuple literals, e.g.:

>>> print Relation(["StudentId", "Name"],
...               [{"StudentId":'S1', "Name":'Anne'},
...                {"StudentId":'S2', "Name":'Boris'},
...                {"StudentId":'S3', "Name":'Cindy'},
...                {"StudentId":'S4', "Name":'Devinder'},
...                {"StudentId":'S5', "Name":'Boris'},
...               ])
+-----------+----------+
| StudentId | Name     |
+===========+==========+
| S1        | Anne     |
| S2        | Boris    |
| S3        | Cindy    |
| S4        | Devinder |
| S5        | Boris    |
+-----------+----------+

Note:

  • there is no order to the heading attributes (they are a set)
  • nor is there any order to the tuples in the body (they are a set)
  • there is no duplication in the heading attribute names (they are a set)
  • nor is there any duplication in the tuples in the body (they are a set)

Also, we will try to use the term relation variable when we mean a variable that refers to a Relation, and just relation (or relation value) to mean the value of the relation. This is an important distinction. The value of a relation never changes, just like the value 5 never changes.

To assign a relation value to a relation variable, use the standard Python syntax, e.g.

>>> IS_CALLED = Relation(["StudentId", "Name"],
...                     [{"StudentId":'S1', "Name":'Anne'},
...                      {"StudentId":'S2', "Name":'Boris'},
...                      {"StudentId":'S3', "Name":'Cindy'},
...                      {"StudentId":'S4', "Name":'Devinder'},
...                      {"StudentId":'S5', "Name":'Boris'},
...                     ])

An alternative way to define a relation is to use the Tuple class to define the body:

>>> IS_CALLED = Relation(["StudentId", "Name"],
...                     [Tuple(StudentId='S1', Name='Anne'),
...                      Tuple(StudentId='S2', Name='Boris'),
...                      Tuple(StudentId='S3', Name='Cindy'),
...                      Tuple(StudentId='S4', Name='Devinder'),
...                      Tuple(StudentId='S5', Name='Boris'),
...                     ])

or alteratively, a more concise option is available which relies on the order of the body attributes matching the order of the heading:

>>> IS_CALLED = Relation(["StudentId", "Name"],
...                     [('S1', 'Anne'),
...                      ('S2', 'Boris'),
...                      ('S3', 'Cindy'),
...                      ('S4', 'Devinder'),
...                      ('S5', 'Boris'),
...                     ])

(Note that Python allows an additional comma after the last item in a list, which can simplify copy/paste operations. Also a Python tuple with a single value must have a comma after the value to distinguish it from a value in parentheses, e.g. (7,) rather than (7))

There are a number of ways to display a relation:

  1. Print it as a string (i.e. using its __str__ method), e.g.
>>> print IS_CALLED
+-----------+----------+
| StudentId | Name     |
+===========+==========+
| S1        | Anne     |
| S2        | Boris    |
| S3        | Cindy    |
| S4        | Devinder |
| S5        | Boris    |
+-----------+----------+
  1. Print a literal representation (one of possibly many variations) (i.e. using its __repr__ method), e.g.
>>> print `IS_CALLED`           #or just: >>> IS_CALLED
Relation(('StudentId', 'Name'),
[Tuple(StudentId='S1', Name='Anne'), Tuple(StudentId='S2', Name='Boris'), Tuple(StudentId='S3', Name='Cindy'), Tuple(StudentId='S4', Name='Devinder'), Tuple(StudentId='S5', Name='Boris')],
{'PK':(Key, None)})

Note: this literal can itself be evaluated using Python's eval() function to retrieve the relation's value, e.g.

>>> print eval(`IS_CALLED`)
+-----------+----------+
| StudentId | Name     |
+===========+==========+
| S1        | Anne     |
| S2        | Boris    |
| S3        | Cindy    |
| S4        | Devinder |
| S5        | Boris    |
+-----------+----------+
>>> r2=eval(`IS_CALLED`)
>>> print r2
+-----------+----------+
| StudentId | Name     |
+===========+==========+
| S1        | Anne     |
| S2        | Boris    |
| S3        | Cindy    |
| S4        | Devinder |
| S5        | Boris    |
+-----------+----------+
  1. Print it rendered as an HTML table, e.g.
>>> print IS_CALLED.renderHTML()
<table><thead><th><em>StudentId</em></th><th><em>Name</em></th></thead><tbody><tr><td>S1</td><td>Anne</td></tr><tr><td>S2</td><td>Boris</td></tr><tr><td>S3</td><td>Cindy</td></tr><tr><td>S4</td><td>Devinder</td></tr><tr><td>S5</td><td>Boris</td></tr></tbody></table>

Which in a browser becomes:

StudentId Name
S1 Anne
S2 Boris
S3 Cindy
S4 Devinder
S5 Boris

The heading of a relation can be retrieved via its heading method, which returns the attribute names as a Python set, e.g.

>>> print IS_CALLED.heading()
set(['StudentId', 'Name'])

The Interpretation of a Relation

Given a relation such as the one denoted by IS_CALLED above, we should take the meaning of it to be as follows:

  • The heading supplies the parameters for the predicate, e.g. StudentId and Name are the parameters for the IS_CALLED predicate.
  • The tuple Tuple(StudentId='S3', Name='Cindy') is an instantiation of that predicate. It is a proposition where the argument values 'S3' and 'Cindy' are substituted for the parameters. This states that student S3 is called Cindy.
  • Each tuple in the relation is a true instantiation.
  • Any tuple not in the relation is a false instantiation.

Function-based Relations

Instead of defining the value of a relation variable once when it is assigned, we can refer to a function to provide the relation. The function can then return different values at different times. One important kind of relation variable that refers to a function for its data is a virtual (or derived) relation variable. A virtual relation variable refers to a function that returns a relational expression. All other relational variables are base relation variables. To specify a virtual relation variable we first need to define a function to provide the data by returning a relational expression. For example (ignore the relational expression syntax for now, we'll cover the details of that later):

>>> def vIS_CALLED_caps():
...     return IS_CALLED.extend(['NameCaps'], lambda t: {'NameCaps': t.Name.upper()}).remove(['Name'])

Then pass the heading as a list of attribute names followed by the body as a function reference, e.g.

>>> IS_CALLED_caps = Relation(["StudentId", "NameCaps"], vIS_CALLED_caps)
>>> print IS_CALLED_caps
+-----------+----------+
| StudentId | NameCaps |
+===========+==========+
| S1        | ANNE     |
| S2        | BORIS    |
| S3        | CINDY    |
| S4        | DEVINDER |
| S5        | BORIS    |
+-----------+----------+

Such virtual relation variables' values will then vary as the underlying base relation variables vary. These virtual relation variables are called views in SQL.

Relation-Valued Attributes

An attribute value can itself be a relation. Such attributes are known as relation-valued attributes or RVAs. There are a number of relational operators (actually macros) that use such nested relations. For example, GROUP, which takes a relation and a set of attribute names together with a new attribute name and returns a relation with the set of attributes as a nested relation, 1 per unique value of the non-grouped attributes:

>>> print GROUP(IS_CALLED, ['StudentId'], 'StudentIds')
+----------+---------------+
| Name     | StudentIds    |
+==========+===============+
| Anne     | +-----------+ |
|          | | StudentId | |
|          | +===========+ |
|          | | S1        | |
|          | +-----------+ |
| Boris    | +-----------+ |
|          | | StudentId | |
|          | +===========+ |
|          | | S2        | |
|          | | S5        | |
|          | +-----------+ |
| Cindy    | +-----------+ |
|          | | StudentId | |
|          | +===========+ |
|          | | S3        | |
|          | +-----------+ |
| Devinder | +-----------+ |
|          | | StudentId | |
|          | +===========+ |
|          | | S4        | |
|          | +-----------+ |
+----------+---------------+

Predefined Relations

There are two interesting relations that are useful for defining some fundamental relational operators in Dee. We introduce them here.

DUM

This is the relation that has no attributes and no tuples. It plays the role of False. It is difficult to display:

>>> print DUM
+
|
+
+

>>> print DUM.renderHTML()
<table><thead></thead><tbody></tbody></table>

It is also called TABLE_DUM and FALSE.

DEE

This is the relation that has no attributes and a single tuple. It plays the role of True. It is difficult to display:

>>> print DEE
+
|
+
|
+

>>> print DEE.renderHTML()
<table><thead></thead><tbody><tr></tr></tbody></table>

It is also called TABLE_DEE and TRUE.

Relation Constraints

A Relation (function-based or not) can also take an extra parameter in its constructor to specify a set of constraints. This takes the form of a Python dictionary where each key gives the constraint name and each value is a pair of constraint-function, parameters. For example, to specify that the "StudentId" attribute is a candidate key for the above relation we could say:

>>> IS_CALLED = Relation(["StudentId", "Name"],
...                     [('S1', 'Anne'),
...                      ('S2', 'Boris'),
...                      ('S3', 'Cindy'),
...                      ('S4', 'Devinder'),
...                      ('S5', 'Boris'),
...                     ],
...                     {'PK':(Key, ["StudentId"])}
...                    )
>>> print IS_CALLED
+-----------+----------+
| StudentId | Name     |
+===========+----------+
| S1        | Anne     |
| S2        | Boris    |
| S3        | Cindy    |
| S4        | Devinder |
| S5        | Boris    |
+-----------+----------+

Here, Key is a pre-defined constraint type (actually a function wrapper that creates a function) that takes a list of attributes to enforce the constraint. A constraint function can return True or False and is called whenever the relation is assigned a new value. If no candidate key is specified for a relation, one is assumed comprising all the attributes in the relation (this is displayed in representations as {'PK':(Key, None)}). As another example:

>>> COURSE = Relation(["CourseId", "Title"],
...                  [('C1', 'Database'),
...                   ('C2', 'HCI'),
...                   ('C3', 'Op Systems'),
...                   ('C4', 'Programming'),
...                  ],
...                  {'PK':(Key, ["CourseId"])}
...                 )

Another pre-defined constraint (function wrapper) is ForeignKey. It takes a relation name and a mapping of foreign key attributes to candidate key attributes as parameters, e.g.:

>>> IS_ENROLLED_ON = Relation(["StudentId", "CourseId"],
...                         [('S1', 'C1'),
...                          ('S1', 'C2'),
...                          ('S2', 'C1'),
...                          ('S3', 'C3'),
...                          ('S4', 'C1'),
...                         ],
...                         {'FKS':(ForeignKey, ('IS_CALLED', {"StudentId":"StudentId"})),
...                          'FKC':(ForeignKey, ('COURSE', {"CourseId":"CourseId"}))}
...                        )

Here, two foreign keys are declared to ensure referential integrity between this relation and the relations referred to by IS_CALLED and COURSE.

Lambda

In a number of places we need to pass expressions, e.g. restrictions (where clauses). Python has a built-in way of defining such expressions with anonymous functions using the lambda keyword. So an example restriction for the above IS_CALLED relation could be:

>>> print IS_CALLED.where(lambda t: t.Name == 'Boris')
+-----------+-------+
| StudentId | Name  |
+===========+=======+
| S2        | Boris |
| S5        | Boris |
+-----------+-------+

In this example, the lambda expression is passed to the relation's where function and the expression introduces a range variable, t, which will stand for each Tuple in the relation. The expression itself, the part after the colon, tests whether the Name attribute of each tuple is equal to 'Boris': if it is then the tuple is included in the result. Any Python expression can be passed this way. So here, complex boolean expressions including boolean operators and function calls can be built, e.g.

>>> print IS_CALLED.where(lambda t: t.Name.startswith('B') and t.StudentId.endswith('5'))
+-----------+-------+
| StudentId | Name  |
+===========+=======+
| S5        | Boris |
+-----------+-------+

>>> print IS_CALLED.where(lambda t: 'A' < t.Name[0] < 'D')
+-----------+-------+
| StudentId | Name  |
+===========+=======+
| S2        | Boris |
| S3        | Cindy |
| S5        | Boris |
+-----------+-------+

>>> print IS_CALLED.where(lambda t: t["Name"].startswith('B'))
+-----------+-------+
| StudentId | Name  |
+===========+=======+
| S2        | Boris |
| S5        | Boris |
+-----------+-------+

Of course, simple boolean expressions can also be used, e.g.

>>> print IS_CALLED.where(lambda t: True)
+-----------+----------+
| StudentId | Name     |
+===========+==========+
| S1        | Anne     |
| S2        | Boris    |
| S3        | Cindy    |
| S4        | Devinder |
| S5        | Boris    |
+-----------+----------+

>>> print IS_CALLED.where(lambda t: False)
+-----------+------+
| StudentId | Name |
+===========+======+
+-----------+------+

It's perhaps worth noting that the where function is really just shorthand for a natural join. Take the first example:

>>> print IS_CALLED.where(lambda t: t.Name == 'Boris')
+-----------+-------+
| StudentId | Name  |
+===========+=======+
| S2        | Boris |
| S5        | Boris |
+-----------+-------+

This relational calculus based where clause can be rephrased using the relational algebra's AND operator (in this case acting as the natural join):

>>> print IS_CALLED & Relation(["Name"], [('Boris',)])
+-----------+-------+
| StudentId | Name  |
+===========+=======+
| S2        | Boris |
| S5        | Boris |
+-----------+-------+

Many of the relational methods provided are in fact macros implemented using only a few fundamental relational operators, such as AND.

Another place lambda expressions can be used is when defining virtual relation variables. For example the earlier example:

>>> def vIS_CALLED_caps():
...     return IS_CALLED.extend(['NameCaps'], lambda t: {'NameCaps': t.Name.upper()}).remove(['Name'])
>>> IS_CALLED_caps = Relation(["StudentId", "NameCaps"], vIS_CALLED_caps)
>>> print IS_CALLED_caps
+-----------+----------+
| StudentId | NameCaps |
+===========+==========+
| S1        | ANNE     |
| S2        | BORIS    |
| S3        | CINDY    |
| S4        | DEVINDER |
| S5        | BORIS    |
+-----------+----------+

Could be re-coded using lambda in a more concise way as:

>>> IS_CALLED_caps = Relation(["StudentId", "NameCaps"],
...                            lambda: IS_CALLED.extend(["NameCaps"], lambda t: {
...                                                      "NameCaps": t.Name.upper()}).remove(["Name"]))
>>> print IS_CALLED_caps
+-----------+----------+
| StudentId | NameCaps |
+===========+==========+
| S1        | ANNE     |
| S2        | BORIS    |
| S3        | CINDY    |
| S4        | DEVINDER |
| S5        | BORIS    |
+-----------+----------+

Lambda expressions can also be used as general constraints. On relations, another pre-defined constraint is Constraint. This takes a function that must evaluate to True for the constraint to hold, e.g.:

>>> EXAM_MARK = Relation(["StudentId", "CourseId", "Mark"],
...                     [('S1', 'C1', 85),
...                      ('S1', 'C2', 49),
...                      ('S2', 'C1', 49),
...                      ('S3', 'C3', 66),
...                      ('S4', 'C1', 93),
...                     ],
...                     {'PK':(Key, ["StudentId", "CourseId"]),
...                      'MarkRange': (Constraint, lambda r: ALL(r, lambda t: 0 <= t.Mark <= 100))}
...                    )

Here, the 'MarkRange' Constraint uses the ALL relational operator (discussed below) to ensure that all Marks in this relation are between 0 and 100. Note the Constraint works at the relation level and its range variable is r in the example. Useful operators at this level are ALL, ANY, IS_EMPTY, and the relational comparison operators discussed below, because they all take relations and return a boolean result.

Relations to Tuples

Here are some conversion functions to map between relations and tuples:

fromTuple

This static method returns a relation from a tuple:

>>> r1 = Relation.fromTuple({'CourseId':'C1', 'Title':'Database'})
>>> print r1
+----------+----------+
| CourseId | Title    |
+==========+==========+
| C1       | Database |
+----------+----------+

It can also take an extra parameter to specify a set of constraints:

>>> r1 = Relation.fromTuple({'CourseId':'C1', 'Title':'Database'}, {'PK':(Key, ['CourseId'])})
>>> print r1
+----------+----------+
| CourseId | Title    |
+==========+----------+
| C1       | Database |
+----------+----------+

toTuple

This can apply only to a single-tuple relation and returns a tuple from that relation:

>>> t1 = r1.toTuple()
>>> print t1
Tuple(CourseId='C1', Title='Database')
>>> print t1.Title
Database

fromTupleList

This static method returns a relation from a list of tuples:

>>> r2 = Relation.fromTupleList([{'CourseId':'C1', 'Title':'Database'},
...                              {'CourseId':'C4', 'Title':'Programming'},
...                              {'CourseId':'C3', 'Title':'Op Systems'},
...                              {'CourseId':'C2', 'Title':'HCI'}])
>>> print r2
+----------+-------------+
| CourseId | Title       |
+==========+=============+
| C1       | Database    |
| C4       | Programming |
| C3       | Op Systems  |
| C2       | HCI         |
+----------+-------------+

It can also take an extra parameter to specify a set of constraints:

>>> r2 = Relation.fromTupleList([{'CourseId':'C1', 'Title':'Database'},
...                              {'CourseId':'C4', 'Title':'Programming'},
...                              {'CourseId':'C3', 'Title':'Op Systems'},
...                              {'CourseId':'C2', 'Title':'HCI'}],
...                             {'PK':(Key, ['CourseId'])})
>>> print r2
+----------+-------------+
| CourseId | Title       |
+==========+-------------+
| C1       | Database    |
| C4       | Programming |
| C3       | Op Systems  |
| C2       | HCI         |
+----------+-------------+

toTupleList

This returns a list of tuples from the relation. Since relations are sets they can have no order, so to iterate through all the tuples in a relation you must use this method to first extract a list of tuples from the relation.

>>> ts = r2.toTupleList()
>>> print ts
[Tuple(CourseId='C1', Title='Database'), Tuple(CourseId='C4', Title='Programming'), Tuple(CourseId='C3', Title='Op Systems'), Tuple(CourseId='C2', Title='HCI')]

This list can then be iterated over in the usual ways, e.g:

>>> for t in ts:
...     print t.Title
Database
Programming
Op Systems
HCI

>>> print [t.Title for t in ts if t.CourseId=='C4']
['Programming']

>>> for t in reversed(ts):
...     print t.Title
HCI
Op Systems
Programming
Database

>>> print len(ts)
4

>>> print ts[0]
Tuple(CourseId='C1', Title='Database')

>>> print ts[-1]
Tuple(CourseId='C2', Title='HCI')

This is also the way to access the tuples in a pre-defined order. The toTupleList method can take an extra parameter to define a sort order. The sort parameter is a pair (ascending, attribute-list) where ascending is a boolean flag to indicate whether to sort in ascending order or not, and the attribute-list specifies the attributes to sort on.

>>> tss = r2.toTupleList((True, ['Title']))
>>> print [t.Title for t in tss]
['Database', 'HCI', 'Op Systems', 'Programming']

>>> tss = r2.toTupleList((False, ['CourseId']))
>>> print [t.CourseId for t in tss]
['C4', 'C3', 'C2', 'C1']

The renderToHTML method, mentioned earlier, is built upon the toTupleList method and also allows this sort parameter, e.g:

>>> print r2.renderHTML(sort=(True, ['Title']))
<table><thead><th><em>CourseId</em></th><th>Title</th></thead><tbody><tr><td>C1</td><td>Database</td></tr><tr><td>C2</td><td>HCI</td></tr><tr><td>C3</td><td>Op Systems</td></tr><tr><td>C4</td><td>Programming</td></tr></tbody></table>

Which in a browser becomes:

CourseId Title
C1 Database
C2 HCI
C3 Op Systems
C4 Programming

Relational Comparisons

A number of boolean operators are available to compare the values of two relations. These are all implemented with the obvious overloaded Python comparisons.

Equality (==)

>>> print IS_CALLED == Relation(["StudentId", "Name"],
...                     [('S1', 'Anne'),
...                      ('S2', 'Boris'),
...                      ('S3', 'Cindy'),
...                      ('S4', 'Devinder'),
...                      ('S5', 'Boris'),
...                     ])
True

A useful shorthand for testing equality against an empty relation is to use the IS_EMPTY function:

>>> print IS_EMPTY(IS_CALLED.where(lambda t: t.StudentId=='S99'))
True

>>> print not IS_EMPTY(IS_CALLED)
True

Inequality (!=, not ... ==)

>>> print IS_CALLED != COURSE
True

>>> print not IS_CALLED == COURSE
True

Proper Subset (<)

>>> print IS_CALLED.where(lambda t: t.StudentId=='S3') < IS_CALLED
True
>>> print IS_CALLED.where(lambda t: t.StudentId.startswith('S')) < IS_CALLED
False

Subset (<=)

>>> print IS_CALLED.where(lambda t: t.StudentId=='S3') <= IS_CALLED
True
>>> print IS_CALLED.where(lambda t: t.StudentId.startswith('S')) <= IS_CALLED
True
>>> print IS_CALLED.where(lambda t: t.StudentId=='S3') <= IS_CALLED.where(lambda t: t.StudentId.startswith('S')) <= IS_CALLED
True

Proper Superset (>)

>>> print IS_CALLED > IS_CALLED.where(lambda t: t.StudentId=='S3')
True
>>> print IS_CALLED > IS_CALLED.where(lambda t: t.StudentId.startswith('S'))
False

Superset (>=)

>>> print IS_CALLED >= IS_CALLED.where(lambda t: t.StudentId=='S3')
True
>>> print IS_CALLED >= IS_CALLED.where(lambda t: t.StudentId.startswith('S'))
True

Membership (in)

This is effectively the same as the subset comparison:

>>> print IS_CALLED.where(lambda t: t.StudentId=='S3') in IS_CALLED
True

The membership operator can also be passed a tuple:

>>> print Tuple(StudentId='S3', Name='Cindy') in IS_CALLED
True

>>> print Tuple(StudentId='S3', Name='Bob') in IS_CALLED
False

>>> print Tuple(StudentId='S3', Name='Cindy') not in IS_CALLED
False

>>> print Tuple(StudentId='S3', Name='Bob') not in IS_CALLED
True

Relational Operators

We use a small core of relational operators to deliver a large number of operations. For example, we use & (relational AND) to provide natural join, intersection and Cartesian product, and we use it as the basis for implementing restriction and extension. A number of other operators are defined as macros on top of the core ones, e.g. GROUP, and this number can easily be increased. The ideas behind this approach can be found in The Third Manifesto chapter 5.

One of the powerful uses of & is the natural join. This joins relations together on their commonly named attributes. To make the most of this, without having to rename attributes before each join, use the same name for the same attributes across relations, e.g. if a key on one relation is named "product_code" then use that same name in all other relations in case they need to be joined. Naming it "code" on the product relation and "product_code" on other relations would require the rename operator to be used before doing a natural join (not to mention making the two attributes appear to be different things).

The relational operators are defined as Python functions taking, and usually returning, relations. Many of the common ones are also defined as methods and operators on the Relation class.

Some basic operations on a relation now presented.

Projection (project, remove)

This is so called because a relation can be thought of as representing a point in n-dimensional space (where n is the number of attributes) and just selecting a few of them is akin to projecting that point onto the chosen axes. Note once again that since a relation body is a set of tuples, there are no duplicate tuples.

>>> print IS_CALLED.project(['Name'])
+----------+
| Name     |
+==========+
| Anne     |
| Boris    |
| Cindy    |
| Devinder |
+----------+

>>> print IS_CALLED(['Name'])
+----------+
| Name     |
+==========+
| Anne     |
| Boris    |
| Cindy    |
| Devinder |
+----------+

>>> print IS_CALLED.remove(['Name'])
+-----------+
| StudentId |
+===========+
| S1        |
| S2        |
| S3        |
| S4        |
| S5        |
+-----------+

>>> print IS_CALLED.remove(['Name', 'StudentId']) == IS_CALLED.project([]) == IS_CALLED([]) ==  DEE
True

Rename (rename)

This is crucial to our implementation since attributes with the same name are considered to represent the same thing. The mapping of old to new attribute name(s) is given as a Python dictionary (or indeed a Tuple would also do).

>>> print IS_CALLED.rename({'Name':'NewName'})
+-----------+----------+
| StudentId | NewName  |
+===========+----------+
| S1        | Anne     |
| S2        | Boris    |
| S3        | Cindy    |
| S4        | Devinder |
| S5        | Boris    |
+-----------+----------+

>>> print IS_CALLED.rename({'StudentId':'NewId', 'Name':'NewName'})
+-------+----------+
| NewId | NewName  |
+=======+----------+
| S1    | Anne     |
| S2    | Boris    |
| S3    | Cindy    |
| S4    | Devinder |
| S5    | Boris    |
+-------+----------+

Restriction (where)

This is also known as relational selection, but that can be confusing because of the SELECT in SQL which is actually for projection.

>>> print IS_CALLED.where(lambda t: t.StudentId=='S4')
+-----------+----------+
| StudentId | Name     |
+===========+==========+
| S4        | Devinder |
+-----------+----------+

Natural Join, Times, Intersection (&)

If you think about it, these are all the same thing - it just depends on whether the relations have some, none, or all of their attributes in common. We implement them all using the AND relational operator using the Python &. Note that since a relation heading is a set of attributes, there are no duplicate attributes.

Natural Join - Some attributes in common

>>> print IS_CALLED & IS_ENROLLED_ON
+----------+-----------+----------+
| CourseId | StudentId | Name     |
+==========+===========+==========+
| C1       | S1        | Anne     |
| C2       | S1        | Anne     |
| C1       | S2        | Boris    |
| C3       | S3        | Cindy    |
| C1       | S4        | Devinder |
+----------+-----------+----------+

Times (Cartesian Join) - No attributes in common

Beware: this kind of join can be very large and is almost always meaningless.

>>> print IS_CALLED & COURSE
+----------+-----------+----------+-------------+
| CourseId | StudentId | Name     | Title       |
+==========+===========+==========+=============+
| C1       | S1        | Anne     | Database    |
| C1       | S2        | Boris    | Database    |
| C1       | S3        | Cindy    | Database    |
| C1       | S4        | Devinder | Database    |
| C1       | S5        | Boris    | Database    |
| C2       | S1        | Anne     | HCI         |
| C2       | S2        | Boris    | HCI         |
| C2       | S3        | Cindy    | HCI         |
| C2       | S4        | Devinder | HCI         |
| C2       | S5        | Boris    | HCI         |
| C3       | S1        | Anne     | Op Systems  |
| C3       | S2        | Boris    | Op Systems  |
| C3       | S3        | Cindy    | Op Systems  |
| C3       | S4        | Devinder | Op Systems  |
| C3       | S5        | Boris    | Op Systems  |
| C4       | S1        | Anne     | Programming |
| C4       | S2        | Boris    | Programming |
| C4       | S3        | Cindy    | Programming |
| C4       | S4        | Devinder | Programming |
| C4       | S5        | Boris    | Programming |
+----------+-----------+----------+-------------+

Intersection - All attributes in common

>>> print IS_CALLED.where(lambda t: t.Name[0] < 'C') & IS_CALLED.where(lambda t:t.Name[0] > 'A')
+-----------+-------+
| StudentId | Name  |
+===========+=======+
| S2        | Boris |
| S5        | Boris |
+-----------+-------+

Note that this is equivalent to:

>>> print IS_CALLED.where(lambda t: t.Name[0] < 'C' and t.Name[0] > 'A')
+-----------+-------+
| StudentId | Name  |
+===========+=======+
| S2        | Boris |
| S5        | Boris |
+-----------+-------+

Or, Union (|)

These are the same thing - it just depends on whether the relations have any of their attributes in common. We implement them all using the OR relational operator using the Python |. For pragmatic reasons, we only implement the Union operator, i.e. where the relations have all of their attributes in common. The more general Or would need to handle an infinite (impossible) number of alternatives.

>>> print IS_CALLED.where(lambda t: t.Name[0] > 'C') | IS_CALLED.where(lambda t:t.Name[0] < 'B')
+-----------+----------+
| StudentId | Name     |
+===========+==========+
| S4        | Devinder |
| S1        | Anne     |
+-----------+----------+

Note that this is equivalent to:

>>> print IS_CALLED.where(lambda t: t.Name[0] > 'C' or t.Name[0] < 'B')
+-----------+----------+
| StudentId | Name     |
+===========+==========+
| S1        | Anne     |
| S4        | Devinder |
+-----------+----------+

Difference (-)

Note that, unlike intersection and union, this is not commutative: which relation is mentioned first does make a difference (excuse the pun). It is implemented using the MINUS relational operator.

>>> print IS_CALLED - IS_CALLED.where(lambda t:t.Name[0] < 'B')
+-----------+----------+
| StudentId | Name     |
+===========+==========+
| S2        | Boris    |
| S3        | Cindy    |
| S4        | Devinder |
| S5        | Boris    |
+-----------+----------+

Extension (extend)

This is used to add new attributes to relations. First, the list of the names of the extra attributes is passed, followed by a lambda expression returning a dictionary containing the attribute values for each tuple. The values can refer to the range variable introduced by the lambda to access tuple values. It is implemented using the EXTEND relational operator which in turn is implemented using the AND relational operator (can you see how?).

>>> print IS_CALLED.extend(['Initial'], lambda t: {'Initial':t.Name[:1]})
+---------+-----------+----------+
| Initial | StudentId | Name     |
+=========+===========+==========+
| A       | S1        | Anne     |
| B       | S2        | Boris    |
| C       | S3        | Cindy    |
| D       | S4        | Devinder |
| B       | S5        | Boris    |
+---------+-----------+----------+

Note that this does not modify the original relation.

Aggregate Operators

These operators take relations and return scalar values according to some lambda expression (except in the case of COUNT which simply counts the number of tuples). If the relation has a single attribute then the expression defaults to it.

COUNT (len)

The number of tuples.

>>> print COUNT(IS_ENROLLED_ON)
5

SUM

The total.

>>> print SUM(EXAM_MARK, lambda t: t.Mark)
342

AVG

The average.

>>> print AVG(EXAM_MARK, lambda t: t.Mark)
68.4

MIN

The minimum.

>>> print MIN(EXAM_MARK, lambda t: t.Mark)
49

>>> print MIN(EXAM_MARK(['Mark']))
49

MAX

The maximum.

>>> print MAX(EXAM_MARK, lambda t: t.Mark)
93

ALL

The expression in this case must return a boolean value. If all of these are True then ALL returns True, and False otherwise.

>>> print