In this tutorial I will explain how to build and manage a database with buzhug

The example

The example is a database managing information about a football championship. It will store data about the teams, the players in the teams and the matches played

The base will be structured into several tables :

teams

Field name Field type

name str

city str

players

Field name Field type

name unicode

firstname unicode

team teams¹

position str

born date

matches

Field name Field type

home teams¹

away teams¹

date date

goals_home int

goals_away int

(1) The type used for the team is a reference to a record in the first table, teams

Creating the base

Here is the code to create the tables :

from buzhug import Base
from datetime import date

teams = Base('teams')
teams.create(('name',str),('city',str))

players = Base('players').create(('name',unicode),
    ('firstname',unicode),('team',teams),
    ('position',str),('born',date))
matches = Base('matches').create(('home',teams),('away',teams),
    ('date',date),('goals_home',int),('goals_away',int))

Begin by importing the class Base and the class date from the datetime module

Then create a first instance of this class : teams = Base('teams'). The argument to Base is the name of the folder where the table will be stored. I use the same name for the instance and for the folder, but you could have a different name, for instance : teams = Base('../teams2006')

The next step is to define the structure of this table : the list of fields with their name and type, here : teams.create(('name',str),('city',str)). The field types are provided as Python classes, here the built-in class str

The same goes for the two other classes, except that I use a shortcut to create the instance of the class and define its structure in one instruction (create() returns a reference to the instance)

Save the code above in a script called example.py and run it ; you should have no specific message. If you take a look at the folder in which you saved the script, it should have 3 new folders called teams, players and matches, with a number of files in it, including files with the name of the fields

Now try to run the same script again. You will get this exception :
IOError: Base teams already exists

You are trying to create a base that already exists. To avoid this, the method create should be passed an extra argument, mode, taking the value open : it means that if the table already exists, it should just be opened as it is

from buzhug import Base
from datetime import date

teams = Base('teams').create(('name',str),('city',str),mode="open")
players = Base('players').create(('name',unicode),
            ('firstname',unicode),('team',teams),
            ('position',str),('born',date),
            mode="open")
matches = Base('matches').create(('home',teams),('away',teams),
            ('date',date),('goals_home',int),('goals_away',int),
            mode="open")

If you knew that the table already exists, you would get a reference to it by the function open() :

teams = Base('teams').open()
players = Base('players').open()
matches = Base('matches').open()

Inserting data

To insert new data in a table, you use the function insert() :

teams.insert(name="Arsenal",city="London")
teams.insert(name="Manchester United",city="Manchester")

You can also provide the data as a list of positional arguments :

teams.insert("Chelsea","London")
teams.insert("West Bromwich Albion","Birmingham")

In this case, you must provide the data in the same order as what you have declared in the function create() : the name of the team first, the city afterwards

Updating a record

Suppose you made a typo when you entered the next team :

teams.insert("Aston Vila","Birmingham")

The correct name is "Aston Villa" with two "l". If you want to fix the mistake, you have to follow these steps :

identify the record in the table
update the field name with the correct value

For the first step you have to get a way of finding the correct record ; for this you must provide information that will identify the record without ambiguity. You can't ask for the record where the city is Birmingham, because there are two of them in the table ; but you can ask for the record whose name is "Aston Vila"

Here is how you can find it :

aston = [ r for r in teams if r.name == "Aston Vila" ][0]

The syntax is the one you use for Python list comprehensions : technically, the tables support the iterator protocol. When you browse the table by for r in teams, the items yielded are objects which have attributes called as the fields in the table, and their value is the one entered by the function insert()

List comprehensions return a list ; here you know that there is only one record with this name, and you get a reference to it as the first and only item in the list

Once you have a reference to this record, you can update its field name like this :

teams.update(aston,name="Aston Villa")

More on selecting

The list comprenhension is one of the ways to select records in a base, and allows for queries as complex as you can imagine

In some cases you might want to use another syntax, using the select() method :

aston = teams.select(name = "Aston Vila")[0]

You pass the function select() a keyword argument whose key is a field of the table, and it returns a list of records for which the field has the specified value

Another way is to use the record identifier, an integer which is specific to the record. This identifier is returned when you insert the record. So if you had inserted the record about Aston Villa by :

aston_id = teams.insert("Aston Vila","Birmingham")

the quickest way to get a reference to the record would be :

aston = teams[aston_id]

In this case the table behaves like a list, indexed by record identifiers

Entering players information

Players belong to teams (for a moment...). Instead of copying the same information about the team for each record in the base players, the field team only holds a reference to a record in the table teams

Suppose you have to enter the Aston Villa players. First you must get a reference to the record for this team, and as we have seen above you can do it with select() :

aston_villa = teams.select(name="Aston Villa")[0]

Now you can use this reference when you insert the players data :

players.insert(unicode("Mellberg"),unicode("Olof"),
    aston_villa,"Defender",date(1977,9,3))
players.insert(unicode("Baros"),unicode("Milan"),
    aston_villa,"Striker",date(1981,10,28))

The records in players have an attribute team ; the value of this attribute is a record in the table teams. If you want to know in which city Milan Baros plays :

baros = players.select(name=unicode("Baros"))[0]
print baros.team.city

The attribute team points to a record in teams ; if you update fields in this record, the change will be visible in the attribute. For instance, if you had inserted information about Aston Villa players when there was a typo in the team name, baros.team.name would have returned "Aston Vila" ; after updating the team name, baros.team.name returns "Aston Villa"

Using record identifiers

All records have an attribute __id__ which is set internally by buzhug when the record is created, and which is never modified as long as the record exists. If you delete the record, its __id__ will not be reused for another record

The record identifier is returned by insert. Let's add another player and store his id :

chelsea = teams.select(name="Chelsea")[0]
rec_id = players.insert(unicode("Makelele"),unicode("Claude"),
    chelsea,"Midfield",date(1973,2,18))

The fastest way to find a record in the base is to use its identifier :

print "Makelele was born on ",players[rec_id].born

In this case, the table behaves exactly like a list of records, indexed by record identifiers

Who plays in that team ?

Now you should guess how to get a list of the name and first names of all Aston Villa players:

print [ (r.name,r.firstname) for r in players if r.team.name == "Aston Villa" ]

Straightforward, isn't it ?

Matches

Let's now concentrate on the mutual matches of Arsenal, Aston Villa and Chelsea and see how to produce a table of points based on these matches.

First we get a reference for Arsenal

arsenal = teams.select(name="Arsenal")[0]

and then we can insert the matches:

matches.insert(chelsea,aston_villa,date(2005,9,24),2,1)
matches.insert(aston_villa,chelsea,date(2006,2,1),1,1)
matches.insert(chelsea,arsenal,date(2005,8,21),1,0)
matches.insert(arsenal,chelsea,date(2005,12,18),0,2)
matches.insert(arsenal,aston_villa,date(2006,4,1),5,0)
matches.insert(aston_villa,arsenal,date(2005,12,31),0,0)

Next we create a group of these teams as a list of the records for these teams - actually, as a list of their identifiers :

group = [ t.__id__ for t in teams if t.name in ["Chelsea","Aston Villa","Arsenal"] ]

Once again you can see how simple and natural the list comprehension syntax is used for this query, using the keyword in

Then we have to see all the matches where these teams played against each other, and increment the points of each team. The result is stored in a dictionary points, mapping the team identifier to a number of points

points = {}
for team in group:
    points[team] = 0

for match in matches:
    if match.home.__id__ in group and match.away.__id__ in group:
        # both teams of this match are in the group
        home,away = match.home.__id__,match.away.__id__
        if match.goals_home > match.goals_away:
            points[home] += 3 # home win
        elif match.goals_home == match.goals_away:
            points[home] += 1 # draw
            points[away] += 1
        else:
            points[away] += 3 # away win

The last step is to build a table with team name and points, sort it by decreasing number of points, and finally print the table :

table = [ (teams[i].name,points[i]) for i in points.keys() ]
# sort by decreasing number of points
table.sort(lambda x,y : cmp(y[1],x[1]))
# pretty print
print '%-15s%4s' %('','Pts')
for t in table:
    print '%-15s%4s' %(t[0],t[1])