Installing

To install buzhug, download the package, unzip it in a directory, open a console window and run

python setup.py install

On Windows you can use the package installer

Python version must be 2.3 or above

Creating a database

from buzhug import Base
db = Base(path)
db.create((name1,type1[,default1])[,(name2,type2[,default2]),...])

A thread-safe version should be used in a multi-threaded environment :

from buzhug import TS_Base
db = TS_Base(path)
db.create((name1,type1[,default1])[,(name2,type2[,default2]),...])

path is the name of the database. Since the contents of the base is stored in a directory of the same name, the only constraint on path is that it must be a valid directory name

The fields of the base are defined by a field name and a field type. The field names must be valid Python names, beginning with a letter (not the underscore)

The field type must be :

one of the built-in types str, unicode, int, float or bool
one of the classes date or datetime in the module datetime
the name of another buzhug base (see the chapter on links between bases)

The optional default value must be of the type defined for the field. This value is used when a record is inserted and no value is given for the field. If no default value is specified, missing fields are set to None

Additionaly, you can add another keyword argument, mode, which can take one of two values :

"override" : if a base exists in the specified path, erase it and replace it with the new field definition
"open" : if a base already exists in the specified path, open it as it is

create() raises IOError if a base of the same name already exists and the keyword mode is not provided

Opening an existing database

from buzhug import Base
db = Base(path)
db.open()

or on one line :

from buzhug import Base
db = Base(path).open()

Raises IOError if the base doesn't exist

Closing a database

db.close()

Closes all the files open for the database

Inserting a record in a database

1) insertion by keyword

record_id = db.insert(name1=val1[,name2=val2,...])

The keys of the keyword arguments must be the field names defined in the create() method. If some of the fields are missing, the default value will be set to None. The values of the keyword arguments must be of the type defined in the create() method, otherwise an exception is raised

insert() returns an integer, the record identifier. Each record in the base has an identifier, and two different records have different identifiers. The identifier is set internally by buzhug and cannot be modified by the programmer

2) insertion by list

db.insert(val1,val2,...)

The values must be provided in the order defined in the create() method. An exception is raised if the number of arguments is not exactly the same as the number of fields in the base, and if a value is not of the expected type

3) insertion as strings

If the database is managed through a network, the data provided by the user will be sent as strings. The management program can convert the strings back to their original type, but since it is a common task buzhug provides functions for this case

The first step is to define how the string must be converted into the original type

For the type str there is of course no conversion to make. For the type int the conversion uses the built-in int() and for float the built-in float()

For unicode strings, the programmer must define the encoding that was used to convert the original unicode string into a Python bytestring :

db.set_string_format(unicode,encoding)

encoding is a string defining the encoding, such as 'latin-1','utf-8', etc. If an invalid encoding is provided, an exception is raised

For date and datetime, the programmer must define a format with the syntax specified for the function strftime in the built-in time module :

db.set_string_format(date,format)
db.set_string_format(datetime,format)

For instance if the date is provided in the form YYYY-MM-AA the format is '%Y-%m-%d'

Once all the string formats have been specified, the insertion is made by

db.insert_as_strings(name1=string1[,name2=string2,...])

db.insert_as_strings(string1[,string2,...])

Selecting a record

1. Selection by identifier

record = db[rec_id]

The database supports lookup by record identifier, exactly as if it was a list. The returned value is an object with attributes matching the fields defined in create()

2. Selection by keywords

records = db(name1=value1[,name2=value2,...])

Returns the list of all the records where name1 == value1 and name2 == value2, etc.

3. Selection by list comprehension or generator expression

The database supports the iterator protocol. Its next() method returns records (objects with attributes matching the fields of the database)

Instead of SQL, the query language is the one used in Python list comprehensions or generator expressions :

result = [ record for record in db if condition ]

for record in (record for record in db if condition):
    (...do anything with record...)

where condition is a condition on the object "record" yielded during the iteration on the database

With this simple syntax you can create complex queries, for instance if you want to find all the records whose field 'name' matches a regular expression pattern :

import re
print [ r for r in db if re.match(pattern,r.name) ]

4. Selection by the select() method

The selection by list comprehension described above should be used preferably for clarity and its ability to write complex selection conditions

Its drawback is that it is not very efficient ; so for a much better performance you can use the select() method instead

4.1 Equality test

To select the records whose fields are equal to a value, the syntax is

result_set = db.select(field_list,name_i=val_i[,name_j=val_j...)

field_list is a list of fields (a sublist of the fields defined in the base)

The result is a set of records whose field name_i has the value val_i, name_j has the value val_j, etc ; only the fields provided in field_list are set for these records

For instance if we have a base of persons and we want to know the name of those who are 30 years old :

result_set = db.select(['name'],age = 30)
# print the result
for record in result_set:
    print record.name

4.2 Tests with minimum and maximum values

To test the records where an integer, float, date or datetime field must be between a minimum value and a maximum value, you pass a 2-item list as argument value :

result_set = db.select(field_list,field=[vmin,vmax])

For instance, to get the persons aged between 30 and 35 :

result_set = db.select(['name'],age = [30,35])
# print the result
for record in result_set:
    print record.name

4.3 Complex tests

If the condition is more complex (field greater or lesser than a value, alternative conditions with OR, etc) the syntax is

result_set = db.select(field_list,predicate_string,kw_arguments)

field_list is a list of fields (a sublist of the fields defined in the base)

predicate_string is string with a Python expression returning True or False, expressed with variable names (not literal values) which can be fields of the database or the keys in the kw_arguments. Only the records for which the evaluation of the predicate string returns True are appended to result_set

kw_arguments are keyword arguments, where the keys are the names used in the predicate

Since this syntax is a little complex I will give a number of examples

If we have a base of persons and we want to know the name of those between 30 and 35 years old:

result_set = db.select(['name'],"age_min <= age < age_max",age_min=30,age_max=36)

Note that you can't express this condition with literals inside the predicate string, like in :

result_set = db.select(['name'],"30 <= age < 36")

It would seem much more natural, but for implementation reasons it would not work. The values are stored as strings in the database files ; instead of converting these strings into Python types (integers, dates etc) they are left as they are, and it is the values of the keyword arguments which are converted into strings and compared to the string found in the file. This is obviouly much more efficient, because there is only one conversion to make (argument value to string) instead of as many conversions (string to argument type) as there are records in the base

In the example above, the expression compares the string representation of age to the string representation of 30 and 36. If the predicate was expressed as ""30 <= age < 36" buzhug would have to parse the string and replace 30 and 36 by their string representation. Implementing a parser of this kind would be difficult, so the choice is to have a less natural syntax, for the sake of the speed of selection operations

Some more examples :

To find the people who have this age and live in London or Paris :

result_set = db.select(['name'],
    "age_min <= age < age_max and city in city_list",
    age_min=30,age_max=36,city_list=('London','Paris'))

The same, but returning all the fields and not only the name :

result_set = db.select(None,
    "age_min <= age < age_max and city in city_list",
    age_min=30,age_max=36,city_list=('London','Paris'))

Returning the name of all the records :

result_set = db.select(['name'])

Returning the records with all arguments set :

result_set = db.select()

4. Using regular expressions

For selection using regular expressions, you use select and pass it a compiled regular expression object, returned by re.compile() ; you can use this object in the predicate string with its methods match() and search()

For instance to find all the people whose first name begins with a vowel :

import re
pattern = re.compile('^[AEIOUY]')
result_set = db.select(['name','age'],'p.match(name)',p=pattern)

5. Sorting the result

The value returned by select() is an object which supports a method sort_by()

results = result_set.sort_by(order_string)

order_string is a Python expression using the field names and the operators + and -

+ indicates that the field name that follows must be sorted by ascending order

- indicates that the field name that follows must be sorted by decreasing order

For instance if you want to sort a set of results first by decreasing age, then by ascending name of those who have the same age :

results = result_set.sort_by("-age+name")

You can combine the selection and sort operations in one line :

results = db.select(['name','age'],
	"age > age_min",age_min=30).sort_by("-age+name")

6. Selection by the select_for_update() method

This method must be used instead of select() if you have to update the selected records (see next section about updating)

There are 2 methods because select() is faster than select_for_update()

Updating a record

1. Syntax

The values of fields of a record can be modified by the update() method

This method applies to the record itself :

record.update(name1=newval1[,name2=newval2,...)

record is a record that has been previously selected for update

The records selected by their identifier or by iteration (list comprehension or generator expression) are already selected for update. But for performance reasons, the records returned by select() are selected only for reading, not for update ; if you want to select records and update them, you must use the method select_for_update(), with exactly the same syntax as select()

An alternative syntax can be used :

db.update(record,name1=newval1[,name2=newval2,...)

updates a single record

db.update(records,name1=newval1[,name2=newval2,...)

updates an list or tuple of records, changing the values for each of them

2. Concurrency control

When many users access the same database at the same time and are allowed to update records (which is the case in a web application), conflicts can occur if two users want to update the same record

Buzhug uses a version number to detect the conflicts. Each record has a version number (an integer). When a record is selected for update, the version number at selection time is provided as the attribute __version__ ; when update() is called, the program first looks at the version number of the record with the same identifier in the database. If the version number has changed, this is because another user has updated the record in the meantime

In this case, an exception (buzhug.ConflictError) is raised. It is up to the programmer to decide how to manage the conflict

In a web environment, updates usually follow these steps :

a page is used to show a list of the records, or to enter data to find the record
on a second page, the record is shown, the fields that can be modified are inserted in INPUT tags
when data is submitted, control is transfered to a script which will validate the input, then update the records in the database

The script will call select_for_update() in step 2 and update() in step 3

To identify the record between steps 2 and 3, the record identifier can be passed as a hidden form field ; but if we only pass it and get the record at step 3 by db[record_id], the version number will be the one when step 3 is reached, and not when the record was selected (step 2)

To handle this, the version number must also be passed as a hidden form field ; on step 3, the program will have to compare db[record_id].__version__ to the value of the form field

For instance, in step 2 :

<input type="hidden" name="record_id" value="$record.__id__">
<input type="hidden" name="record_version" value="$record.__version__">
<input name="name" value="$record.name">

where $record.__id__ is a pseudo code, to be replaced by the way your web framework allows insertion of dynamic values

The code in step 3 will then look like this :

old_record = db[int($record_id)]
if old_record.__version__ != int($record_version):
    print "Error - the record was updated by someone else since you selected it"
else:
    db.update(old_record,name=$name)

Here, you replace $record_id by the syntax used by your framework to retrieve the form field record_id (don't forget to convert to integers)

Deleting records

db.delete(record)

removes an individual record

db.delete(records)

removes a list (or any other iterator) of records

If you know the identifier of a record, you can delete it by:

del db[record_id]

Number of records

len(db)

returns the number of items in the base. It is increased by 1 when a record is inserted and decreased by 1 when a record is deleted

Field names and types

db.field_names

is the list of the field names in the base, in the order specified upon creation

db.fields

is a dictionary mapping field names to their type

Cleanup

db.cleanup()

When a record is deleted or updated, the old version is marked as deleted, but physically it remains on disk. When you have made many deletions or updates, the old records take useless space on disk, and their presence can alter the performance of selections

You can use the cleanup() function to physically remove these records

Links between bases

A reference to a database can be used as a type in another base

Suppose you have a table with cities and zip codes :

cities = Base('cities').create(('name',str),('zip',int))
cities.insert('Rennes',35000)
cities.insert('Bordeaux',33000)

If you want to create a base of persons living in these cities, you can use cities as the field type for the field 'city' :

persons = Base('persons')
persons.create(('name',str),('street',str),('city',cities))
persons.insert('Jean Dupont','12 rue Montaigne',cities[1])
persons.insert('Pierre Martin','2 avenue Voltaire',cities[0])

The value for the field 'city' must be a record of the base cities. If you want to get the zip code of Jean Dupont :

print persons[0].city.zip
>> 33000

If a record is deleted in the base cities, the value returned in the base persons will be set to None

del cities[0]
print persons[1].city
>> None

Modifying the database structure

1. Adding a new field

db.add_field(field_name,field_type[,after[,default]])

Add a new field with the specified name and type after the specified field. Each existing record will have a field of this name, initialized to the value specified in default

after defaults to None, in which case the field becomes the first

default defauts to None

2. Removing a field

db.drop_field(field_name)

Removes the specified field from the database. The existing records will no longer have this field set

3. Changing the default value for a field

db.set_default(field_name, default_value)

Reset the default value for the specified field. This value must be of the type specified for the field in the create() method