python setup.py install
On Windows you can use the package installer
Python version must be 2.3 or above
from buzhug import Base db = Base(path) db.create((name1,type1[,default1])[,(name2,type2[,default2]),...])
A thread-safe version should be used in a multi-threaded environment :
from buzhug import TS_Base db = TS_Base(path) db.create((name1,type1[,default1])[,(name2,type2[,default2]),...])
path is the name of the database. Since the contents of the base is stored in a directory of the same name, the only constraint on path is that it must be a valid directory name
The fields of the base are defined by a field name and a field type. The field names must be valid Python names, beginning with a letter (not the underscore)
The field type must be :
The optional default value must be of the type defined for the field. This value is used when a record is inserted and no value is given for the field. If no default value is specified, missing fields are set to None
Additionaly, you can add another keyword argument, mode
, which can take one of two values :
create() raises IOError if a base of the same name already exists and the keyword mode
is not provided
from buzhug import Base db = Base(path) db.open()
or on one line :
from buzhug import Base db = Base(path).open()
Raises IOError if the base doesn't exist
db.close()
Closes all the files open for the database
record_id = db.insert(name1=val1[,name2=val2,...])
The keys of the keyword arguments must be the field names defined in the create() method. If some of the fields are missing, the default value will be set to None. The values of the keyword arguments must be of the type defined in the create() method, otherwise an exception is raised
insert() returns an integer, the record identifier. Each record in the base has an identifier, and two different records have different identifiers. The identifier is set internally by buzhug and cannot be modified by the programmer
db.insert(val1,val2,...)
The values must be provided in the order defined in the create() method. An exception is raised if the number of arguments is not exactly the same as the number of fields in the base, and if a value is not of the expected type
If the database is managed through a network, the data provided by the user will be sent as strings. The management program can convert the strings back to their original type, but since it is a common task buzhug provides functions for this case
The first step is to define how the string must be converted into the original type
For the type str there is of course no conversion to make. For the type int the conversion uses the built-in int() and for float the built-in float()
For unicode strings, the programmer must define the encoding that was used to convert the original unicode string into a Python bytestring :
db.set_string_format(unicode,encoding)
encoding is a string defining the encoding, such as 'latin-1','utf-8', etc. If an invalid encoding is provided, an exception is raised
For date and datetime, the programmer must define a format with the syntax specified for the function strftime in the built-in time
module :
db.set_string_format(date,format) db.set_string_format(datetime,format)
For instance if the date is provided in the form YYYY-MM-AA the format is '%Y-%m-%d'
Once all the string formats have been specified, the insertion is made by
db.insert_as_strings(name1=string1[,name2=string2,...])
or
db.insert_as_strings(string1[,string2,...])
record = db[rec_id]
The database supports lookup by record identifier, exactly as if it was a list. The returned value is an object with attributes matching the fields defined in create()
records = db(name1=value1[,name2=value2,...])
Returns the list of all the records where name1 == value1 and name2 == value2, etc.
The database supports the iterator protocol. Its next() method returns records (objects with attributes matching the fields of the database)
Instead of SQL, the query language is the one used in Python list comprehensions or generator expressions :
result = [ record for record in db if condition ]
or
for record in (record for record in db if condition): (...do anything with record...)
where condition is a condition on the object "record" yielded during the iteration on the database
With this simple syntax you can create complex queries, for instance if you want to find all the records whose field 'name' matches a regular expression pattern :
import re print [ r for r in db if re.match(pattern,r.name) ]
The selection by list comprehension described above should be used preferably for clarity and its ability to write complex selection conditions
Its drawback is that it is not very efficient ; so for a much better performance you can use the select() method instead
To select the records whose fields are equal to a value, the syntax is
result_set = db.select(field_list,name_i=val_i[,name_j=val_j...)
field_list is a list of fields (a sublist of the fields defined in the base)
The result is a set of records whose field name_i has the value val_i, name_j has the value val_j, etc ; only the fields provided in field_list are set for these records
For instance if we have a base of persons and we want to know the name of those who are 30 years old :
result_set = db.select(['name'],age = 30) # print the result for record in result_set: print record.name
To test the records where an integer, float, date or datetime field must be between a minimum value and a maximum value, you pass a 2-item list as argument value :
result_set = db.select(field_list,field=[vmin,vmax])
For instance, to get the persons aged between 30 and 35 :
result_set = db.select(['name'],age = [30,35]) # print the result for record in result_set: print record.name
If the condition is more complex (field greater or lesser than a value, alternative conditions with OR, etc) the syntax is
result_set = db.select(field_list,predicate_string,kw_arguments)
field_list is a list of fields (a sublist of the fields defined in the base)
predicate_string is string with a Python expression returning True or False, expressed with variable names (not literal values) which can be fields of the database or the keys in the kw_arguments. Only the records for which the evaluation of the predicate string returns True are appended to result_set
kw_arguments are keyword arguments, where the keys are the names used in the predicate
Since this syntax is a little complex I will give a number of examples
If we have a base of persons and we want to know the name of those between 30 and 35 years old:
result_set = db.select(['name'],"age_min <= age < age_max",age_min=30,age_max=36)
Note that you can't express this condition with literals inside the predicate string, like in :
result_set = db.select(['name'],"30 <= age < 36")
It would seem much more natural, but for implementation reasons it would not work. The values are stored as strings in the database files ; instead of converting these strings into Python types (integers, dates etc) they are left as they are, and it is the values of the keyword arguments which are converted into strings and compared to the string found in the file. This is obviouly much more efficient, because there is only one conversion to make (argument value to string) instead of as many conversions (string to argument type) as there are records in the base
In the example above, the expression compares the string representation of age to the string representation of 30 and 36. If the predicate was expressed as ""30 <= age < 36" buzhug would have to parse the string and replace 30 and 36 by their string representation. Implementing a parser of this kind would be difficult, so the choice is to have a less natural syntax, for the sake of the speed of selection operations
Some more examples :
To find the people who have this age and live in London or Paris :
result_set = db.select(['name'], "age_min <= age < age_max and city in city_list", age_min=30,age_max=36,city_list=('London','Paris'))
The same, but returning all the fields and not only the name :
result_set = db.select(None, "age_min <= age < age_max and city in city_list", age_min=30,age_max=36,city_list=('London','Paris'))
Returning the name of all the records :
result_set = db.select(['name'])
Returning the records with all arguments set :
result_set = db.select()
For selection using regular expressions, you use select and pass it a compiled regular expression object, returned by re.compile() ; you can use this object in the predicate string with its methods match() and search()
For instance to find all the people whose first name begins with a vowel :
import re pattern = re.compile('^[AEIOUY]') result_set = db.select(['name','age'],'p.match(name)',p=pattern)
The value returned by select() is an object which supports a method sort_by()
results = result_set.sort_by(order_string)
order_string is a Python expression using the field names and the operators + and -
+ indicates that the field name that follows must be sorted by ascending order
- indicates that the field name that follows must be sorted by decreasing order
For instance if you want to sort a set of results first by decreasing age, then by ascending name of those who have the same age :
results = result_set.sort_by("-age+name")
You can combine the selection and sort operations in one line :
results = db.select(['name','age'], "age > age_min",age_min=30).sort_by("-age+name")
This method must be used instead of select() if you have to update the selected records (see next section about updating)
There are 2 methods because select() is faster than select_for_update()
The values of fields of a record can be modified by the update() method
This method applies to the record itself :
record.update(name1=newval1[,name2=newval2,...)
record is a record that has been previously selected for update
The records selected by their identifier or by iteration (list comprehension or generator expression) are already selected for update. But for performance reasons, the records returned by select() are selected only for reading, not for update ; if you want to select records and update them, you must use the method select_for_update(), with exactly the same syntax as select()
An alternative syntax can be used :
db.update(record,name1=newval1[,name2=newval2,...)updates a single record
db.update(records,name1=newval1[,name2=newval2,...)updates an list or tuple of records, changing the values for each of them
When many users access the same database at the same time and are allowed to update records (which is the case in a web application), conflicts can occur if two users want to update the same record
Buzhug uses a version number to detect the conflicts. Each record has a version number (an integer). When a record is selected for update, the version number at selection time is provided as the attribute __version__
; when update() is called, the program first looks at the version number of the record with the same identifier in the database. If the version number has changed, this is because another user has updated the record in the meantime
In this case, an exception (buzhug.ConflictError) is raised. It is up to the programmer to decide how to manage the conflict
In a web environment, updates usually follow these steps :
The script will call select_for_update() in step 2 and update() in step 3
To identify the record between steps 2 and 3, the record identifier can be passed as a hidden form field ; but if we only pass it and get the record at step 3 by db[record_id], the version number will be the one when step 3 is reached, and not when the record was selected (step 2)
To handle this, the version number must also be passed as a hidden form field ; on step 3, the program will have to compare db[record_id].__version__
to the value of the form field
For instance, in step 2 :
<input type="hidden" name="record_id" value="$record.__id__"> <input type="hidden" name="record_version" value="$record.__version__"> <input name="name" value="$record.name">
where $record.__id__ is a pseudo code, to be replaced by the way your web framework allows insertion of dynamic values
The code in step 3 will then look like this :
old_record = db[int($record_id)] if old_record.__version__ != int($record_version): print "Error - the record was updated by someone else since you selected it" else: db.update(old_record,name=$name)
Here, you replace $record_id by the syntax used by your framework to retrieve the form field record_id (don't forget to convert to integers)
db.delete(record)
removes an individual record
db.delete(records)
removes a list (or any other iterator) of records
If you know the identifier of a record, you can delete it by:
del db[record_id]
len(db)returns the number of items in the base. It is increased by 1 when a record is inserted and decreased by 1 when a record is deleted
db.field_namesis the list of the field names in the base, in the order specified upon creation
db.fieldsis a dictionary mapping field names to their type
db.cleanup()When a record is deleted or updated, the old version is marked as deleted, but physically it remains on disk. When you have made many deletions or updates, the old records take useless space on disk, and their presence can alter the performance of selections
You can use the cleanup() function to physically remove these records
A reference to a database can be used as a type in another base
Suppose you have a table with cities and zip codes :
cities = Base('cities').create(('name',str),('zip',int)) cities.insert('Rennes',35000) cities.insert('Bordeaux',33000)
If you want to create a base of persons living in these cities, you can use cities as the field type for the field 'city' :
persons = Base('persons') persons.create(('name',str),('street',str),('city',cities)) persons.insert('Jean Dupont','12 rue Montaigne',cities[1]) persons.insert('Pierre Martin','2 avenue Voltaire',cities[0])
The value for the field 'city' must be a record of the base cities. If you want to get the zip code of Jean Dupont :
print persons[0].city.zip >> 33000
If a record is deleted in the base cities, the value returned in the base persons will be set to None
del cities[0] print persons[1].city >> None
db.add_field(field_name,field_type[,after[,default]])
Add a new field with the specified name and type after the specified field. Each existing record will have a field of this name, initialized to the value specified in default
after defaults to None, in which case the field becomes the first
default defauts to None
db.drop_field(field_name)
Removes the specified field from the database. The existing records will no longer have this field set
db.set_default(field_name, default_value)
Reset the default value for the specified field. This value must be of the type specified for the field in the create()
method