Databases Overview

Databases provide a critical aspect of psiTurk, as they store data from experiments and help to prevent the same user from completing your experiment more than once. Databases provide an important function for web-based experiments – Because multiple people can complete your experiment at the same time, you need a system which can simultaneously write/read data. Databases are optimized for this type of environment and are thus very useful for experiments.

psiTurk can integrate with any database that is compatible with SQLAlchemy.

See also

database_url – For details on how to configure databases in config.txt

Using SQLite

Perhaps the simplest quickstart solution is to use SQLite. This database solution writes to a local file on the same computer as is running the psiTurk server.

To use a SQLite data base, simply set the database_url field in your local configuration file (config.txt):

database_url = sqlite:///FILENAME.db

where FILENAME is of your choosing. By default, psiTurk sets this like this:

database_url = sqlite:///participants.db

This will make a SQLite database file in the top-level folder of your project. If you change the database_url and restart psiTurk, a new database corresponding to the new filename will be created. If you set it to an existing file name, psiTurk will attempt to connect to this database.

It is best to do this while the server is not running (note in this example the “server” status says “off”). If you change this while the server is running you will need to type:

[psiTurk server:on mode:sdbx #HITs:0]$ server restart

While great for development and debugging, SQLite has a number of important downsides for deploying experiments. In particular, SQLite does not allow concurrent access to the database, so if the locks work properly, simultaneous access (say, from multiple users submitting their data at the same time) could destabilize your database. In the worst scenario, the database could become corrupted, resulting in data loss.

As a result, we recommend using a more robust database solution when actually running your experiment.

However, SQLite is a good solution particularly for initial testing. It is also possible to try to “throttle” the rate of signups on Mechanical Turk (by only posting one assignment slot at a time) so that database errors are less likely using SQLite.

Note

SQLite database are fine for local testing but more robust databases like MySQL are recommended especially if you plan to run many participants simultaneously. Again, any server compatible with SQLAlchemy can be used.

Using a postgresql database on Heroku

The Heroku free tier includes access to a postgresql database. See Running psiTurk on Heroku.

Using a SQL database server

A more robust solution is to set up a MySQL database. psiTurk’s reliance on SQLAlchemy for interfacing with database which means it is easy to switch between MySQL, PostgreSQL, or SQLite.

For example, to use an existing MySQL database:

database_url = mysql://USERNAME:PASSWORD@HOSTNAME:PORT/DATABASE

where USERNAME and PASSWORD are your access credentials for the database, HOSTNAME is the DNS entry or IP address for the database, PORT is the port number (default is 3306) and DATABASE is the name of the database on the server.

Use 127.0.0.1 as the HOSTNAME for a database running locally to the psiTurk server rather than ‘localhost’. Mysql treats the HOSTNAME ‘localhost’ as a special case in Unix-based systems and will cause the psiTurk server to fail to boot.

It is wise to test that you can connect to this url with a MySQL client prior to launching, such as MySQL Workbench Sequel Pro.

Here’s an example of setting up a minimal MySQL database for use with psiTurk:

$ mysql -uroot -p
mysql> CREATE USER 'your_username'@'localhost' IDENTIFIED BY 'your_password';
Query OK, 0 rows affected (0.03 sec)

mysql> CREATE DATABASE your_database;
Query OK, 1 row affected (0.01 sec)

mysql> GRANT ALL PRIVILEGES ON your_database.* TO 'your_username'@'localhost';
Query OK, 0 rows affected (0.00 sec)

where your_username, your_password and your_database match the USERNAME, PASSWORD and DATABASE specified in config.txt’s database_url variable.

The table specified in config.txt:

table_name = turkdemo

…will be created automatically when running the psiturk shell. MySQL is (fairly) easy to install and free. However, a variety of web hosting services offer managed MySQL databases. Some are even free.

Running a MySQL database on Amazon’s Web Services Cloud

While not terribly difficult, installing and managing a MySQL database can be an extra hassle. Interestingly, when you sign up with Amazon Mechanical Turk as a requester, you also are signing up for Amazon’s Web Services a very powerful cloud-based computing platform that is used by many large web companies. One of the services Amazon provides is a fully hosted relational database server (RDS).

According to Amazon, “Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business.”

Danger

If you use Amazon’s RDS to host your MySQL database you may incur additional charges. At the present time a small RDS instance is free if you have recently signed up for Amazon Web Services. However, older account have to pay according to the current rates. This does NOT use the pre-paid mechanism that is used on Amazon Mechanical Turk. Thus launching a database server on the cloud and leaving it running run up monthly charges. You are responsible for launching and shutting down your own database instances if you use this approach. PROCEED WITH CAUTION.