Running psiTurk on Heroku

Heroku is a cloud service that lets you run applications in the cloud. You can run psiTurk on Heroku by preparing a git repository and then pushing it to Heroku which will deploy and autorun the code for you.

The benefits of Heroku include the following:

  • It’s somewhat easier to manage than Amazon Web Services EC2 for the tech-wary (no need for security groups, no need to ssh in).
  • You can set up a free PostgreSQL server (which is highly recommended to use over the default SQLite database that psiTurk uses). A database server is required on heroku as files, including participants.db, are ephemeral. Data would be lost every time the app spins down.
  • You get free SSL for hosting your own ad.
  • It’s scalable.
  • You get a Heroku buffering server in front of your psiTurk gunicorn instance, which helps with performance a little bit.

One downside with Heroku is that it can get expensive if you need any kind of horsepower beyond 512MB memory and one node.

What follows is a step-by-step tutorial for setting up a psiTurk example experiment on Heroku (both the experiment itself and ad) with a PostgreSQL database for collecting data.

All commands listed in this tutorial are meant to be typed into your terminal application.

  1. Go to the Heroku website and create a new account if you don’t already have one.

  2. Make sure that psiTurk, git, and the Heroku Command Line Interface are installed on your computer.

    • If you don’t already have a psiturk experiment:

      Create a psiTurk example at a desired location

      psiturk-setup-example
      

      Navigate into your newly created psiTurk example folder:

      cd psiturk-example
      
    • If you are starting from an already-existing psiturk project:

      Navigate to your project root directory.

  3. If your experiment is not already in a git repository: Initialize a Git repository in the root dir of your psiturk project the psiTurk (your current working directory):

    git init
    
  4. Log in to Heroku, entering your heroku credentials when promted for them:

    heroku login
    
  5. Create a new app on Heroku:

    heroku create
    

    Note

    Running this command will add a git remote to your .git/config file, which will make it so that any heroku commands run from your project folder will be run against your newly-created heroku app.

  6. Run the following psiturk shell command:

    psiturk-heroku-config
    

    Running this command copies all files from psiturk’s heroku_files folder into your experiment’s root directory. These are needed for your experiment to run on Heroku.

    This command also runs heroku config:set ON_CLOUD=1 in your shell on your behalf. This sets an environment variable called ON_CLOUD to the value 1 in your heroku app’s environment. Setting ON_CLOUD=1 in your environment tells psiturk to use some sensible defaults for several config settings. Specifically, it sets defaults for host, threads, errorlog, and accesslog.

    Warning

    Heads up! The sample config.txt file generated by psiturk 3 shows defaults in your config.txt commented out (prepended with a ;). Cloud defaults will override any defaults that are commented-out in your config.txt.

    But if the cloud defaults are set in your config.txt then the cloud defaults will be overridden. To remedy this, you will need to either:

    1. change them in your config.txt or re-comment them out, or
    2. set environment variables on heroku for the corresponding cloud defaults that take precedence over your config.txt values.

    For the latter, any of the config settings can be overridden in the heroku environment by setting PSITURK_{uppercase_config_name} via heroku config:set. For example, to override a config.txt threads on heroku, one could run the following:

    heroku config:set PSITURK_THREADS=1
    
  7. Set a database that your heroku app will use.

    • To get a free heroku-hosted postgresql database:

      Create a Postgres database on the newly created Heroku app:

      heroku addons:create heroku-postgresql
      

      This will provision a psiturk-compatible postgresql database, and set an environment variable on your app called DATABASE_URL that points to your database.

      To see the DATABASE_URL given to you by heroku for this newly-provisioned postgresql database, you can run the following:

      heroku config
      

      Important

      This URL includes your username and password. Anyone who has access to the database_url can connect to your database and has access to the data stored in it!

    • If you already have a publicly-accessible database hosted elsewhere:

      Then you can do one of the following:

      1. list its url as your database_url in your config.txt and be sure that DATABASE_URL is not set in your heroku environment (check heroku config), or
      2. set its url in your heroku environment (heroku config:set DATABASE_URL=your-url)

    Important

    psiTurk prefers environment variables over all other config file settings. Most environment settings need to prepend PSITURK_ to the corresponding config setting name, with the exception of two environment variables:

    1. DATABASE_URL
    2. PORT

    These two, if present in the environment, are respected even if not prepended by PSITURK_.

    This means that if DATABASE_URL is set in your heroku environment, it will override any setting you have in config.txt.

  8. Optional: if you want to use the psiturk dashboard from your heroku instance to run AWS some commands, or if you want your heroku instance to run any tasks created by the dashboard:

    • Set your AWS credentials as environment variables within your heroku app, replacing <XYZ> with your access and secret keys for Amazon Web Services:

      heroku config:set aws_access_key_id=<XYZ>
      heroku config:set aws_secret_access_key=<XYZ>
      
  9. Stage all the files in your psiTurk example to your Git repository:

    git add .
    
  10. Commit all the staged files to your Git repository:

    git commit -m "Initial commit"
    
  11. Push the code to your Heroku git remote, which will trigger a build process on Heroku, which, in turn, runs the command specified in Procfile, which autolaunches your psiTurk server on the Heroku platform:

    git push heroku master
    

    Note

    Any time you want to push changes to your heroku-hosted psiturk experiment, you will need to repeat the above flow of git add, git commit, git push.

  12. You can run through your heroku-hosted experiment by visiting your heroku app’s url.

    To get your app’s url, run heroku domains from the root of your local psiturk app, and visit your app’s reported domain url in a browser. From that url, you can conveniently obtain a debugging url by clicking “Begin by viewing the ad.”

  13. To download data from your heroku app using a locally-run psiturk, set your local psiTurk app to use the same database that your experiment uses when it runs on heroku.

    To do so, get the DATABASE_URL of your heroku psiturk instance by running heroku config, and set the database url in any of the following local places:

    1. your config.txt file, or
    2. your own local environment.

    Warning

    If you opt to set your database url in your config.txt file, then be cautious about sharing your experiment code – the url contains your database username and password!

    Once your local psiturk app uses the same database as your heroku app, then you can run the following to download your experiment data, regardless of whether you have run through your experiment hosted locally or on Heroku:

    psiturk download_datafiles
    

    This should generate three datafiles for you in your local directory:

    • trialdata.csv,
    • questiondata.csv, and
    • eventdata.csv.

    Congratulations, you’ve now gathered data from an experiment running on Heroku!

    Note

    psiTurk will look for a file called .env in the root of your psiturk app and read in any KEY=VALUE settings in there as environment variables for your psiturk app. Therefore, one could put the following content in a file called .env to set the database_url:

    DATABASE_URL=url-for-your-publicy-accessible-database
    
  14. To post a hit to MTurk that uses your heroku app, set your local psiTurk config.txt’s ad_url settings to point to your heroku app. The easiest way to do this is to set ad_url_domain in your config.txt’s [HIT Configuration] section to equal your heroku domain name.

    For example, if running heroku domains reported that your heroku domain was example-app.herokuap.com, then you would simply set ad_url_domain = example-app.herokuapp.com in your config.txt’s [HIT Configuration] setting. With that, HITs posted to mturk should correctly point to your heroku app.

    See also

    See the Hit Configuration – Ad Url for more information.

From your local psiTurk session, you can now create and modify HITs. When these are accessed by Amazon Mechanical Turk workers, the workers will be directed to the psiTurk session running on your Heroku app. This means that it is never necessary to launch psiTurk and run server on from anywhere to run an experiment on Heroku. The server is automatically running, accessible via your Heroku domain url. (Of course, if you want to debug locally, you can still run a local server.)

Note

If you stay on the “Free” Heroku tier, your app will go to “sleep” after a period of inactivity. If your app has gone to sleep, it will take a few seconds before it responds if you visit its url. It should respond quickly once it “awakens”. Consider upgrading to a “Hobby” heroku dyno to prevent your app from going to sleep.

Note

If you want to run commands against your postgresql db, you can run heroku pg:psql to connect, from where you can issue postgres commands. You can also connect directly to your heroku postgres db by installing and running postgresql on your local machine, and passing the DATABASE_URL that your heroku app uses as a command-line option.