Read a file in chunks in Python

This article is just to demonstrate how to read a file in chunks rather than all at once.

This is useful for a number of cases, such as chunked uploading or encryption purposes, or perhaps where the file you want to interact with is larger than your machine memory capacity.

# chunked file reading
from __future__ import division
import os

def get_chunks(file_size):
    chunk_start = 0
    chunk_size = 0x20000  # 131072 bytes, default max ssl buffer size
    while chunk_start + chunk_size < file_size:
        yield(chunk_start, chunk_size)
        chunk_start += chunk_size

    final_chunk_size = file_size - chunk_start
    yield(chunk_start, final_chunk_size)

def read_file_chunked(file_path):
    with open(file_path) as file_:
        file_size = os.path.getsize(file_path)

        print('File size: {}'.format(file_size))

        progress = 0

        for chunk_start, chunk_size in get_chunks(file_size):

            file_chunk =

            # do something with the chunk, encrypt it, write to another file...

            progress += len(file_chunk)
            print('{0} of {1} bytes read ({2}%)'.format(
                progress, file_size, int(progress / file_size * 100))

if __name__ == '__main__':

Also available as a Gist (

The above will output:

File size: 698837
131072 of 698837 bytes read (18%)
262144 of 698837 bytes read (37%)
393216 of 698837 bytes read (56%)
524288 of 698837 bytes read (75%)
655360 of 698837 bytes read (93%)
698837 of 698837 bytes read (100%)

Hopefully handy to someone. This of course isn’t the only way, you could also use `` in the standard library to target chunks.

Getting console.log errors with Selenium, PhantomJS in Python

So I had some functional tests passing on my workstation, but when pushed to CI environment they would fail with an “ElementNotVisibleException” exception, because scripts which created the element weren’t doing their job.

I wanted to view the browser console.log to get some clues to what went wrong on the front-end.

The selenium docs state to use:


But in my case that returned an empty list, not very useful.

I’ve found if you use log type of “har”, not in the docs:


It will return a bunch information, including, if you look carefully, some “NOT FOUND” errors for for requests triggered by Javascript code.

    'timestamp': 1436900661766,
    'message': '{"log":{"version":"1.2","creator":{"name":"PhantomJS","version":"2.0.0"},"pages":[{"startedDateTime":"2015-07-14T19:01:40.795Z","id":"","title":"John Smith - MyAppName","pageTimings":{"onLoad":2900}}],"entries":[{"startedDateTime":"2015-07-14T19:03:22.559Z","time":148,"request":{"method":"GET","url":"","httpVersion":"HTTP/1.1","cookies":[],"headers":[{"name":"Accept","value":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"},{"name":"Cache-Control","value":"max-age=0"},{"name":"User-Agent","value":"Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.0 Safari/538.1"}],"queryString":[],"headersSize":-1,"bodySize":-1},"response":{"status":200,"statusText":"OK","httpVersion":"HTTP/1.1","cookies":[],"headers":[{"name":"Date","value":"Tue, 14 Jul 2015 19:03:22 GMT"},{"name":"Server","value":"WSGIServer/0.1 Python/2.7.6"},{"name":"Vary","value":"Cookie"},{"name":"X-Frame-Options","value":"SAMEORIGIN"},{"name":"Content-Type","value":"text/html; charset=utf-8"},{"name":"Set-Cookie","value":"csrftoken=wFzWPTm9aVkGLtPuOCcc1tIs6ve5KosW; expires=Tue, 12-Jul-2016 19:03:22 GMT; Max-Age=31449600; Path=/"}],"redirectURL":"","headersSize":-1,"bodySize":5776,"content":{"size":5776,"mimeType":"text/html; charset=utf-8"}},"cache":{},"timings":{"blocked":0,"dns":-1,"connect":-1,"send":0,"wait":140,"receive":8,"ssl":-1},"pageref":""},{"startedDateTime":"2015-07-14T19:03:22.705Z","time":7,"request":{"method":"GET","url":"","httpVersion":"HTTP/1.1","cookies":[],"headers":[{"name":"Accept","value":"*/*"},{"name":"Referer","value":""},{"name":"User-Agent","value":"Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.0 Safari/538.1"}],"queryString":[],"headersSize":-1,"bodySize":-1},"response":{"status":null,"statusText":"Error downloading - server replied: NOT FOUND","httpVersion":"HTTP/1.1","cookies":[],"headers":[{"name":"Date","value":"Tue, 14 Jul 2015 19:03:22 GMT"},{"name":"Server","value":"WSGIServer/0.1 Python/2.7.6"},{"name":"X-Frame-Options","value":"SAMEORIGIN"},{"name":"Content-Type","value":"text/html"}],"redirectURL":"","headersSize":-1,"bodySize":125,"content":{"size":125,"mimeType":"text/html"}},"cache":{},"timings":{"blocked":0,"dns":-1,"connect":-1,"send":0,"wait":6,"receive":1,"ssl":-1},"pageref":""},{"startedDateTime":"2015-07-14T19:03:22.706Z","time":15,"request":{"method":"GET","url":"","httpVersion":"HTTP/1.1","cookies":[],"headers":[{"name":"Accept","value":"text/css,*/*;q=0.1"},{"name":"Referer","value":""},{"name":"User-Agent","value":"Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.0 Safari/538.1"}],"queryString":[],"headersSize":-1,"bodySize":-1},"response":{"status":null,"statusText":"Error downloading - server replied: NOT FOUND","httpVersion":"HTTP/1.1","cookies":[],"headers":[{"name":"Date","value":"Tue, 14 Jul 2015 19:03:22 GMT"},{"name":"Server","value":"WSGIServer/0.1 Python/2.7.6"},{"name":"X-Frame-Options","value":"SAMEORIGIN"},{"name":"Content-Type","value":"text/html"}],"redirectURL":"","headersSize":-1,"bodySize":154,"content":{"size":154,"mimeType":"text/html"}},"cache":{},"timings":{"blocked":0,"dns":-1,"connect":-1,"send":0,"wait":15,"receive":0,"ssl":-1},"pageref":""}]}}',
    'level': 'INFO'

So with that I found JS assets weren’t being compiled in my CI environment and was able to go ahead and fix it :)

Hopefully that’s useful to someone out there.

Why is Programming Fun?

An extract from Fred Brooks’ (Frederick P. Brooks, Jr.) book, The Mythical Man-Month.

Why is programming fun? What delights may its practioner expect as his reward?

First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design. I think this delight must be an image of God’s delight in making things, a delight shown in the distinctness and newness of each leaf and each snowflake.

Second is the pleasure of making things that are useful to other people. Deep within, we want others to use our work and to find it helpful. In this respect the programming system is not essentially different from the child’s first clay pencil holder “for Daddy’s office.”

Third is the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning. The programmed computer has all the fascination of the pinball machine or the jukebox mechanism, carried to the ultimate.

Fourth is the joy of always learning, which springs from the nonrepeating nature of the task. In one way or another the problem is ever new, and its solver learns something: sometimes practical, sometimes theoretical, and sometimes both.

Finally, there is the delight of working in such a tractable medium. The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures. (…)
Yet the program construct, unlike the poet’s words, is real in the sense that it moves and works, producing visible outputs separately from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

Programming then is fun because it gratifies creative longings built deep within us and delights sensibilities we have in common with all men.

Concurrent Jenkins builds of a Django application

If you try to run multiple Jenkins builds of a single Django project on Jenkins out of the box you might be met with a message similar to:

Got an error creating the test database: database "test_projectdb" already exists

Got an error recreating the test database: database "test_projectdb" is being accessed by other users
DETAIL:  There is 1 other session using the database.

To fix this you need to edit the ‘DATABASES’ Dictionary within your Django project settings module, adding another key ‘TEST_NAME’.

TEST_NAME is the name of the test database Django will create when running your tests with

We can make this name unique by adding the following function to our Django setting module:

def get_test_db_name():
    md5 = hashlib.md5()
    md5.update(os.environ.get('BUILD_TAG', b'no-tag'))
    return md5.hexdigest()

(This will take the unique BUILD_TAG environment variable set by Jenkins and md5 it)

And then calling it within the DATABASES Dictionary:

    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'projectdb',
        'USER': 'django',
        'PASSWORD': 'django',
        'HOST': '',
        'PORT': '',
        'TEST_NAME': get_test_db_name(),

That’s it, Jenkins should now work fine with concurrent builds of your application.


Separation of logic in Django Projects

Currently I work mostly on a large Django code-base 4+ years old in which business logic is tangled throughout the three MVC components.

Given such a large framework, developers often forget how to write well-organized Python business logic code they’re definitely capable of given the absence of the framework.

As someone who has worked on projects with these symptoms, as well as other older code bases before my Django days, the situation isn’t as bad as some very old PHP projects I’ve seen. Anyway, here are some tips which can be applied to Django applications to aid in organization overall.

Keep only database code within the models module

You’ll often see lengthy ORM queries randomly plastered throughout an application. Keeping these within the model class makes everything more maintainable.

  • Place ORM queries within your models module
  • Wrap up the queries you need using these features
  • Keep business logic out of here

Create modules for business logic

As you would with a regular Python program. Create your own modules outside of Django component structure.
Make your logic functions responsible only for logic, as in not caring about the presentation or data layer (use dependency injection).

Views should be light

Your views should only be used to glue things together, linking requests to forms, forms to your business logic and outputs to templates.
The view then becomes a simple description of how a feature is coupled.

Override forms for validation

  • Override the Django form methods to add any complex custom validation.

Keeping all of your validation code inside the forms module means errors can always to tied back to the individual inputs.

A result of the above rules is increased testability, easier adaptability and of course it’s in-keeping with the separation of concerns design principal.

Fix for: character of encoding “UTF8” has no equivalent in “LATIN1”, Ubuntu & Vagrant

This is mostly a post for if I happen over this problem again in the future.

Related to: “DETAIL: The chosen LC_CTYPE setting requires encoding LATIN1.”

The solution I found is a bit of a hack. Really you want to find why postgres has created its databases in LATIN1 encoding before installing postgres.

This script however will recreate them correctly so you can get on with some work. Run it before creating your application database(s).

#!/usr/bin/env bash
# This script changes postgres from LATIN1 to UTF8
pg_dumpall > /tmp/postgres.sql
pg_dropcluster --stop 9.1 main
pg_createcluster --locale en_US.UTF-8 --start 9.1 main
psql -f /tmp/postgres.sql

Navigation active state in Django

Here’s a clean way to display a navigation menu item’s active state in Django.

Wherever the app you’re doing this for is located you’ll have an Ensure you have the name set within each url group.

from django.conf.urls import patterns, url
from apps.pages import views

urlpatterns = patterns('',
    url(r'^pages/$', views.pages.index, name='pages.index'),
    url(r'^pages/about$', views.pages.about, name='pages.about'),

Next create a directory called templatetags within your app folder.
Add to it a blank and, giving it the below content.

from django.core.urlresolvers import resolve
from django.template import Library

register = Library()

def nav_active(request, url):
    In template: {% nav_active request "url_name_here" %}
    url_name = resolve(request.path).url_name
    if url_name == url:
        return "active"
    return ""

# nav_active() will check the web request url_name and compare it 
# to the named url group within, 
# setting the active class if they match.

Now to finish up, in your template .html file you need to load in the template tag and add it to each navigation item.

How to use Enums for Django Field.choices

In Django when using the choices parameter on a form field the format passed in must be as follows:

# within your models.Model class...
    ('0', 'freshman'),
    ('1', 'sophomore'),
    ('2', 'junior'),
    ('3', 'senior'),
student_type = models.CharField(max_length=1, choices=STUDENT_TYPE_CHOICES)

This means elsewhere in your code if you want to specify a choice field value, you’d have to enter the first slot of the tuple’s value, e.g.:

junior_students = Student.objects.filter(student_type='2')

This is pretty terrible since it’s hardcoded in our source, possibly over many files.

How to fix this mess:

First, install enum34 on the commandline

pip install enum34

In my project I added common/ containing the following:

import inspect
from enum import Enum

class ChoiceEnum(Enum):

    def choices(cls):
        # get all members of the class
        members = inspect.getmembers(cls, lambda m: not(inspect.isroutine(m)))
        # filter down to just properties
        props = [m for m in members if not(m[0][:2] == '__')]
        # format into django choice tuple
        choices = tuple([(str(p[1].value), p[0]) for p in props])
        return choices

That’s the hard work over.

Now when you create your choice field:

from common.utils import ChoiceEnum

class StudentTypes(ChoiceEnum):
    freshman = 0
    sophomore = 1
    junior = 2
    senior = 3

# within your models.Model class...
student_type = models.CharField(max_length=1, choices=StudentTypes.choices())

Now if we need to access StudentTypes from elsewhere in our source code, we can simply:

# obviously import StudentTypes
junior_students = Student.objects.filter(student_type=StudentTypes.junior.value)

That’s it. If anyone knows of a nicer way feel free to comment below.

nginx and django

Setting up NGINX + Django + uWSGI (a tutorial that actually works)

So after reading the various tutorials online for setting up NGINX + Django + uWSGI and all of them not working correctly, I decided to write my own.

This tutorial was tested on a blank install of Ubuntu Server 12.04 LTS 64-bit, if you follow the steps carefully in the correct order all should be well :)

1. Add a new user, give them sudo privileges and switch to that user, below I’ve named mine “user”.

sudo adduser user
sudo adduser user sudo
su user


2. Ensure your system hostname is set to localhost

sudo echo "localhost" > /etc/hostname
sudo hostname localhost


3. Since this is a new install, update the system.

sudo apt-get update
sudo apt-get upgrade


4. Install python, virtual environment builder and python dev

sudo apt-get install python
sudo apt-get install python-virtualenv
sudo apt-get install python2.7-dev


5. Install and start the NGINX web server

sudo apt-get install nginx
sudo service nginx start


6. Install uWSGI

sudo apt-get install uwsgi


7. Setup a Django project

sudo mkdir /var/www
sudo mkdir /var/www/
cd /var/www/
sudo mkdir venv conf src logs


This will give the below pictured folder structure
folder structure


8. Set-up the virtual environment and activate it

sudo virtualenv /var/www/
source /var/www/


9. Install Django

sudo pip install django


10. Change to the “src” directory, then copy your Django project files into it

cd /var/www/


10. Create your uwsgi.ini config file, with the below content

sudo nano /var/www/
# variables
projectname = example_project
projectdomain =
base = /var/www/

# config
plugins = python
master = true
protocol = uwsgi
env = DJANGO_SETTINGS_MODULE=%(projectname).settings
pythonpath = %(base)/src/%(projectname)
module = %(projectname).wsgi
socket =
logto = %(base)/logs/uwsgi.log
#below line runs it as a daemon in background
daemonize = /var/log/uwsgi/example_project.log


11. Create an NGINX config file for this domain, with the below content

sudo nano /var/www/
server {
    listen 80;
    root /var/www/;
    access_log /var/www/;
    error_log /var/www/;

    location /static/ { # STATIC_URL
        alias /var/www/; # STATIC_ROOT
        expires 30d;

    location /media/ { # MEDIA_URL
        alias /var/www/; # MEDIA_ROOT
        expires 30d;

    location / {
        include uwsgi_params;


12. Edit the main nginx.conf to import our domain conf file, see below content as a guide

sudo nano /etc/nginx/nginx.conf
user    www-data;
# ...
http {
    # ...
    include /var/www/*/conf/nginx.conf;
    # ...


13. Restart NGINX (to load apply our config changes)

sudo service nginx restart


14. Install MySQL and secure it

sudo apt-get install mysql-server
sudo mysql_secure_installation


15. Install Python MySQL and uWSGI plugins

sudo apt-get install python-mysqldb
sudo apt-get install uwsgi-plugin-python


16. Install south (optional)

sudo pip install south


17. Test that uWSGI is working

sudo uwsgi --ini /var/www/

If you visit your site it should now show django. If it doesn’t common causes are:

  • ALLOWED_HOSTS in isn’t set
  • DEBUG isn’t off in
  • Database isn’t configured


18. Setup uWSGI to run on system boot

Create the following file, with the below content

sudo nano /etc/init/uwsgi.conf
# Emperor uWSGI script

description "uWSGI Emperor"
start on runlevel [2345]
stop on runlevel [06]

exec uwsgi --master --die-on-term --emperor /var/www/


19. Now reboot the server and navigate to your website

sudo reboot