Read a file in chunks in Python

This article is just to demonstrate how to read a file in chunks rather than all at once.

This is useful for a number of cases, such as chunked uploading or encryption purposes, or perhaps where the file you want to interact with is larger than your machine memory capacity.

# chunked file reading
from __future__ import division
import os

def get_chunks(file_size):
    chunk_start = 0
    chunk_size = 0x20000  # 131072 bytes, default max ssl buffer size
    while chunk_start + chunk_size < file_size:
        yield(chunk_start, chunk_size)
        chunk_start += chunk_size

    final_chunk_size = file_size - chunk_start
    yield(chunk_start, final_chunk_size)

def read_file_chunked(file_path):
    with open(file_path) as file_:
        file_size = os.path.getsize(file_path)

        print('File size: {}'.format(file_size))

        progress = 0

        for chunk_start, chunk_size in get_chunks(file_size):

            file_chunk = file_.read(chunk_size)

            # do something with the chunk, encrypt it, write to another file...

            progress += len(file_chunk)
            print('{0} of {1} bytes read ({2}%)'.format(
                progress, file_size, int(progress / file_size * 100))
            )

if __name__ == '__main__':
    read_file_chunked('some-file.gif')

Also available as a Gist (https://gist.github.com/richardasaurus/21d4b970a202d2fffa9c)

The above will output:

File size: 698837
131072 of 698837 bytes read (18%)
262144 of 698837 bytes read (37%)
393216 of 698837 bytes read (56%)
524288 of 698837 bytes read (75%)
655360 of 698837 bytes read (93%)
698837 of 698837 bytes read (100%)

Hopefully handy to someone. This of course isn't the only way, you could also use `file.seek` in the standard library to target chunks.

One thought on “Read a file in chunks in Python

Leave a Reply

Your email address will not be published. Required fields are marked *