Posts

Showing posts from June, 2015

Digging into python memory issues in ckan with heapy

Image
So we had a report about a memory leak when using the ckan datastore extension, where large queries to the datastore would leak large amounts of memory per request. It wasn't simple to get to the bottom of it, at first I couldn't recreate it the leak at all. The test data I was using was the STAR experiment csv files, which I found when I googled 'Large example csv files'. The reporter Alice Heaton, had kindly written a script that would recreate the leak. Even with this, I could not recreate the problem, until I upped the number of rows fetched by a factor of ten. I suspect that Alice has more data per column with perhaps large text fields instead of the mainly numeric data of the STAR experiment data I was using. Once I could reliably recreate the problem, I ended up poking around using heapy , which I've used previously to track down similar problems and inserted some code to setup heapy and an ipdb breakpoint from guppy import hpy hp = hpy() heap