So, I was recently trying to insert 8 million rows into a table in my database (just another day in grad school) and noticed that the query was taking longer than usual. However, if I just selected the 8 million rows, it would execute pretty fast. The table I was inserting into had no triggers or indices, meaning something was blocking the insert.
That combination of being able to do fast selects and not being able to do inserts gave me the impression that the table was “locked” by a transaction or something. This had happened to me before, when I had an iPython notebook open and hadn’t closed my connection, and terminated a query midway by crashing the web page.
The last time, I knew what was causing the problem, so I simply closed the database connection in my iPython notebook and everything began working smoothly again. This time around, I had no clue what was causing the idle query. There are a gazillion things that connect to my database independently — a bunch of iPython notebooks to test models and ideas, scrapers that are constantly expanding the dataset and a web server through which I serve data for visualization to name a few.
I needed a way to find this hanging query in my database, and then close it in my database. And it turns out that this is actually super simple!
First, figure out the relation id for the table that is locked. One easy way to do this, is to run your query that seems to hang forever. Then run this query:
SELECT * FROM pg_locks WHERE NOT GRANTED;
Since your query is hanging, it hasn’t been “granted” a lock and should show up on this list. Now copy the relation id of your table. Next, figure out what other locks are present on the same table.
SELECT * FROM pg_locks WHERE relation = <relationID>;
Now, be sure that you’re not killing the wrong query, or stopping some important process. Because, here’s the coolest part, you can simply kill the offending query with a single SQL statement! Just take the PID from the query above that is running (granted = true) and cancel it.
SELECT pg_cancel_backend(<pid from previous query>);
Before you do this, if you want to check some details of the query you’re going to kill, you can run this to get some sense of what they query was and when you began executing it.
SELECT
xact_start, query_start, backend_start,
state_change, state
FROM pg_stat_activity
WHERE pid IN (<pid from previous query>);
Mostly wrote this for my reference, but maybe it’ll help someone else too :)