Postgres Query Debugging
Postgres EXPLAIN and ANALYZE explained and analyzed
Postgres EXPLAIN or EXPLAIN ANALYZE is a Postgres command that accepts a statement such as SELECT…, UPDATE…, or DELETE… and returns the execution plan for executing that query within Postgres. When using the ANALYZE keyword the query is also then executed to compare the theoretical values determined by EXPLAIN with the actual values during query execution.
SELECT DISTINCT tasks.*
FROM persons client
LEFT JOIN cases ON cases.client_id = client.id
LEFT JOIN caseitem_task ON caseitem_task.caseitem_id = cases.id
LEFT JOIN person_task ON person_task.person_id = client.id
LEFT JOIN tasks ON person_task.task_id = tasks.id OR caseitem_task.task_id = tasks.id
WHERE
(client.supervisioncaseowner_id = 63 AND tasks.caseownertask = TRUE)
OR tasks.assignee_id = 63
ORDER BY tasks.duedate ASC;
Running EXPLAIN ANALYZE on the query above gives the following output:
Unique (cost=11.85..11.89 rows=1 width=631) (actual time=0.145..0.145 rows=0 loops=1)
-> Sort (cost=11.85..11.86 rows=1 width=631) (actual time=0.145..0.145 rows=0 loops=1)
Sort Key: tasks.duedate, tasks.id, tasks.assignee_id, tasks.type, tasks.systemtype, tasks.status, tasks.priority, tasks.activedate, tasks.name, tasks.description, tasks.completeddate, tasks.createdtime, tasks.caseownertask
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=0.44..11.84 rows=1 width=631) (actual time=0.044..0.044 rows=0 loops=1)
Filter: ((person_task.task_id = tasks.id) OR (caseitem_task.task_id = tasks.id))
-> Nested Loop Left Join (cost=0.30..11.40 rows=1 width=639) (actual time=0.043..0.043 rows=0 loops=1)
-> Nested Loop Left Join (cost=0.14..10.67 rows=1 width=639) (actual time=0.043..0.043 rows=0 loops=1)
-> Nested Loop (cost=0.00..10.38 rows=1 width=635) (actual time=0.043..0.043 rows=0 loops=1)
Join Filter: (((client.supervisioncaseowner_id = 63) AND tasks.caseownertask) OR (tasks.assignee_id = 63))
-> Seq Scan on tasks (cost=0.00..3.40 rows=1 width=631) (actual time=0.042..0.042 rows=0 loops=1)
Filter: (caseownertask OR (assignee_id = 63))
Rows Removed by Filter: 112
-> Seq Scan on persons client (cost=0.00..4.59 rows=159 width=8) (never executed)
-> Index Scan using idx_1c1b038b19eb6921 on cases (cost=0.14..0.28 rows=2 width=8) (never executed)
Index Cond: (client_id = client.id)
-> Index Scan using idx_d8985f0e217bbb47 on person_task (cost=0.15..0.62 rows=11 width=8) (never executed)
Index Cond: (person_id = client.id)
-> Index Scan using idx_23f3a0c342b2f274 on caseitem_task (cost=0.14..0.41 rows=2 width=8) (never executed)
Index Cond: (caseitem_id = cases.id)
Planning time: 1.076 ms
Execution time: 0.269 ms
So what exactly does this all mean
-> Seq Scan on tasks (cost=0.00..3.40 rows=1 width=631) (actual time=0.042..0.042 rows=0 loops=1)
The numbers that are quoted in parentheses are (left to right):
- Estimated start-up cost. This is the time expended before the output phase can begin, e.g., time to do the sorting in a sort node.
- Estimated total cost. This is stated on the assumption that the plan node is run to completion, i.e., all available rows are retrieved. In practice a node’s parent node might stop short of reading all available rows (see the LIMIT example below).
- Estimated number of rows output by this plan node. Again, the node is assumed to be run to completion.
- Estimated average width of rows output by this plan node (in bytes).
The costs are measured in arbitrary units determined by the planner’s cost parameters. Traditional practice is to measure the costs in units of disk page fetches; that is, seq_page_cost is conventionally set to 1.0 and the other cost parameters are set relative to that. The rows value is a little tricky because it is not the number of rows processed or scanned by the plan node, but rather the number emitted by the node. This is often less than the number scanned, as a result of filtering by any WHERE-clause conditions that are being applied at the node.
Note that the actual time values are in milliseconds of real time, whereas the cost estimates are expressed in arbitrary units; so they are unlikely to match up. The thing that’s usually most important to look for is whether the estimated row counts are reasonably close to reality. In this example the estimates were all dead-on, but that’s quite unusual in practice.
In some query plans, it is possible for a subplan node to be executed more than once. For example, the inner index scan will be executed once per outer row in the above nested-loop plan. In such cases, the loops value reports the total number of executions of the node, and the actual time and rows values shown are averages per-execution. This is done to make the numbers comparable with the way that the cost estimates are shown. Multiply by the loops value to get the total time actually spent in the node.
Postgres builds a tree structure of plan nodes representing the different actions taken, with the root and each -> pointing to one of them. In some cases EXPLAIN ANALYZE provides additional execution statistics beyond the execution times and row counts, such as Sort (shown above) and Hash. Any line other than the first without an -> is such information, so the structure of our query is:
Unique
└──Sort
└──Nested Loop Left Join
└── Nested Loop Left Join
| ├── Nested Loop Left Join
| | ├── Nested Loop
| | | ├── Seq Scan
| | | └── Seq Scan
| | └── Index Scan
| └── Index Scan
└── Index Scan
Each tree’s branches represent sub-actions, and you’d work inside-out to determine what’s happening first. In the query above the first action taken is:
| | | ├── Seq Scan
| | | └── Seq Scan
These correspond to the WHERE clause in the query. Expanded out these appear as follows:
-> Seq Scan on tasks (cost=0.00..3.40 rows=1 width=631) (actual time=0.042..0.042 rows=0 loops=1)
Filter: (caseownertask OR (assignee_id = 63))
Rows Removed by Filter: 112
-> Seq Scan on persons client (cost=0.00..4.59 rows=159 width=8) (never executed)
The first Seq Scan entry relates to the portions of the WHERE clause dealing with the tasks table. Postgres does some clever optimisation here and deals with WHERE clauses on a table basis, as you can see in the Filter section of the Sequential Scan it has reordered the WHERE clause in to (caseownertask OR (assignee_id = 63)) so it can process all task related WHERE clauses at the same time. The second Seq Scan entry deals with the client table and contains some very important key phrases, (not executed). This means Postgres determined that no results would be returned and as such stopped execution at this point, this basically means we have not clients with a supervisioncaseownerid equal to 63. Going back to the first Seq Scan entry, this hints one of 2 things. Either we don’t have a relevant index for this portion of the query to use or Postgres has decided doing a Seq Scan is more efficient that an Index Scan. Why might Postgres do this? The reason why this is the case is that indexes have a cost to create and maintain (on writes) and use (on reads). When an index is used in a SELECT query, first the position of the requested rows is fetched from the index (instead of from the table directly). Then, the requested rows (a subset of the total) are actually fetched from the table. It’s a two step process. Given a large enough number of requested rows (such as all the posts with an id greater than 10), it is actually more efficient to go fetch all those rows from the table directly rather than to go through the extra step of asking the index for the position of those rows and next fetch those specific rows from the table. If the percentage of the rows is smaller than 5-10% of all rows in the table the benefit of using the information stored in the index outweighs the additional intermediate step. In this case, the total number of rows in the table is 113 and the number of rows returned is 1. As such Postgres should be using an Index Scan and therefore we can assume that either Postgres has a bug (unlikely) or that or table doesn’t have the correct fields indexed for this query (very likely).
Red flags to be wary of
So as a quick rule of thumb what should you be scanning for in the execution plan when debugging performance issues?
- Anything not using an Index Scan! There might be a good reason for it, but usually not, and anything other than an Index Scan can be a real performance hog at scale.
- Large values in “rows”. If you’re joining hundreds of thousands of rows the query plan will tell you and you should really do whatever pre-filtering you can to bring this number down.
- Large values in “width”. If you’re bringing back thousands of bytes of data per database row you should really question if you need to look at normalising the tables you are querying.