Why RedShift is showing rows_pre_user_filter is zero
I was working for a script to figure out the Ghost rows in all the tables based on the
STL_SCAN. This system table has a detailed view of your query execution. The column
rows_pre_filter will tell you how many rows fetched from the disk including the rows that are marked for delete(Ghost rows). Then the next step is to filter that row offset by removing the Ghost rows. Then it’ll apply the user filters like where conditions. I found strange behavior in this
Let’s say you have a table with 10 rows. Then you delete the 5 rows.
Now get the query ID for the last executed select query.
Let’s see its execution from the
RedShift fetches 10 rows from the disk. After removing the deleted rows it returned 10 rows(since we didn’t delete any rows). Let’s delete some rows and run the select query again.
This time see the query execution from the
STL_SCAN. Get the last executed Query ID.
This time you can see the
rows_pre_user_filter is 5 which is the row count after eliminating the Ghost rows. This value will still remain the same even you apply any filters. select * from grows_test where id=1; If you check the
rows_pre_user_filter will be 5.
But in one of my production cluster, while checking the
STL_SCAN, I found that the
rows_pre_user_filter was 0. Then means after eliminating the dead rows nothing is there. That’s the meaning. But it’s not true, it is one of my largest tables in the cluster. It should not say zero rows. But when I see the rows column(the number of rows processed) in
STL_SCAN it has some number. How this is possible?
For all the slices it is showing zero only. Then I started looking at the other columns as well.
RedShift supports late materialization from 2017 onwards. This late materialization will help us to do the row-level filtering while fetching the data from the disk itself. So instead of reading the whole block, only filtered rows will be selected. This is the catch here. Redshift will not use late materialization for all the queries, but whenever it’s using the late materialization for a query then from the
STL_SCAN table it’ll mark the is_rlf_scan as true. Then while checking further I noticed that if the query uses late materialization then the
rows_pre_user_filter is zero.