The Data Guy

Sort key column compressed

RStoolKit Result: Sort key column compressed

Generally its best practice to compress all the columns in RedShift to get the better performance. But you should not compress the sort key columns. If you have multiple column in your sort key then don’t compress the first column of the sort key.

RedShift using both early materialization and late materialization. In early materialization the data will be filtered at the first step before a join or a filter. But in late materialization, the row numbers will be filtered instead of the data. For late materialization its ok to compress the sort key column, but RedShift will not use Late materialization for all the queries.

Pro Tip: A detail blog post with visual example of how sort key compression is affecting the performance.

Find tables with Sort key column compressed:

select
	"schema",
	"table"
FROM
	pg_catalog.svv_table_info
WHERE
	sortkey1 IS NOT NULL
	AND sortkey1_enc <> 'none'
	AND "schema" not like 'pg_temp%';

How to fix this problem?

We cannot change/disable/enable the compression on the existing columns in redshift. So create a new table and copy the data from the existing table.

  1. RedShift table design playbook
  2. Why you shouldn’t compress the sortykey column from SO?
  3. Another real world example of how compressed sort key is affecting performance by an AWS engineer
  4. How to choose the right sort key on existing table
  5. Analyze the table design
  6. Table inspector from RedShift utilities
· aws, redshift, rstoolkit

Advertisement


Loading Comments