RedShift Reconstructing SQL from STL_QUERYTEXT
If you are managing the RedShift clusters then STL_QUERY and STL_QUERYTEXT tables are not new to you. STL_Query can’t hold the complete SQL query instead we can use STL_QueryText to read the complete query. But there is a challenge, we can’t read that table as it is. Since your queries are saved in multiple rows. So we need to combine all these rows into a single row with LISTAGG function which is well documented here.
What is the real challenge? #
From the AWS documentation example, they are showing to reconstruct the SQL for a single query. But in a production system, you may have 50k+ queries where the LISTAGG function will through some errors related to its limitation. Let’s simulate this. I ran the below query on my production RedShift Cluster.
SELECT Listagg(
CASE
WHEN Len(Rtrim(text)) = 0 THEN text
ELSE Rtrim(text)
END, '') within GROUP (ORDER BY SEQUENCE) AS text
FROM stl_querytext limit 10;
ERROR: Result size exceeds LISTAGG limit
DETAIL:
-----------------------------------------------
error: Result size exceeds LISTAGG limit
code: 8001
context: LISTAGG limit: 65535
query: 180132
location: 0.cpp:228
process: query0_78_180132 [pid=23735]
-----------------------------------------------
Why this error? #
If you read the LISTAGG function details, If the result set is larger than the maximum VARCHAR size (64K – 1, or 65535), then LISTAGG returns the error. Here I have more than 50k+ rows. So the overall characters will not fit in the result set.
What is the solution?: #
If we run the reconstruct query one by one instead of all then the resultset will fit into the LISTAGG’s limitation. Lets say one of my queries split into 2 rows in the STL_QUERYTEXT table.
SELECT text
FROM stl_querytext
WHERE query = 97729
ORDER BY SEQUENCE;
First, process these two rows and then process another query and then the next one.
How? #
We have userid
,pid
,xid
,query
columns are common between these two rows. So partition your resultset by query wise. Then do the LISTAGG.
LISTAGG(
CASE
WHEN LEN(RTRIM(text)) = 0 THEN text
ELSE RTRIM(text)
END,
''
) within group (
order by
sequence
) over (PARTITION by userid, xid, pid, query)
Replace \n and \r characters: #
If you are going to run this query from psql client most of the queries starts with \r\nselect * from mytbl
or any GUI based tools will return \n
characters.
- \r actually a carriage return
- \n is not a new line its a sting.
So we have to replace twice.
replace(replace( LISTAGG( CASE WHEN LEN(RTRIM(text)) = 0 THEN text ELSE RTRIM(text) END, '' ) within group ( order by sequence ) over (PARTITION by userid, xid, pid, query), '\r','' ), '\\n', '')
Final View: #
For east management, Im going to create a view on top of the STL_QUERYTEXT table to return the reconstructed SQL for all the queries.
CREATE VIEW recon_stl_querytext
AS
(SELECT DISTINCT userid,
xid,
pid,
query,
Replace(Replace(Listagg(CASE
WHEN Len(Rtrim(text)) = 0 THEN text
ELSE Rtrim(text)
END, '')
within GROUP ( ORDER BY SEQUENCE ) over (
PARTITION BY userid, xid, pid, query),
'\r'
, ''), '\\n', '')
AS text
FROM stl_querytext
ORDER BY 1,
2,
3,
4);
Now you can view the proper data from this view.
SELECT *
FROM recon_stl_querytext
WHERE query = 97729;
I hope this helps you to get the overall consolidated view on the query history. If you are interested in how to Audit RedShift Historical Queries With pgbadger please click here.