How Low Cardinality Columns Cause InnoDB Temp File Overload in Amazon RDS
We were working on a MySQL issue where MySQL Server went down for 5 to 10mins, not able to access from the application or any client tools. When we checked further it went to a stage where the root cause was not having an index on a low cardinality column. A seemingly harmless column with low cardinality—values ranging from just 1 to 10. It doesn’t look like much, but under the hood, it’s wreaking havoc, forcing MySQL to churn through unnecessary temporary writes.
In this post, we’re diving deep into why low cardinality columns can trigger massive InnoDB temp file overloads, how it impacts Amazon RDS, and—most importantly—how you can fix it before it brings your database to its knees.
We can see the more connections got pilled, the server was getting stuck somewhere, so it was unable to process any request.
The number of queries(execution queries) got drastically reduced by 50%
we can see a huge write-ops at that time.
From the InnoDB metrics, we can see that the IO operation is pending(more than 1k operations) which stats that, the write ops were pilling.
On the wait events, we can see the innodb_tmp_file wait event is more than 1k sessions. Ideally, this wait event occurs,
innodb_temp_file
- This is specific to InnoDB engine operations
- Used for large sorts and large result set
Temp file latency is high(1 second)
Tables access during that time: The tablc ABC(masked) access 140+ for read and 140+ for insert. This is stating, some query on this table might caused the issue.
The top 1 query during the issue is pointing to the same table. 36+ queries per second
This table has index on column XXX but inning doesn’t have it. This is the main cause for the InnoDB did too much tmp write to match the records.
The high temp file usage was from InnoDB needing to handle large intermediate results due to the suboptimal index
Here is the visual representation of the query execution. Lets say the INDEX COLUMN filtered 25 rows, and then to filer NON INDEXED COLUMN it has to do full table scan. But this table has 380 million rows.