sureshanaparti commented on PR #8737: URL: https://github.com/apache/cloudstack/pull/8737#issuecomment-1991683719
> > > @JoaoJandre We can't run explain analyze with delete commands. Adding the index also helps reduce the load on database while cleaning up the entries. Otherwise the DB will have to go through each record to filter out all the entries which can cause spike in I/O as well. I understand that if the administrator reduces the retention period, it can cause issues but having the index will speed up deletion which runs every minute. > > > > > > @vishesh92 you are proposing to add an index to improve DELETE operations; however, you are presenting results for SELECT operations. Indeed, indexes can improve SELECT operations, which only list the data. However, the DELETE operation works different: it removes data, recalculates statistics, reorganizes indexes and more. Please, present some tests and results for DELETE operations; otherwise, it is not feasible to sustain the reason for adding those indexes. > > @JoaoJandre Here are the results for a delete operation where the filter doesn't matches any rows. > > ```sql > MySQL root@(none):test> DELETE FROM vm_stats WHERE timestamp < '1950-01-01'; > Query OK, 0 rows affected > Time: 0.001s > MySQL root@(none):test> DELETE FROM vm_stats WHERE timestamp < '1950-01-01'; > Query OK, 0 rows affected > Time: 0.001s > ``` > > query plan > > ```sql > +------------------------------------------------------------------------------------------------+ > | EXPLAIN | > |------------------------------------------------------------------------------------------------| > | { | > | "query_block": { | > | "select_id": 1, | > | "table": { | > | "delete": true, | > | "table_name": "vm_stats", | > | "access_type": "range", | > | "possible_keys": [ | > | "temp_idx" | > | ], | > | "key": "temp_idx", | > | "used_key_parts": [ | > | "timestamp" | > | ], | > | "key_length": "5", | > | "ref": [ | > | "const" | > | ], | > | "rows_examined_per_scan": 1, | > | "filtered": "100.00", | > | "attached_condition": "(`test`.`vm_stats`.`timestamp` < TIMESTAMP'1950-01-01 00:00:00')" | > | } | > | } | > | } | > +------------------------------------------------------------------------------------------------+ > ``` > > After dropping the index > > ```sql > MySQL root@(none):test> DELETE FROM vm_stats WHERE timestamp < '1950-01-01'; > Query OK, 0 rows affected > Time: 0.364s > MySQL root@(none):test> DELETE FROM vm_stats WHERE timestamp < '1950-01-01'; > Query OK, 0 rows affected > Time: 0.362s > ``` > > query plan > > ```sql > +------------------------------------------------------------------------------------------------+ > | EXPLAIN | > |------------------------------------------------------------------------------------------------| > | { | > | "query_block": { | > | "select_id": 1, | > | "table": { | > | "delete": true, | > | "table_name": "vm_stats", | > | "access_type": "ALL", | > | "rows_examined_per_scan": 660340, | > | "filtered": "100.00", | > | "attached_condition": "(`test`.`vm_stats`.`timestamp` < TIMESTAMP'1950-01-01 00:00:00')" | > | } | > | } | > | } | > +------------------------------------------------------------------------------------------------+ > ``` > > As you can see, without the index it takes around 0.36 seconds because it has to go through each row in the database. It takes 0.001 seconds with the index because it doesn't need to go through all the rows in the table. @vishesh92 _vm_stats_ table data, timestamp column cardinality in both cases are the same? please check/confirm results with the filter matching rows, and same delete count. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org