Hello everyone, I have an increasingly popular web application running on top of mysql and due to its popularity, I'm running into performance issues. After carefully examining database indexes and tuning queries I was able to pin down the slowest part of the system.
The app's got a user segmentation tool that allows you to filter users based on a range of criteria from which the slowest is: "Select all users that did not receive the email of id 100" To answer this question we turn to the activities table, which is basically a denormalized log of actions taken by the user in this format: user_id | activity | email_id | ... 10 | email_sent | 100 | ... 10 | subscribed | NULL | ... 10 | email_open | 100 | ... Given this table and the question above, the usual way of finding out all users who did not receive this email is through the use of a left outer join, such as: select u.id from users u left outer join activities a on u.id = a.user_id and a.activity = 'email_sent' and a.email_id = 100 where a.user_id is null That's all fine for medium-ish tables. However our current activities table has over 13 million rows, slowing the hell out of this left outer join, taking about 52 seconds in my machine. What this query is trying to do is to get the relative complement of set A(users) to B(activities). As far as I know mysql doesn't support set subtraction, thus the reason for these queries being slow. Based on that I've setup a test database on Postgresql, which supports this very set operation and rewrote the query to look like this: select u.id from users u except select a.user_id from activities a where a.activity = 'email_sent' and a.email_id = 100; The fact that postgresql knows how to subtract sets brought this query down to only 4 seconds. My question then is: since this is a somewhat common query in our system, are there any workarounds I could use in mysql to improve things? I did find one myself, but it's a bit convoluted and might not perform well under load, but the following sql script gives me similar performance in mysql: create temporary table email_sent_100 select a.user_id from user_activity_events a where a.activity = 'email_sent' and a.email_id = 100; create index user_id_idx on email_sent_100(user_id); //this could potentially bring the runtime down in the case of a larg temp table. select count(u.id) from users u left outer join email_sent_100 s on u.id = s.user_id and s.user_id is null; A lot more lines and a lot more complex, but does the job in this example. I'd appreciate your thoughts. Cheers, Leonardo Borges www.leonardoborges.com