On Thu, Dec 11, 2014 at 6:52 PM, Robert DiFalco <robert.difa...@gmail.com> wrote:
> Thanks Arthur. I don't think there is as big a different between BIGINT > and INTEGER as you think there is. In fact with an extended filesystem you > might not see any difference at all. > > As I put in the first emal I am using a GIST index on user.name. > > I was really more interested in the LEFT OUTER JOINs vs EXISTS queries and > if there was a better alternative I had not considered. > > On Tue, Dec 9, 2014 at 11:44 AM, Arthur Silva <arthur...@gmail.com> wrote: > >> On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco <robert.difa...@gmail.com> >> wrote: >> >>> I'm sorry, I missed a JOIN on the second variation. It is: >>> >>> SELECT u.id, u.name, u.imageURL, u.bio, >>> CASE >>> WHEN f.friend_id IS NOT NULL THEN 'isFriend' >>> WHEN s.to_id IS NOT NULL THEN 'hasSentRequest' >>> WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest' >>> ELSE 'none' >>> END AS 'friendStatus', >>> (SELECT COUNT(1) AS d >>> FROM friends f1 >>> JOIN friends f2 ON f1.fiend_id = f2.friend_id >>> WHERE f1.user_id = 33 AND f2.user_id = u.id) >>> FROM users u >>> *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id >>> <http://u.id>* >>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id >>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33 >>> WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name; >>> >>> >>> On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco < >>> robert.difa...@gmail.com> wrote: >>> >>>> I have users, friends, and friend_requests. I need a query that >>>> essentially returns a summary containing: >>>> >>>> * user (name, imageURL, bio, ...) >>>> * Friend status (relative to an active user) >>>> * Is the user a friend of the active user? >>>> * Has the user sent a friend request to the active user? >>>> * Has the user received a friend request from the active user? >>>> * # of mutualFriends >>>> * Exclude the active user from the result set. >>>> >>>> So I have mocked this up two ways but both have complicated query plans >>>> that will be problematic with large data sets. I'm thinking that my lack of >>>> deep SQL knowledge is making me miss the obvious choice. >>>> >>>> Here's my two query examples: >>>> >>>> SELECT u.id, u.name, u.imageURL, u.bio, >>>> CASE >>>> WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND >>>> f.friend_id = u.id) THEN 'isFriend' >>>> WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33 >>>> AND s.from_id = u.id) THEN 'hasSentRequest' >>>> WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id >>>> AND r.from_id = 33) THEN 'hasReceivedRequest' >>>> ELSE 'none' >>>> END AS "friendStatus", >>>> (SELECT COUNT(1) >>>> FROM friends f1 >>>> JOIN friends f2 ON f1.friend_id = f2.friend_id >>>> WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends >>>> FROM users u >>>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name; >>>> >>>> SELECT u.id, u.name, u.imageURL, u.bio, >>>> CASE >>>> WHEN f.friend_id IS NOT NULL THEN 'isFriend' >>>> WHEN s.to_id IS NOT NULL THEN 'hasSentRequest' >>>> WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest' >>>> ELSE 'none' >>>> END AS 'friendStatus', >>>> (SELECT COUNT(1) AS d >>>> FROM friends f1 >>>> JOIN friends f2 ON f1.fiend_id = f2.friend_id >>>> WHERE f1.user_id = 33 AND f2.user_id = u.id) >>>> FROM users u >>>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id >>>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33 >>>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name; >>>> >>>> 33 is just the id of the active user I am using for testing. The WHERE >>>> clause could be anything. I'm just using "u.name" here but I'm more >>>> concerned about the construction of the result set than the WHERE clause. >>>> These have more or less similar query plans, nothing that would change >>>> things factorially. Is this the best I can do or am I missing the obvious? >>>> >>>> Here are the tables: >>>> >>>> >>>> CREATE TABLE users ( >>>> id BIGINT, >>>> name VARCHAR, >>>> imageURL VARCHAR >>>> created TIMESTAMP DEFAULT CURRENT_TIMESTAMP, >>>> phone_natl BIGINT, /* National Phone Number */ >>>> country_e164 SMALLINT, /* E164 country code */ >>>> email VARCHAR(255), >>>> PRIMARY KEY (id), >>>> UNIQUE (email), >>>> UNIQUE (phone_natl, country_e164) >>>> ); >>>> >>>> >>>> CREATE TABLE friends ( >>>> user_id BIGINT, >>>> friend_id BIGINT, >>>> PRIMARY KEY (user_id, user_id), >>>> FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE, >>>> FOREIGN KEY (friend_id) REFERENCES users(id) ON DELETE CASCADE >>>> ); >>>> CREATE INDEX idx_friends_friend ON friends(friend_id); >>>> >>>> CREATE TABLE friend_requests ( >>>> from_id BIGINT, >>>> to_id BIGINT, >>>> created TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP, >>>> PRIMARY KEY (from_id, user_id), >>>> FOREIGN KEY (from_id) REFERENCES users(id) ON DELETE CASCADE, >>>> FOREIGN KEY (to_id) REFERENCES users(id) ON DELETE CASCADE >>>> ); >>>> CREATE INDEX idx_friend_requests_to ON friend_requests(to_id); >>>> >>>> Let me know if you guys need anything else. >>>> >>>> >>> >> >> Hello Robert, none of your schemas worked for me, here's a clean version >> >> CREATE TABLE users ( >> id BIGINT, >> name VARCHAR, >> imageURL VARCHAR, >> created TIMESTAMP DEFAULT CURRENT_TIMESTAMP, >> phone_natl BIGINT, >> country_e164 SMALLINT, >> email VARCHAR(255), >> PRIMARY KEY (id), >> UNIQUE (email), >> UNIQUE (phone_natl, country_e164) >> ); >> >> >> CREATE TABLE friends ( >> user_id BIGINT, >> friend_id BIGINT, >> PRIMARY KEY (user_id, friend_id), >> FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE, >> FOREIGN KEY (friend_id) REFERENCES users(id) ON DELETE CASCADE >> ); >> CREATE INDEX idx_friends_friend ON friends(friend_id); >> >> CREATE TABLE friend_requests ( >> from_id BIGINT, >> to_id BIGINT, >> created TIMESTAMP DEFAULT CURRENT_TIMESTAMP, >> PRIMARY KEY (from_id, to_id), >> FOREIGN KEY (from_id) REFERENCES users(id) ON DELETE CASCADE, >> FOREIGN KEY (to_id) REFERENCES users(id) ON DELETE CASCADE >> ); >> CREATE INDEX idx_friend_requests_to ON friend_requests(to_id); >> >> I may look into the query later but here's some thoughts >> >> * you want an index to speed up name searches on user.name, I suggest >> pg_trgm + ILIKE as a starting point. >> * you really expect more than a billion users? If not (or not in the near >> future) use INT instead. This will save you a significant amount of memory >> and eventually buy you time later. >> * as long as you don't hit the disk for the queries you'll be fine, so >> make sure you have enough memory or use read-slaves with smaller working >> sets. >> >> > Really? I double checked and there isn't any mention about GIST in your first email. As for the Int/BigInt, It's not a big deal (from 5% to 20% memory savings on indexes), but it was worth mentioning. You may try to materialize the author friend list (used in the common-friend count) with a CTE.