Thanks Arthur. I don't think there is as big a different between BIGINT and INTEGER as you think there is. In fact with an extended filesystem you might not see any difference at all.
As I put in the first emal I am using a GIST index on user.name. I was really more interested in the LEFT OUTER JOINs vs EXISTS queries and if there was a better alternative I had not considered. On Tue, Dec 9, 2014 at 11:44 AM, Arthur Silva <arthur...@gmail.com> wrote: > On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco <robert.difa...@gmail.com> > wrote: > >> I'm sorry, I missed a JOIN on the second variation. It is: >> >> SELECT u.id, u.name, u.imageURL, u.bio, >> CASE >> WHEN f.friend_id IS NOT NULL THEN 'isFriend' >> WHEN s.to_id IS NOT NULL THEN 'hasSentRequest' >> WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest' >> ELSE 'none' >> END AS 'friendStatus', >> (SELECT COUNT(1) AS d >> FROM friends f1 >> JOIN friends f2 ON f1.fiend_id = f2.friend_id >> WHERE f1.user_id = 33 AND f2.user_id = u.id) >> FROM users u >> *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id >> <http://u.id>* >> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id >> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33 >> WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name; >> >> >> On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco <robert.difa...@gmail.com >> > wrote: >> >>> I have users, friends, and friend_requests. I need a query that >>> essentially returns a summary containing: >>> >>> * user (name, imageURL, bio, ...) >>> * Friend status (relative to an active user) >>> * Is the user a friend of the active user? >>> * Has the user sent a friend request to the active user? >>> * Has the user received a friend request from the active user? >>> * # of mutualFriends >>> * Exclude the active user from the result set. >>> >>> So I have mocked this up two ways but both have complicated query plans >>> that will be problematic with large data sets. I'm thinking that my lack of >>> deep SQL knowledge is making me miss the obvious choice. >>> >>> Here's my two query examples: >>> >>> SELECT u.id, u.name, u.imageURL, u.bio, >>> CASE >>> WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND >>> f.friend_id = u.id) THEN 'isFriend' >>> WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33 >>> AND s.from_id = u.id) THEN 'hasSentRequest' >>> WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id >>> AND r.from_id = 33) THEN 'hasReceivedRequest' >>> ELSE 'none' >>> END AS "friendStatus", >>> (SELECT COUNT(1) >>> FROM friends f1 >>> JOIN friends f2 ON f1.friend_id = f2.friend_id >>> WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends >>> FROM users u >>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name; >>> >>> SELECT u.id, u.name, u.imageURL, u.bio, >>> CASE >>> WHEN f.friend_id IS NOT NULL THEN 'isFriend' >>> WHEN s.to_id IS NOT NULL THEN 'hasSentRequest' >>> WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest' >>> ELSE 'none' >>> END AS 'friendStatus', >>> (SELECT COUNT(1) AS d >>> FROM friends f1 >>> JOIN friends f2 ON f1.fiend_id = f2.friend_id >>> WHERE f1.user_id = 33 AND f2.user_id = u.id) >>> FROM users u >>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id >>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33 >>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name; >>> >>> 33 is just the id of the active user I am using for testing. The WHERE >>> clause could be anything. I'm just using "u.name" here but I'm more >>> concerned about the construction of the result set than the WHERE clause. >>> These have more or less similar query plans, nothing that would change >>> things factorially. Is this the best I can do or am I missing the obvious? >>> >>> Here are the tables: >>> >>> >>> CREATE TABLE users ( >>> id BIGINT, >>> name VARCHAR, >>> imageURL VARCHAR >>> created TIMESTAMP DEFAULT CURRENT_TIMESTAMP, >>> phone_natl BIGINT, /* National Phone Number */ >>> country_e164 SMALLINT, /* E164 country code */ >>> email VARCHAR(255), >>> PRIMARY KEY (id), >>> UNIQUE (email), >>> UNIQUE (phone_natl, country_e164) >>> ); >>> >>> >>> CREATE TABLE friends ( >>> user_id BIGINT, >>> friend_id BIGINT, >>> PRIMARY KEY (user_id, user_id), >>> FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE, >>> FOREIGN KEY (friend_id) REFERENCES users(id) ON DELETE CASCADE >>> ); >>> CREATE INDEX idx_friends_friend ON friends(friend_id); >>> >>> CREATE TABLE friend_requests ( >>> from_id BIGINT, >>> to_id BIGINT, >>> created TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP, >>> PRIMARY KEY (from_id, user_id), >>> FOREIGN KEY (from_id) REFERENCES users(id) ON DELETE CASCADE, >>> FOREIGN KEY (to_id) REFERENCES users(id) ON DELETE CASCADE >>> ); >>> CREATE INDEX idx_friend_requests_to ON friend_requests(to_id); >>> >>> Let me know if you guys need anything else. >>> >>> >> > > Hello Robert, none of your schemas worked for me, here's a clean version > > CREATE TABLE users ( > id BIGINT, > name VARCHAR, > imageURL VARCHAR, > created TIMESTAMP DEFAULT CURRENT_TIMESTAMP, > phone_natl BIGINT, > country_e164 SMALLINT, > email VARCHAR(255), > PRIMARY KEY (id), > UNIQUE (email), > UNIQUE (phone_natl, country_e164) > ); > > > CREATE TABLE friends ( > user_id BIGINT, > friend_id BIGINT, > PRIMARY KEY (user_id, friend_id), > FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE, > FOREIGN KEY (friend_id) REFERENCES users(id) ON DELETE CASCADE > ); > CREATE INDEX idx_friends_friend ON friends(friend_id); > > CREATE TABLE friend_requests ( > from_id BIGINT, > to_id BIGINT, > created TIMESTAMP DEFAULT CURRENT_TIMESTAMP, > PRIMARY KEY (from_id, to_id), > FOREIGN KEY (from_id) REFERENCES users(id) ON DELETE CASCADE, > FOREIGN KEY (to_id) REFERENCES users(id) ON DELETE CASCADE > ); > CREATE INDEX idx_friend_requests_to ON friend_requests(to_id); > > I may look into the query later but here's some thoughts > > * you want an index to speed up name searches on user.name, I suggest > pg_trgm + ILIKE as a starting point. > * you really expect more than a billion users? If not (or not in the near > future) use INT instead. This will save you a significant amount of memory > and eventually buy you time later. > * as long as you don't hit the disk for the queries you'll be fine, so > make sure you have enough memory or use read-slaves with smaller working > sets. > >