On 29.4.2015 18:54, Jonathan Vanasco wrote:
> 
> On Apr 29, 2015, at 12:25 PM, Ladislav Lenart wrote:
> 
>> Could you please explain to me the error(s) in my reasoning?
> 
> Let me just flip your list in reverse... and add in some elements (marked 
> with a *):
> 
> posting  ts  context
> p60      60  friend
> p55      55 friend*
> p54      54 friend*
> p50      50  group
> p50      49  group*
> p50      49  group*
> p40      40  friend
> p30      30  group
> p20      20  friend
> p10      10  group
> p0        0  friend
> 
> With the 2 limited subqueries, the results would be:
>       60F, 55F, 50G, 49G
> 
> But the "most recent" data is
>       50F, 55F, 54F, 50G
> 
> So we end up showing 49 which is less relevant than 54.

I would expect the overall query to return only 60F nad 55F as the most recent
data. No? You expect it to return 4 items when the LIMIT is only 2. Remember
that the overall query should be also ordered by ts and limited to 2.

I thought you want most recent items across all contexts and not 2 most recent
items from friends plus two most recent items from groups...


Ladislav Lenart


> In some situations this isn't much of an issue, but in others it is 
> detrimental. 
> For example, one of my "feeds" contains a distribution of events 
> according-to-type that is very uneven.  While "friend" and "group" might be 
> relatively close in time to one another, "system" or other events may be 
> months old -- and that older content gets pulled in with this style of query. 
>  
> 
> If you need to paginate the data and select the next 10 overall items, it 
> gets even more complicated.
> 
> IIRC, the best mix of performance and "product" that I've found is do 
> something like this:
> 
>       SELECT * FROM (
>               SELECT a,b,c FROM table_a ORDER BY TIMESTAMP DESC LIMIT 10000;
>               UNION
>               SELECT a,b,c FROM table_b ORDER BY TIMESTAMP DESC LIMIT 10000;
>       ) as unioned
>       order by unioned TIMESTAMP DESC LIMIT 100 OFFSET 0; 
> 
> by creating an artificial limit on the inner queries, you can save postgres 
> from doing a lot of intensive i/o work and memory usage (like a lot)
> then, joining a few lists and sorting 20k (or even 100k) items is really 
> cheap.
> the downside is that you effectively limit the 'relevancy' of the query to 
> whatever the inner limit is (ie, 10000 -- not the combined total of 20000), 
> but that number can be arbitrarily high enough that it is irrelevant while 
> still showing the right amount of content for people.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to