HACKERS: see the end of this message about a possible optimisation for ORDER BY+LIMIT cases (the normal use of LIMIT?)
Adam wrote: > > I help run a job database and have a table of search records. I want > a query that will return the top 10 jobs by search frequency. I'm > familiar with ORDER BY and LIMIT, so I basically need this: > > Given a table search_records: > job_num > ------- > 1 > 2 > 2 > 3 > 4 > 4 > 4 > > I want a query that will return: > job_num | count > --------+------ > 1 |1 > 2 |2 > 3 |1 > 4 |3 > > I tried > > select distinct job_num, (select count(*) from search_records j where > j.job_num=k.job_num) from search_records k > > but it is horribly slow (it takes several minutes on a table of about > 25k rows!). I assume it scans the entire table for every job_num in > order to count the number of occurences of that job_num, taking order > n^2 time. Since I can easily use job_num as an index (being integers > from 0 to roughly 400 so far) I could just do a "select * from > search_records" and do the counting in PHP (our HTML pre-processor) in > order n time. However, I don't know how to do an order n*log(n) sort > in PHP, just n^2, so there would still be an efficiency problem. > I have Postgresql 7.0.3. > Help is of course greatly appreciated. I have not tried it but how about:- select job_num from (select job_num, count(*) as c from search_records group by job_num) order by c limit 10; I am not sure if count(*) would work in this context, if not try count() on some field that is in every record. If you can be sure that the top 10 will have at least a certain threshold of searches (perhaps >1!) then it MIGHT be faster, due to less data being sorted for the outer selects order by, (experiment) to do:- select job_num from (select job_num, count(*) as c from search_records group by job_num HAVING c>1) order by c limit 10; It would depend on how efficient the ORDER BY and LIMIT work together. (The ORDER BY could build a list of LIMIT n items and just replace items in that list...a lot more efficient both of memory and comparisons than building the full list and then keeping the top n) HACKERS: If it does not do this it might be a usefull optimisation. There would probably need to be a cutoff limit on whether to apply this method or sort and keep n. Also for LIMIT plus OFFSET it would need to build a list of the the total of the LIMIT and OFFSET figures. -- This is the identity that I use for NewsGroups. Email to this will just sit there. If you wish to email me replace the domain with knightpiesold . co . uk (no spaces). ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly