[SQL] select within aggregate?

2005-05-06 Thread Vortex
Hi!!

Of course the following is possible in various ways but i wonder
if there is an elegant and simple solution:

A part from apache log:
remote_hostvarchar(50)
request_urivarchar(50)
ts timestamp with time zone

Now i want to see which host has done how many requests. This
is not difficult:

SELECT remote_host, count(*) FROM apache_log GROUP BY remote_host;

But now i would like to add a column to the result which
shows the most recent requested uri for each group.
This sounds quite easy at first glance but i see no simple solution.

I suppose with max(ts) i could acquire the time of the
most recent request within the group, but how do i get
the corresponding request_uri without doing subquerys or
something like that?

Thank you very much!

Klaus

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] select within aggregate?

2005-05-06 Thread Vortex
On Fri, 6 May 2005 13:34:26 +0530
"Ramakrishnan Muralidharan" <[EMAIL PROTECTED]> wrote:

> SELECT abc.remote_host ,  c , abc.request_uri , a.t FROM abc , ( select 
> remote_host , count(*) as c , max( ts ) as t from abc group by remote_host ) 
> as a 
> where a.remote_host = abc.remote_host and abc.ts = a.t

I thought about this too. But what happens if there are accidentally two
sets with same timestamp in the table? I suppose the outer select would
match both of them. But a DISTINCT should help to avoid this.
Ok, if you think this is the propper way to meet the
requirement i will do so.

Thank you for your answer!

Klaus


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] select within aggregate?

2005-05-07 Thread Vortex
On Fri, 6 May 2005 07:49:23 -0500
Bruno Wolff III <[EMAIL PROTECTED]> wrote:

> You could use max(ts) as long as you can have two requests occur at the
> same time. This might be a reasonable assumption, but it might have been
> better to include a sequence so that each record would be guaranteed to
> have a unique key.
Yes, of course. So far it was more a theoretical question.

> Using DISTINCT ON to get the latest URIs is probably better. You can
> join that query to the one doing the count. This will probably be more
> efficient than executing a subquery for each remote host.
The DISTINCT ON construction looks quite interesting. It seems to allow
some sorting within an group (even althought it's no real aggregate) in
general. I can't say if it's really more eficient than the solution suggested
by R.Muralidharan. But i will do some tests!

Thank you very much!

Klaus

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings