Re: [GENERAL] histogram

2011-04-30 Thread Joel Reymont
I think this should do what I want select trunc(distance * 10.)/10., count(*) from doc_ads group by 1 order by 1 Thanks, Joel -- - for hire: mac osx device driver ninja, kernel extensions and usb d

Re: [GENERAL] histogram

2011-04-30 Thread Joel Reymont
What is the meaning of group by 1 order by 2 e.g. what to the numbers 1 and 2 stand for? What would change if I do the following? group by 1 order by 1 On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote: > Hi, > > try something like this: > > select >trunc(random(

Re: [GENERAL] histogram

2011-04-30 Thread Joel Reymont
Thank you Thomas! Is there a way for the code below to determine the number of rows in the table and use it? Thanks, Joel On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote: > Hi, > > try something like this: > > select >trunc(random() * 10.)/10. >, count(*) > from >generat

[GENERAL] histogram

2011-04-30 Thread Joel Reymont
I have a column of 2 million float values from 0 to 1. I would like to figure out how many values fit into buckets spaced by 0.10, e.g. from 0 to 0.10, from 0.10 to 0.20, etc. What is the best way to do this? Thanks, Joel

Re: [GENERAL] optimizing a cpu-heavy query

2011-04-27 Thread Joel Reymont
Tom, On Apr 26, 2011, at 5:00 PM, Tom Lane wrote: > For another couple orders of magnitude, convert the sub-function to C code. > (I don't think you need > a whole data type, just a function that does the scalar product.) That's a 30x speedup, from 12 minutes down to 38s. Thanks Tom!

Re: [GENERAL] tuning on ec2

2011-04-26 Thread Joel Reymont
On Apr 26, 2011, at 4:31 PM, Scott Marlowe wrote: > It's a reasonable start. However, if you consistently using less than > that in aggregate then lowering it is fine. Is there a way to tell if I consistently use less than that in aggregate? > What's your work_mem and max_connections set to?

[GENERAL] tuning on ec2

2011-04-26 Thread Joel Reymont
I'm running pgsql on an m1.large EC2 instance with 7.5gb available memory. The free command shows 7gb of free+cached. My understand from the docs is that I should dedicate 1.75gb to shared_buffers (25%) and set effective_cache_size to 7gb. Is this correct? I'm running 64-bit Ubuntu 10.10, e.g

[GENERAL] optimizing a cpu-heavy query

2011-04-26 Thread Joel Reymont
Folks, I'm trying to optimize the following query that performs KL Divergence [1]. As you can see the distance function operates on vectors of 150 floats. The query takes 12 minutes to run on an idle (apart from pgsql) EC2 m1 large instance with 2 million documents in the docs table. The CPU i