On 04/30/2011 09:00 PM, Samuel Gendler wrote:
Some kind of in-memory cache of doc/ad mappings which the ad server
interacts with will serve you in good stead and will be much easier to
scale horizontally than most relational db architectures lend
themselves to...Even something as simple as a pr
On Sat, Apr 30, 2011 at 5:12 PM, Jeff Janes wrote:
>
>
> gist indices are designed to make this type of thing fast, by using
> techniques to rule out most of those comparisons without actually
> performing them. I don't know enough about the
> guts of either your distance function or the gist in
On Sat, Apr 30, 2011 at 3:29 PM, Joel Reymont wrote:
>
> On Apr 30, 2011, at 11:11 PM, Jeff Janes wrote:
>
>> But what exactly are you inserting? The queries you reported below
>> are not the same as the ones you originally described.
>
> I posted the wrong query initially. The only difference is
On Apr 30, 2011, at 11:11 PM, Jeff Janes wrote:
> But what exactly are you inserting? The queries you reported below
> are not the same as the ones you originally described.
I posted the wrong query initially. The only difference is in the table that
holds the probability array.
I'm inserting
On Sat, Apr 30, 2011 at 2:15 PM, Joel Reymont wrote:
>
> On Apr 30, 2011, at 7:24 PM, Kevin Grittner wrote:
>
>> If this is where most of the time is, the next thing is to run it
>> with EXPLAIN ANALYZE, and post the output.
>
> I was absolutely wrong about the calculation taking < 1s, it actually
On Apr 30, 2011, at 7:36 PM, Kevin Grittner wrote:
> It may even be amenable to knnGiST indexing (a new feature coming in
> 9.1), which would let you do your select with an ORDER BY on the
> distance.
I don't think I can wait for 9.1, need to go live in a month, with PostgreSQL
or without.
> P
On Apr 30, 2011, at 7:24 PM, Kevin Grittner wrote:
> If this is where most of the time is, the next thing is to run it
> with EXPLAIN ANALYZE, and post the output.
I was absolutely wrong about the calculation taking < 1s, it actually takes
about 30s for 2 million rows.
Still, the difference be
Joel Reymont wrote:
> I'm calculating distance between probability vectors, e.g. topics
> that a document belongs to and the topics of an ad.
>
> The distance function is already a C function. Topics are
> float8[150].
>
> Distance is calculated against all documents in the database
There's
[rearranging to correct for top-posting]
Joel Reymont wrote:
> Kevin Grittner wrote:
>> Joel Reymont wrote:
>>
>>> We have 2 million documents now and linking an ad to all of them
>>> takes 5 minutes on my top-of-the-line SSD MacBook Pro.
>>
>> How long does it take to run just the SELECT part
I'm calculating distance between probability vectors, e.g. topics that
a document belongs to and the topics of an ad.
The distance function is already a C function. Topics are float8[150].
Distance is calculated against all documents in the database so it's
arable scan.
Sent from my comfortable
If you want to search by geographical coordinates, you could use a gist
index which can optimize that sort of things (like retrieving all rows
which fit in a box).
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.
Calculating distance involves giving an array of 150 float8 to a pgsql
function, then calling a C function 2 million times (at the moment),
giving it two arrays of 150 float8.
Just calculating distance for 2 million rows and extracting the
distance takes less than a second. I think that includes s
Joel Reymont wrote:
> We have 2 million documents now and linking an ad to all of them
> takes 5 minutes on my top-of-the-line SSD MacBook Pro.
How long does it take to run just the SELECT part of the INSERT by
itself?
-Kevin
--
Sent via pgsql-performance mailing list (pgsql-performance@po
13 matches
Mail list logo