Re: [HACKERS] [v9.5] Custom Plan API

Stephen Frost Thu, 08 May 2014 19:14:27 -0700

* Peter Geoghegan (p...@heroku.com) wrote:
> On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote:
> > Umm... I'm now missing the direction towards my goal.
> > What approach is the best way to glue PostgreSQL and PGStrom?
> 
> I haven't really paid any attention to PGStrom. Perhaps it's just that
> I missed it, but I would find it useful if you could direct me towards
> a benchmark or something like that, that demonstrates a representative
> scenario in which the facilities that PGStrom offers are compelling
> compared to traditional strategies already implemented in Postgres and
> other systems.


I agree that some concrete evidence would be really nice.  I
more-or-less took KaiGai's word on it, but having actual benchmarks
would certainly be better.

> If I wanted to make joins faster, personally, I would look at
> opportunities to optimize our existing hash joins to take better
> advantage of modern CPU characteristics.

Yeah, I'm pretty confident we're leaving a fair bit on the table right
there based on my previous investigation into this area.  There were
easily cases which showed a 3x improvement, as I recall (the trade-off
being increased memory usage for a larger, sparser hash table).  Sadly,
there were also cases which ended up being worse and it seemed to be
very sensetive to the size of the hash table which ends up being built
and the size of the on-CPU cache.

> A lot of the research
> suggests that it may be useful to implement techniques that take
> better advantage of available memory bandwidth through techniques like
> prefetching and partitioning, perhaps even (counter-intuitively) at
> the expense of compute bandwidth.

While I agree with this, one of the big things about GPUs is that they
operate in a highly parallel fashion and across a different CPU/Memory
architecture than what we're used to (for starters, everything is much
"closer").  In a traditional memory system, there's a lot of back and
forth to memory, but a single memory dump over to the GPU's memory where
everything is processed in a highly parallel way and then shipped back
wholesale to main memory is at least conceivably faster.

Of course, things will change when we are able to parallelize joins
across multiple CPUs ourselves..  In a way, the PGStrom approach gets to
"cheat" us today, since it can parallelize the work where core can't and
that ends up not being an entirely fair comparison.

        Thanks,

                Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

Reply via email to