* Peter Geoghegan (p...@heroku.com) wrote: > On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote: > > Umm... I'm now missing the direction towards my goal. > > What approach is the best way to glue PostgreSQL and PGStrom? > > I haven't really paid any attention to PGStrom. Perhaps it's just that > I missed it, but I would find it useful if you could direct me towards > a benchmark or something like that, that demonstrates a representative > scenario in which the facilities that PGStrom offers are compelling > compared to traditional strategies already implemented in Postgres and > other systems.
I agree that some concrete evidence would be really nice. I more-or-less took KaiGai's word on it, but having actual benchmarks would certainly be better. > If I wanted to make joins faster, personally, I would look at > opportunities to optimize our existing hash joins to take better > advantage of modern CPU characteristics. Yeah, I'm pretty confident we're leaving a fair bit on the table right there based on my previous investigation into this area. There were easily cases which showed a 3x improvement, as I recall (the trade-off being increased memory usage for a larger, sparser hash table). Sadly, there were also cases which ended up being worse and it seemed to be very sensetive to the size of the hash table which ends up being built and the size of the on-CPU cache. > A lot of the research > suggests that it may be useful to implement techniques that take > better advantage of available memory bandwidth through techniques like > prefetching and partitioning, perhaps even (counter-intuitively) at > the expense of compute bandwidth. While I agree with this, one of the big things about GPUs is that they operate in a highly parallel fashion and across a different CPU/Memory architecture than what we're used to (for starters, everything is much "closer"). In a traditional memory system, there's a lot of back and forth to memory, but a single memory dump over to the GPU's memory where everything is processed in a highly parallel way and then shipped back wholesale to main memory is at least conceivably faster. Of course, things will change when we are able to parallelize joins across multiple CPUs ourselves.. In a way, the PGStrom approach gets to "cheat" us today, since it can parallelize the work where core can't and that ends up not being an entirely fair comparison. Thanks, Stephen
signature.asc
Description: Digital signature