A document of 2 weeks ago where they write at least *something*,
not bad from nvidia, knowing they soon have to give lessons to
topcoders :)
It's not really systematic approach though. We want a list of all
instructions with latencies and throughput latency
that belong to it. Also lookup tim
On Sun, Sep 13, 2009 at 10:48:12AM +0200, Vincent Diepeveen wrote:
>
> On Sep 13, 2009, at 10:19 AM, Petr Baudis wrote:
> >Just read the nVidia docs. Shifting has the same cost as addition.
> >
>
> Document number and url?
http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA
On Sep 13, 2009, at 10:19 AM, Petr Baudis wrote:
On Sun, Sep 13, 2009 at 01:02:40AM +0200, Vincent Diepeveen wrote:
On Sep 10, 2009, at 12:55 AM, Michael Williams wrote:
Very interesting stuff. One glimmer of hope is that the memory
situations should improve over time since memory grows bu
On Sun, Sep 13, 2009 at 01:02:40AM +0200, Vincent Diepeveen wrote:
>
> On Sep 10, 2009, at 12:55 AM, Michael Williams wrote:
>
> >Very interesting stuff. One glimmer of hope is that the memory
> >situations should improve over time since memory grows but Go
> >boards stay the same size.
> >
>
>
On Sep 10, 2009, at 12:55 AM, Michael Williams wrote:
Very interesting stuff. One glimmer of hope is that the memory
situations should improve over time since memory grows but Go
boards stay the same size.
Well you first have to figure out how fast or slow shifting is on the
nvidia's
On Sep 9, 2009, at 11:57 PM, Christian Nentwich wrote:
Mark,
let me try to add some more context to answer your questions. When
I say in my conclusion that "it's not worth it", I mean it's not
worth using the GPU to run playout algorithms of the sort that are
in use today. There may be m
Thanks for sharing this Christian,
in my lines comments.
On Sep 9, 2009, at 5:54 PM, Christian Nentwich wrote:
I did quite a bit of testing earlier this year on running playout
algorithms on GPUs. Unfortunately, I am too busy to write up a tech
report on it, but I finally brought myself to
Interesting stuff. I don't have the skills nor the time to make such
experiments myself, but here is a simple idea:
When using a bitmap representation of the board, it is quite possible to find
all eye-like points with a constant number of bit-shifting operations. That
should reduce the number of
Mark,
you are right, I meant to type "more than half the speed" in my mail.
That would be enough to rule it out for me, it's just not worth it.
Your idea on using this for RAVE values may be useful. But I think some
flexible thinking around using the GPU as a resource for tangential
tasks ra
Rene,
you're absolutely right, it's completely fishy! But don't worry, you're
work is not in vain :) I noticed this morning, when I read your mail,
that I had included the 9x9 results in my original mail instead of
19x19! Indeed, for 19x19 the results are even worse. Here's a complete
rundown
Christian,
Would you care to provide some more detail on your implementation for the
playouts? Your results are very impressive. At 19x19 Go using bit-boards,
your implementation is roughly 7x as fast as the bitboard implementation I
presented just a few weeks back, and also outperforms libEgo by a
Very interesting stuff. One glimmer of hope is that the memory situations
should improve over time since memory grows but Go boards stay the same size.
Christian Nentwich wrote:
Mark,
let me try to add some more context to answer your questions. When I say
in my conclusion that "it's not wo
Thank you Christian, for taking the time to write an extensive reply.
I still don't understand how you come to conclude that the CPU, at 94K
playouts with two cores, is twice as fast as the GPU doing 170K
playouts per second. Sounds like the reverse to me. Or you meant to
say "more than half the s
Mark,
let me try to add some more context to answer your questions. When I say
in my conclusion that "it's not worth it", I mean it's not worth using
the GPU to run playout algorithms of the sort that are in use today.
There may be many other algorithms that form part of Go engines where
the
I'm trying to understand your conclusion. The GPU is more than 3 times
faster than the CPU, yet you don't think it's worth it. You also say
the card has only 9% occupancy.
I know next to nothing about GPU programming, so take my questions in
that stride.
>>Optimal speed was at 80 threads per bloc
nctions between
those few board moves by offloading the heavy lifting.
s.
2009/9/9:
Interesting stuff. Thanks for reporting your results.
- Dave Hillis
-Original Message-
From: Christian Nentwich
To: computer-go
Sent: Wed, Sep 9, 2009 11:54 am
Subject: [computer-go] CUDA and GPU P
heavy lifting.
s.
2009/9/9 :
> Interesting stuff. Thanks for reporting your results.
>
> - Dave Hillis
>
>
> -Original Message-
> From: Christian Nentwich
> To: computer-go
> Sent: Wed, Sep 9, 2009 11:54 am
> Subject: [computer-go] CUDA and GPU Performance
&g
Interesting stuff. Thanks for reporting your results.
- Dave Hillis
-Original Message-
From: Christian Nentwich
To: computer-go
Sent: Wed, Sep 9, 2009 11:54 am
Subject: [computer-go] CUDA and GPU Performance
I did quite a bit of testing earlier this year on running playout
I did quite a bit of testing earlier this year on running playout
algorithms on GPUs. Unfortunately, I am too busy to write up a tech
report on it, but I finally brought myself to take the time to write
this e-mail at least. See bottom for conclusions.
For performance testing, I used my CPU
19 matches
Mail list logo