Fuming, Far from an expert, I did find two papers that deal topics similar to what you are working on. One I unfortunately do not have access to (but would be interested in reading), the other I do not have an official reference for (perhaps the author can elaborate).
http://www.computer.org/portal/web/csdl/doi/10.1109/ReConFig.2009.75 Describes with an FPGA go-algorithm implementation http://www.gggo.jp/publications/gpw08-private.pdf Describes a simulation-server based parallel tree-search implementation Perhaps those two give you a hint on where to publish your paper. A third option is the special issue for an IEEE journal in preparation (although the submission deadline seems to have passed, see http://ieee-cis.org/_files/SpecialIssueMonteCarloAndGoFinal.pdf). René On Thu, Jun 17, 2010 at 7:13 AM, Fuming Wang <[email protected]> wrote: > Hi Rene, > > You guesses about our FPGA implementation are quite to the point. The 167 > games are moving through the 167 pipelined stages of one module instead of > 167 modules. > > As this material is a cross between digital circuit design and computer > gaming, not quite sure which refereed journal is most suitable for this > material. Do you or any readers of this list has any suggestions? > > Thanks, > Fuming > > > > On Thu, Jun 17, 2010 at 8:09 AM, René van de Veerdonk < > [email protected]> wrote: > >> Hi Fuming, >> >> Thanks for your answer, it makes much more sense to me now. >> >> We are using pipelining in different ways. When I referred to it for a >> CPU-based single-threaded application, I was thinking about speculative >> execution. If I understand it correctly, that does not exist in FPGA's, as >> these are advertised as deterministic in their execution and process flow. >> In the FPGA case, I imagine that pipelining refers to "unrolling the >> program", and having different boards physically move across the chip from >> module to module, as if they are on a production line, all in various states >> of simulation (board #...@module #101: black to move; board #12@ module >> #100: white to move; etc.). >> >> How you have designed your program in detail would be an interesting read, >> there are a lot of high-level design trade-offs that you must have dealt >> with. These will be very different from how you would do it for a CPU-based >> program. One difference that I imagine, for instance, is the length of the >> simulation. A CPU-based program stops when the game ends (or you exceed some >> limit, or you force an early decision, or ...), whereas for FPGA you may end >> up with a fixed game-length (ready or not, i.e., no early out option) and >> you may have to simulate pass moves until you reach the end of the >> "production line" in case the game ended early (is this what you do?). In >> any case, your impressive numbers suggest that this can be done very >> efficiently. How you harness all this raw simulation power in a tree-search >> is yet another research topic that is very interesting and almost >> orthogonal. Do you think your approach could be mapped to a GPU as well? In >> any case, I hope you will make a pre-print available to this list when the >> time is there. >> >> In another response in this thread, you mention that you are simulating >> 167 board in parallel. Does that mean that you unrolled your program for 167 >> moves, moving a board between 167 separate modules every "cycle" and >> seed/harvest one complete board per "cycle"? Or do you have multiple >> (shorter) production lines in parallel? Or something else entirely? >> >> As you may have noticed, I am looking forward to your paper, >> >> René >> >> On Tue, Jun 15, 2010 at 7:03 PM, Fuming Wang <[email protected]> wrote: >> >>> Hi Rene, >>> >>> Our design is fully pipelined, so we are able to simulate multiple games >>> simultaneously. The way way in which simulations are run in FPGA and in CPU >>> is quite different, so direct comparison is not easy. If we want to simulate >>> just one game, FPGA implementation is not 10x faster, however, if we want >>> thousands of games simulated for a single board position, than FPGA is 10x >>> faster. So, we are getting 1500k GAMES/sec, but only in the second sense. >>> The clock rate of our FPGA board is only 125 MHz, so with better board/chip, >>> we will still have 10-100 times improvement over the current result. >>> >>> best, >>> Fuming >>> >>> >>> On Wed, Jun 16, 2010 at 1:28 AM, René van de Veerdonk < >>> [email protected]> wrote: >>> >>>> Fuming, >>>> >>>> Could you please explain your approach a little bit? From the numbers >>>> you quote, this sounds extreme positive, but I have a hard time >>>> understanding how you achieve them. Taking 100k playouts/sec for 9x9 on my >>>> 2.4 GHz labtop for my single-threaded bitmap based light-playout >>>> implementation as an example, with 110 moves/playout, this results in a >>>> little less than 240 clock-cycle/move. When I quickly looked up the Cyclone >>>> III specification, I saw that the clock-speed for this FPGA tops out around >>>> 240 MHz, yet you achieve 15x the throughput, i.e., you are 150x more >>>> efficient. This means 1.8 clock-cycle/move. Without being able to make use >>>> of pipe-lining inside the CPU (someone measured ~2 assembly >>>> instructions/clock-cycle for my bitmap approach), this leads me to >>>> questions. First, are you running a single threaded application, or playing >>>> on multiple boards at once? Second, are you just replaying moves, or also >>>> generating them on the fly (about half of the time is spend there in my >>>> implementation, more if you include updating the data-structure to make >>>> that >>>> possible)? Third, are we using the same definitions? >>>> >>>> For instance, I would find it much more comprehensible to believe that >>>> you achieved to do 1500k moves/second instead of 1500k playouts/sec (with >>>> each playout being ~110 moves). 200 clock-cycles/move sounds do-able if you >>>> can avoid branching, memory lookups, or miscellaneous calculations by >>>> creating fine-level parallelism in your FPGA-code and specializing >>>> functions >>>> on a per grid-point basis. In a CPU-based application, this results in >>>> code-bloat that will become counter-productive at some stage, may not be >>>> feasible in all instances, and is more difficult to maintain. For an >>>> FPGA-based application, however, this sounds entirely possible (not knowing >>>> anything about FPGA's). >>>> >>>> Thanks, >>>> >>>> René van de Veerdonk >>>> >>>> >>>> On Sat, Jun 12, 2010 at 10:37 AM, Fuming Wang <[email protected]>wrote: >>>> >>>>> >>>>> Cyclone III >>>>> 120,000 logical elements >>>>> cycle time is linear to the number of moves to finish a game, which is >>>>> approximately linear to the square of the board size. >>>>> >>>>> Fuming >>>>> >>>>> >>>>>> - What FPGA? Virtex-6? Spartan-6? >>>>>> - What size is the core in LUT's? >>>>>> - Is your cycle time linear in the board size or in the number of >>>>>> squares (i.e. quadratic in board size)? Or something else? >>>>>> >>>>>> -- >>>>>> GCP >>>>>> _______________________________________________ >>>>>> Computer-go mailing list >>>>>> [email protected] >>>>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >>>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Computer-go mailing list >>>>> [email protected] >>>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Computer-go mailing list >>>> [email protected] >>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >>>> >>> >>> >>> _______________________________________________ >>> Computer-go mailing list >>> [email protected] >>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >>> >> >> >> _______________________________________________ >> Computer-go mailing list >> [email protected] >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >> > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
