Heikki, This is very similar to what AnchorMan on CGOS does. At the end of each random simulation I keep the same statistcs on each point of the board and I use it to improve the move selection algorithm. I call this special board an "ownership map" for obvious reasons. I just divide each value by the number of simulations run.
It's not clear from your email what you do with this information however. AnchorMan uses the ownership map to give the final move choice a little bias. In my testing the use of an ownership map improved the program significantly but it was tricky getting it right. For reference, AnchorMan does 5,000 simulations per move and is rated exactly 1500.0. AnchorMan is basic MC too, but there are some things about it that do not match the experiences of others: 1. Others have tried to use my idea to bias moves and report no improvement. I think this is because it's easy to get this wrong. I did weeks of experimentation and found that if a bias was too strong it actually weakend the program. The secret if you are using this to bias the choices is to be very conservative and experiment a lot. I bias edge moves, self-atari, captures and moves into heavily owned territory, but these biases are carefully applied based on the ownership map. For example I do not bias ALL self-atari moves, just ones that cross a certain threshold of enemy ownership. The bias is not a veto, it's a gentle encouragement or discouragement to making certain moves under certain situations. 2. I use a hybrid form of all-moves-as-first and others have reported no improvement. The behavior I get is that it plays much stronger at low simulations and in extensive testing I could not find a high enough level for the version that does not do all-as-first to make it play as strongly. So for me there is no reason whatsoever to stop using all-as-first, even at ridiculous levels. Intiutively, all-as-first should play weaker if you are doing enough simulations - but the combination of tricks I'm using somehow negates this. What I do should be called some-as-first because I stop tallying moves at about 5/8 of the simulation length. AnchorMan even beats Lazarus which is about 1800 strength at fast levels. For instance Lazarus cannot even begin to compete with AnchorMan if it's playing as fast as AnchorMan on CGOS. Unfortunately, this type of algorithms is not scalable, AnchorMan will never get much better (I think it gets close to 1600 if it's given something like 200,000 simulations.) There are some interesting enhancements that can push this a little farther without getting into the more complex UCT stuff. I can't remember who did it, but someone produced something over 1600 doing basic monte carlo but simulating a 2 ply search - I can't quite remember the exact details but it was real simple. It might have been Lukasz Lew, I'm not sure. (I think they kept a 2 dimensional array of moves and their responses, something like that.) The details are in my email somewhere. I have an older program (which I cannot find) that was quite interesting, it evolved a playing strategy using PBIL, similar to some of the original papers on using Monte Carlo with GO. CGOS wasn't around then, but I think that program was pretty strong, at least as strong as AnchorMan, maybe stronger. - Don On Tue, 2007-02-27 at 21:38 +0100, Heikki Levanto wrote: > Hi, > > I thought I'd report a small change I have made to the plain MC > algorithm. I have been unhappy with the fact that the result of the game > gets pressed into one bit (win/loose), and all other information is > discarded. This leads to silly endgame moves, since the algorithm sees > no difference between a 1-point win and a 100-point win. > > After thinking a bit about this, I decided to keep the whole board image > at the end. In other words, I have one board-sized array of ints, and at > the end, when scoring the result of a simulation, I add +1 to every > point that belongs to black, and -1 to every point that belongs to > white. Finally I divide this by the number of simulations I have done. > > When evaluating the simulation, I sum up the points on this board. A > simple summing didn't work so well, I thought squaring the numbers > before summing would be better, but in fact it was worse. Now I end up > with the following: > sum += sign(v) * pow(v, exp); > where exp is something like 0.5 > I did run one player with exp=0.8, and it seemed to perform about > equally well. > > As a side benefit, I get to know which points clearly belong to black or > white, and can skip unnecessary endgame moves easily (filling own > territory to one-point eyes, making silly invasions to enemy territory, > etc). I believe this could be used for much more clever pruning, but I > have not (yet?) done so. > > The best of these experiments, Halgo-1.5-500k reached almost 1400 elo on > cgos, which I think is pretty good for plain MC without any trickery. > Here my notes of the versions I had there: > > Halgo1 - 1250 - the all first one > Halgo-1.0-500k - 477 - Traditional MC. Something wrong with it? > Halgo-1.1-500k - 1280 - Simple MC with area evaluation > Halgo-1.2-500k - 1053 - Same as 1.1, but with a non-linear scoring function > (s*v*v) > Halgo-1.3-500k - 1400 - Same as 1.1, but with pow(v,0.8) as the summing > function. > Halgo-1.4-500k - 492 - Same as 1.1, but with plain sign(v) as the summing > function. > Halgo-1.5-500k - 1390 - Going for the pow function again, with 0.5. About > same as 1.3 > > I am not sure if there was something wrong with halgo-1.0, it was > supposed to play 'traditional' MC, summing just the winner of the games. > > My code is heavily based on the Efficient Go Library by Lukas Lew. > Actually, my own code is only some 360 lines, warts and all. I have put > a tarball with Lew's library and my code at > http://www.lsd.dk/heikki/halgo-1.5.tgz > if you want to look at it. Being derived from a GPL work, it is under > GPL itself too, of course. > > The 500k refers to the number of simulated random games from the given > position, equally divided among all legal moves that don't fill > one-point eyes. 500k sounds like a large number but the games typically > use less than 5 minutes of their clock, so are quite ok to play on cgos. > This is on my dual-core AMD desktop (using only one core). > > I still suspect that 500k is nowhere enough to get reliabe results, at > least from the opening position, as the program wants to start at all > sort of places. It has some preference to the centerpoint (on 9x9), and > some tendency to avoid the first and second line moves, but nowhere near > enough to be predictable. > > I even played a game on 19x19, and although the opening was kind of > surreal, it shaped up quite well. I overlooked an atari, and it took > advantage of that immediately, saving a group of maybe 20 stones. In the > end I won clearly, but Halgo had three separate live groups on the > board. It took 2.5 hours, and I was glad I had eliminated the silly > endgame moves. > > > In case you are interested in numbers, here are some: > > Cpuinfo says > vendor_id : AuthenticAMD > cpu family : 15 > model : 43 > model name : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ > stepping : 1 > cpu MHz : 1000.000 > cache size : 512 KB > except that MHz will be 2000 when there is load on the machine, as there > is when playing. The system is Linux with Debian/testing. > > GCC says: > gcc -v > Using built-in specs. > Target: i486-linux-gnu > Configured with: ../src/configure -v > --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr > --enable-shared --with-system-zlib --libexecdir=/usr/lib > --without-included-gettext --enable-threads=posix --enable-nls > --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu > --enable-libstdcxx-debug --enable-mpfr --with-tune=i686 > --enable-checking=release i486-linux-gnu > Thread model: posix > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) > > > > Next I guess I will have a look at UCT, it seems to be the hottest topic > at the moment. > > - Heikki > > > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/