Re: [computer-go] Super-duper computer

2008-09-09 Thread Claus Reinke

This beast goes online in 2011.  Better start lobbying now for some Mogo
time.


By coincidence I was looking at the Top 500 list yesterday and the top
machine already does petaflop (peak) performance [1]. I wonder how many
playouts/second Mogo would do on that :-).


If you're looking for spare processors, how about a "[EMAIL PROTECTED]"
program for Go?-) 

Local versions of the top programs could offer to connect to their 
main incarnation's games, explaining internal state ("it is sure it will 
win", "it thinks that group is dead", ..) in exchange for borrowing 
processing resources. Or, instead of doing this on a per-program 
basis, there could be a standard protocol for donating processing 
power from machines whose users view a game online.


That way, the more kibitzes a game attracts, the better the computer
player plays; and if the game cannot hold an audience, the computer
player might start to seem distracted, losing all those borrowed
processors;-)

Mogo might even find some related research at INRIA ([EMAIL PROTECTED]
style (desktop) grid computing, ..), so perhaps there's scope for
collaboration there?

Claus

Q: why do you search for extra-terrestrial intelligence?
A: we've exhausted the local search space.



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Super-duper computer

2008-09-28 Thread Claus Reinke
> If you're looking for spare processors, how about a "[EMAIL PROTECTED]"
> program for Go?-)

It appears that the Chess community has had such a project already:

ChessBrain: a Linux-Based Distributed Computing Experiment
http://www.linuxjournal.com/article/6929

ChessBrain II - A Hierarchical Infrastructure for Distributed
Inhomogeneous Speed-Critical Computation
IEEE CIG06, Reno NV, May 2006 (6 pages)
(IEEE Symposium on Computational Intelligence and Games)
http://chessbrain.net/docs/chessbrainII.pdf

old project site:
http://chessbrain.net/

>From the chessbrainII paper, it seems they considered Go, but
before the recent developments that made parallel processing
promising. The papers might also be interesting for their discussion
of parallel tree search and communication issues.

Claus

> Local versions of the top programs could offer to connect to their main 
> incarnation's games, 
> explaining internal state ("it is sure it will win", "it thinks that group is 
> dead", ..) in 
> exchange for borrowing processing resources. Or, instead of doing this on a 
> per-program basis, 
> there could be a standard protocol for donating processing power from 
> machines whose users view a 
> game online.
>
> That way, the more kibitzes a game attracts, the better the computer
> player plays; and if the game cannot hold an audience, the computer
> player might start to seem distracted, losing all those borrowed
> processors;-)
>
> Mogo might even find some related research at INRIA ([EMAIL PROTECTED]
> style (desktop) grid computing, ..), so perhaps there's scope for
> collaboration there?
>
> Claus
>
> Q: why do you search for extra-terrestrial intelligence?
> A: we've exhausted the local search space. 



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Using playouts for more than position evaluation?

2008-09-28 Thread Claus Reinke
>From browsing Monte-Carlo Go papers (*), I get the impression that random
playouts are used mainly to approximate an evaluation function, determining
some value for board positions arising in more traditional tree search.

Is that correct? It seems somewhat wasteful to calculate all those possible
board positions and only take a single value before throwing them away.
Have there been any attempts to extract other information from the playouts?

For instance, if an intersection belongs to the same colour in all playouts,
chances are that it is fairly secure (that doesn't mean one shouldn't play
there, sacrifices there may have an impact on other intersections).

Or, if an intersection is black in all playouts won by black, and white in
all playouts won by white, chances are that it is fairly important to play
there (since playouts are random, there is no guarantee, but emphasizing
such intersections, and their ordering, in the top-level tree search seems
profitable).

Secondly, I have been surprised to see Go knowledge being applied to the
random playouts - doesn't that run the danger of blinding the evaluation
function to border cases? It would seem much safer to me to keep the
random playouts unbiased, but to extract information from them to guide
the top-level tree search. Even the playout termination criterion (not filling
eyes) has to be defined fairly carefully (and there have been variations),
to avoid blinding playouts against sacrifices.

Since most Go knowledge isn't absolute, but comes with caveats, it would
seem that any attempt to encode Go knowledge in the playouts is risky
(mind you, I'm a weak player, so I might be wrong;-). For instance, a
bamboo joint connects two strings, unless (insert various exceptions here),
so if you encode a bamboo joint as a firm connection, your playouts include
a systematic error. Shouldn't the same hold for nearly all Go "rules"?

Thirdly, I have been trying to understand why random playouts work
so well for evaluating a game in which there is sometimes a very narrow
path to victory. Naively, it would seem that if there was a position from
which exactly one sequence of moves led to a win, but starting on that
sequence would force the opponent to stay on it, then random playouts
would evaluate that position as lost, even if the forced sequence would
make it a win.

Is it the full search at the top of the tree that avoids this danger (every
starting move gets explored, and for the correct starting move, random
plays are even worse for the opponent than being forced, so the forcing
sequence will emerge, if slowly and not certainly)? If yes, that would
explain the "horizon effect", where Monte-Carlo programs with slightly
deeper non-random search fare better at judging positions and squash
their opponents even without other improvements.

It might also explain why bots like Leela sometimes seem overconfident
of their positions, abandoning local fights before they are entirely stable.
Such overplay has traditionally been useful in playing against other bots,
even though it can be punished severly against strong human players. If
the opponent bot can't see the winning sequence, it may not continue
the local fight, and if it does continue the local fight with anything but 
the
optimal move, Leela tends to come back with strong answers, as if it
could suddenly see the danger. Either way tends to justify Leela's playing
elsewhere, if only against a bot opponent.

Of course, the third and second issue above are somewhat related:
if incorporating Go knowledge in the playouts is the only way to
avoid missing narrow paths to certain evaluations, one might have
to risk adding such knowledge, even if it boomerangs in other situations
(are ladders one such case, or are they better left to random evaluation?).

Ok, way too many questions already;-) I hope someone has some
answers, even if partial or consisting of references to more papers.

Claus

(*) btw, Computer Go related papers seem to be widely distributed -
is there a central bibliography that keeps track of papers and urls?




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Using playouts for more than position evaluation?

2008-09-29 Thread Claus Reinke
>I agree with much of what you say (to the degree that anyone needs to "agree" 
>with questions).

Good of you not respond as Kosh might have: "Yes" (warble sound effects;-)

> The discussions on this list dealing with "ownership maps", RAVE and AMAF have
> to do with using additional information from the playouts.

I've found the thread for "ownership maps":

http://computer-go.org/pipermail/computer-go/2007-February/008897.html

and I guess the explanations and origins of RAVE/AMAF are somewhere in
the >100 references each brings up in the archives? Perhaps the archives should
be published in printed form, to make "catching up" easier (I wonder whether
David Brin was a member of this list when he wrote his Uplift series)?-)

> Playouts can't be "unbiased." Picking a move with uniform probability is a
> bias too, and not a good one.

I was thinking of "unbiased" in the sense of not excluding valid moves (no
blind spots), but I guess after one accounts for trying to search an effectivly
infinite space with finite resources, the difference between "not there" and
"not likely" isn't all that big. And since one is sure to miss some important
moves, one has to try an tune the heuristics so that the available resources
are spent in the most likely areas of the search space.

It all comes down to experiments, probably, but the experimentation tends
to be limited to self-play, play against a couple of freely available opponents,
and low-frequency tournaments. Not exactly ideal for exposing unfortunate
biases in experimental heuristics, no matter how much math one throws at it.

I assume everyone is aware of how computer go is self-similar, the
research/development following the same patterns as the go engines?

Where there used to be deep study followed by few deliberate moves or
engine releases, there is now a need for frequent tournaments as playouts
for testing many possible playout heuristics:-)

> Computer go papers here: http://www.citeulike.org/group/5884/library

Thanks! Together with the list archives, I guess I'll not run out of reading
material for a while. I've already found some papers relevant to my questions,
but having the right search terms and reference lists is just as important.

What I haven't yet seen is a FAQ for this list (search terms, topics,
terminology, links, and the like). Is there one?

Thanks for your comments, keywords, and library url,
Claus




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] symmetry optimizations vs statistics?

2008-09-30 Thread Claus Reinke
Since statistics play such a vital role in modern Go engines, is there a danger
that good old-fashioned low-level optimizations interfere with those statistics?

Things like not re-evaluating symmetric positions (boar mirroring/rotation,
move transposition, etc.) multiple times interacting with statistics based on
number of evaluations.

>From short notes in papers, it appears as if, on visiting a position that is
symmetric to an earlier one, programs simply link to the earlier evaluation.

But that wouldn't account for the fact that there are now two ways to
reach the same result. For instance, from an empty board, there is only
one way to put a stone in the center, but there are four ways to put a
stone on a diagonal off center, and eight ways to put a stone off either.

So, if having a stone in any of these regions sounds equally important,
counting symmetries would make the rare move more urgent. On the
other hand, in longer move sequences, it is safer to rely on results that
can be reached in multiple ways (a generalization of miai?), so counting
symmetries might bias play towards those moves.

Perhaps this is also among the well-known material (or something
that experienced authors just get right without thinking or writing about
it), but a preliminary search of the archives hasn't brought up anything
relevant. How do current programs account for this? Has any of them
ever been caught unawares by such effects, positive or negative?-)

Claus




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] symmetry optimizations vs statistics?

2008-09-30 Thread Claus Reinke
> Since statistics play such a vital role in modern Go engines, is there a 
> danger
> that good old-fashioned low-level optimizations interfere with those 
> statistics?
>
> Things like not re-evaluating symmetric positions (boar mirroring/rotation,
> move transposition, etc.) multiple times interacting with statistics based on
> number of evaluations.

It seems that the keyword "transpositions" gives more useful search results
in the list archives than just "symmetries". There seem to be several issues,
including:

- if one uses hashing into a transposition table, one will simply be able
to find and reuse results from earlier visits to similar positions via 
different
paths; that might interfere with results that only exist as sums (eg, 
ownership
maps/territory heuristic) stored higher up in the tree (boards resulting 
from
playouts are added into a single higher-up array, then thrown away);

the solution here is probably to keep more local data, but that could
be expensive..

- if one uses transpositions to find all paths into the current position, one
will be able to sample multiple simulations in one go; that might interfere
with biases and confidence (giving more samples to positions that can
be reached in multiple ways);

the solution here is probably to be careful about the statistics, whatever
that may mean;-)

What I was wondering about was to what extent and with what results
such interactions between statistical sampling and optimizing transpositions
have been investigated. Is that any clearer?

Claus






___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] [EMAIL PROTECTED]

2008-10-02 Thread Claus Reinke
> But for grids (instead of clusters), the communication will become much much
> bigger - I'd like to study that carefully one day, I have no clear idea of 
> what is possible.
>
> A trouble is that on grids (at least the ones I've seen) there are often
> faults. We'll have to be fault tolerant I guess.

I've been somewhat on the fence for grids, but that is mainly because
computer scientists are at least as intelligent as Monte-Carlo playouts:
if adding "grid" to a project increases its chances of getting funding,
then "grid" will be added to the project, no matter how; and for a while,
that was exactly what happened - "grid this" and "grid that" everywhere..

However, while browsing papers like

"Toward Third Generation Internet Desktop Grids"
http://hal.inria.fr/inria-00148923/en/

isn't quite as fascinating as reading about the newest Go programming
techniques, it doesn't sound entirely hopeless, either. And isn't INRIA
nearby for some of you? If not, I'm sure that some funding agency in
your neighbourhood decided to steer research toward grid research
long ago;-)

Perhaps some of you Go engine authors could give some of those
Grid researchers a challenge - they have the hardware, the software,
and the need to research grid issues to justify their funding.

And you have the application that might be able to use all that. Unless
they think that their tools can't handle your requirements, they should be
quite interested in participating in your publicity ("beyond blue gene",
"the final man-machine intelligence challenge", and all that;-), not to
mention demonstrating that their tools can handle well-known-to-
be-tough problems. The genuine researchers among them will even
want to figure out how to address the research issues you raise.

Since netlag has been an issue even for slow human games over
internet servers, I'm sure it is going to be an issue here. But that
doesn't mean it is going to be insurmountable (as long as the
whole grid doesn't sit behind the same microwave transmitter;-).

Claus





___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] [EMAIL PROTECTED]

2008-10-02 Thread Claus Reinke
>> Now, for the technical matter: Could somebody please point me to a  quick 
>> rundown of how modern 
>> Go engines exactly utilize multicore  environments and the workload is 
>> segregated and 
>> distributed? I  don't have any significant knowledge on that, so any 
>> pointers would  be much 
>> appreciated.

> Here's a start:
> http://hal.inria.fr/docs/00/28/78/67/PDF/icin08.pdf
> Gelly et al, The Parallelization of Monte-Carlo Planning

A couple more references here:

Group: computer-go - with tag parallelization
http://www.citeulike.org/group/5884/tag/parallelization

Claus




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Using playouts for more than position evaluation?

2008-10-03 Thread Claus Reinke
>>> For instance, if an intersection belongs to the same colour in  all 
>>> playouts, chances are 
>>> that it is fairly secure (that doesn't mean one
>> shouldn't play there, sacrifices there may have an impact on other  
>> intersections).

Ok, that one was well know (ownership maps, territory heuristic).

>> Or, if an intersection is black in all playouts won by black,  and white 
>> in all playouts won 
>> by white, chances are that it is fairly  important to play there (since 
>> playouts are random, 
>> there is no guarantee, but  emphasizing such intersections, and their 
>> ordering, in the top-level 
>> tree search seems profitable).

That one was related to some existing work, and though perhaps not
exactly the same, already under research:

> We (the Orego team) have done some work along these lines this  summer. We're 
> working on a paper.

There is also this paper, right on topic:

Combining tactical search and Monte-Carlo in the game of Go
Tristan Cazenave, Bernard Helmstetter
http://www.citeulike.org/group/5884/article/2990531

That still seems to leave some other possibilities, though.

For instance, what about intersections that are 50/50, independent
of who plays them first? If they exist, it would seem that their fate is
determined by plays elsewhere, in particular, it doesn't seem like a
good idea to play there, and if one does already have a stone there,
one might have to do something about it. Also, in a balanced game,
one would like to see some balance of undecided intersections.

Or what about keeping the difference between statistical evaluations
for a few moves? Our own moves are driven by having positive effects
on those values, but if an opponent's move changes only one of the
values (without offering compensation in another value), one would
do well to answer it, and the particular values affected (does it unsettle
a stable group, or influence previously neutral territory, or ..) might give
an indication of the nature of the imbalance, and possible ways to
counter it (sufficiently to restore the balance).

Do these make sense? And are there other useful items one might be
able to extract from the playout statistics, online, during the game (as
opposed to offline learning, from self-play)?

Of course, some of this could be expected to emerge from playouts
without additional intervention, eg, if one always tries to optimize the
statistical results, then good answers to imbalances introduced by
opponent play should emerge. But given the size of the search space,
having more concrete handles on information to help directing the
search might still be useful.

Claus




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] (early) termination of random playouts?

2008-10-09 Thread Claus Reinke
Hi again, with yet another question:-)

Could someone please summarize the state-of-the art wrt the various ways
of limiting random playouts, and the their consequences?

There are several variations on the "don't fill your own eyes" (dfyoe) approach,
but the way this approach is often described tends to confuse me (eg, when
someone seems to claim that there is exactly one correct way of implementing
this, or when someone seems to claim that this guarantees termination, or when
the resulting biases are not even mentioned, let alone discussed or compared
against biases in alternative implementations, etc.).

I'll try to summarize my current understanding, in the hope that others will
fill in the missing pieces or point out misunderstandings:

- the only thing that guarantees termination of random playouts (with or
without dfyoe) is the positional superko rule: no whole-board repetitions
allowed. Waiting for this to kick in without taking other measures is not
an option: it takes too long and the results don't tell us much about the
value of the initial position.

- dfyoe variations share some motivations:
- reduce the likelihood of unnaturally long playouts (but not to zero)
- try to avoid disastrous endgame-situation mistakes that random playouts
are otherwise prone to make (and that eliminate the value of playouts)
- try to be simple (efficiently implementable), with low (but non-zero)
probability of introducing errors (filling in when not filling in would 
be
better, or vice versa)
- aim to steer playouts to easily counted terminal positions that bear a
quantifiable relationship to the estimated value of the initial position
(perhaps a more suggestive name would be "random legal fill-ins"?)

- some variations of dfyoe have been reported (are there others?):

[Bruegmann; Monte Carlo Go] (Gobble):
The computer only passes if either no legal move is available or all
legal moves reduce the eye space of one of its groups from two to
one. The (over-) simplification lies in the definition of an eye, which
is defined to be a field whose direct neighbors are all of the same
color and whose diagonal neighbors contain no more than 1 stone
of the opposite color (0 for border and corner fields). The reader
may convince himself that this definition of an eye is correct for
most situations, excluding sekis.
.. the rule is necessary for technical reasons, and even though (if
passing was implemented) the program could figure out the rule by
itself, that would be very inefficient ..

[Bouzy,Helmstetter; Monte-Carlo Go Developments] (Olga/Oleg)
Without this rule, the random player would never make living groups
and the games would never end. There are different ways to define
"eyes" as precisely as possible with domain-dependent knowledge
such as Fotland (2002) and Chen and Chen (1999). Our definitions
are designed to be integrated into a random Go-playing program;
they are simple and fast but not correct in some cases.
In OLGA, an eye is an empty intersection surrounded by stones of
one colour with two liberties or more.
In OLEG, an eye is an empty intersection surrounded by stones
belonging to the same string.
The upside of both definitions is the speed of the programs.
OLEG's definition is simpler and faster than OLGA's. Both
approaches have the downside of being wrong in some cases.
OLEG's definition is very restrictive: OLEG's eyes are actual
true eyes but it may fill an actual eye surrounded by more than
one string. Besides, OLGA has a fuzzy and optimistic definition:
it never fills an actual eye but, to connect its stones surrounding
an OLGA's eye, OLGA always expects one adjacent stone to
be put into atari.

Gobble's definition seems to be the one used in recent discussions here.
Has anyone compared the different eye definitions, without other changes?
Or looked for systematic (exploitable) errors (erroneous fill-ins, erroneous
inability to fill in)? What about the remaining unrealistic and overlong games
(filling large parts of the board without making eyes, then killing all and
repeat with small variations)? What about that claim that "the program
could figure out the rule by itself"?

Comments/pointers appreciated (especially if there was a discussion
that limited the options to Gobble's definition, as recent threads seemed
to suggest).

Claus



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] AMAF Scalability study + Responses to previous

2008-10-09 Thread Claus Reinke
>>What about that claim that "the program could figure out the rule by itself"?
>I made experiments with no-go-knowledge. (no eye knowledge).

Any random player that accurately avoids suicide seems to need an awful lot
of Go knowledge implicitly:

- liberties (or at least pseudo-liberties, since we only care about the step to 
zero)
- (immediately) connected strings of stones of the same colour
- strings with zero liberties are dead, and are removed
- first remove dead opponent strings, then check for suicide, which is ruled out

In particular, moves that can't be played because they would be suicide
are in opponent eye positions, so the bot "knows" about these in some form,
and that knowledge influences possible moves, even if the bot might not be
aware of that knowledge, nor of its own eyes;-)

I've sometimes wondered whether that biases light playouts for white.
Since light playouts do no planning, black has little advantage from playing
first, and white does get slightly better use out of the eye knowledge implicit
in the no-suicide rule:

For any pair of consecutive black-then-white moves, if the white move
forms an eye, the black move may well be caught in it, whereas if the
black move forms an eye, the white move will never be caught in it. For
an equal number of moves, white strings seem safer than black ones.

>I think what i did, was just to play a random number of moves, and scoring like
>if the game had terminated. Using that i remember that i could get a bot that 
>was
>able not to fill it's own eyes :) That's about as strong as it got though.

So you interpreted "passing" as not continuing the game, introducing an
arbitrary termination criterion, and found that  games without eye-filling
resulted in better scores. That makes sense.

But it raises another question, about evaluating a playout with arbitrary
length limit: Isn't the value fluctuating wildly with the length?

Consider a playout that stops just before black fills one of its two eyes
vs the same playout but stopped just after black filled one and white filled
the other.

And if the dead string was big enough (random moves cluster together
without any notion of bad shape), the same game can happen inside the
newly cleared area if enough further moves are played.

Or consider a playout where black randomly throws stones into a white
area (surrounded by white stones, but without explicit two-eye separation):
the black score for that is going down with every black move, until that
final move that kills the surrounding white string, after which the black
score suddenly goes up because those black invaders are no longer dead.

One could perhaps record upper and lower bounds on such fluctuating
evaluations, and stop the playout if its evaluation converges on a narrow
enough range? And one would like to avoid wins that rely on the
evaluation going downhill for a long time before suddenly going up

All of which of course assumes an external scoring function that doesn't
need the playouts to result in simple positions for scoring to work. But
isn't such a standalone scoring function exactly what the random playouts
are meant to replace?

Claus



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] (early) termination of random playouts?

2008-10-09 Thread Claus Reinke
>> Anything can exist in a random game :-)
> Occur yes, repeat forever requires very special situations.

That makes long repeats unlikely in any individual run, and infinite
runs infinitely unlikely, unless we run into one of those very special
situations. But how special do these actually need to be?

One popular way of scaling up Monte-Carlo based engines seems to
add more processing resources to explore more of the search space.
Now, if you have a loop that requires n specific steps with at most m
loop-breakers at each step, it isn't one of those very special cases that
would force a bot into the loop. But if your engine explores more than
(m+1)^n alternative runs, one of its simulations is going to run into the
loop, even if that isn't likely for each individual run, right? And if the
class of dangerous situations isn't quite as narrow as one might think
at first, then random simulations overall are more likely to run into
one of them, even if we don't use such a "beast" to start the simulation.

But even short of infinite loops, very few of those finite repetitions are
likely to occur in real games - some of them have and do, but most are
a sign of unrealistic simulations (unsupported claim;-), which are likely
to mess with the quality of evaluation results. Since finite bounds can
take care of infinite loops, such degradations in quality are more worrying,
at least to me. Especially if their origins are hidden somewhere inside
statistics gathered from a huge invisible (unrecorded for efficiency
reasons) set of random simulation runs (tuned via experiments).

Perhaps one could record a histogram of simulation lengths at least, and
stop worrying unless there happen to be lots of entries in the >361 area.
Otherwise, results might be influenced by whether the artifical bound on
simulation length happens to fall into a high or a low of overlength runs.

Claus




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] pseudo eye variations for playouts

2008-10-10 Thread Claus Reinke
Thanks everyone for the answers regarding playout terminations. I still
have my suspicions regarding how artificial game length bounds affect
the position evaluation (I'd expect the values to fluctuate with length,
so arbitrary bounds would result in arbitrary differences).

For the moment, however, I'd like to focus on the variations wrt
pseudo-eyes (positions not to be filled during playouts). All of these
are known to be wrong in some cases, the main argument in favour of
pseudo-eyes being efficiency.

Since ruling out fill-ins introduces not only biases, but actual blind spots
into the playouts, something even heavy playouts otherwise try to avoid
(they prioritize "good" moves instead of eliminating "bad" moves) it is
important that the definition of pseudo-eyes errs on the safe side ("safe"
here meaning random play - any move is possible): they may permit
erroneous fill-ins, but must not rule out necessary fill-ins (only definitely
bad fill-ins may be ruled out).

Within that constraint, the goal is to rule out as many bad fill-ins as
possible without ruling out any good fill-ins. And all three variations
listed earlier (Gobble, Olga, Oleg)

http://computer-go.org/pipermail/computer-go/2008-October/016549.html

have their issues with that. For instance, consider the following position:

(
;
FF[1]
GM[1]
SZ[19]
AP[Jago:Version 5.0]
AB[ji][kj][jk][ij][jl][hj][jh][lj][jg][kf][ke][le][me][ne][of][og][oh][oi][ni][mj][nf][kg][nj][jm][jn][jo][jp][ip][hp][gp][fp][fo][fn][fm][fl][fk][fj][gj][gi][gh][hg][ig][ok][gf][pl][fe]
AW[ki][kh][li][lg][mh][lf][mf][ng][nh][kk][lk][kl][mi][mk][nk][ik][hk][gk][gl][gm][gn][il][im][in][io][go][hm][ho][jd][kd][ld][md][nd][od][pe][pf][ph][pg][pi][pj][ei][ej][ek][el][em][en][eo][ep][eq][fq][gq][hq][iq][jq][kq][ko][kp][kn][km][oj][ii][fi][fh][fg][gg][hf][if][jf][je][ih][oe]
C[black to move.

Gobble: 'a' is not an eye => fill it and die

Olga: 'a' is an eye, but would cease to be if 'b' was white

Oleg: 'a' is not an eye, but would be if 'b' was black]
L[jj][hh][nk][oj]
GN[playout-eyes]
)

If I set this up correctly, the black center group is "unconditionally" alive,
even if it was white's turn, while playouts with Gobble-style pseudo-eyes
will rate it as unconditionally dead (and hence the white strings 'c' and 'd'
as unconditionally alive, independent of conditions on the outside). Playouts
with Olga-style pseudo-eyes would fare better, but would be fooled into
considering fill-in at 'a' after white 'b'. Playouts with Oleg-style pseudo-eyes
have a chance of finding the black group alive, in case of black 'b' first.

Did I set this up right? And why does no-one else seem worried about it?-)

Claus

PS In case the sgf-labelling isn't standard, 'a' is K10/center, 'b' is H12.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] pseudo eye variations for playouts

2008-10-11 Thread Claus Reinke
> It is important to know about potential blind spots introduced by pseudo-eye 
> variations, or any 
> other rules.
>
> Borrowing from Eric S Raymond, the more eyes inspecting the ideas, the 
> shallower the bugs.

Thanks! Here goes another attempt, this time trying to construct an
example where pseudo-eyes prevent a necessary fill-in ('a' is J15,
'b' is L17, 'c' is F12, and 'd' is K16).

Claus

(
;
FF[4]
GM[1]
SZ[19]
AP[Jago:Version 5.0]
AB[hd][id][je][jf][if][hf][he][jb][ld][le][lf][lg][kh][jh][ih][hh][gh]
[fg][ff][fe][fd][fc][gb][hb][ib][kb][lc][fb][lh]
AW[gc][hc][ic][jc][kd][ke][kf][jg][ig][hg][gf][ge][gd][kg][gg][ga][ha]
[ia][ja][ka][la][lb][mc][mb][md][me][mf][mg][ki][ji][ii][hi][gi][fi]
[eh][eg][ef][ee][ed][ec][eb][ea][fa][mh][li]
C[black to move.

Gobble, Olga, and Oleg agree:
'a' is an eye, cannot be filled.

Black 'b' or 'c' would put black's outer group in atari, accelerating its
death.

A random opponent might fail to respond to black 'b' with white 'c',
permitting black 'd' to kill white, and black to live. If there are no
remaining outside moves, or the opponent is not quite random (heavy
playouts with capture preference, say), black 'b' or 'c' amounts to suicide.

The only other local move is black 'd', which allows and forces white 'a'
to live, killing black.

If filling 'a' was not ruled out, playing there would kill white, allowing
black to live.

The pseudo-eye definitions blind black to the winning shaped sacrifice
of what can never become part of a two-eye group, turning a likely live
(after selecting the best first move) black group into a likely dead one
(with survival depending on context and opponent weakness).]
LB[ie:a][kc:b][fh:c][jd:d]
GN[playout-eyes2]
)



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] pseudo eye variations for playouts

2008-10-11 Thread Claus Reinke
> There is a de facto standard light playout policy (algorithm).

I have a rough idea of what that might be. And I suspect that keeping this
"de facto standard" implicit has been hiding some actual differences in what
different people think that standard is. Some of my questions arise from trying
to pin down where and why different authors have different ideas of what "the"
standard is. If there has been some explicit standardisation since those papers
were published, I'd be interested in a pointer to that standard and its 
rationale.

> If you want to adhere to this standard, then by definition, there are correct
> and incorrect ways to do so. If you just want to design a better playout
> policy, naturally anything is fair game.

Neither, actually!-) If there was a single standard, then implementing that
first would be useful before starting on variations. But only if said standard
was arrived at by community optimization, not by committee or by failure
to examine hypotheses for their validity ("it works in most cases" is no
proof that it works, let alone that there isn't a more complete way to do
the same thing). The more everyone always implements the same thing
because everyone does it that way, the less that way is guaranteed to
be tested for flaws (local extrema, exploration vs exploitation?-).

Mostly, I like to understand what I'm doing and why, instead of just
implementing and optimizing what has been handed down. So I take the
slow, scenic route to writing my own implementation, using coding only
to test and improve my understanding of what I've read while enjoying
the pleasure of finding things out. Usually, that either leads to requests for
clarification/suggestions for possible improvements or to being more
comfortable with "the" standard (because I've tried to shoot it down
and failed - "coming to grips with it";-).

After all that initial learning has settled down, and the basis is solid,
there'll come a time for testing new ideas.

> But how are you going to measure whether it's really better?

By trying to understand the issues in enough depth that I can design
concrete experiments that will test concrete hypotheses?-) And if the
theory is good enough, experiments might not even be necessary in
all cases (or rather, a single proof might cover an infinity of experiments),
though I expect experiments to be helpful in most cases.

> My advice is to keep to this standard for a little while longer, write
> the rest of your engine, run it on CGOS where you can compare it
> directly with programs like myCtestxxx (maintained for that purpose
> by generous individuals) and verify that the other parts work. Then
> start improving the playout policy.

I fully expect to try my code against others at some point (though I
don't have a permanent internet connection atm, so couldn't let anything
run via a server for long). But that isn't urgent yet. What is urgent is not
to build in limiting assumptions (in either code or understanding) that will
be difficult to back out of once there is more code.

Don't get me wrong: I have switched from pure hypotheses to working
on code, though only on the side, and I think the various community
resources (servers, wikis, software, list archives, papers, bibliographies,
..) are great. A FAQ with urls, archive pointers, search keywords, and
glossary would help newcomers (with pointers to it in the list headers
and mailman listinfo page). If the list had a wiki, newcomers like myself
could simply add what they find to a FAQ page on that wiki before we
forget how difficult it was to find in the first place.

Claus



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] pseudo eye variations for playouts

2008-10-11 Thread Claus Reinke
Don's draft standard reminded me of the corner cases. So here's
an even simpler example, this time trying to show that dead invading
stones can poison playout analysis depending on which definition of
pseudo-eyes is used ('a' is A1, 'b' is C1).

That makes three attempted examples so far (are the examples
valid for what they are trying to show?), trying to show:

- the "standard" Gobble pseudo-eye can both allow real eyes
to be filled and prevent sacrifices involving filling false eyes
(in all cases so far, resulting in own groups counting as more
likely dead, and opponent groups counting as more likely live;
but since the playouts are symmetric, that implies miscounting
both ways)
- the local pattern approach of Gobble can be poisoned by
dead opponent neighbours

I understand that some engines use pseudo-liberties, or may
not have connection information easily accessible all the time,
but for those who do maintain proper liberty counts: are there
any examples where Olga-style pseudo-eyes are not preferable
to (at least as good as) Gobble-style ones?

Claus

(
;
FF[4]
GM[1]
SZ[19]
AP[Jago:Version 5.0]
AB[ar][bs][bq][cq][dr][ds][dq][aq]
AW[br][cr][ap][bp][cp][dp][eq][er][es]
LB[as:a][cs:b]
C[The black group is "unconditionally" alive.

Gobble: 'a' is not an eye, group is unsettled (50% chance: black 'a' dies, 
black 'b' lives).

Olga: 'a' is an eye, black 'b' the only possible local move: group is alive.

Oleg: 'a' is not an eye, group is unsettled (50% chance: black 'a' dies, black 
'b' lives).]
GN[playout-eyes3]
)




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] pseudo eye variations for playouts

2008-10-11 Thread Claus Reinke
> Thanks! Here goes another attempt, this time trying to construct an
> example where pseudo-eyes prevent a necessary fill-in ('a' is J15,
> 'b' is L17, 'c' is F12, and 'd' is K16).
..
> GN[playout-eyes2]

Sorry about that one! I must have been thinking of some form of
Carpenter's square or square nine in the corner, but quite apart
from somehow managing to set up the whole thing away from the
corner, that shape is far too complex for such a simple argument
to succeed.

Please don't waste your time on playout-eyes2,-(
Claus



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Go/Games with modified scores

2008-10-14 Thread Claus Reinke
> .. I think people (me included) feel that replacing a whole swath
> of relevant information by a single number points to potentially some
> serious inefficiency and loss of information. The fact that nobody
> has found how to make use of the excess of information is no proof of
> course that it can't be done. I think it's a very valid question at
> the very least and a bit premature to call it a wrong premise. MCTS
> programs are still in their early days of development and it's quite
> possible some good improvements can be made by using more information
> than the simple win/loss ratio.

First, let me state that I agree: there should be ways to get more. In
fact, my impression is that there have been successes in making better
use of simulation results (ownership maps, territory heuristic). Just that
they use even more detail (final board vs final score), which makes it
easier to extract information that is safe only over a large number of
runs. Just looking at the sum score can hide the details that lead to
statistically relevant information.

One might look at the distribution of sum scores, though: for instance,
if scores are either drastic or close, with little in-between, one might
assume that there is a pivot move (combination) to be searched for
on which the game outcome hinges in the current position (eg, a large
group in danger).

Second, having now looked at some more random "light playouts"
(just instrument your engine to output sgf before starting the next run),
I feel that the name is highly misleading. These simulation runs have
very little in common with actual play, eg, in a 19x19 run from an
empty board, one might see an opening period, with 200 moves of
randomly placed stones, followed by a middle period, with 100 or
so moves occasionally making a firm connection or a true single-space
eye, followed by an endgame with 100-200 moves exploring possible
consequences of those little bits of random mess that cannot be ruined
by random play.

You can perhaps predict which side has the more-likely-connected
strings and the more-likely-unkillable strings at 250 or 300 moves,
and hence, which side is likely to win the "playout" in the end, but
again those properties have very little in common with good shape
in actual play.

Of course, the outcome is supposed to be nearly evenly random from
an empty board, but if you look at move 300 and try to compare the
strings and eyes that almost make it vs those that do, it drives home
the message that the individual run is nearly meaningless, and the
simulation is blind to many "obvious" features of a game position.
Upping the number of simulations escalates some of those "nearly"s
to significance, but care is needed to decide between significant result
and significant error, and how to interpret either.

So I now prefer to call these simulation runs "legal fill-ins" and I think
it would be worthwhile trying to improve our knowledge on what kind
of information can be safely extracted from (sets of) them.

Even the one-bit win/loss information appears to depend on (a) undecided
areas being filled in such a way that they are allocated evenly to both sides
and (b) decided areas and influence on both sides having an equal share
of ruinable and non-ruinable aspects. This way, a Murphy-style (whatever
can go wrong, will) random fill-in will neither see unrealistic advantages
emerge in undecided areas nor reduce one side's advantages more strongly
than the other one's, so even if the random scores bear little relation to the
realistic scores, the win/loss bit would still be useable, at least over large
enough samples.

Similarly, if very nearly all random fill-ins in a large enough sample agree
on the territorial status of an intersection, that might seem another safe bit
of information to extract.

But that may not be entirely accurate, and "it works most of the time"
is quite different from "statistical analysis shows an error rate of E% for
information X after N simulation runs" (*). Even the latter leaves enough
room both for interpretation and for significant surprises in actual play.

Below is another of those odd examples that you might want to run through
your Monte-Carlo evaluation engine (mentally or computer-based:-). I'd
be interested to see the results, and how they vary with number of simulations.

Assuming no komi, Chinese count is 40/40, so whoever fills the center
wins. In a simulation-based evaluation, random invasions are impossible
in the black side, but only highly unlikely in the white side. Perhaps someone
else can construct an example where the difference is actually significant?
If anything, white has been more efficient in building (four moves less, as
apparent in Japanese count), so one could say that simulation-based
scoring based on naive play is slightly biased toward inefficient play,
because it sees most clearly those features that cannot be ruined by
Murphy-style play.

Simulation-based evaluation does not see the same board po

Re: [computer-go] Go/Games with modified scores

2008-10-14 Thread Claus Reinke
> Just out of curiosity, what did you expect from the playouts?

Nothing in particular, really; at this point I'm just trying to build up an 
intuition
about what I can or cannot expect from them. At first, I thought light playouts
would not fully explore, but at least randomly cover all possible games, so we
wouldn't get good move sequences directly, but lots of useful statistics to work
with; while heavy playouts would try to mimick more realistic games, so the
statistics would be biased, but there'd be a better chance of getting good moves
directly. That turned out to be somewhat naive on all counts, as the answers
to my earlier messages showed!-)

There is no chance of covering all possible games, so uniformly random
play (light playouts) definitely ignores some, and if realistic games are 
unlikely,
and good move sequences even more so, then they get the least coverage.
So, in the presence of limited resources, "no bias" is a bias on its own. And
the way eyes and other rules knowledge are encoded for efficiency reasons
introduces other biases as well.

Heavy playouts never drop possible sequences, they just try to focus the
available resources away from sequences that appear less promising. That
can have unpredictable effects, so the "unpromising" sequences are kept
in the pool, at low priority, and the resulting focussed playouts need heavy
testing to get some measure of confidence in any improvements.

In sum, there doesn't seem to be a good basis for understanding playouts
and their optimisation, other than by trial and error. Those who've been
through some cycles of trial-and-error probably have at least a vague
intuition of what works and what doesn't (or didn't when they last tried),
but there is depressingly little theory (and many of the references are
infuriatingly vague - in the style of "there was a lot of discussion about 
that";
great! not! where, when, what keywords or references, please?-).

I'm trying to build some of that intuition in advance, both to be able to
focus my trial-and-error phase and to have some theory to complement
the experiments.

At the moment, the most puzzling aspect is that the uncertainties aren't
bounded in any direction: the pseudo-eyes variations aren't subsets of
each other (that can probably be fixed, and would result in a single best
variation, at least for the three ones I've mentioned);

the playout evaluation, while somewhat biased toward not seeing more
complex properties of board positions, cannot even be said to give a
lower bound on the score, because it will underestimate both the
strength of positions and the strength of possible attacks against them
(though knowing that much will be of some help at least).

Currently, it seems as if even many of the tricks that seem to work
wonderfully well don't really have a solid basis, so any tournament
could throw up a game that puts the whole thing into doubt, by
showing a hole in the test coverage big enough to drive a truck
through (once pros and others know what to look for). Or not.

But if a (very) occasional, weak player like myself can show up
some of the biases in concrete board positions, then a real player,
not to mention a professional, should be able to spot and exploit
them in real games.

Claus



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] programming (languages) light-simulation contest

2008-10-14 Thread Claus Reinke
>As for me, i'm really NOT interested in knowing "what langage is good for go
>programming". That's simply not a question i can ask myself, nor anyone else.
> This question doesn't make any sense for me. Still if someone can get the
>"standard light playout" right in less than 10 code line, and they are very
>understandable lines. I would be very happy to see it. But it would never
>mean for me that, "this language is BEST". Even if the peformances are
>optimal there. I think 90% of the "this language is the best" debate gets it's
>root in some affective part of the people engaged with it.

As someone interested in programming languages, I cannot resist to comment.
True, language advocacy tends to be boring at best, and -in its extreme forms-
tends to be pursued not by the language designers themselves. If they have any
experience at all, they are quite willing to let their language speak for 
itself, and
quite open-minded, both toward other languages having good points and toward
their own language having weak points. Converts turned evangelists are often
as embarrassing to "their" community as they are to their target audience.

You might get a little more out of the question if you change your perspective
a little: ask "what language is good for Go programming?" not with a view on
what general- or specific-purpose languages are out there, but with a view on
how you would like to express your thoughts about Go.

In this particular case, representations seem to play a fundamental role: I
didn't want to think about details or optimizations too early, but I found it
impossible to think about the algorithms without choosing suitable data
representations (too many details to keep in mind if the representations
were unsuitable); once I got the "right" representations, the algorithms
were straightforward, and whenever added algorithms started to get
complicated, it indicated a need to switch to a better representation.

So, once you've got your representations (sets/bitmaps, arrays/hashtables,
etc.) implemented in libraries, the programming language itself may not
matter much any more (recursion simplifies things, but whether one
recursively collects sets of liberties in imperative, functional, logic, or
object-oriented code may not make much difference - even
assembler might be manageable, given such libraries, just that it might
doom your code when you move to a different machine in future).

But in thinking about a consistent set of representations suitable for
Go programming, you're effectively deciding on a language to use
when talking about Go programming issues. Being aware of this,
the question "what language is good for Go programming?" may
turn out to be a little more useful and interesting. An unsuitable
answer results in complicated code, which is hard to think about
and even harder to get correct; a suitable answer results in simple
code, which can then be embedded into many of the popular
general purpose languages.

Claus



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Go/Games with modified scores

2008-10-19 Thread Claus Reinke
> You'll probably enjoy an article I wrote last year on this theme:
>  http://dcook.org/compgo/article_the_problem_with_random_playouts.html

Hi Darren,

yes, I particularly liked that you explained what you were trying to
show and how you think your data supports this - one might agree or
disagree, but at least it is fairly clear what one agrees or disagrees with.

The concrete references also led me to earlier discussions that showed
that the biases and blind-spots I was trying to highlight were already
well known to this group. Unsurprisingly, really, but I hadn't seen these
points emphasized much elsewhere (be it because the standard techniques
mostly work, or because it isn't clear yet how they work and hence how
to improve them).

What I'm looking for is a kind of recent survey, summarizing all those
well-known ideas and problems that aren't much talked about here
anymore (unless a newcomer like myself re-encounters them for the
first time;-). The field is probably still too much in flux to hope for a
complete survey in the style of these two from 2001/2002

http://www.citeulike.org/group/5884/tag/survey

But if someone knowledgable were to draft such a thing specializing
on Monte-Carlo and other developments since 2002, as a technical
report with yearly updates, that would be a great resource, would
avoid repeat questions in this group, and could ultimately evolve into
the next great survey paper..

Thanks,
Claus



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Go/Games with modified scores

2008-10-19 Thread Claus Reinke
> It's a new area and the systems are very complicated. What kind of
> theory are you after, and what would you like it to tell you?

Currently, what seems to happen is (no offense intended, and please
correct me if my incomplete view is wildly off!-):

- have an idea for a great improvement (one thinks..)
- implement&test
- evaluate result statistics
- be surprised by what happens (and happy or sad with the results)
- commit to or abandon the improvement
- try something else

What I'd like to see more of is what seems to work well in science
everywhere:

(a) Understanding emerging from that loop. In particular, one should
be able to see not just that the change was successful or not, but why.
And where to look for the next improvement.

(b) Understanding feeding into that loop. In particular, one should be
able to predict not just that the change is clever (and hence should
be useful by rights;-), but how it will interact with what is there and
why that interaction should result in an improvement.

Just the usual model vs reality/theory vs practice thing: having either is
nice, but having both and having them interact is what really gets the
ball rolling. Experiments result in hypotheses, hypotheses result in
experiments, after a while a coherent model emerges that guides
experimentation and is developed or changed to reflect experiments.

>> Currently, it seems as if even many of the tricks that seem to work
>> wonderfully well don't really have a solid basis, so any tournament
>> could throw up a game that puts the whole thing into doubt, by
>> showing a hole in the test coverage big enough to drive a truck
>> through (once pros and others know what to look for). Or not.
>
> Is testing based on a large number of games not solid enough? I don't
> see any alternative with such complicated systems.

The mere experiment-driven approach is likely to accumulate features
until their interaction makes further progress too slow or too fragile to
be practical (just as a merely theory-driven approach is likely to
accumulate pretty concepts that deviate further and further from
useful reality). Combining theory and practice helps to overcome
the limitations of both.

Having theoretic models of what is going on should make it easier
to design a consistent engine (hypotheses are more malleable than
fragile features accumulated over expensive experiments) or to guide
new develpments (tests tell you "yes, it works" or "no, try again"; a
theory might tell you "given these known results, you could try looking
for something else about here").

Abstraction also helps to manage complexity: to take the standard
example for Monte-Carlo, a circle may be described by a large
number of experiments or by a simple formula (and once you suspect
that you're looking at a circle, you have a framework in which to
interpret the experiments; just as a few more directed experiments
will tell you whether trying to interpret the results as a square makes
any sense;-). Without a theory, you can add an arbitrary number of
tests, hoping for something useful to emerge, but how can you tell
whether your test are even in a useful area?

> Another point:
>
> We deliberately restrict the complexity of the generative model (the
> playout function) by keeping it simple, and show that it works on a
> large, representative number of positions. Because the generative
> model is so simple, we can expect the performance we see in private
> testing to be realized in real games.
>
> We need not live in constant fear, at least to the degree that I think
> you are implying.

Of course I was exaggerating!-) But even in the few weeks I've been
looking into this topic so far, I've seen bots lose by playing against
superko in tournaments. And in simple random playouts, after eliminating
ko candidates, I see about 20 superkos in every 10k 9x9 runs, ranging
from under 10 to over 30. I've heard about bots crashing occasionally
until their authors accounted for unexpectedly large game lengths.

Then there is the bias in light playouts, apparently already well known
here, but shrugged off because it seems to work well enough after
Amaf and especially tree search have been added. Or perhaps one
needs to move to heavy playouts? Or just add more computing power?

There are necessary moves that playouts with certain eye definitions
will never even consider (so they can also not emerge as best moves).
Again, that seems to be known, but adding tree search takes care of
that, doesn't it? Well, yes, starting with a tree that considers every
move, then using playouts only to evaluate those moves, will move
the playout blindness one move away, but it is still there, at the tree
horizon. What is the effect of that?

Perhaps it is a double disadvantage that the standard techniques
work so well at the moment, and scale up even better with more
resources to add depth and breadth. Think of ManyFaces: 15
years have passed between its first and its most recent wins

Re: [computer-go] From zero to playing on CGOS in 10 minutes

2008-10-23 Thread Claus Reinke
> Thanks again for more explanations. I think the AMAF is clear to me now.

For what it is worth: I read the AMAF section as indicating that the bots
are to play using AMAF heuristics - random playouts, followed by playing
the AMAF-scored winning move, rinse and repeat. Which is why I thought
I shouldn't try until I get round to making a performant, lightweight playout.
Is that right, or are these bots supposed to play random playouts only, but
provide scoring informaton as if those playouts were part of an AMAF bot?

In either case, a working example explanation would make a useful addition.

Btw, if there was a version of the bot specification on a wiki page, people
might be able to clarify text or add questions. The wiki version could point
to the current reference version (part of the reference implementation, I
believe? I didn't want to look at Don's code until I get round to ripping out
all the extra info and experimental code I keep playing with, which keeps
my code from matching that 20k playouts/sec figure).

> This is still something I don't understand. Are there others who  implemented 
> the same thing and 
> got 111 moves per game on average? I  tried to look through some posts on 
> this list but didn't see 
> any  other numbers published.

I'm not yet implementing the reference spec, but if you're just asking
about random playouts: yes, I tend to get around 111 moves. The
most frequent game length is even shorter (81 positions+ ~39 positions
freed by capture - ~13 positions occupied by eyes => ~107 moves),
but there are more longer games than shorter ones (see histogram below).

Claus

$ ./Go.exe 0.5 1
Size: 9
Komi: 0.5
Games: 1 (1.3056 average black-white score difference)
Black: 5141 games (26.9716=6.6633+20.3083 average score)
White: 4859 games (25.1660=6.1724+18.9936+0.5 average score)
Draws: 0

Gobble-style eye: 442830
capture (w/o ko): 986684
ko candidate: 87978
positional superko: 34
suicide: 595044

game lengths (average: 111.1783): 
[(80,3),(81,9),(82,9),(83,19),(84,16),(85,47),(86,44),(87,57),(88,
79),(89,82),(90,124),(91,156),(92,158),(93,159),(94,223),(95,234),(96,236),(97,312),(98,278),(99,310
),(100,282),(101,301),(102,305),(103,340),(104,282),(105,308),(106,311),(107,255),(108,300),(109,291
),(110,252),(111,253),(112,246),(113,241),(114,195),(115,199),(116,200),(117,167),(118,134),(119,144
),(120,130),(121,101),(122,88),(123,76),(124,79),(125,67),(126,74),(127,61),(128,58),(129,66),(130,5
5),(131,92),(132,44),(133,87),(134,70),(135,101),(136,79),(137,110),(138,93),(139,110),(140,73),(141
,81),(142,65),(143,98),(144,61),(145,60),(146,58),(147,60),(148,30),(149,30),(150,29),(151,22),(152,
23),(153,16),(154,18),(155,14),(156,14),(157,13),(158,11),(159,3),(160,6),(161,3),(162,1),(163,6),(1
66,2),(184,1)]



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] enhanced test

2008-10-28 Thread Claus Reinke
>It's pretty tricky doing anything very clever without slowing things
>down a lot.   But I am highly interested because I want to build a
>stronger Ogo type of program for handheld devices.

Here's a suggestion: some mobile devices have volume-bound
communication tariffs (internet or text messaging), and Go protocols
are low-volume. So you could run a strong engine on your home
machine, and add a mobile front-end for it. If your tariff is limited
to small numbers of free/inclusive text messages, you might want to
pre-compute several possible moves on the server and send tree
fragments (rather than individual moves) to the mobile, for local
execution/selection. Of course, even if you can stay within the
cheap communications window, you might want to check which
of communication and computation is going to run down the
battery sooner..

Claus

Ps. The suggestion is free. But if any of the commercial authors likes
the idea enough to add it to their program, perhaps they could
provide me with access to a free copy of their software in return?-)



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] computer-go textbook ?

2008-11-03 Thread Claus Reinke
> I am now seeking some good textbooks for computer-go course in my university
> for undergraduates. I hope I can get some advice from this mailing list,
> thank you all!

I'm not currently in teaching, but I'd be interested to hear about uses
of computer-go in this context, ie, not so much as a specialist course
but as a focus for programming and group projects. I looked into this
some years ago but, before the recent successes with Monte-Carlo
based approaches, it wasn't promising. Interesting, yes, open scope
for advanced students, yes, but too much rope for the average student,
and too far to go before reaching real achievements. Difficult to motivate,
both for students and non-Go colleagues, and highly risky for the majority
of students.

This has probably all changed now. The recent reference bot discussions
show that competent programmers can port an example implementation
easily, or re-implement it from scratch given a suitable specification.

With the usual accounting for experienced programmers vs learners,
easy/quickly becomes reachable given sufficient time and support, but
that is still substantial progress: one can actually expect a majority of
student( team)s to create a working bot (that plays better than random
legal moves) by the end of a course. At which point motivation will
be high and contagious and, even if the student bots might not be the
fastest or bug-free in absolute terms, they could compete against
each other.

And there is no end of useful experience to draw from a computer-go
related project: algorithms and datastructures (with lots of datastructure
variants applicable within one bot), profiling, performance (switching
representations, eliminating structure to gain performance without losing
testability/debuggability, estimating complexity and seeing it make a real
impact), presentation/gui, communication, game servers, protocols, non-
sequential and distributed programming, ..

Btw, one can water down the topic slightly, aiming for computer assistance
rather than computer play, which lowers the barriers for success, and gives
more students a chance to find an interesting, achievable, and rewarding
topic to work on in the overall setting.

That also allows for small parcels of work (guis, influence/territorry estimate
visualisation, move/move area prediction [which program comes closest
most often], game commentary generation [can your program tell what a
move is about, or turn estimates into commentary], statistics/knowledge
extraction from game databases, life&death problems, interactive tutorials,
..), some of which might be integrated into tool suites later (raising interface
and integration issues). In brief, all the computer-go resarch topics, but
presented in small enough, manageable, problem sets, not expecting
perfect results, but focussing on interesting and rewarding paths.

Or is that still too optimistic?-)

|I am not aware of a book that would cover modern go-programming
|techniques (ie Monte-Carlo).
|
|You'll find plenty of ressources in the Computer-go bibliography:
|http://www.citeulike.org/group/5884/library

Great resource, but not easy to use (too many tags, too little structure).
Is there/should there be a web page/url tag in the computer-go
bibliography, for resources like wikis ,
game servers, url collections, and reference bots? The bibliography
could be the single entry point for computer-go.

The survey tag gives a useful starting point in 2001/2002

http://www.citeulike.org/group/5884/tag/survey

then working backwards from today trying to find the most influential
ideas/tags/search terms, cross-checking with the mailing list archives..

http://computer-go.org/pipermail/computer-go

A lot of work before you can get a course started..
Claus

PS. I notice that not all postings come with the list url footer, and
not all email clients know how to present the List-Archive header.
Also, the "About computer-go" section in the mailman page

http://www.computer-go.org/mailman/listinfo/computer-go/

ought to mention/link to the computer-go bibliography.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: FW: computer-go] Monte carlo play?

2008-11-16 Thread Claus Reinke
>> In a game of self play, if both parties are employing only monte
>> carlo, ... random simulations... wouldnt it be very weak...
>> ... and some playing around I am clearly mistaken because its works
>> quite well.
>Yes, it doesn't make sense but it does indeed seem to work :-).

Plain Monte-Carlo bots *are* rather weak. But they work better than
might be expected, and they are fairly simple, which is extremely
motivating for potential Go programmers!-) Instead of having to learn
all of Go and a lot of AI before being able to start building anything,
one can start quickly, and then improve together with the bot. And
the current top programs claim plain Monte-Carlo bots as their remote
ancestors, so it isn't a dead end, either (though complete rewrites may
be needed somewhere along the path;-).

Another way to think about playouts is by comparing them with a
human player processing through the ranks:

1 only knowing the rules makes for very inefficient/slow/buggy play
2 playing lots of games quickly has often been recommended to get a
better feeling for the game; personally, I don't like fast games(*), but
at least the worst inadequacies tend to show up even if we only play
equally clueless opponents (probably because weaknesses aren't
evenly distributed, so that peaks in understanding in one player
happen to match weaknesses in understanding in the other)
3 human players learn over time, until all players in a group have
reached an equal level (or have gone as far as they can, with the
effort they can afford to put into the game)
4 now further progress depends on innovation in the existing pool
of players, on joining a different player pool, or on bringing in
other players, preferably players whose strengths expose
weaknesses in the current pool (similar effects can be achieved
by reading books or observing games)

If we allow the learning outcome in 2/3 to be represented as patterns
("wait a minute, I've fallen into that trap before, so I shouldn't do this",
"I usually do this, but that would be faster - I wonder whether it'll work"),
then random playouts can emulate a weak form of this process. Instead
of learning from lots of games over time, plain Monte-Carlo bots
extrapolate from lots of dumb games, everytime they consider their
next move.

Random "games" are really too dumb to be considered games, but bots
can play lots of them in a short time, so they can at least see trends (a
mixture of on-the-spot learning and very bad reading ahead). If we
assume that the quantity compensates somewhat for the quality, a
Monte-Carlo bot is a bit like beginners after lots of games who
always forget their lessons learned when the session ends.

They may have an idea of who is ahead and what is alive, and be able
to organise their games based on those ideas, but a stronger player (who
remembers his lessons) may be able to rip their ideas apart ("you might
both think that group is alive but what do you do if I play here? and who
is ahead if you lose that group?"). And then there is the scenario of "strong
player visits club and humiliates local champion" ("you might think that
playing there kills that group, but what if they reply here?").

Adding patterns or other biases to make playouts heavier (both slower
and more relevant for evaluation) is similar to improving reading skills,
but with differences: bots already read to the end, so eliminating useless
moves doesn't improve the playout depth, it improves at best the density
of information to be gained from playouts (so you don't have to play as
many of them, or can get more out of those you do play); but playouts
are also still used for on-the-spot-learning, and biases have limited scope
for success, so one has to be careful not to ruin this aspect. Using a tree
on top of the playouts is not just a way to represent the knowledge
extracted from the playouts, but also helps to direct playout effort towards
relevant situations, and gives a place where traditional Go knowledge
can be built in without affecting the playouts directly (so they become
more of an evaluation function than a candidate move generator).

One problem is that the top programs tend to be fairly close in strength.
There are variations in method, so there is some further progress, but
how much of the human going-through-the-ranks evolution has been
captured in current top programs, I don't know. I suspect that it is less
than their ranks would indicate. There is certainly reading going on,
discussion, and inviting stronger players or different bots.

Oh, and if any of this seems to make sense, remember that I just made
it all up!-) You might want to consult these slides for overviews from
folks who have been in the business:


"Old-fashioned Computer Go vs Monte-Carlo Go"; by Bruno Bouzy,
Invited tutorial, IEEE 2007 Symposium on Computational Intelligence
in Games, CIG 07, Hawaï, USA, 1st April 2007.
Abstract:

http://cswww.essex.ac.

[computer-go] Re: One-sided 2-inch Rules

2008-11-18 Thread Claus Reinke
> one of the basic problems of go newbies
> is their tendency to place the next stone
> near to the latest stone of the opponent.
> Sometimes this is called the "2-inch heuristic
> of beginners".

How about going the other way, forcing Monte-Carlo simulations
onto a coarser grid in the hope of quickly finding information that
might be lost in full-grid simulations (by analogy with multigrid
methods - there even exists something called multigrid Monte-Carlo;
perhaps some of the mathematically inclined can have a look?).

As a trivial idea, if both playout-parties are forced to play on all
odd intersections before being allowed onto the remaining ones
(staking broader claims before entering detailed fights), how would
that affect the statistics, and is there a benefit to be had from
combining coarse and fine simulations?

Claus



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: UCT RefBot

2008-11-20 Thread Claus Reinke
>My first monte carlo programs used ownership info very effectively,  but it 
>can be that by using 
>AMAF this information is used already.

As a relative beginner in these matters, the more I look at AMAF,
the less I like it, and I think that has to do with AMAF ignoring
relative sequencing. By looking at all moves as if they were played
first, it ignores that some moves only make sense after certain other
moves. I haven't yet been able to pin this down with evidence,
because the "the game is lost, all moves are equally bad" effect
tends to obscure it (I guess I need to insert "resign" moves at the
right places to limit the game records to the interesting pieces, even
if that means my bot can't score the result).

If I ignore that the Go board is 2d (dimensions, not dans;-), and think
only of time (number of moves), space (board positions), and alternatives
(simulation runs), then ownership maps project this 3d spacetime multiverse
onto a 2d space histogram (likelihood of each position being black/white)
while AMAF tries to project these three dimensions onto a 2d time
histogram (likelihood of each move resulting in win/lose).

As the various tweaks (don't count moves by other player, don't count
moves already played, value early moves higher than later moves) and
remaining bot blunders demonstrate, that doesn't quite work right, in
spite of first appearances.

My (as yet unproven) hypothesis is that the issues stem from AMAF
trying to ignore the time dimension of its projection, hence you get
highly rated moves that place themselves into atari (because that would
have been the last, necessary, move to capture the opponent group,
*after* surrounding it), that would be suicide right now (but not at some
later point in many playouts), or even on an occupied position (no longer
occupied later in many playouts), that try to fill an opponent's eye (which
would make sense only if the second, nearly complete eye were prevented
first, but that doesn't always happen in the same way, hence lower rating
for each of the possible ways).

Giving priority to early moves alleviates this somewhat, as the experiments
show, but then you still don't get perfect loss or win results, even if the
board is effectively owned by one of the players, because the tweaks
have removed some data from the statistics to cover the worst holes.

There should be a variation of AMAF that records move win rates in
move sequences ("these moves are good, but this one always comes
after that one; never try this one unless you are sure you can make that
one as well"), preserving more of the structure in the time dimension
of its projection without just reproducing the whole 3d input data space.

Given that ownership maps don't suffer from this kind of trouble (unless
one collapses their information to score, which has been said to perform
badly), it is surprising that they are hard to make good use of. It might
just be a matter of asking different questions: AMAF variants try to
answer which move is likely best to make first; ownership maps answer
which positions are likely to be mine in the end. The latter is not as directly
useable, at least not as long as we are only asking about the next move.

That continues with UCT: apart from offering scaffolding for introducing
more Go knowledge, it also introduces a way of interleaving use of
playout information with continued generation of it, but it tends to do so
in a way biased towards move sequences. Again, not the kind of
question that ownership maps can offer answers to, but that just means
that there are other questions we are not asking yet.

Perhaps there is a complement to UCT/Amaf that achieves such
interleaving based on territorial information? One could allocate the
first half of simulation runs to acquiring an ownership map, then use
that information to narrow down the second half of simulations to
the more interesting parts of the board (perhaps less than half the
runs are needed to weigh the rest successfully?).

Claus

PS. Is there a location where one can find the current versions of
the plain Amaf reference bots? I seem to be reaching the stage
where issues I find are not always in my own code, but I don't
know whether, eg, JrefBot 081016-2022 is the latest version).



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Reference bot (AMAF, MC) spec and jrefbot

2008-11-22 Thread Claus Reinke
It seems unbelievable to me how many bugs I introduced while converting
my "fooling around with Go concepts" code to GTP and towards the refbot
spec. It is not that GTP or refbot are themselves problematic, it is that I had
lots of assumptions built in that needed revision - leaving GTP and refbot
spec to the end was a bad idea, as others have pointed out before (my
favourite language provides implementations with read-eval-print loops,
so I didn't need GTP early on..).

To save future bot authors from this, I suggest to revise the refbot spec to
emphasize a "GTP early" approach, and to make the nature of AMAF
statistics slightly more explicit (see attachment).

Btw, wasn't there talk of putting the spec on a web page somewhere
(so we can more easily refer others to it), not to mention test harness
(gogui-twogtp serves well, though) and links to latest versions of
available rebot implementations?-)

Thanks to all those who are making Go programming easier and fun,
be it via papers, postings, services, specs, or tools!

Claus

PS. I also attach two examples of JrefBot 081016-2022 violating
superko (it is nice that not all the issues are in my own code!-)


begin 666 refbot-spec.txt
M#0HM+2TM6R!B;W0@:6UP;&5M96YT871I;VX@[EMAIL PROTECTED]@=7-E9"!A
M'[EMAIL 
PROTECTED]')O
M=&]C;[EMAIL PROTECTED]"[EMAIL PROTECTED]2UO=70@:7,@;F]T
M('!O2UO=71S
M('-T;W @869T97(@,B!C;VYS96-U=&EV92!P87-S(&UO=F5S+"!/4B!W:&5N
M($XJ3BHS#0H@(" @(&UO=F5S(&[EMAIL PROTECTED];B!C;VUP;&5T960L('=H97)E
M($X@:7,@=&AE('-I>F4@;[EMAIL PROTECTED]&AE(&)O87)D+B -"B @(" @4V\@:[EMAIL 
PROTECTED]&AE
M(&)O87)D(&ES(#EX.2!T:&5N('1H92!G86UE(&ES('-T;[EMAIL PROTECTED](@
M#0H@(" @(#DJ.2HS([EMAIL PROTECTED],R ](#(T,R!M;W9E2!B86-K('1O($=O8F)[EMAIL 
PROTECTED],@9&5S8W)I8F5D#0H@
M(" @(" @:6X@(DUO;G1E+4-A[EMAIL PROTECTED]<&QE=&5S+"!T:&[EMAIL PROTECTED]@(" @('-C;W)E(&ES('1A:V5N
M(&%C8V]U;G1I;F<@9F]R(&MO;[EMAIL PROTECTED]('-T871I65D(&9O
M[EMAIL PROTECTED]&)Y
M(&5I=&AE2!I9B!T:&4@;6]V92 -"B @(" @("!I
M0T*(" @
M(" @(&UA9&[EMAIL PROTECTED];B!T:&[EMAIL PROTECTED]@(" @(&(@4&%S2!U
M;FQE2!I;7!L96UE;G1E9"(@<')O9W)A;7,N#0H-"B @.2X@
M5'=O(')E9F)O="US<&5C:69I8R!G=' @8V]M;6%N9',@87)E('-U9V=E&4@,C P, I7:&ET92!C;VUM
[EMAIL PROTECTED]&4@,C P, I7:&ET92!C;VUM
[EMAIL PROTECTED]http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: Re: hex robot

2008-11-27 Thread Claus Reinke
>The use of C is almost the only choice, but I'm on the lookout for the
>next wave of languages that will be:
>   1.  Native code compiled - no external runtime required.
>   2.  High level language features, but not imposed on you.

Have a look at Haskell (www.haskell.org). I don't want to get into
language advocacy (you'll find enough of that elsewhere;-), so I'll just
address your points:

1. GHC compiles to native code (either via C or, increasingly, directly)
2. Higher-level language features are strongly suggested (and even higher-
level features are being worked on continuously), but not imposed - if
you want to write C-style code in Haskell, you can (just don't show it:-)

(and if you absolutely want to write the inner loop in C, Haskell has a
nice foreign function interface - it gets a lot of its libraries from 
importing
the C APIs and wrapping higher-level safer interfaces on top; so you
always have the fallback of writing a few core pieces in C for speed
while writing the framework and interface code in Haskell for 
expressiveness)

You also mentioned string handling, which harks back to your second
point above: a game engines involves many tasks, not all of which require
low-level bit-fiddling, so you actually need to be able to mix high- and
low-level code as appropriate. Again, Haskell allows this.

Disclaimer: Haskell is by no means perfect, and I have many, many
gripes with it and it implementations. But it is very good for increasingly
many things, and games programming might well be one of it (I'm not yet
sure about Go in particular - the performance-critical part of the code
doesn't look much nicer than it would in C at the moment). The thing I
like best is that it that it brings practitioners and researchers together in
a fairly nice community (it used to be more research-heavy, but that has
balanced over recent years).

Claus



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Dependency inversions in Monte-Carlo Go

2009-04-07 Thread Claus Reinke
Last time I looked more closely at what my MC bot (simple, no tree)
was doing, I noticed that it has a tendency to try the impossible moves
(suicides) first, discarding them as impossible for every genmove.

Looking at more statistics, and thinking about it, it finally dawned on
me that this is a consequence of the standard move evaluation approach
based on win rate:

- what one hopes for is the move with the best chance of winning
(move enables win)
- what one might get is a move that can only be played when winning
(win enables move)

For suicide moves, this is particularly obvious: they become possible
only in the small number of games in which the opponent group
surrounding the pseudo-eye has died (or is killed by playing in the
pseudo-eye after removing all other liberties). The larger the group,
the more likely that the game is going to be won if that group dies
(roughly), so the larger the opponent group, the more tempting its
pseudo-eyes seem for win-rate-based evaluation, however unlikely
it is that the group actually dies in non-random play (certainly not
by starting with the pseudo-eye).

Something similar might be happening at a less obvious scale,
such as playing a move into self-atari: there is one opponent
move that renders this useless, but it is only one move - any
game in which the stone is not captured might look rather
good in terms of winning.

Is there a known method of reducing the impact of these outliers
(other than building a real tree and seeing by trial-and-error
that many good-looking moves aren't really useful)? It seems
that one cannot just devalue moves with low hit counts - after
all, if there is only one sequence of moves that will rescue the
game, one doesn't want to discount it just because it is rare.

One thing that might help are move-number statistics: those
moves that tend to be played late in the playouts in which
they occur might depend on other moves to be played first,
so perhaps one should have lower bounds on when each
move can be considered for actual play?

Claus




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] bots and handicaps (Re: New CGOS)

2009-06-06 Thread Claus Reinke
>The purpose of a handicap games is to allow a 50% chance of either
>contestant winning. .. Programs do not care,

Are you sure?-) I haven't got round to moving beyond a plain MC bot yet,
where the effect is rather striking, but less naive bots also depend on win-rate
for their move evaluations.

The effect is that the bot is only "interested" in a narrow range of fairly
balanced games: if it gets too far behind, or too far ahead, its play becomes
fairly random. The reason being that small differences in score may not
sufficiently affect the win/lose ratio to be distinguishable (it would take
longer sequences of such better moves to change the win/lose outcome),
so good and bad moves are getting lumped together in the evaluation.

However, handicap games are rather different from non-handicap games,
and bots don't care how a balance is achieved, so one might simply use
numeric handicaps. Only that a large numeric handicap will have a bad
effect on win-rate-based bots' evaluation abilities - the weaker bot,
seeing a large numeric advantage, will play even more weakly. So one
might have to phase in awareness of handicap gradually, adjusting it
during the game (as the stronger bot gets ahead in the game, more of the
handicap gets added to keep the evaluation and gameplay interesting).

Which does sound rather complicated. But it reminds me of a suggestion
I made a long time ago for human games(*), which might be easier to adopt
in this context: when a player/bot is far enough ahead that even a double
move by its opponent will at best catch up, the stronger player can simply
pass. At the end of the game, any difference in the number of passes is
counted as handicap, and recorded together with the plain score/result.

It would allow stronger bots to keep weaker bots in play for longer,
when the remainder of the game would otherwise be a continued
sigh "oh, if its programmer had only implemented early resign..".

It would also allow weaker bots and their authors to get more out of
playing stronger bots (provided that authors actually study game
records, not just result statistics) - instead of "that game was really
lost, but my bot didn't know, so it just kept playing random moves"
it would be "hmm, to make that game even close, our opponent
had to pass 5 times, lets look at the pass moves and what went
wrong to make them possible".

Just another idea,
Claus

(*) the context was "teaching" games: being rather a beginner myself,
and trying to interest newcomers to play the game, I never felt
confident in guessing useful handicaps, let alone explaining the
different play required to make use of handicap stones. Not to
mention that beginner strengths are rather variable.

So what I did instead was to start out evenly, then deliberately
throw in less efficient moves when I thought I was getting too far
ahead (an entirely new source of blundering opportunities, btw;-).
This was often accompanied by trying to explain to my opponent,
within the limited range of my game knowledge, why I thought
that their moves weren't as efficient as mine, and what they
might have tried instead to achieve their aims.

The discussions were really useful and interesting (the target
audience at the time being academics with incurable affinity
for discussing and reasoning), but the slow-downs kept the
games going for longer (also: realistic for long stretches, and
highlighting opportunities for discussions in between, at the
slow-down points). The approach also seemed to take the
emphasize away from winning, towards learning/thinking/
having fun with exploring the game.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: bots and handicaps (Re: New CGOS)

2009-06-06 Thread Claus Reinke
>> .. plain MC bot ..
>> The effect is that the bot is only "interested" in a narrow range of fairly
>> balanced games: if it gets too far behind, or too far ahead, its play
>> becomes fairly random. The reason being that small differences in
>> score may not sufficiently affect the win/lose ratio to be distinguishable
>> (it would take longer sequences of such better moves to change the
>> win/lose outcome), so good and bad moves are getting lumped
>> together in the evaluation.
>
> I once made the argument that MCTS bots would not play handicap games
> well based on this same argument.   I got a fairly strong reaction from the
> group on this.   Many said it would make no difference and that the really
> strong bots would play just as hard even if they were started with a
> technically lost game (from the handicap.)

Well, MCTS bots have the advantage of reasoning over sequences of
moves, not just single moves, so I would expect the effect to be less
pronounced there (but still existent, and possibly affecting performance
even if not strength). If the win-rate evaluation is mixed with enough
other information from the playouts (total score, ownership, patterns,
..), that might be sufficient to reduce the effect to negliblility, but it
seems worth checking that this is the case.

>> .. it reminds me of a suggestion
>> I made a long time ago for human games(*), which might be easier to adopt
>> in this context: when a player/bot is far enough ahead that even a double
>> move by its opponent will at best catch up, the stronger player can simply
>> pass. At the end of the game, any difference in the number of passes is
>> counted as handicap, and recorded together with the plain score/result.

Note that this suggestion should be fairly easy to implement (if anyone
with two bots of different strength, or one bot stronger than publicly
available opponents, would try this, I'd be very interested in the results!)
and is not meant to change playing strength but to get better information
from games between bots of different strengths.

> The solution is to simply play for the best total territory score.
> ..
> I think some bots have enough knowledge imposed on them that they don't
> revert to random-like play, even when it's clear who will win.

I'm all for getting more information out of playouts but, since the
win-rate approach is so successful, there has to be something
special about it. Perhaps it is just that it abstracts over just the
right amount of information to balance out the unwanted artifacts
in the more detailed playout evaluations, perhaps it is something
else. But it seems likely that one needs to find a weighted mix
of win-rate and more details for evaluation (if win-rate is decisive,
use that, otherwise give more weight to alternative measures).

There was a recent (CIG'2008) paper suggesting sigmoid
functions for combining final score and winrate, btw.

Claus





___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: Re: Dynamic komi in commercial programs

2009-07-12 Thread Claus Reinke
> I would like to know what exact experiments with "virtual komi" have
> been made and why thay failed. ..
> So why exactly shouldn't it work?

(warning: I have not tried this yet, just been thinking about the issue)

While virtual komi sounds like an attractively simple way of solving
the issue, I don't think it does so without fail - it is slightly too simple.

The issue, for me, is that MC's evaluation function is a mixture of
statistics and cut-off, so if all the statistics remain below or above
the cut-off, moves can no longer be distinguished and play becomes
random (the cut-off is important: just going for best average score
does not appear to perform as well as going for best potential to win).

Just adding virtual komi shifts the cut-off, which should help to make
evaluation more selective in the problematic cases. Unfortunately, it
does so at the price of making evaluation less selective in formerly
unproblematic cases (what previously was the clear best move now
has to "compete" with lesser moves that also make the lower cut-off).

What seems to be needed is a stratified approach:

1 try to make a clear choice of best move with unmodified MC
2 only if 1 fails, move the cut-off and try again
3 if nothing helps, resign (not just polite, but speeds up testing;-)

As usual, one could trade time vs space: instead of repeating the
statistics with different cut-offs, record all moves wrt different
cut-offs (a small range around the komi) in the first run, then use
the statistics for the cut-off that is closest to normal komi but still
able to distinguish better moves.

Claus




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/