Re: [Computer-go] replacing dynamic komi with a scoring function

Don Dailey Mon, 09 Jan 2012 11:02:55 -0800

I like that idea - I'm not sure I've fully digested it's implications but
it does deliver more information that could be put to good use.


Don


On Mon, Jan 9, 2012 at 1:53 PM, Zach Wegner <[email protected]> wrote:

> On Mon, Jan 9, 2012 at 7:26 AM, Vlad Dumitrescu <[email protected]>
> wrote:
> > Hi,
> >
> > On Mon, Jan 9, 2012 at 13:17, Don Dailey <[email protected]> wrote:
> >> Summary:
> >>
> >> I believe a more correct scoring function won't be based on how much
> you win
> >> by OR how often you win but will incorporate some other more relevant
> >> concept and it will be dynamic.    And it will not matter if the game is
> >> a handicap game or otherwise because the scoring function will always be
> >> relevant.   The goal will be to maximize your winning chances but it
> >> will incorporate something more sophisticated that just counting how
> often
> >> you win or how much you win by.
> >
> > I hope I may interfere with something that Don's nice description
> > revealed to me. It feels rather obvious, but since nobody stated it
> > explicitly, maybe it's news for at least some people here.
> >
> > MCTS is maximizing the chances of winning. These chances are largest
> > for a minimal score difference because this allows for making some
> > errors. Winning by the largest possible score has rather small chances
> > to happen because every move has to be perfect.
> >
> > The curve describing the probability of ending the game with a certain
> > score is bell-shaped and MCTS explores the area beneath it, looking
> > for winning moves. With handicap, the disadvantaged side is getting
> > less samples explored, making it less likely to discover the really
> > good moves. Dynamic komi shifts the bell left or right in order to
> > equalize the sampling on both sides, but as mentioned it isn't dynamic
> > enough (the curve changes after each move) and also is actually using
> > a different shape for the curve than the real "handicap curve".
> >
> > In theory, I think that the solution for keeping the same level of
> > play with handicap as without would be to make sure that the the
> > disadvantaged side gets just as many samples with or without handicap.
> > That is, use more playouts when playing with handicap. In practice,
> > this is probably prohibitive...
> >
> > I wonder if it might be possible to estimate the shape of this curve
> > after each move and use that estimate to dynamically adjust the number
> > of playouts. One might have to use higher precision calculations, too,
> > so that the noise doesn't get too loud.
> >
> > Does this make any sense? Has anyone tried something like this?
> >
> > best regards,
> > Vlad
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
> This is similar to ideas I've had to solve this. Of course, I'm not a
> go programmer, so it's really just idle curiosity for me when reading
> this list. Perhaps an ideal scoring function is "highest median
> score". You can do this in a slow/memory intensive way by storing a
> histogram of final score in each node instead of wins/losses.
> Wins/losses can then be computed by summing the histogram buckets
> above/below the komi level. Computing the median is simply finding the
> bucket where the sum crosses 50%. Other percentages could be used here
> as well, though I'm not quite sure what effect this might have.
>
> This solves the problem of getting far less than one bit of
> information per playout when the win rate is very high or low, and
> also supersedes dynamic komi--if you have the full histogram at any
> node, you can compute the exact win rate for any arbitrary komi.
>
> Using this in practice might be fairly difficult, and would probably
> involve making the histogram "lossy" in some sense. I would be
> interested if somebody here could run tests with this as a limit study
> though--keep the number of nodes allocated and playouts per move
> constant, and see the effect on strength.
>
> Zach
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] replacing dynamic komi with a scoring function

Reply via email to