Re: [computer-go] Way MC plays

2008-03-01 Thread Jacques Basaldúa

Don Dailey wrote:


I personally have serious doubts about knowledge extraction from human
games, but I hope you have success.I think you can get more from
computer games of strong players even though the level is weaker. Here
is why I say that:



1.  A strong computer still plays a lot of good moves - so the delta
between human and computer games is not as high as you think.


2.  A certain consistency in computer games that humans don't possess. 



3.  You have access to the internals, such as a score that quantifies
moves.


My idea combines offline knowledge with MC methods. So there is a lot
of online computer generated knowledge. I don't pretend to make a dumb
savant program, just playing the only move it finds in a joseki database.
But if the program considers the joseki stored in (state, action[i-1]) 
-> action[i] records. It will immediately start searching a tree whose

first move at all depths is the joseki sequence. If the joseki depends
on a ladder, it will find that ladder much earlier and if it is not good
it will hopefully play something else. At blitz time settings, playing 
the wrong  joseki is bad but not terrible. That would be about the level 
of a human dan making a mistake in a blitz game, still better than current 
programs.


In this context, human knowledge produces a priori values for UCT. (The 
exact formula of the MC tree search is not yet decided.) There is enough
computer generated knowledge in the search. When the end of the game 
is still 200+ moves ahead, the search alone does not find enough 
difference between good and not so good moves.


The knowledge, I think, makes sense to extract is:

1. (state, last move) -> (next moves list) records (joseki or places
to play when the last move is somewhere else.)
2. urgency of shapes
3. distribution of statistics (e.g. proportion of times tenuki is played 
at move n) that helps making playouts more humanlike while still fast.
I finally abandoned full board Bradley-Terry scored random playouts. I 
draw a random number to decide if tenuki should be played, if true I 
invent a random move and use it as if it was the previous move. Else,
I play a Bradley-Terry scored random move in the 40 neighbors of the 
last move. That is almost immediate in my hologram board system because

I store the mask. On other implementations it may not be so good.

About urgency of shapes:

The urgency of a BT adjusted 40 neighbors shape hits 40% in a 10 M+ sample 
with about 160 K patterns. (Overlearning should not be very important

given the lack of degrees of liberty.) It hits 66% with the first 5 moves.

Hits of move  1 =  0.402048 acc =  0.402048
Hits of move  2 =  0.117102 acc =  0.519151
Hits of move  3 =  0.065450 acc =  0.584600
Hits of move  4 =  0.045592 acc =  0.630193
Hits of move  5 =  0.035190 acc =  0.665382

That sounds great news. But there is bad news of course. That doesn't go any 
further. To reach 80% it needs 14 moves


Hits of move 14 =  0.006206 acc =  0.801095

And to reach 90% it needs 96 moves!

Hits of move 96 =  0.001323 acc =  0.900994

Shape is a factor that explains, perhaps 2/3 of the moves (shape alone about 
40%, but shape "in the right place" a little more than 66%). Not less and
_not more_ than that. It is naive to think that the 12th move in terms of 
shape is better than the 50th. But, imagine there are 250 legal moves, the 
248th is a really bad move. 

Can shape make a difference? I don't know. If its slow, surely not. If it 
is almost free in terms of performance as in my board system, I hope it does.



Jacques.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] f(score) instead of sign(score)

2008-03-01 Thread jonas . kahn
> http://ewh.ieee.org/cmte/cis/mtsc/ieeecis/tutorial2007/Bruno_Bouzy_2007.pdf
> 
> Page 89, "which kind of outcome". This method is better than the above
> and similar to what Jonas seems to propose. The improvement is minor.

By looking at their proposal (45 * win + score), in contrast to mine,
there is no scaling with 1/n. Yet they get an improvement, with
something in fact of the order 1/\sqrt{n}. But I think the latter does
not play a role.

Why ?

First hypothesis: misleading data.

Second hypothesis: better evaluation function, if we had the ``exact''
expectation of the evaluation function under the MC runs.
Here is an interesting argument towards the following:
- Even with many simulations, it is likely that all these simulations
  are different by move 50.
  Suppose we had an oracle who, after 50 moves in the MC simulation,
  told if the game is a win or a loss under perfect play. Stopping the
  simulation here and taking this result would probably be a better
  evaluation function (discussion below). We could then wonder how best
  to approach this evaluation function by what we know, that is the
  score at the end of the MC runs.
  Then taking the same value for a half point victory or a 60 points
  victory is ludicrous: MC is not perfect play, we have an uncertainty
  depending on how far we are in the game and on the position. But a 60
  points victory would probably mean that we had a victory after the 50
  first moves of the simulation.
  Under this hypothesis, the evaluation of the node should be 
\int_{-score}^{\infty} Phi(x) dx, where Phi(x) is the probability that the MC 
score is off the perfect play by (+x) points.

- Now this Phi could be evaluated by looking at typical results of many
  simulations from the same position (assuming the mean or the median is
  perfection); alternatively, it could be practically evaluated from the
  simulations in the node before.

- BUT this evaluation function would not be efficient: I  have not
  access to any data, but I think that the MC simulations must have a
  big variance. That would give too narrow a margin to victory, I guess.

- Why should this evaluation with oracle be better ? Well, that's the
  true minimax evaluation of the position from move 50, same as counting
  at the end. What we have inbetween is noise. Making better playouts
  can result in worse play because it is more deterministic, so that the
  evaluation is less robust (could always make the same mistake), but
  here it IS robust: we have said perfect play, no mistake. If we had
  that at the root of the tree, that would be God playing.
  However, as I said, we arrive at a hardly tenable conclusion for the
  practical evaluation function. Why?
  Probably, the RIGHT  evaluation function is not victory or defeat,
  it's: how likely is the program to win from this position (say against
  itself) ? Indeed, since we have random players, it even has a precise
  meaning. Hence, non-integer values could make sense. 
  Now, since programs with tree search play better than their MC
  simulations, we would have less variance in the outcome from this
  infamous move 50, and hence, we would also have a convolution. We
  would then not use 0-1, but not anything too broad either. Something
  like (45 * win + score) is reasonable, around 0 in particular (it
  should saturate for high scores).

- In any case, this would also mean that the bonus for the win should be
  made higher at yose. Indeed, the nearer the end, the more accurate the
  MC scoring is (for one simulation) and hence the less we must
  convolve.


And this is not having a spoilt child. That's explicitely trying to
imitate a better evaluation function, that we cannot know directly.

Jonas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: computer-go Digest, Vol 43, Issue 8

2008-03-01 Thread Petr Baudis
On Mon, Feb 18, 2008 at 09:03:17PM +0100, Alain Baeckeroot wrote:
> Le lundi 18 février 2008, Michael Williams a écrit :
> > But as was pointed out before, these high levels of MoGo are probably still 
> > not pro level, right?
> > 
> 
> On 9x9 Big_slow_Mogo is near pro level, maybe more.
> 6 monthes ago or so it was able to regurlarly beat pro without komi on 9x9.

But no komi on 9x9 is quite a handicap.

It would be quite interesting to see how well the high level Mogos fare
against high dans unhandicapped.

-- 
Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.-- J. W. von Goethe
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Should 9x9 komi be 8.0 ?]

2008-03-01 Thread jonas . kahn
> delta_komi = 10^(K * (number_of_empty_points / 400 - 1)),
> where K is 1 if winnig and is 2 if loosing.  Also, if expected
> winning rate is between 45% and 65%, Komi is unmodified.

There's one thing I don't like at all, there: you could get positive
evaluation when losing, and hence play conservatively when you should
not. That could especially be true near the end, when the situation
drifted from bad to less bad. I think that if this happens you're
doomed... The  other way around would probably be painful too.

After all, the aim of tinkering with komi is to avoid that the computer
plays nonsensical moves, but it should know whether he must fight or
calm down.

Here is a suggestion: give a new komi at each move (not a drift, though
it would probably be a good idea that you *upper bound the komi change*,
maybe by ten times the delta_komi you use now).
For choosing it, punish the fact of being away from true komi
proportionnally to | komi_aya - true_komi |. Call lambda the
proportionnality constant.
Punish the winning rate through (winning_rate - .5)^2.
Total loss function is then:
loss = lambda * |komi_aya - true_komi| + (winning_rate - .5)^2


Suppose the change in winning rate is proportional to change in komi,
that is take as hypothesis:
winning_rate(komi_aya) = winning_rate(normal_komi) + C (komi_aya -
normal_komi) 

REMARK: C is negative !

Take the change in komi that minimizes the punishment (a real), and
truncate it to get an integer (truncation also means you have a tendancy
to have delta_komi near 0).

If you knew beforehand the winning rates for each komi, this would
ensure you generally have a winning rate on the right side of .5, while
it would not move komi if winning rate is somewhat near .5. 

Solving the degree two equation yields:
change_in_winning_rate_by_point_komi_change =

new_komi_aya =
truncation[(.5 - old_winning_rate) +
C * old_komi_aya - lambda / (2 * C)]^{+}

if the estimated winning rate with true komi is more than .5, and

new_komi_aya =
truncation[(.5 - old_winning_rate) +
C * old_komi_aya + lambda / (2 * C)]^{-}
 ^^
if the estimated winning rate with true komi is less than .5.


The estimated winning rate with true komi is given by
winning_rate - C old_komi_aya.

You could take 
lambda = K / (1 - number_of_empty_points / 400)
if you want to be nearer true komi during yose. Less human-like, but
less error-prone !
Now that's probably not necessary since the constant C is probably much
bigger when entering yose.


The problem here in this formula is to find a good C, since this
constant  change during the game.
Here is an idea from the data in the game: use the last time the komi
was changed and take:
C = 
(winning_rate_after_last_change - winning_rate_before_last_change) /
(komi_aya_after_last_change - komi_aya_before_last_change)
[* f(number_empty_points_last_change, number_empty_points_now) if you want to 
refine]

Now this estimation might be unstable, especially since it cannot tell if
that's the evolution of the situation that made the winning rate change,
or the change in komi. I'm sure ou can find ways to stabilize. The
easiest I can think of would be a formula like:
new_C = .3 * C_formula_above + .7 old_C


END OF THE DELIRIUM 
Jonas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [EMAIL PROTECTED]: Re: [computer-go] Should 9x9 komi be 8.0 ?]

2008-03-01 Thread Petr Baudis
On Thu, Feb 28, 2008 at 01:01:08PM -0500, Don Dailey wrote:
> It's naive to think some simplistic deception imposed upon the program
> is going to correct the error when you don't even know if the program is
> erring in the first place. How can you say,  "the program thinks it
> is losing, but it MUST be in error and I'm going to change the komi so
> that it will 'play better'?"You must have some basis for knowing the
> program is in error before you can lower it's goal.  

I think there is a misunderstanding here, this was (I suppose) never to
correct error in program's perception of the board. (From watching MoGo
and such, it often overestimates its board position, but rarely
underestimates it. :-)

The point here is to prevent the program from playing the "MC-hamete"
moves that in most cases have no hope of working, but instead still aim
at a close game and wait for some opponent's yose mistake. This closely
matches human approach to the game as well - if you are confident in
yose and see you are only little behind, you frequently don't start
overplaying hopelessly, but instead still play the best you can and hope
for an overturn in yose.

-- 
Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.-- J. W. von Goethe
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Should 9x9 komi be 8.0 ?]

2008-03-01 Thread Hideki Kato

[EMAIL PROTECTED]: <[EMAIL PROTECTED]>:
>> delta_komi = 10^(K * (number_of_empty_points / 400 - 1)),
>> where K is 1 if winnig and is 2 if loosing.  Also, if expected
>> winning rate is between 45% and 65%, Komi is unmodified.
>
>There's one thing I don't like at all, there: you could get positive
>evaluation when losing, and hence play conservatively when you should
>not. That could especially be true near the end, when the situation
>drifted from bad to less bad. I think that if this happens you're
>doomed... The  other way around would probably be painful too.

 From my observaion, mc chooses good moves if and only if the winning 
rate is near 50%.  Once it gets loosing, it plays bad moves.  Surely 
it's an illusion but it helps to prevent them.

>After all, the aim of tinkering with komi is to avoid that the computer
>plays nonsensical moves, but it should know whether he must fight or
>calm down.

Agree.  So, it's important _when_ adjust komi or apply my method.  My 
object is to keep winning rate around 50%, which yields good moves.

>Here is a suggestion: give a new komi at each move (not a drift, though
>it would probably be a good idea that you *upper bound the komi change*,
>maybe by ten times the delta_komi you use now).
>For choosing it, punish the fact of being away from true komi
>proportionnally to | komi_aya - true_komi |. Call lambda the
>proportionnality constant.
>Punish the winning rate through (winning_rate - .5)^2.
>Total loss function is then:
>loss = lambda * |komi_aya - true_komi| + (winning_rate - .5)^2
>
>Suppose the change in winning rate is proportional to change in komi,
>that is take as hypothesis:
>winning_rate(komi_aya) = winning_rate(normal_komi) + C (komi_aya -
>normal_komi) 
>
>REMARK: C is negative !
>
>Take the change in komi that minimizes the punishment (a real), and
>truncate it to get an integer (truncation also means you have a tendancy
>to have delta_komi near 0).
>
>If you knew beforehand the winning rates for each komi, this would
>ensure you generally have a winning rate on the right side of .5, while
>it would not move komi if winning rate is somewhat near .5. 
>
>Solving the degree two equation yields:
>change_in_winning_rate_by_point_komi_change =
>
>new_komi_aya =
>truncation[(.5 - old_winning_rate) +
>C * old_komi_aya - lambda / (2 * C)]^{+}
>
>if the estimated winning rate with true komi is more than .5, and
>
>new_komi_aya =
>truncation[(.5 - old_winning_rate) +
>C * old_komi_aya + lambda / (2 * C)]^{-}
> ^^
>if the estimated winning rate with true komi is less than .5.
>
>
>The estimated winning rate with true komi is given by
>winning_rate - C old_komi_aya.
>
>You could take 
>lambda = K / (1 - number_of_empty_points / 400)
>if you want to be nearer true komi during yose. Less human-like, but
>less error-prone !
>Now that's probably not necessary since the constant C is probably much
>bigger when entering yose.
>
>
>The problem here in this formula is to find a good C, since this
>constant  change during the game.
>Here is an idea from the data in the game: use the last time the komi
>was changed and take:
>C = 
>(winning_rate_after_last_change - winning_rate_before_last_change) /
>(komi_aya_after_last_change - komi_aya_before_last_change)
>[* f(number_empty_points_last_change, number_empty_points_now) if you want to 
>refine]
>
>Now this estimation might be unstable, especially since it cannot tell if
>that's the evolution of the situation that made the winning rate change,
>or the change in komi. I'm sure ou can find ways to stabilize. The
>easiest I can think of would be a formula like:
>new_C = .3 * C_formula_above + .7 old_C
>
>
>END OF THE DELIRIUM

:) 

Thanks.  I will try it later as I'm now rewriting my code radically
which will take so long.

# One question: where _aya_ comes from or stands for?  If my guess is 
correct, you are confusing Hiroshi, author of Aya, and I, Hideki, 
author of GGMC :).  I'm sorry if I'm wrong.

-Hideki
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/