Re: [computer-go] Way MC plays
Don Dailey wrote: I personally have serious doubts about knowledge extraction from human games, but I hope you have success.I think you can get more from computer games of strong players even though the level is weaker. Here is why I say that: 1. A strong computer still plays a lot of good moves - so the delta between human and computer games is not as high as you think. 2. A certain consistency in computer games that humans don't possess. 3. You have access to the internals, such as a score that quantifies moves. My idea combines offline knowledge with MC methods. So there is a lot of online computer generated knowledge. I don't pretend to make a dumb savant program, just playing the only move it finds in a joseki database. But if the program considers the joseki stored in (state, action[i-1]) -> action[i] records. It will immediately start searching a tree whose first move at all depths is the joseki sequence. If the joseki depends on a ladder, it will find that ladder much earlier and if it is not good it will hopefully play something else. At blitz time settings, playing the wrong joseki is bad but not terrible. That would be about the level of a human dan making a mistake in a blitz game, still better than current programs. In this context, human knowledge produces a priori values for UCT. (The exact formula of the MC tree search is not yet decided.) There is enough computer generated knowledge in the search. When the end of the game is still 200+ moves ahead, the search alone does not find enough difference between good and not so good moves. The knowledge, I think, makes sense to extract is: 1. (state, last move) -> (next moves list) records (joseki or places to play when the last move is somewhere else.) 2. urgency of shapes 3. distribution of statistics (e.g. proportion of times tenuki is played at move n) that helps making playouts more humanlike while still fast. I finally abandoned full board Bradley-Terry scored random playouts. I draw a random number to decide if tenuki should be played, if true I invent a random move and use it as if it was the previous move. Else, I play a Bradley-Terry scored random move in the 40 neighbors of the last move. That is almost immediate in my hologram board system because I store the mask. On other implementations it may not be so good. About urgency of shapes: The urgency of a BT adjusted 40 neighbors shape hits 40% in a 10 M+ sample with about 160 K patterns. (Overlearning should not be very important given the lack of degrees of liberty.) It hits 66% with the first 5 moves. Hits of move 1 = 0.402048 acc = 0.402048 Hits of move 2 = 0.117102 acc = 0.519151 Hits of move 3 = 0.065450 acc = 0.584600 Hits of move 4 = 0.045592 acc = 0.630193 Hits of move 5 = 0.035190 acc = 0.665382 That sounds great news. But there is bad news of course. That doesn't go any further. To reach 80% it needs 14 moves Hits of move 14 = 0.006206 acc = 0.801095 And to reach 90% it needs 96 moves! Hits of move 96 = 0.001323 acc = 0.900994 Shape is a factor that explains, perhaps 2/3 of the moves (shape alone about 40%, but shape "in the right place" a little more than 66%). Not less and _not more_ than that. It is naive to think that the 12th move in terms of shape is better than the 50th. But, imagine there are 250 legal moves, the 248th is a really bad move. Can shape make a difference? I don't know. If its slow, surely not. If it is almost free in terms of performance as in my board system, I hope it does. Jacques. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] f(score) instead of sign(score)
> http://ewh.ieee.org/cmte/cis/mtsc/ieeecis/tutorial2007/Bruno_Bouzy_2007.pdf > > Page 89, "which kind of outcome". This method is better than the above > and similar to what Jonas seems to propose. The improvement is minor. By looking at their proposal (45 * win + score), in contrast to mine, there is no scaling with 1/n. Yet they get an improvement, with something in fact of the order 1/\sqrt{n}. But I think the latter does not play a role. Why ? First hypothesis: misleading data. Second hypothesis: better evaluation function, if we had the ``exact'' expectation of the evaluation function under the MC runs. Here is an interesting argument towards the following: - Even with many simulations, it is likely that all these simulations are different by move 50. Suppose we had an oracle who, after 50 moves in the MC simulation, told if the game is a win or a loss under perfect play. Stopping the simulation here and taking this result would probably be a better evaluation function (discussion below). We could then wonder how best to approach this evaluation function by what we know, that is the score at the end of the MC runs. Then taking the same value for a half point victory or a 60 points victory is ludicrous: MC is not perfect play, we have an uncertainty depending on how far we are in the game and on the position. But a 60 points victory would probably mean that we had a victory after the 50 first moves of the simulation. Under this hypothesis, the evaluation of the node should be \int_{-score}^{\infty} Phi(x) dx, where Phi(x) is the probability that the MC score is off the perfect play by (+x) points. - Now this Phi could be evaluated by looking at typical results of many simulations from the same position (assuming the mean or the median is perfection); alternatively, it could be practically evaluated from the simulations in the node before. - BUT this evaluation function would not be efficient: I have not access to any data, but I think that the MC simulations must have a big variance. That would give too narrow a margin to victory, I guess. - Why should this evaluation with oracle be better ? Well, that's the true minimax evaluation of the position from move 50, same as counting at the end. What we have inbetween is noise. Making better playouts can result in worse play because it is more deterministic, so that the evaluation is less robust (could always make the same mistake), but here it IS robust: we have said perfect play, no mistake. If we had that at the root of the tree, that would be God playing. However, as I said, we arrive at a hardly tenable conclusion for the practical evaluation function. Why? Probably, the RIGHT evaluation function is not victory or defeat, it's: how likely is the program to win from this position (say against itself) ? Indeed, since we have random players, it even has a precise meaning. Hence, non-integer values could make sense. Now, since programs with tree search play better than their MC simulations, we would have less variance in the outcome from this infamous move 50, and hence, we would also have a convolution. We would then not use 0-1, but not anything too broad either. Something like (45 * win + score) is reasonable, around 0 in particular (it should saturate for high scores). - In any case, this would also mean that the bonus for the win should be made higher at yose. Indeed, the nearer the end, the more accurate the MC scoring is (for one simulation) and hence the less we must convolve. And this is not having a spoilt child. That's explicitely trying to imitate a better evaluation function, that we cannot know directly. Jonas ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: computer-go Digest, Vol 43, Issue 8
On Mon, Feb 18, 2008 at 09:03:17PM +0100, Alain Baeckeroot wrote: > Le lundi 18 février 2008, Michael Williams a écrit : > > But as was pointed out before, these high levels of MoGo are probably still > > not pro level, right? > > > > On 9x9 Big_slow_Mogo is near pro level, maybe more. > 6 monthes ago or so it was able to regurlarly beat pro without komi on 9x9. But no komi on 9x9 is quite a handicap. It would be quite interesting to see how well the high level Mogos fare against high dans unhandicapped. -- Petr "Pasky" Baudis Whatever you can do, or dream you can, begin it. Boldness has genius, power, and magic in it.-- J. W. von Goethe ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Should 9x9 komi be 8.0 ?]
> delta_komi = 10^(K * (number_of_empty_points / 400 - 1)), > where K is 1 if winnig and is 2 if loosing. Also, if expected > winning rate is between 45% and 65%, Komi is unmodified. There's one thing I don't like at all, there: you could get positive evaluation when losing, and hence play conservatively when you should not. That could especially be true near the end, when the situation drifted from bad to less bad. I think that if this happens you're doomed... The other way around would probably be painful too. After all, the aim of tinkering with komi is to avoid that the computer plays nonsensical moves, but it should know whether he must fight or calm down. Here is a suggestion: give a new komi at each move (not a drift, though it would probably be a good idea that you *upper bound the komi change*, maybe by ten times the delta_komi you use now). For choosing it, punish the fact of being away from true komi proportionnally to | komi_aya - true_komi |. Call lambda the proportionnality constant. Punish the winning rate through (winning_rate - .5)^2. Total loss function is then: loss = lambda * |komi_aya - true_komi| + (winning_rate - .5)^2 Suppose the change in winning rate is proportional to change in komi, that is take as hypothesis: winning_rate(komi_aya) = winning_rate(normal_komi) + C (komi_aya - normal_komi) REMARK: C is negative ! Take the change in komi that minimizes the punishment (a real), and truncate it to get an integer (truncation also means you have a tendancy to have delta_komi near 0). If you knew beforehand the winning rates for each komi, this would ensure you generally have a winning rate on the right side of .5, while it would not move komi if winning rate is somewhat near .5. Solving the degree two equation yields: change_in_winning_rate_by_point_komi_change = new_komi_aya = truncation[(.5 - old_winning_rate) + C * old_komi_aya - lambda / (2 * C)]^{+} if the estimated winning rate with true komi is more than .5, and new_komi_aya = truncation[(.5 - old_winning_rate) + C * old_komi_aya + lambda / (2 * C)]^{-} ^^ if the estimated winning rate with true komi is less than .5. The estimated winning rate with true komi is given by winning_rate - C old_komi_aya. You could take lambda = K / (1 - number_of_empty_points / 400) if you want to be nearer true komi during yose. Less human-like, but less error-prone ! Now that's probably not necessary since the constant C is probably much bigger when entering yose. The problem here in this formula is to find a good C, since this constant change during the game. Here is an idea from the data in the game: use the last time the komi was changed and take: C = (winning_rate_after_last_change - winning_rate_before_last_change) / (komi_aya_after_last_change - komi_aya_before_last_change) [* f(number_empty_points_last_change, number_empty_points_now) if you want to refine] Now this estimation might be unstable, especially since it cannot tell if that's the evolution of the situation that made the winning rate change, or the change in komi. I'm sure ou can find ways to stabilize. The easiest I can think of would be a formula like: new_C = .3 * C_formula_above + .7 old_C END OF THE DELIRIUM Jonas ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [EMAIL PROTECTED]: Re: [computer-go] Should 9x9 komi be 8.0 ?]
On Thu, Feb 28, 2008 at 01:01:08PM -0500, Don Dailey wrote: > It's naive to think some simplistic deception imposed upon the program > is going to correct the error when you don't even know if the program is > erring in the first place. How can you say, "the program thinks it > is losing, but it MUST be in error and I'm going to change the komi so > that it will 'play better'?"You must have some basis for knowing the > program is in error before you can lower it's goal. I think there is a misunderstanding here, this was (I suppose) never to correct error in program's perception of the board. (From watching MoGo and such, it often overestimates its board position, but rarely underestimates it. :-) The point here is to prevent the program from playing the "MC-hamete" moves that in most cases have no hope of working, but instead still aim at a close game and wait for some opponent's yose mistake. This closely matches human approach to the game as well - if you are confident in yose and see you are only little behind, you frequently don't start overplaying hopelessly, but instead still play the best you can and hope for an overturn in yose. -- Petr "Pasky" Baudis Whatever you can do, or dream you can, begin it. Boldness has genius, power, and magic in it.-- J. W. von Goethe ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Should 9x9 komi be 8.0 ?]
[EMAIL PROTECTED]: <[EMAIL PROTECTED]>: >> delta_komi = 10^(K * (number_of_empty_points / 400 - 1)), >> where K is 1 if winnig and is 2 if loosing. Also, if expected >> winning rate is between 45% and 65%, Komi is unmodified. > >There's one thing I don't like at all, there: you could get positive >evaluation when losing, and hence play conservatively when you should >not. That could especially be true near the end, when the situation >drifted from bad to less bad. I think that if this happens you're >doomed... The other way around would probably be painful too. From my observaion, mc chooses good moves if and only if the winning rate is near 50%. Once it gets loosing, it plays bad moves. Surely it's an illusion but it helps to prevent them. >After all, the aim of tinkering with komi is to avoid that the computer >plays nonsensical moves, but it should know whether he must fight or >calm down. Agree. So, it's important _when_ adjust komi or apply my method. My object is to keep winning rate around 50%, which yields good moves. >Here is a suggestion: give a new komi at each move (not a drift, though >it would probably be a good idea that you *upper bound the komi change*, >maybe by ten times the delta_komi you use now). >For choosing it, punish the fact of being away from true komi >proportionnally to | komi_aya - true_komi |. Call lambda the >proportionnality constant. >Punish the winning rate through (winning_rate - .5)^2. >Total loss function is then: >loss = lambda * |komi_aya - true_komi| + (winning_rate - .5)^2 > >Suppose the change in winning rate is proportional to change in komi, >that is take as hypothesis: >winning_rate(komi_aya) = winning_rate(normal_komi) + C (komi_aya - >normal_komi) > >REMARK: C is negative ! > >Take the change in komi that minimizes the punishment (a real), and >truncate it to get an integer (truncation also means you have a tendancy >to have delta_komi near 0). > >If you knew beforehand the winning rates for each komi, this would >ensure you generally have a winning rate on the right side of .5, while >it would not move komi if winning rate is somewhat near .5. > >Solving the degree two equation yields: >change_in_winning_rate_by_point_komi_change = > >new_komi_aya = >truncation[(.5 - old_winning_rate) + >C * old_komi_aya - lambda / (2 * C)]^{+} > >if the estimated winning rate with true komi is more than .5, and > >new_komi_aya = >truncation[(.5 - old_winning_rate) + >C * old_komi_aya + lambda / (2 * C)]^{-} > ^^ >if the estimated winning rate with true komi is less than .5. > > >The estimated winning rate with true komi is given by >winning_rate - C old_komi_aya. > >You could take >lambda = K / (1 - number_of_empty_points / 400) >if you want to be nearer true komi during yose. Less human-like, but >less error-prone ! >Now that's probably not necessary since the constant C is probably much >bigger when entering yose. > > >The problem here in this formula is to find a good C, since this >constant change during the game. >Here is an idea from the data in the game: use the last time the komi >was changed and take: >C = >(winning_rate_after_last_change - winning_rate_before_last_change) / >(komi_aya_after_last_change - komi_aya_before_last_change) >[* f(number_empty_points_last_change, number_empty_points_now) if you want to >refine] > >Now this estimation might be unstable, especially since it cannot tell if >that's the evolution of the situation that made the winning rate change, >or the change in komi. I'm sure ou can find ways to stabilize. The >easiest I can think of would be a formula like: >new_C = .3 * C_formula_above + .7 old_C > > >END OF THE DELIRIUM :) Thanks. I will try it later as I'm now rewriting my code radically which will take so long. # One question: where _aya_ comes from or stands for? If my guess is correct, you are confusing Hiroshi, author of Aya, and I, Hideki, author of GGMC :). I'm sorry if I'm wrong. -Hideki -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/