I think I being was a bit unclear here. By UCT I meant the combination of 
UCT+Simulation. I am just curious why simulation based methods of evaluation 
are thus far found to be superior (are they on 19x19?) to traditional bots 
which have a different form of evaluation function for a given position. It 
would seem that simulation based methods of arriving at a scoring function for 
the current position (playing out to the end) might be quite inefficient 
especially on 19x19 where it seems to me on average the game lasts about 
400-450 moves. Heavy playouts from this viewpoint are an attempt to improve the 
evaluation function over the uniformly random "light" playout. 

As for UCT itself perhaps I dont still completely understand this algorithm but 
is a full episode or simulation required for it's use? or are there related 
family of algorithms which operate on trees like UCT does but grows them 
asymmetrically given some confidence bounds on the values returned by the 
heuristical positional evaluation function?

Perhaps my understanding of Mogo from the thesis is incorrect. From a certain 
standpoint it makes very limited usage of heuristics and seems to rely solely 
several published details (in the thesis):

1) UCT+simulation

2) learned pattern weights via self-play using TD(lambda). 

3) Proximity heuristics. (which is something I do not quite understand on a 
deep level as to why it improves play).

4) RAVE knowledge recycling between trees.

5) The dragon heuristic.

The first two can be viewed as online and offline learning. 3) & 5) to me 
incorporate some domain knowledge and 4) perhaps some but it seems to perhaps 
work in games which have combinatorial properties and perhaps is more broadly 
applicable.

Obviously there are probably alot of unpublished details. Such as the use of a 
simple ladder search (as you indicated your program Valkyria does) and perhaps 
many other enhancements which we do not know about. But from Sylvain's 
description it seems that the amount of domain knowledge is limited and that 
the statistical learning procedures dominate is this a 
misinterpretation/misrepresentation? 


      
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to