Thanks for the detailed explanation of the paper. Would it make sense to vary the number of moves generated by the classifier as you run more playouts? Have you tried this? It seems like the classifier would return garbage initially and slowly give better moves deeper down the sequence, analogous to descending the tree in MCTS.
You mentioned that adding more than two previous moves as (linearly independent) input terms does worse. What happens when you start combining moves into a single feature? have you tried just one feature with a 1 at each of the two previous move locations? Or a 1 and a c<1? Or what about using this as a third term, like y[i] = w1[i]*m1 + w2[i]*m2 + w12[i]+m12 + b[i]? In the paper you say you only consider local moves, which is natural because your input vectors represent the last two moves, which we already know are very important for predicting local moves. What steps can we take to try and learn from other features of the game? One way to add patterns to the classifier might be to have input vectors for 3x3 patterns. Instead of a 1 at the location of all the stones in the 3x3 pattern you could have some small value, and zero elsewhere. So the output for some square would look like y[i] = w1[i]*m1 + w2[i]*m2 + w3[i]*p[i]. Or maybe you don't even need the m1 and m2 terms for non-local moves. You could add other types of features too (atari, capture, extend, etc.) by putting small values in input vectors. And this is where offline learning from game records could come in handy, for initializing the p[i]'s, etc.
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
