Hello everyone,
For my master thesis, I have built an AI that has a strategical approach
to the game. It doesn’t play but simply describe the strategy behind all
possible move for a given strategy ("enclosing this group", "making life
for this group", "saving these stones", etc). My main idea is that once
associated with a playing AI, I will be able to generate comments on a
position (and then teach people). So for my final experiment, I’m trying
to build a playing AI. I don’t want it to be highly competitive, I just
need it to be decent (1d or so), so I thought about using a policy
network, a value network and a simple MCTS. The MCTS works fine, the
policy network learns quickly and is accurate, but the value network
seems to never learn, even the slightest.
During my research, I’ve trained a lot of different networks, first on
9x9 then on 19x19, and as far as I remember all the nets I’ve worked
with learned quickly (especially during the first batches), except the
value net which has always been problematic (diverge easily, doesn't
learn quickly,...) . I have been stuck on the 19x19 value network for a
couple months now. I’ve tried countless of inputs (feature planes) and
lots of different models, even using the exact same code as others. Yet,
whatever I try, the loss value doesn’t move an inch and accuracy stays
at 50% (even after days of training). I've tried to change the learning
rate (increase/decrease), it doesn't change. However, if I feed a stupid
value as target output (for example black always win) it has no trouble
learning.
It is even more frustrating that training any other kind of network
(predicting next move, territory,...) goes smoothly and fast.
Has anyone experienced a similar problem with value networks or has an
idea of the cause?
Thank you
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go