[Computer-go] Value network that doesn't want to learn.

Vincent Richard Mon, 19 Jun 2017 08:52:44 -0700

Hello everyone,

For my master thesis, I have built an AI that has a strategical approachto the game. It doesn’t play but simply describe the strategy behind allpossible move for a given strategy ("enclosing this group", "making lifefor this group", "saving these stones", etc). My main idea is that onceassociated with a playing AI, I will be able to generate comments on aposition (and then teach people). So for my final experiment, I’m tryingto build a playing AI. I don’t want it to be highly competitive, I justneed it to be decent (1d or so), so I thought about using a policynetwork, a value network and a simple MCTS. The MCTS works fine, thepolicy network learns quickly and is accurate, but the value networkseems to never learn, even the slightest.

During my research, I’ve trained a lot of different networks, first on9x9 then on 19x19, and as far as I remember all the nets I’ve workedwith learned quickly (especially during the first batches), except thevalue net which has always been problematic (diverge easily, doesn'tlearn quickly,...) . I have been stuck on the 19x19 value network for acouple months now. I’ve tried countless of inputs (feature planes) andlots of different models, even using the exact same code as others. Yet,whatever I try, the loss value doesn’t move an inch and accuracy staysat 50% (even after days of training). I've tried to change the learningrate (increase/decrease), it doesn't change. However, if I feed a stupidvalue as target output (for example black always win) it has no troublelearning.It is even more frustrating that training any other kind of network(predicting next move, territory,...) goes smoothly and fast.

Has anyone experienced a similar problem with value networks or has anidea of the cause?


Thank you
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Value network that doesn't want to learn.

Reply via email to