Hi everyone. It occurs to me there might be a more efficient method to train the value network directly (without using the policy network).
You are welcome to check my method: http://withablink.com/GoValueFunction.pdf Let me know if there is any silly mistakes :)
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go