I don't see the AGZ paper explain what the mean action-value Q(s,a) should
be for a node that hasn't been expanded yet. The equation for Q(s,a) has
the term 1/N(s,a) in it because it's supposed to average over N(s,a)
visits. But in this case N(s,a)=0 so that won't work.

Does anyone know how this is supposed to work? Or is it another detail AGZ
didn't spell out?
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to