Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Andy
I made a pull request to Leela, and put some data in there. It shows the details of how Q is initialized are actually important: https://github.com/gcp/leela-zero/pull/238 2017-12-03 19:56 GMT-06:00 Álvaro Begué : > You are asking about the selection of the move that goes to a leaf. When > the n

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Álvaro Begué
You are asking about the selection of the move that goes to a leaf. When the node before the move was expanded (in a previous playout), the value of Q(s,a) for that move was initialized to 0. The UCB-style formula they use in the tree part of the playout is such that the first few visits will foll

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Andy
Álvaro, you are quoting from "Expand and evaluate (Figure 2b)". But my question is about the section before that "Select (Figure 2a)". So the node has not been expanded+initialized. As Brian Lee mentioned, his MuGo uses the parent's value, which assumes without further information the value should

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Álvaro Begué
n't see the AGZ paper explain what the mean action-value Q(s,a) should >> be for a node that hasn't been expanded yet. The equation for Q(s,a) has >> the term 1/N(s,a) in it because it's supposed to average over N(s,a) >> visits. Bu

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Brian Lee
> > Does anyone know how this is supposed to work? Or is it another detail AGZ > didn't spell out? > -- next part -- > An HTML attachment was scrubbed... > URL: < > http://computer-go.org/pipermail/computer-go/attachments/20171203/8fc94bcd/attachment

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Álvaro Begué
The text in the appendix has the answer, in a paragraph titled "Expand and evaluate (Fig. 2b)": "[...] The leaf node is expanded and and each edge (s_t, a) is initialized to {N(s_t, a) = 0, W(s_t, a) = 0, Q(s_t, a) = 0, P(s_t, a) = p_a}; [...]" On Sun, Dec 3, 2017 at 11:27 AM, Andy wrote: >

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Rémi Coulom
They have a Q(s,a) term in their node-selection formula, but they don't tell what value they give to an action that has not yet been visited. Maybe Aja can tell us. - Mail original - De: "Álvaro Begué" À: "computer-go" Envoyé: Dimanche 3 Décembre 2017 16:44:00 Objet: Re: [Computer-go]

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Andy
Figure 2a shows two bolded Q+U max values. The second one is going to a leaf that doesn't exist yet, i.e. not expanded yet. Where do they get that Q value from? The associated text doesn't clarify the situation: "Figure 2: Monte-Carlo tree search in AlphaGo Zero. a Each simulation traverses the tr

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Álvaro Begué
I am not sure where in the paper you think they use Q(s,a) for a node s that hasn't been expanded yet. Q(s,a) is a property of an edge of the graph. At a leaf they only use the `value' output of the neural network. If this doesn't match your understanding of the paper, please point to the specific

[Computer-go] action-value Q for unexpanded nodes

2017-12-03 Thread Andy
I don't see the AGZ paper explain what the mean action-value Q(s,a) should be for a node that hasn't been expanded yet. The equation for Q(s,a) has the term 1/N(s,a) in it because it's supposed to average over N(s,a) visits. But in this case N(s,a)=0 so that won't work. Does anyone know how this i

Re: [Computer-go] Significance of resignation in AGZ

2017-12-03 Thread Brian Sheppard via Computer-go
I have been interested in a different approach, and it had some elements in common with AGZ, so AGZ gave me the confidence to try it. From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Chaz G. Sent: Sunday, December 3, 2017 4:05 AM To: computer-go@computer-go.org Subj

[Computer-go] What happens if you only feed the current board position to AGZ?

2017-12-03 Thread Imran Hendley
AlphaGo Zero's Neural Network takes a 19x19x17 input representing the current and 15 previous board positons, and the side to play. What if you were to only give it the current board position and side to play, and you handled all illegal ko moves only in the tree? So obviously the network cannot d

Re: [Computer-go] Significance of resignation in AGZ

2017-12-03 Thread Chaz G.
Hi Brian, Thanks for sharing your genuinely interesting result. One question though: why would you train on a non-"zero" program? Do you think your program as a result of your rules would perform better than zero, or is it imitating the best known algorithm inconvenient for your purposes? Best, -