On Dec 12, 2007 3:31 PM, Jason House <[EMAIL PROTECTED]> wrote: > > > On Dec 12, 2007 3:09 PM, Álvaro Begué <[EMAIL PROTECTED]> wrote: > > > > > > > On Dec 12, 2007 3:05 PM, Jason House <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > > > On Dec 12, 2007 2:59 PM, Rémi Coulom <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > Do you mean a plot of the prediction rate with only the > > > > > gamma of interest varying? > > > > > > > > No the prediction rate, but the probability of the training data. > > > > More > > > > precisely, the logarithm of that probability. > > > > > > > > > I still don't know what you mean by this. > > > > > > > He probably should use the word "likelihood" instead of "probability". > > http://en.wikipedia.org/wiki/Likelihood_function > > > > Clearly I'm missing something, because I still don't understand. Let's > take a simple example of a move is on the 3rd line and has a gamma value of > 1.75. What is the equation or sequence of discrete values that I can take > the derivative of? >
We start with a database of games, and we are trying to find a set of gamma values. For a given set of gamma values, we can compute the probability of all the moves happening exactly as they happened in the database. So if the first move is E4 and we had E4 as having a probability of 0.005, we start with that, then we take the next move, and multiply 0.005 by the probability of the second move, etc. By the end of the database, we'll have some number like 3.523E-9308 which is the probability of all of the moves in the database happening. This is the probability of the database if it had been generated by a random process following the probability distributions modeled by the set gamma values. You can see this as a function of the gamma values. This function is usually called "likelihood function". In order to pick the best gammas, we choose the ones with the maximum likelihood. Sometimes we use the logarithm of the likelihood instead, which has the interpretation of being "minus the amount of information in the database", plus it's not a number with gazillion 0s after the decimal point. Now, around the point where the maximum likelihood happens, you can try to move one of the gammas and see how much it hurts the likelihood. For some features it will hurt a lot, which means that the value has to be very close to the one you computed, or you'll get a bad model, and for some features it will hurt very little, which means that there are other settings of the value that are sort of equivalent. The second derivative of the likelihood (or the log of the likelihood, I don't think it should matter much), will tell you how narrow a peak you are at. Does that make some sense?
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/