Rémi,

The idea of a non one dimension rating model is interesting.  If you
decide to pursue this I can give you the CGOS data in a compact format, 
1 line per result. 

I thought of this idea too, but I didn't try to produce a model.    It
would be easier to test and build such a model however if you
synthesized them artificially.      You could purposely build a
rocks/scissors/paper style into a dozen different players or more.  
Randomly give them different strength (using straightforward ELO
ratings) but also give them one of 3 playing styles (rock, scissors,
paper) in which their actual performance against a given opponent was
bumped up or down 100 ELO or so depending on whether they had
conflicting styles.     So a "paper" might still beat a "scissors", but 
it would be more difficult than their base elo ratings would suggest.  

Then you can play hundreds of thousands of simulated games in just
seconds and generate data and see if your model can predict the results
reliably. 

Another approach I thought of is to take a very simple game (such as
tic-tac-toe) and create many players that play by simple rules but where
significant transitivities might exist.    You would not want the rules
to be deterministic or the games would all play the same, but the rules
could be probabilistic. 

It would be remarkable if you could capture strength characteristics
with just 2 or 3 numbers instead of one.   I would guess that 2 numbers
might be far more accurate than 1,  but with quickly diminishing returns
for additional parameters.     Of course it might require a huge amount
of data in order to "zero in" on a players characteristics statistically.

- Don




Rémi Coulom wrote:
> David Fotland wrote:
>> The styles of CS (CS-9-17-10k-1CPU), MFGO (mfgo12exp-15), and GNUGO
>> (gnugo3.7.10_10) are different, and it's generating some odd results.
>>
>> Many Faces beats GnuGo 70%.  There are not many games, but this is
>> consistent with over 100 test games I've run.
>> CS beats GnuGo 55%.  Over 100 games played.
>> CS beats Many Faces 90%.  Only 20 games, but consistent with earlier
>> results.
>>
>> If we look at results against GnuGo, Many Faces seems stronger than
>> CS, but
>> in games against CS, Many Faces is much weaker.
>>
>> Many Faces plays a fighting style, and CS plays a territorial style,
>> but I'm
>> still surprised at the difference.
>>
>> David
>>
>> _______________________________________________
>> computer-go mailing list
>> computer-go@computer-go.org
>> http://www.computer-go.org/mailman/listinfo/computer-go/
>>   
>
> I noticed that too. My feeling is that is because MF is a classical
> program with a global search, GNU a classical program with no global
> search, and Crazy Stone a MC program. MF beats GNU thanks to global
> search. But MF's strength without the global search (whatever that
> would mean) is inferior to that of GNU. CS also has a global search,
> so MF's global-search advantage does not work against CS.
>
> I guess that KCC Igo had the same problem as MF against Crazy Stone.
>
> I thought about a model for multi-dimensional Elo ratings once (don't
> give only one value to each player, but two or three, with an
> appropriate formula for predicting game outcome). Maybe I'll try it on
> CGOS data when I have time. This would not rate players along a
> one-dimensional line. Here is a reference to a similar idea:
>
> http://dx.doi.org/10.1016/j.jspi.2004.05.008
>
>
>      Abstract
>
> The Bradley–Terry model is widely and often beneficially used to rank
> objects from paired comparisons. The underlying assumption that makes
> ranking possible is the existence of a latent linear scale of merit or
> equivalently of a kind of transitiveness of the preference. However,
> in some situations such as sensory comparisons of products, this
> assumption can be unrealistic. In these contexts, although the
> Bradley–Terry model appears to be significantly interesting, the
> linear ranking does not make sense. Our aim is to propose a
> 2-dimensional extension of the Bradley–Terry model that accounts for
> interactions between the compared objects. From a methodological point
> of view, this proposition can be seen as a multidimensional scaling
> approach in the context of a logistic model for binomial data. Maximum
> likelihood is investigated and asymptotic properties are derived in
> order to construct confidence ellipses on the diagram of the
> 2-dimensional scores. It is shown by an illustrative example based on
> real sensory data on how to use the 2-dimensional model to inspect the
> lack-of-fit of the Bradley–Terry model.
>
> Rémi
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to