Can you elaborate on the relation between the 'ELO curve' and the
improved evaluation functions? Is it that if you took this original
'brute force' program and ran it on current hardware, it would gain 250
rating points per ply, and the improved evaluation functions allow
modern programs to do even better than this?
Yes,  that's the basic idea.   Richard Lang, the programmer of chess genius,  once told me that it was amazing how much one of his programs improved over the years.   He was talking about a program he had not touched in many years. 

A lot of studies were done on the relationship between ELO strength and processing power.   For each doubling there appears to be a fixed increase in playing strength.   However, there has been a gradual decline - but it's very gradual.   Common sense tells you that there MUST be a gradual decline since the improvements cannot continue indefinitely.  At some point you will reach perfect play.   

In computer chess, programs are now clearly above the strength of the top players.   This is quite remarkable because just a couple of decades ago it was predicted that even master level chess by computers (several hundred ELO below the best players) would not happen in your lifetime.   

One way that people denied this,  was to claim that the increase only applied to computer/computer play.   You might gain 200 ELO against other computers, but against humans it was "almost nothing" is how this went.   Even though the ELO formula is an approximation of reality and makes some simplifying assumptions about how playing strength works (for instance it does not consider in-transitivities) it's not THAT broken.  Such nonsense implies that you can be hundreds of rating points stronger against one opponent than another.   Chess program are not that brittle - they play very human-like and when the are better they do everything better.   The final proof is that computers are clearly better than humans.  A large number of doubling's over the years translated to an enormous improvement in playing strength.   Humans no longer laugh in derision at the weak computer moves.  Now they cower in fear.  

There was at least one study that suggested that better evaluation functions didn't just lead to better play, but that it had a synergistic effect on the power of hardware - that a doubling is worth more in a "smart" program.  This study explored the ELO curve with 2 different version of the same program,  one with a dumbed down evaluation function and the other full strength.   However, I thought the study was very weak because there were not enough games played to draw significant conclusions.   As far as I know there has not been a more complete study to see if this is true.  

Over the years of course chess programs have improved, not just due to hardware advances.   The quality of the evaluation functions are very important - you cannot have a competitive chess program without a reasonably good evaluation function.  

One of the big advances in computer chess is the re-introduction of selectivity.   The very first chess programs were all about selectivity, but screwed it up pretty badly.    Then someone discovered that "brute force" search wasn't that bad after all!   What was called "brute force" back then was still somewhat selective,  a program still did several selective ply of capture search and check extensions were discovered very quickly.   If a move gave check a program was granted a free ply of lookahead for the rest of the sub-tree.    But still, these programs were called "brute force" because there was no specific pruning mechanism involved.

Strangely enough, when null move pruning came along, the term "brute force" started to become a dirty word.   You could insult a programmer by calling his program a brute force searcher.    The first null move programs were called "selective searchers"  but over a period of time the terminology changed and these same programs once again started to be called "brute force" searchers.   

The most selective program of all, for a while was a program called Fritz.  Fritz was known for it's blinding speed.  It reported enormous nodes per second and it's programmer was quite the engineer - a Lukasz Lew optimizer type.   So this program was not only incredibly selective, but it was blazing fast.   The primary method to achieve a lot of speed in chess is by the use of piece square tables - each piece has a fixed vaule depending on what square it's on - and this can be table driven.  This table can be pre-processed between moves but remains static during the search.   Fritz had this and the programmer often joked that the bottleneck in his program was those expensive memory references (referring to the piece square tables that made his program so fast.)    His program was so fast that you could almost believe this was true.   

The incredible thing about the Fritz story is that this program came to represent what "Brute Force" was all about - even though it was clearly more selective than most of the other programs - at least in it's early days.    It got this reputation because it was FAST, not because it was not selective.  Many people even found this offensive, because they felt it was not the right approach.   Even though Fritz was the highest rated program much of the time,  it took a lot of verbal abuse, probably because of it's status it was targeted.   One of the main complaints was that it played terrible chess but somehow always got away with it.   I was personally convinced that if the programmer (Franz) had understated the node count and level people would have believed it was a "smart" program, not a brute.   He could have subtracted 2 or 3 ply on the display and divided the node counts by 5 and people would have considered Fritz a triumph of AI.  At least that is my theory - I would lo
ve to have seen this conducted as an experiment in human psychology.   

Many programs get a reputation for being either "fast" or "smart" and the merits of both approaches were hotly debated, perhaps they still are.   In my opinion these discussions where very foolish, since ALL the best programs were both fast and smart.  Even though each program was different,  every good program had a fast search, excellent move ordering and good robust evaluation functions.   Some were a little smarter, some were a little faster.  The arguments were like a room-full of warthogs arguing about who is most beautiful.   

In recent years, programs have become even more selective.  Moves are pruned like crazy and not just based on the null move heuristic.  Many refinement have improved chess programs over the years and it's completely ridiculous to call any program today a "brute force" searcher.    I also think calling the computer chess approach the "brute force" approach shows your ignorance.   

Another myth that has been broken is that humans play close to perfect chess.  Not everyone believed this, but quite a few people did.  What is interesting now is that it's not interesting to play the top humans unless you give some kind of odds.   In GO terms, it would be like saying it's not interesting having the best computer play the top master unless the computer gives a couple of stones to the human player!

But even more incredible to me is that there does not seem to be a limit any time soon.   The ELO curve has not fallen off much so there is clearly a few hundred ELO to go.  This makes me think that whatever human GO experts estimate as the distance between them and perfect play - it's probably off by a few stones.  Of course the nature of GO may be such that humans actually play closer to perfect play than chess players do.   I don't know much about that.

- Don




Matthew Woodcraft wrote:
Don Dailey wrote:
  
Back then the ELO curve was not understood. The basic formula was
eventually found that 1 ply added 250 ELO rating points to a program.
    

[...]

  
At some point, some incredibly foolish programmers at Northwestern
University built a program and decided that it was too hard to do
selectivity. They kept noticing that when they threw out moves, they
would accidentally throw out good moves because a lot of terrible
looking moves are actually good moves. So in their ignorance they
build a program that didn't throw out moves and decided to live with
the consequences - a program that couldn't look very far ahead and
thus would be totally incapable of "long term planning." [...]
    

  
The big surprise was that this program dominated computer chess for a
period of time - until everyone got wise to their foolishness and
started copying them. Thus was born the age of "brute force" chess.
    

[...]

  
The best programs in the world HAD to have good evaluation functions
(which deflates another myth - that chess programs are nothing more
than a very simplistic evaluation function thrown together with a few
terms and that all the effort is in the search.) But the best programs
had to have very good searches too.
    


Can you elaborate on the relation between the 'ELO curve' and the
improved evaluation functions? Is it that if you took this original
'brute force' program and ran it on current hardware, it would gain 250
rating points per ply, and the improved evaluation functions allow
modern programs to do even better than this?

-M-

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

  
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to