> But those video games have a very simple optimal policy. Consider Super > Mario: > if you see an enemy, step on it; if you see a whole, jump over it; if you see > a > pipe sticking up, also jump over it; etc.
A bit like go? If you see an unsettled group, make it live. If you have a ko, play a ko threat. If you see have two 1-eye groups near each other, join them together. :-) Okay, those could be considered higher-level concepts, but I still thought it was impressive to learn to play arcade games with no hints at all. Darren > > On Sat, Feb 25, 2017 at 12:36 AM, Darren Cook <dar...@dcook.org > <mailto:dar...@dcook.org>> wrote: > > > ...if it is hard to have "the good starting point" such as a trained > > policy from human expert game records, what is a way to devise one. > > My first thought was to look at the DeepMind research on learning to > play video games (which I think either pre-dates the AlphaGo research, > or was done in parallel with it): https://deepmind.com/research/dqn/ > <https://deepmind.com/research/dqn/> > > It just learns from trial and error, no expert game records: > > > http://www.theverge.com/2016/6/9/11893002/google-ai-deepmind-atari-montezumas-revenge > > <http://www.theverge.com/2016/6/9/11893002/google-ai-deepmind-atari-montezumas-revenge> > _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go