Hi!
I have an idea have to improve RAVE, but this is still rough.
So, I want to describe it here in the hope it lead to some
interesting discussion. I hope my not-so-good English
allows me to describe the idea adequately.

If you don't want to read all the details, you can first scroll down
to  "Use cases" to read when this improvement may be useful.


--- Current RAVE statistics implementation ---

Let's say, we want to accumulate RAVE statistics. Let's say
we have four arrays for this: black_rave_total[], black_rave_wins[],
white_rave_total[], white_rave_wins[].

We increment black_rave_total[intersection] += 1
if, during last simulation, black was first who play on this intersection.

We increment black_rave_win[intersection] += 1
if, during last simulation, black was first who play on this intersection,
and the simulation result is "black win".

--- New proposal ---

What if we add two arrays: black_rave_win_move_sum[]
and white_rave_win_move_sum[].
These arrays will accumulates sum of move numbers when black
was first who play on the intersection and black win in this simulation.

Concrete example:

That is, let's say we already done ten simulations for current node.
In six simulations, black was first who play on B4 intersection.
In three of this simulations black win.

In first of this three simulations, move number when black
play on B4 was 20 (this move number is counted from the
start of random simulation, not from the start of the game)

In second of three simulation, the move number for B4 was 25.
In the third simulation where black play on B4 and win,
the move number was 72.

In this case, black_rave_win_move_sum for B4 will be
20 + 25 + 72 = 117

This number allows us to calculate average move number
for B4 when simulation result was successful for black:
117 / 3 = 39.

I denote this as black_rave_avg_win_move_num:
black_rave_avg_win_move_num[pos] =
black_rave_win_move_sum[pos] / black_rave_wins[pos]

In current RAVE, we use winrate to determine the "best" move:
black_rave_winrate[pos] = black_rave_wins[pos] / black_rave_total[pos]

I propose to use "weighted winrate" instead:
black_weighed_rave_winrate[pos] =
black_rave_winrate[pos] / black_rave_avg_win_move_num[pos]

In current example, winrate for B4 is 3/6 = 0.5
weighted winrate will be (3/6) / (117/3) = 0.0128205

Weighted winrate will be bigger for successful moves which must played
ealier
during simulation. Good endgame moves will have low weighted winrate.


--- Use cases ---

1) Let's say we have two moves with good RAVE winrate: E5 and A4.
A4 have bigger winrate, because A4 is inside safe territory, and each
successful simulation have A4. E5 is critical, and must be played
very early for result to be successful. Each simulation with E5 also
have A4, but some simulations without E5 were also successful
because of dumb opponent play during simulation.

So, A4 have bigger RAVE winrate. But E5 have bigger
weighted winrate, because A4 can be played at any time during
simulation, and E5 must be played early, or it will be useless.

With using of weighted RAVE winrate we can determine that
E5 is more important then A4, despite the fact A4 have bigger
RAVE winrate.

2) Let's say black must do three moves during simulation
in order to win - B2, B3, C3, exactly in this order. Without
this moves black cannot win the simulation.

All of this moves have the same winrate, because the
simulation is successful for black only if all three moves
are played during simulations.

So, if we use simple RAVE winrate, we can have problems
with determination of correct move order.

But B2 have bigger weighted winrate then B3 and C3,
(and B3 have bigger weighted winrate then C3), because
in all successful simulations B2 played before B3,
and hence average move number for B2 is strictly less
then average move number for B3 and C3. So, when using
weighted winrate, we can determine correct move order.


What do you think, am I missing something?
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to