On 22/08/12 15:16, nicolas.o...@gmail.com wrote:
You should replace your functions that computes the board by function
that does return 30 times the same board.
And evaluation function by something that returns a constant value.
And check : speed and speed-up for folding.
Then you will know for sure whether the slowness comes from the
explore and search or from
the evaluation and computation of moves.
Alternatively, use visualvm (comes with jdk) or any other profiler to
check where are the cost.
Ok, so I followed your advice and replaced the scoring-fn with (rand-int
10) and the next-level-fn with (repeat 30 (Move-Board blah blah...)) and
I have some interesting results....
First of all, experimentation showed that the best partitioning size is
1 unless it's only going to level 2 in which case it seems partitioning
with 2 is slightly better...anyway I'm not interested in only going to
level 2 so it doesn't matter.
As you reported 4-5 emails back, after some optimizations (mainly
'definline' and using reducers in my core ns as well) I managed to go to
level 4 in roughly 8sec, with the dummy fns of course! Now that I got my
baseline, starts the real experimentation...
Just looking at the 2 fns that are obviously the culprit, it is pretty
obvious that the one producing the next-boards is the most expensive of
the 2. so let's leave it last. for the moment let's bother with the one
that calculates the leaves (the scoring-fn).
--with the dummy scoring-fn (rand-int 10) : 8-9sec
--with the scoring-fn that counts the pieces and subtracts : 63-64sec
--with the scoring fn that counts their relative-value and subtracts :
83-84
the good thing about the dummy scoring-fn is that at the end i can
verify that it brought back the move with :value 9 so that is good news.
However, no matter how much i tried to tune this, it seems that just
counting the pieces is 7 times more expensive than generating random
ints!!! In addition, accessing the :value key of the pieces (they are
records) and subtracting their sums is an extra 20% more expensive!
These are the best times i can report - mind you, I started with 127 and
168 sec respectively...
Now, that we've established how much cheaper it is to generate random
ints let's move on to the serious bit. for the next experiment the
scoring-fn is locked (rand-ints) but I'm using the real next-level fn,
again after making some optimisations...
--with the dummy board generation (repeat blah blah), as we saw above it
takes 8-9 sec.
--with the real board generation it takes forever!!! I can't even
measure how much cos I can't wait that long.
trying with the 2 real fns does not make any sense at this point...it is
pretty clear that 'next level' is the culprit with regards to
performance. So here it is:
------------------------------------------------------------------------------------------------------------------------------------------------------------------
(defn next-level [b dir]
(r/map #(Move->Board. % (core/try-move %))
(core/team-moves @curr-game b dir))) ;;curr-game is a promise
(definline team-moves
[game b dir]
`(let [team# (gather-team ~b ~dir)
tmvs# (r/mapcat (fn [p#] (r/map #(dest->Move ~game p# %)
(getMoves p#))) team#)]
(into [] tmvs#)) )
(definline gather-team "Returns all the pieces with same direction dir
on this board b."
[b dir]
`(into [] (r/filter #(= ~dir (:direction %)) ~b))) ;all the team-mates
(with same direction)
(definline dest->Move "Helper fn for creating moves."
[dm p dest] `(Move. ~p (partial move ~dm) ~dest))
(defn move
"The function responsible for moving Pieces. Each piece knows how to
move itself. Returns the resulting board without making any state
changes. "
[game-map p coords]
;;{:pre [(satisfies? Piece p)]} ;safety comes first
;;(if (some #{coords} (:mappings game-map)) ;check that the position
exists on the grid
(let [newPiece (update-position p coords) ;the new piece as a result of
moving
old-pos (getListPosition p)
new-pos (getListPosition newPiece)] ;;piece is a record
(-> @(:board-atom game-map) ;deref the appropriate board atom
(transient)
(assoc! old-pos nil)
(assoc! new-pos newPiece)
(persistent!)
#_(populate-board))) ;replace dead-pieces with nils
#_(throw (IllegalStateException. (str coords " is NOT a valid position
according to the mappings provided!"))))
(defn collides? "Returns true if the move from [sx sy] to [ex ey]
collides with any friendly pieces.
The move will be walked step by step by the walker fn."
[[sx sy] [ex ey] walker b m dir]
(loop [[imm-x imm-y] (if (nil? walker) [ex ey] (walker [sx sy]))] ;if
walker is nil make one big step to the end (for the knight)
(cond
(= [ex ey] [imm-x imm-y]) ;if reached destination
(if (not= dir (:direction (b (translate-position ex ey m))))
false true)
(not (nil? (get b (translate-position imm-x imm-y m)))) true
:else (recur (walker [imm-x imm-y])))))
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
the only thing not shown here (unless i messed up) is getMoves which
basically goes into a special namespace where the core.logic code lives.
there is a separate fn for each piece that finds available moves.
however because the potential moves are calculated by core.logic engine
on an empty board. the rules contain only the logical restrictions of
chess - they don't interfere with the actual game played at any given
time. For this reason each move has to be 'walked' manually to check if
it collides with any pieces on the actual board we're playing now. this
is what 'collides?' does and so I'm removing every move that succeeds in
colliding...
Now, can you see any place in these fns that 'next-level' depends on
that can be optimised any further? I don't think it is reasonable to
take that long...calling (next-level (start-chess!) -1) once where
start-chess returns the starting board takes just over 150 ms and this
is only going to be called 4 times until level 4! IT does not justify
that much delay...
I'm really confused and visualvm confused me even more...my app reaches
29,000 objects at its peak! the memory profiler says my memory is
dominated by core.logic.Substitutions objects (close to 34%)...Thread
peak is 12 if I remember correctly (quad-core cpu).
Have I hit the limit? I don't want to think that core.logic is to
blame...after all, I went through rough times in order to encode the
rules in core.logic and I thought i would have performance benefits as
well (apart from clarity)...
Jim
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en