Re: Style, Efficiency, and updating nested structure

Michael Gardner Wed, 11 Nov 2015 15:13:56 -0800

- Worrying about the performance of a small, pure function like this is almost 
certainly premature optimization.


- Avoid concurrency constructs like atoms if you don't need them.

- Have you considered using group-by?

> On Nov 11, 2015, at 13:25, Dave Tenny <dave.te...@gmail.com> wrote:
> 
> A colleague and I are debating various things clojure as we were exploring 
> alternative ways to solve a problem.
> 
> Here's the description of the problem that a particular function is trying to 
> solve, and the first implementation of it.
> 
> (defn update-roles 
>   "Given a vector of maps of the form {:project_id N :project_name S 
> :project_roles [...roles...]}
>   if there is already a map for the indicated project id/name, add new-role 
> to it and returned
>   a copy the updated input vector, otherwise return a vector with a new map 
> entry for the newly found
>   project and initial role.  This function is basically aggregating tuples 
> from the database."
>   [projects project-id project-name new-role]
>   (let [updated? (atom nil)
> 
>         projects* (mapv (fn [m] 
>                           (if (= (:project_id m) project-id)
>                             (do (reset! updated? true)
>                                 (assoc m :project_roles (conj (:project_roles 
> m) new-role)))
>                             m))
>                         projects)]
>     (if @updated?
>       projects*
>       (conj projects {:project_id project-id :project_name project-name 
> :project_roles [new-role]}))))
> 
> 
> ;; (update-roles [{:project_id 1 :project_name "One" :project_roles [:own]}] 
> 2 "Two" :edit)
> ;; => [{:project_id 1, :project_name "One", :project_roles [:own]} 
> {:project_id 2, :project_name "Two", :project_roles [:edit]}]
> ;; (update-roles [{:project_id 1 :project_name "One" :project_roles [:own]}] 
> 1 "Two" :edit)
> ;; => [{:project_id 1, :project_name "One", :project_roles [:own :edit]}]
> 
> 
> 
> Now here's another implementation:
> 
> (defn update-or-insert-project-role
>   [prj-roles prj-role]
>   (let [to-insert-prj-id (:project_id prj-role)
>         by-pid           (group-by :project_id prj-roles)]
>     (case (get by-pid to-insert-prj-id)
>       nil (conj prj-roles prj-role)
>       (->> (update-in by-pid [to-insert-prj-id 0 :project_roles] #(apply conj 
> % (:project_roles prj-role)))
>            (mapcat second)
>            (into [])))))
> 
> ;; (def prj-roles [{:project_id 1, :project_name "One", :project_roles 
> [:own]} {:project_id 3 :project_name "Three" :project_roles [:edit]}])
> ;; (update-or-insert-project-role prj-roles {:project_id 2 :project_name 
> "Two" :project_roles [:edit]})
> ;; => [{:project_id 1, :project_name "One", :project_roles [:own]} 
> {:project_id 3, :project_name "Three", :project_roles [:edit]} {:project_id 
> 2, :project_name "Two", :project_roles [:edit]}]
> ;; (update-or-insert-project-role prj-roles {:project_id 1 :project_name 
> "One" :project_roles [:edit]})
> ;; => [{:project_id 1, :project_name "One", :project_roles [:own :edit]} 
> {:project_id 3, :project_name "Three", :project_roles [:edit]}]
> 
> 
> 
> The function is called in a loop to aggregate rows from a database, though it 
> isn't an overriding concern, we're not going millions of records in this case.
> 
> The first thing my colleague and I disagreed on was the calling sequence, 
> arguing over which is more readable.
> The second thing was whether efficiency in this context is really important, 
> or whether it's all moot in clojure.  
> 
> Finally, I'm sure there's a better way, probably with Zippers or something, 
> but neither of us have used them. Suggestions for the stylistic and 
> operational epitome of clojure expression on this routine are welcome!
> 
> Superficially, and probably incorrect in multiple ways, here is a poor 
> attempt at breaking down efficiency in terms of search/traversal and memory 
> allocations.  This was done by someone with no knowledge of clojure internals 
> (including the library implementations of the called functions).
> 
> ;; Comparing the two routines per function call, for existing project case 
> (i.e. where roles are updated)
> ;;
> ;; Assuming each routine allocates new vector for new-role placement in 
> existing project
> ;; and MapEntry for assoc of new vector on project_roles, so they aren't 
> included in allocations 
> ;; below since both routines have to do it.
> ;;
> ;; Note that x-element map allocates storage for map and map-entries or 
> clojure equivalent.
> ;; (and more expensive than an x-element vector, of course).
> ;;
> ;; n == length of input project list.
> ;; m == average length of input project list role vectors.
> ;; 
> ;; Object Allocations
> ;;   Function call:
> ;;     update-roles: 
> ;;       1 atom
> ;;       1 O(n) vector for mapv 
> ;;     update-or-insert-project-role: 
> ;;       1 3-entry map + 1 single-element vector for prj-role argument input.
> ;;       1 n-element map for group-by
> ;;       n vectors for group-by map values
> ;;       1 n-element map for update-in
> ;;       1 list/sequence for mapcat (+ n concat intermediaries?)
> ;;       1 vector for into
> ;;
> ;; If we discard the second 'into' and first 'mapv' allocations the 
> update-or-insert-project-role routine allocates
> ;; 3 additional maps (two of which are O(n)), n additional vectors, and 1 
> additional list/sequence.
> ;;   
> ;; Searches/traversals/copies
> ;;  update-roles: 
> ;;       O(n) - mapv
> ;;  update-or-insert-project-role: 
> ;;       O(n) - group-by
> ;;       O(n) - update-in
> ;;       O(n) - mapcat
> ;;       O(n) - into
> ;;
> ;; Here's what update-or-insert-project-role allocates (by way of assistance 
> in assessing the above)
> ;;{1 [{:project_id 1, :project_name "One", :project_roles [:own]}], 3 
> [{:project_id 3, :project_name "Three", :project_roles [:edit]}]}             
> -- group-by
> ;;{1 [{:project_id 1, :project_name "One", :project_roles [:own :edit]}], 3 
> [{:project_id 3, :project_name "Three", :project_roles [:edit]}]}       -- 
> update-in
> ;;({:project_id 1, :project_name "One", :project_roles [:own :edit]} 
> {:project_id 3, :project_name "Three", :project_roles [:edit]})               
>  -- mapcat
> ;;[{:project_id 1, :project_name "One", :project_roles [:own :edit]} 
> {:project_id 3, :project_name "Three", :project_roles [:edit]}]               
>  -- into
> 
> Thanks for your thoughts and suggestions. 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Style, Efficiency, and updating nested structure

Reply via email to