Re: Efficient sparse vector representation...

Andreas Kostler Thu, 26 May 2011 05:20:15 -0700

This is what I've come up with so far...it should give a bit of an idea of what 
I'm using it for:


(defprotocol PSparseVec
  (norm [this])
  (cnt [this])
  (ith-val [this i])
  (unit-vec [this])
  (assc [this i o]))

(defrecord SparseVec [indices vals]
  PSparseVec
  (cnt [this] (count indices))
  (norm [this] (Math/sqrt (reduce + (map #(* % %) vals))))
  (unit-vec [this] (let [l-inv (/ 1 (norm this))] (SparseVec. indices (vec (map 
#(* l-inv %) vals)))))
  (assc [this i o]
    (let [idx (bin-search indices i)
          val (get vals idx nil)]
      (if val (SparseVec. indices (assoc vals idx o))
          (let [new-indices (sort (conj indices i))
                new-idx (bin-search new-indices i)
                [chunk1 chunk2] (split-at new-idx vals)]
            (SparseVec. new-indices (into (conj (vec chunk1) o) chunk2))))))
  (ith-val [this i]
    (get vals (bin-search indices i) 0)))

Can you guys think of ways of making this more idiomatic and/or performant?

Cheers
Andreas

On 26/05/2011, at 1:12 PM, Ken Wesson wrote:

> On Wed, May 25, 2011 at 10:22 PM, Andreas Kostler
> <andreas.koestler.le...@gmail.com> wrote:
>> Yes and no. I need efficient two way lookup.
>> 
>> So, I need to do something like
>> For every key in {12 "a" 23 "aa" 234 "aaaa"}
>> Get the token in a map that is organised like this:
>> {token1 id1 token2 id2 token2 id3}
>> 
>> Naively, I could have two maps with the reverse lookups.
>> l1 = {token1 id1 token2 id2 token3 id3}
>> reverse = {id1 token1 id2 token2 ...}
>> 
>> Given that there will be many, many tokens this seems a bit too much 
>> overhead.
>> 
>> You could basically do a binarySearch on a sorted hash map (sorted by vals 
>> (e.g. ids)) but I thought
>> doing the search on a vec of vectors would be more efficient and the vec of 
>> vecs representation would
>> be more compact.
>> 
>> I guess that's what I'm trying to figure out.
>> 
>> Andreas
>> 
>> P.S. Lookup by token is more frequent than lookup by id, hence the bin 
>> search on id.
> 
> If the token and id sets are disjoint (no single object can ever
> appear as each -- only one of those at most) then you can use a single
> map with a function put defined as:
> 
> (defn put [m k v]
>  (assoc (assoc m k v) v k))
> 
> which adds mappings in both directions. Then just use (m k) or (m v)
> to perform lookups in either direction. The overhead may be less than
> with two separate hashmaps, and the code is certainly simpler.
> 
> -- 
> Protege: What is this seething mass of parentheses?!
> Master: Your father's Lisp REPL. This is the language of a true
> hacker. Not as clumsy or random as C++; a language for a more
> civilized age.
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Efficient sparse vector representation...

Reply via email to