Re: Efficient sparse vector representation...

Andreas Kostler Wed, 25 May 2011 21:19:40 -0700

On 26/05/2011, at 1:12 PM, Ken Wesson wrote:

> On Wed, May 25, 2011 at 10:22 PM, Andreas Kostler
> <andreas.koestler.le...@gmail.com> wrote:
>> Yes and no. I need efficient two way lookup.
>> 
>> So, I need to do something like
>> For every key in {12 "a" 23 "aa" 234 "aaaa"}
>> Get the token in a map that is organised like this:
>> {token1 id1 token2 id2 token2 id3}
>> 
>> Naively, I could have two maps with the reverse lookups.
>> l1 = {token1 id1 token2 id2 token3 id3}
>> reverse = {id1 token1 id2 token2 ...}
>> 
>> Given that there will be many, many tokens this seems a bit too much 
>> overhead.
>> 
>> You could basically do a binarySearch on a sorted hash map (sorted by vals 
>> (e.g. ids)) but I thought
>> doing the search on a vec of vectors would be more efficient and the vec of 
>> vecs representation would
>> be more compact.
>> 
>> I guess that's what I'm trying to figure out.
>> 
>> Andreas
>> 
>> P.S. Lookup by token is more frequent than lookup by id, hence the bin 
>> search on id.
> 
> If the token and id sets are disjoint (no single object can ever
> appear as each -- only one of those at most) then you can use a single
> map with a function put defined as:
> 
> (defn put [m k v]
>  (assoc (assoc m k v) v k))
> 
> which adds mappings in both directions. Then just use (m k) or (m v)
> to perform lookups in either direction. The overhead may be less than
> with two separate hashmaps, and the code is certainly simpler.
> 
Hi Ken,
That indeed could be useful. What are your thoughts on the proposed solution of 
representing
sparse vectors as maps (id-> val)? Having a vec of vector or a map with ids and 
vals vectors
{:ids [vec of ids] :vals [vec of val]} is just a lot easier to deal with for 
calculations. E.g normalisation


;; For a sparse vector as map
(defn normalise-m [m]
        (let [vals (vals m)
             keys (keys m)
             l (Math/sqrt (reduce + (map #(* % %) vals)))]
             (zipmap keys (map #(/ % l) vals))))

;; For a map of id/val vecs
(defn normalise [sv]
        (let [l (Math/sqrt (reduce + (map #(* % %) (:vals sv))))]
             {:i (:i sv) :vals (map #(/ % l) (:vals sv))}))

The latter outperforms the former by more than a factor of 10. 

Has anyone used sparse vectors or think of a reasonable implementation?




> -- 
> Protege: What is this seething mass of parentheses?!
> Master: Your father's Lisp REPL. This is the language of a true
> hacker. Not as clumsy or random as C++; a language for a more
> civilized age.
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Efficient sparse vector representation...

Reply via email to