in other words, I can lose the 'doall' and do this:
=>(def ntt-pairs (ngrams* tt-pairs 2))
#'hotel_nlp.algorithms.viterbi/ntt-pairs
=> (first ntt-pairs)
(#hotel_nlp.algorithms.viterbi.TokenTagPair{:token "The", :tag "DET"}
#hotel_nlp.algorithms.viterbi.TokenTagPair{:token "Fulton", :tag "NP"})
=> (second ntt-pairs)
(#hotel_nlp.algorithms.viterbi.TokenTagPair{:token "Fulton", :tag "NP"}
#hotel_nlp.algorithms.viterbi.TokenTagPair{:token "County", :tag "N"})
so this proves it works as expected...the weirdness is that it takes
more than forever whereas with strings it finishes quickly!
Jim
On 24/03/13 13:35, Jim - FooBar(); wrote:
the operation is 'ngrams*' which doesn't care about what objects it
finds in the seq...Typically you'd have characters or word ngrams but
that doesn't mean you can't have any type of object...it simply
doesn't care...
(defn ngrams*
"Create ngrams from a seq s.
Pass a single string for character n-grams or a seq of strings for
word n-grams."
[s n]
(when (>= (count s) n)
(lazy-seq
(cons (take n s) (ngrams* (next s) n)))))
I cannot get the ngrams from the second case but yes they should be
different (e.g. not=) but the final coll should be of the same size in
both cases and should terminate in the same time...
Jim
On 24/03/13 13:22, Marko Topolnik wrote:
What do you mean by "performing the same operation"? How can you
perform the same operation on completely different objects? Do you
mean that you don't have the exact same /ngrams*/ in the first and
second case?
On Sunday, March 24, 2013 1:45:37 PM UTC+1, Jim foo.bar wrote:
Hi everyone,
I'm experiencing some odd behaviour that I cannot justify so I
thought someone smarter can help here...
I'm reading in a file with 39,7226 lines, each one containing a
token-tag pair (e.g. The/DET). This gives me 39,7226
fully-realized TokenTagPair objects (records->
TokenTagPair{:token "The", :tag "DET"}).
Let's call them 'tt-pairs'. Now, I've got a function that takes
n-grams from a seq lazily. If I first extract all the tags and
pass them to ngrams then all is fine:
(def tags (mapv :tag tt-pairs)) ;;all the 39,7226 tags
(def ntags (doall (ngrams* tags 2))) ;;returns quickly
=> (last ntags)
("N" "P")
* HOWEVER,*
(def ntt-pairs (doall (ngrams* tt-pairs 2))) ;hangs forever!
one of my cpus is busy but nothing happens!!! How is this
justified? _both collections are of the same size and I'm
performing the same operation on them_...
the only difference is that 'tags' contains String objects
whereas 'tt-pairs' contains TokenTagPair objects... weird stuff, yes?
any ideas anyone?
Jim
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient
with your first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.