Hi Sam,

have you tried putting the incoming (hashtag,tweet) tuples into a queue and
have another thread pull them out and upload them to couchdb?

I'm unfamiliar with HBC, but I assume it has a callback-based API, so you
should be able to have multiple callbacks/connections/streams feed the same
queue and have a single thread do the upload (and maybe batch if necessary).

I don't see refs being a particularly good fit for this problem, but I
could be wrong.


2014-12-11 16:18 GMT+00:00 Sam Raker <sam.ra...@gmail.com>:

> I've got some code that's using Twitter's HoseBirdClient to pull tweets
> from the public stream, which I then preprocess and store with CouchDB.
> Right now, my HBC client is being forced to reconnect more than I'd like,
> which occasionally causes my app to hang, for reasons I'm not entirely
> clear on. Regardless, some preliminary research on HBC suggests that the
> reconnections are being caused by my code failing to keep up with the
> endpoint, which in turn suggests that my processing+uploading is taking too
> long. I tried wrapping the processing+uploading part in futures, which
> definitely sped things up, but caused 409 errors when uploading to
> CouchDB--briefly, Couch requires any update operation to include a
> git-style "rev" string, and if the rev you provide isn't the most recent
> one, it throws a 409 at you. I'm organizing things by hashtag, so tweets
> with multiple copies of the same hashtag, or series of tweets with the same
> hashtag are the culprit--future A gets the current doc from Couch,
> processes it, and uses the rev it got from the currently-existing doc,
> while future B does the same thing, but finishes first, so now future A has
> an outdated rev, and that causes the 409.
>
> The vague solution I've come up with involves using a map to store the rev
> values, with the last step of the processing/uploading function being to
> store the rev number Clutch helpfully returns to you after a successful
> update. From what I can tell, refs are the way to go, since each future is
> effectively a separate thread. My questions are as follows:
> 1) Would I have to store the map-of-refs in a ref?
> 2) Is this even feasible? Would the timing work out?
> 3) With the addition of all this dereferencing and `dosync`+`alter`-ing,
> would this actually end up speeding things up all that much?
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
László Török
--
Checkout http://www.lollyrewards.com/

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to