Re: Handling increasingly-intensive processes

Thomas Heller Mon, 15 Dec 2014 03:35:35 -0800

Hey,

without knowing much about your application/business needs its hard to 
speculate what might be good for you. The root of your problem might be 
CouchDB since it was never meant for "Big Data" and since we are talking 
tweets I generally think "a lot". I'm not sure how your map value looks but 
I think you do something like

obj = (couch/get hash-tag)
obj = (my-app/update obj new-tweet)
(couch/put hash-tag obj)

Which will always perform badly since you cannot do this concurrently, 
except with CRDTs which CouchDB doesn't support since it does its own 
MVCC.  Don't remember exaclty how their conflict resolution works but I 
think it was "last write wins". Caching will not save you for long, since 
writes will eventually become the bottleneck.

Why do you not use a CouchDB view to create the hash-tag map on the server 
and then just append-only the tweets? The views map function can then just 
emit each tweet under the hash-tag key (once for each tag) and the reduce 
function can build your map. That should perform alot better up to a 
certain point and you can control how up-to-date your view index has to be.

Anyways, might be best to choose another Database. Regardless of what 
database you are using, updating a single place concurrently is going to be 
a problem. An Atom in Clojure makes this look like a no-brainer but under 
high load it can still blow up since it has no back-pressure in any way.

"Bit Data" and "Distributed Systems" are hard and cannot be described in 
short. Without exact knowledge of what your app/business needs look like it 
is impossible to make the "correct" recommendation.

HTH,
/thomas

On Monday, December 15, 2014 4:54:04 AM UTC+1, Sam Raker wrote:
>
> I'm (still) pulling tweets from twitter, processing them, and storing them 
> in CouchDB with hashtags as doc ids, such that if a tweet contains 3 
> hashtags, that tweet will be indexed under each of those 3 hashtags. My 
> application hits CouchDB for the relevant document and uses Cheshire to 
> convert the resulting string to a map. The map's values consist of a few 
> string values and an array that consists of all the tweets that contain 
> that hashtag. The problem is thus with common hashtags: the more tweets 
> contain a given hashtag, the long that hashtag's "tweets" array will be, 
> and, additionally, the more often that document will be retrieved from 
> CouchDB. The likelihood and magnitude of performance hits on my app are 
> therefore correlated, which is Bad.
>
> I'm reaching out to you all for suggestions about how best to deal with 
> this situation. Some way of caching something, somehow? I'm at a loss, but 
> I want to believe there's a solution.
>
>
> Thanks,
> -sam
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Handling increasingly-intensive processes

Reply via email to