Hey, without knowing much about your application/business needs its hard to speculate what might be good for you. The root of your problem might be CouchDB since it was never meant for "Big Data" and since we are talking tweets I generally think "a lot". I'm not sure how your map value looks but I think you do something like
obj = (couch/get hash-tag) obj = (my-app/update obj new-tweet) (couch/put hash-tag obj) Which will always perform badly since you cannot do this concurrently, except with CRDTs which CouchDB doesn't support since it does its own MVCC. Don't remember exaclty how their conflict resolution works but I think it was "last write wins". Caching will not save you for long, since writes will eventually become the bottleneck. Why do you not use a CouchDB view to create the hash-tag map on the server and then just append-only the tweets? The views map function can then just emit each tweet under the hash-tag key (once for each tag) and the reduce function can build your map. That should perform alot better up to a certain point and you can control how up-to-date your view index has to be. Anyways, might be best to choose another Database. Regardless of what database you are using, updating a single place concurrently is going to be a problem. An Atom in Clojure makes this look like a no-brainer but under high load it can still blow up since it has no back-pressure in any way. "Bit Data" and "Distributed Systems" are hard and cannot be described in short. Without exact knowledge of what your app/business needs look like it is impossible to make the "correct" recommendation. HTH, /thomas On Monday, December 15, 2014 4:54:04 AM UTC+1, Sam Raker wrote: > > I'm (still) pulling tweets from twitter, processing them, and storing them > in CouchDB with hashtags as doc ids, such that if a tweet contains 3 > hashtags, that tweet will be indexed under each of those 3 hashtags. My > application hits CouchDB for the relevant document and uses Cheshire to > convert the resulting string to a map. The map's values consist of a few > string values and an array that consists of all the tweets that contain > that hashtag. The problem is thus with common hashtags: the more tweets > contain a given hashtag, the long that hashtag's "tweets" array will be, > and, additionally, the more often that document will be retrieved from > CouchDB. The likelihood and magnitude of performance hits on my app are > therefore correlated, which is Bad. > > I'm reaching out to you all for suggestions about how best to deal with > this situation. Some way of caching something, somehow? I'm at a loss, but > I want to believe there's a solution. > > > Thanks, > -sam > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.