I’d wait for strong consistency in Riak 2.0 or try another solution. Your requirements probably need to be rethought if you intend to use any database on the AP side of the spectrum.
- Andy — Andy Gross <a...@basho.com> Chief Architect Basho Technologies, Inc. On Dec 18, 2013, at 9:18 PM, Viable Nisei <vsni...@gmail.com> wrote: > Hi > > On Thu, Dec 19, 2013 at 3:07 AM, Rune Skou Larsen <r...@trifork.com> wrote: > Save the transaction list inside the customer object keyed by customerid. > Index this object with 2i on storeids for each contained tx. > > Not so good idea. Transactions may be running in parallel, but there is no > atomic operations in Riak, or lock managers or UPDATE operation knowing about > blob structure. Risk of race condition is not so high but it exists. > > If some customer objects grow too big, you can move old txs into archive > objects keyed by customerid_seqno. For your low latency customer reads, you > probably only need the newest txs anyway. > > Yeah, we've considered approaches similar to this, but rejected this due to > race conditions. Also we've considered some kind of DLM (like ZooKeeper), but > if we need DLM, we'll just use hadoop/cassandra/hbase... > > That's just one idea. Trifork will be happy to help you find a suitable model > for your use cases. > > Ok, but such idea doesn't look as something mind-blowing...we have considered > this idea and many other approaches. Also what may be anwser for > STORE-TRANSACTION binding? Just mapred?.. > > We usually do this by stress-testing a simulation with realistic data > sizes/shapes and access patterns. > Same for us. We using tsung (scripts are generated, tsung is slightly > automated with some pieces of erlang code) and some custom multithreaded > scenarios like I've mentioned in op-message. > > It's fastest if we come onsite for a couple of days and work with you to set > it up, but we can also help you offsite. > > Write me if you're interested, then we can do a call. > I'm interested, but for now it looks like that there is no prefect solution > (the only untested approach left is custom indexing on riak side), so I don't > really sure if we should pay to just confirm that there is no real solution... > > > On Thu, Dec 19, 2013 at 3:07 AM, Rune Skou Larsen <r...@trifork.com> wrote: > Save the transaction list inside the customer object keyed by customerid. > Index this object with 2i on storeids for each contained tx. > > If some customer objects grow too big, you can move old txs into archive > objects keyed by customerid_seqno. For your low latency customer reads, you > probably only need the newest txs anyway. > > That's just one idea. Trifork will be happy to help you find a suitable model > for your use cases. > > We usually do this by stress-testing a simulation with realistic data > sizes/shapes and access patterns. It's fastest if we come onsite for a couple > of days and work with you to set it up, but we can also help you offsite. > > Write me if you're interested, then we can do a call. > > Rune Skou Larsen > Trifork, Denmark > > > ----- Reply message ----- > Fra: "Viable Nisei" <vsni...@gmail.com> > Til: "riak-users@lists.basho.com" <riak-users@lists.basho.com> > Emne: May allow_mult cause DoS? > Dato: ons., dec. 18, 2013 20:13 > > > > > > ---------- Forwarded message ---------- > From: Viable Nisei <vsni...@gmail.com<mailto:vsni...@gmail.com>> > Date: Thu, Dec 19, 2013 at 2:11 AM > Subject: Re: May allow_mult cause DoS? > To: Russell Brown <russell.br...@me.com<mailto:russell.br...@me.com>> > > > Hi. > > Thank you for your descriptive and so informative answer very much. > > On Wed, Dec 18, 2013 at 3:29 PM, Russell Brown > <russell.br...@me.com<mailto:russell.br...@me.com>> wrote: > Hi, > > Can you describe your use case a little? Maybe it would be easier for us to > help. > Yeah, let me describe some abstract case equivalent to our. Let we have > CUSTOMER object, STORE object and TRANSACTION object, each TRANSACTION has > one tribool attribute STATE={ACTIVE, COMPLETED, ROLLED_BACK}. > > We should be able to list all the TRANSACTIONs of given CUSTOMER, for example > (so we should establish 1-many relation, this list should not be long, > 10^2-10^3 records, but we should be able to obtain this list fast enough). > Also we should be able to list all the TRANSACTIONs of given STATE made in > given STORE (lists may be very long, up to 10^8 records), but these list may > be computed with some latency. Predictable latency is surely preferred but is > not show-stopper. So, that's all. > > Another pain is races and/or operations atomicity, but it's not so important > at current time. > > > On 18 Dec 2013, at 04:32, Viable Nisei > <vsni...@gmail.com<mailto:vsni...@gmail.com>> wrote: > > > On Wed, Dec 18, 2013 at 8:32 AM, Erik Søe Sørensen > > <e...@trifork.com<mailto:e...@trifork.com>> wrote: > > It really is not a good idea to use siblings to represent 1-to-many > > relations. That's not what it's intended for, nor what it's optimized for... > > Ok, understood. > > > > Can you tell us exactly why you need Bitcask rather than LevelDB? 2i would > > probably do it. > > 1) According to > > http://docs.basho.com/riak/latest/ops/running/backups/#LevelDB-Backups , > > it's real pain to implement backups with leveldb. > > 2) According to > > http://docs.basho.com/riak/latest/ops/advanced/backends/leveldb/ , reads > > may be slower comparing to bitcask, it's critical for us > > > > Otherwise, storing a list of items under each key could be a solution, > > depending of course on the number of items per key. (But do perform > > conflict resolution.) > > Why any conflict resolving is required? As far as I understood, with > > allow_mult=true riak should just collect all the values written to key > > without anything additional work? What design decision leads to exponential > > slowdown and crashes when multiple values allowed for any single key?.. So, > > what's the REAL purpose of allow_mult=true if it's bad idea to use it for > > unlimited values per single key? > > The real purpose of allow_mult=true is so that writes are never dropped. In > the case where your application concurrently writes to the same key on two > different nodes, or on two partitioned nodes, Riak keeps both values. Other > data stores will lose one of the writes based on timestamp, serialise your > writes (slow) or simply refuse to accept one or more of them. > Ok, but documentation doesn't make points really clear. > > > It is the job of the client to aggregate those multiple writes into a single > value when it detects the conflict on read. Conflict resolution is required > because your data is opaque to Riak. Riak doesn’t know that you’re storing > lists of values, or JPEGs or JSON. It can’t possibly know how to resolve two > conflicting values unless it knows the semantics of the values. Riak _does_ > collect all the values written to a key, but it does so as a temporary > measure, it expects your application to resolve them to a single value. How > many are you writing per Key? > As I said before, we need really many values in our 1-many sets - up to 10^8 > Also why not to implement separate bucket mode allowing just to collect all > the values writing? Anyway, current allow_mult implementation looks like very > dangerous. Also documentation should be more clear - in "sibling explosion" > paragraph some statement should be added pointing that this relates to > allow_mult=true too. > > > Riak’s sweetspot is highly write available applications. If you have the time > read the Amazon Dynamo paper[1], as it explains the _problems_ Riak solves as > well as the way in which it solves them. If you don’t have these problems, > maybe Riak is not the right datastore for you. Solving these problems comes > with some developer complexity costs. You’ve run into one of them. We have > many customers who think the trade-off is worth it: that the high > availability and low-latency makes up for having eventual consistency. > > Yeah, ok, but what riak<2.0 really allows? FTS looks unscalable (am I right? > is any way to speed-up it available?), list of all bucket keys is not for > production, 2i is not implemented for bitcask (anyway, we'll try them on > leveldb), links "implemented as hacks in java driver". So, riak<2.0 with > bitcask is only good distributed 1-1 hashmap with mapred support. > > > > > Ok, documentation contains the following paragraph: > > > > > Sibling explosion occurs when an object rapidly collects siblings without > > > being reconciled. This can lead to a myriad of issues. Having an enormous > > > object in your node can cause reads of that object to crash the entire > > > node. Other issues are increased cluster latency as the object is > > > replicated and out of memory errors. > > > > But there is no point if it related to allow_mult=false or both cases. > > Sorry, but I don’t understand what you mean by this statement. The point of > allow_mult=true is so that writes are not arbitrarily dropped. It allows Riak > nodes to continue to be available to take writes even if they can’t > communicate with each other. Have a look at Kyle Kingsbury’s Jepsen[2] post > on Riak. > > I'm just speaking about that this paragraph should contain something like > "don't write multiple values into single key in bucket with allow_mult=true, > this will cause dramatic slowdowns/crashes". It's not really obvious that > siblings explosion is related to bucket with allow_mult=true. > > > > > So, the only solution is leveldb+2i? > > Maybe. Or maybe just use the client as it is intended to resolve sibling > values and send that value and a vector clock back to Riak. > Not a solution for big sets of 10^8 elements > > Or maybe roll your own indexes like in this blog post[3]. > It's not an option to use some custom "indexes" on client side for long > lists, so the only option is to write some erlang piece of code?.. > > With Riak 2.0 there are a few data types added to Riak that are not opaque. > Maybe Riak’s Sets would suit your purpose (depending on the size of your Set.) > > What are you meaning by "depending size of Set"? > Will I be able to store 10^8 values and enumerate/add new values fast enough? > > You’re fighting the database at the moment, rather than working with it. The > properties of Riak buy you some wonderful things (high availability, > partition tolerance, low latency) but you have to want / need those > properties, and then you have to accept that there is a data modelling / > developer complexity price to pay. We don’t think that price is too high. We > have many customers who agree. We’re always working to lower that price (see > Strong Consistency, Yokozuna, Data Types etc in Riak 2.0[4].) > We've built 2.0 TP but it like to crash frequently and 2.0 driver still is > not ready, but according to docs it looks like significantly better. But > questions about maximum Set size and FTS scalability still looks actual. > > > You seem to have had a very negative first experience of Riak (and Basho.) I > think that is because you misunderstand what it is for and how it should be > used. I'm very keen to fix that. If it turns out that Riak is just not for > you, that is fine too. > It's not negative experience, it's just WTFZOMG state. Everything looked good > until loading/scalability tests... > > > In response to your earlier mail, I think Basho’s consulting costs sound > incredibly low. I think you got that answer because you reached out to Basho > through that channel, rather than ask the list. We’re still trying to track > down who you spoke to and when, if you could provide me details of that > conversation directly (rather than to the list) I’d be very grateful. > I think it's not really important for now, I think we've incorrectly > emphasized our questions/thoughts. > Anyway for now looks like there is no silver bullet priced for $5k - all the > possible approaches to solve our problem was already listed in this thread. > And the only way I've missed in op message was custom indexing on server side > (implemented as precommit hook, am I right? such as FTS?) > > > I’m not sure if it is just a cultural / language thing, but you’re very > negative right now, and you sound like you're attacking Basho and Riak. I > don’t think that is warranted at this point as we’re just trying to help you > figure out if Riak is the datastore you want / need. > > As I said before, I'm not negative. This picture http://tinyurl.com/p5zntks > excellently describes thoughts of our dev team after set of loading tests. We > got 100 writes/sec on single core i3 host. Ok, we got up to 500 writes (but > we need 10k+) on single cc2.8xlarge host, but with 5 cc2.8xlarge nodes we got > lesser with latency significantly increased. We changed our approach to using > allow_mult - and got only 100 for some first seconds, then exponentially > dropping to zero, then total crash of all the cluster... Also you are right - > english is not my native language. What about subject of our thread - take it > like like some yellow press headline (but I still think that it's not so good > idea to allow client code to do SUCH BAD THINGS WITH WHOLE CLUSTER) > > Cheers > > Russell > > [1] http://dl.acm.org/citation.cfm?id=1294281 > [2] http://aphyr.com/posts/285-call-me-maybe-riak > [3] http://basho.com/index-for-fun-and-for-profit/ > [4] http://basho.com/technical-preview-of-riak-2-0/ > > > > > > > > > On Wed, Dec 18, 2013 at 8:32 AM, Erik Søe Sørensen > > <e...@trifork.com<mailto:e...@trifork.com>> wrote: > > It really is not a good idea to use siblings to represent 1-to-many > > relations. That's not what it's intended for, nor what it's optimized for... > > Can you tell us exactly why you need Bitcask rather than LevelDB? 2i would > > probably do it. > > Otherwise, storing a list of items under each key could be a solution, > > depending of course on the number of items per key. (But do perform > > conflict resolution.) > > /Erik > > > > > > > > -------- Oprindelig meddelelse -------- > > Fra: Viable Nisei <vsni...@gmail.com<mailto:vsni...@gmail.com>> > > Dato: > > Til: riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> > > Emne: May allow_mult cause DoS? > > > > > > Hi. > > > > Recently we've described that something is going unexpectedly. We are using > > Riak 1.4.2 with some buckets with allow_mult=true. > > We've tried our app under load then found that... concurrently writes into > > bucket with allow_mult turning Riak into irresponsible slowpoke and even > > crash it. > > > > Core i3 with 4GB RAM performs only 20 writes/sec with 5 client threads > > writing 20 short strings into 20 keys in bucket with allow_mult=true, > > search=false. With 40 values per 40 keys it performs only 6 writes/sec. > > 60x60 cause riak crash? > > Throughput drops drastically. Ok, we've not chaged concurrency factor (5) > > and increased our data set 4x, but why throughput drops? > > Ok, we increase our dataset linear, 20 strings * 20 keys, 40 strings*20 > > keys, 60 strings*20 keys... Results will be same - exponential throughput > > drop with crash at end. > > > > Cluster of five Amazon EC2 cc2.8xlarge nodes becomes irresponsibly with > > throughput 1-5 writes/sec with only 80-100 values per 1-10 keys. > > > > So, we think it is very strange. > > > > Here you can check our code sample (in java) reproducing this behavior: > > https://bitbucket.org/vsnisei/riak-allow_mult_wtf > > > > So, we have asked Basho about this, but they said that "we think SQLish" > > and asked us for $5k for 2-days consultation to resolve our problem. > > So, I've decided to ask here if we are really so stupid and not able to > > understood some simple things or Basho didn't understood us correctly?.. > > > > Anyway, looks like that some DoS/DDoS attack approach utilizing this > > behavior may be proposed. We should only know that some > > service/appliation/website is using Riak with allow_mult buckets then > > provoke concurrent writes into them... > > > > Actually our question to Basho was broader. Our application needs to > > implement 1-many bindings. Riak allows the following approaches to > > simultate such bindings, according to documentation: > > > > 1. Riak search - but we've found that it's VERY slow (20x performance > > drop when search enabled, even for simple objects like {source_id: xxx, > > target_id: yyy}, also we've found that search is not really scalable - > > adding new nodes into cluster not increasing throughput, but even slows > > cluster down... > > 2. secondary indexes. But, according to docs, they are working only on > > LevelDb, but we need Bitcask > > 3. Link walking. But, according to docs, it's "rest only operation" and > > in java driver it's implemented as a hack > > 4. allow_mult. But we've found that it's just a nightmare. So we told > > Basho about this and given link to our example, but they didn't given us > > any feedback > > 5. Bucket keys enumeration. But, according to docs, this operation causes > > full keys scan on each node and must not be used in production > > 6. Mapred queries. Ok, we didn't tried them yet, maybe it's silver > > bullet, really. But according to docs (and common sense) mapred causes > > full-scan (for bucket at least. Or for all keys?) and it's operation with > > unpredictable latency. > > > > So, where we are wrong? Is everything ok with behavior I've described? Are > > we misunderstood Riak completely and should pay $5k for some > > mind-expansion, or there is no any hidden mystical knowledge and they will > > not say us anything excepting approaches listed above? > > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com