No problem, distributed systems are hard to reason about, I got caught many times in the past
On Sun, Mar 5, 2017 at 9:23 AM, benjamin roth <brs...@gmail.com> wrote: > Sorry. Answer was to fast. Maybe you are right. > > Am 05.03.2017 09:21 schrieb "benjamin roth" <brs...@gmail.com>: > > > No. You just change the partitioner. That's all > > > > Am 05.03.2017 09:15 schrieb "DuyHai Doan" <doanduy...@gmail.com>: > > > >> "How can that be achieved? I haven't done "scientific researches" yet > but > >> I > >> guess a "MV partitioner" could do the trick. Instead of applying the > >> regular partitioner, an MV partitioner would calculate the PK of the > base > >> table (which is always possible) and then apply the regular > partitioner." > >> > >> The main purpose of MV is to avoid the drawbacks of 2nd index > >> architecture, > >> e.g. to scan a lot of nodes to fetch the results. > >> > >> With MV, since you give the partition key, the guarantee is that you'll > >> hit > >> a single node. > >> > >> Now if you put MV data on the same node as base table data, you're doing > >> more-or-less the same thing as 2nd index. > >> > >> Let's take a dead simple example > >> > >> CREATE TABLE user (user_id uuid PRIMARY KEY, email text); > >> CREATE MV user_by_email AS SELECT * FROM user WHERE user_id IS NOT NULL > >> AND > >> email IS NOT NULL PRIMARY KEY((email),user_id); > >> > >> SELECT * FROM user_by_email WHERE email = xxx; > >> > >> With this query, how can you find the user_id that corresponds to email > >> 'xxx' so that your MV partitioner idea can work ? > >> > >> > >> > >> On Sun, Mar 5, 2017 at 9:05 AM, benjamin roth <brs...@gmail.com> wrote: > >> > >> > While I was reading the MV paragraph in your post, an idea popped up: > >> > > >> > The problem with MV inconsistencies and inconsistent range movement is > >> that > >> > the "MV contract" is broken. This only happens because base data and > >> > replica data reside on different hosts. If base data + replicas would > >> stay > >> > on the same host then a rebuild/remove would always stream both > matching > >> > parts of a base table + mv. > >> > > >> > So my idea: > >> > Why not make a replica ALWAYS stay local regardless where the token of > >> a MV > >> > would point at. That would solve these problems: > >> > 1. Rebuild / remove node would not break MV contract > >> > 2. A write always stays local: > >> > > >> > a) That means replication happens sync. That means a quorum write to > the > >> > base table guarantees instant data availability with quorum read on a > >> view > >> > > >> > b) It saves network roundtrips + request/response handling and helps > to > >> > keep a cluster healthier in case of bulk operations (like repair > >> streams or > >> > rebuild stream). Write load stays local and is not spread across the > >> whole > >> > cluster. I think it makes the load in these situations more > predictable. > >> > > >> > How can that be achieved? I haven't done "scientific researches" yet > >> but I > >> > guess a "MV partitioner" could do the trick. Instead of applying the > >> > regular partitioner, an MV partitioner would calculate the PK of the > >> base > >> > table (which is always possible) and then apply the regular > partitioner. > >> > > >> > I'll create a proper Jira for it on monday. Currently it's sunday here > >> and > >> > my family wants me back so just a few thoughts on this right now. > >> > > >> > Any feedback is appreciated! > >> > > >> > 2017-03-05 6:34 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>: > >> > > >> > > On Sat, Mar 4, 2017 at 10:26 AM, Jeff Jirsa <jji...@gmail.com> > wrote: > >> > > > >> > > > > >> > > > > >> > > > > >> > > > > On Mar 4, 2017, at 7:06 AM, Edward Capriolo < > >> edlinuxg...@gmail.com> > >> > > > wrote: > >> > > > > > >> > > > >> On Fri, Mar 3, 2017 at 12:04 PM, Jeff Jirsa <jji...@gmail.com> > >> > wrote: > >> > > > >> > >> > > > >> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo < > >> > > edlinuxg...@gmail.com> > >> > > > >> wrote: > >> > > > >> > >> > > > >>> > >> > > > >>> I used them. I built do it yourself secondary indexes with > them. > >> > They > >> > > > >> have > >> > > > >>> there gotchas, but so do all the secondary index > >> implementations. > >> > > Just > >> > > > >>> because datastax does not write about something. Lets see > like 5 > >> > > years > >> > > > >> ago > >> > > > >>> there was this: https://github.com/hmsonline/ > cassandra-triggers > >> > > > >>> > >> > > > >>> > >> > > > >> Still in use? How'd it work? Production ready? Would you still > >> do it > >> > > > that > >> > > > >> way in 2017? > >> > > > >> > >> > > > >> > >> > > > >>> There is a fairly large divergence to what actual users do and > >> what > >> > > > other > >> > > > >>> groups 'say' actual users do in some cases. > >> > > > >>> > >> > > > >> > >> > > > >> A lot of people don't share what they're doing (for business > >> > reasons, > >> > > or > >> > > > >> because they don't think it's important, or because they don't > >> know > >> > > > >> how/where), and that's fine but it makes it hard for anyone to > >> know > >> > > what > >> > > > >> features are used, or how well they're really working in > >> production. > >> > > > >> > >> > > > >> I've seen a handful of "how do we use triggers" questions in > IRC, > >> > and > >> > > > they > >> > > > >> weren't unreasonable questions, but seemed like a lot of pain, > >> and > >> > > more > >> > > > >> than one of those people ultimately came back and said they > used > >> > some > >> > > > other > >> > > > >> mechanism (and of course, some of them silently disappear, so > we > >> > have > >> > > no > >> > > > >> idea if it worked or not). > >> > > > >> > >> > > > >> If anyone's actively using triggers, please don't keep it a > >> secret. > >> > > > Knowing > >> > > > >> that they're being used would be a great way to justify > >> continuing > >> > to > >> > > > >> maintain them. > >> > > > >> > >> > > > >> - Jeff > >> > > > >> > >> > > > > > >> > > > > "Still in use? How'd it work? Production ready? Would you still > >> do it > >> > > > that way in 2017?" > >> > > > > > >> > > > > I mean that is a loaded question. How long has cassandra had > >> > Secondary > >> > > > > Indexes? Did they work well? Would you use them? How many times > >> were > >> > > > they re-written? > >> > > > > >> > > > It wasn't really meant to be a loaded question; I was being > sincere > >> > > > > >> > > > But I'll answer: secondary indexes suck for many use cases, but > >> they're > >> > > > invaluable for their actual intended purpose, and I have no idea > how > >> > many > >> > > > times they've been rewritten but they're production ready for > their > >> > > narrow > >> > > > use case (defined by cardinality). > >> > > > > >> > > > Is there a real triggers use case still? Alternative to MVs? > >> > Alternative > >> > > > to CDC? I've never implemented triggers - since you have, what's > the > >> > > level > >> > > > of surprise for the developer? > >> > > > >> > > > >> > > :) You mention alternatives/: Lets break them down. > >> > > > >> > > MV: > >> > > They seem to have a lot pf promise. IE you can use them for things > >> other > >> > > then equality searches, and I do think the CQL example with the top > N > >> > high > >> > > scores is pretty useful. Then again our buddy Mr Roth has a thread > >> named > >> > > "Rebuild / remove node with MV is inconsistent". I actually think a > >> lot > >> > of > >> > > the use case for mv falls into the category of "something you should > >> > > actually be doing with storm". I can vibe with the concept of not > >> > needing a > >> > > streaming platform, but i KNOW storm would do this correctly. I > don't > >> > want > >> > > to land on something like 2x index v1 v2 where there was fundamental > >> > flaws > >> > > at scale.(not saying this is case but the rebuild thing seems a bit > >> > scary) > >> > > > >> > > CDC: > >> > > I slightly afraid of this. Rational: A extensible piece design > >> > specifically > >> > > for a close source implementation of hub and spoke replication. I > have > >> > some > >> > > experience trying to "play along" with extensible things > >> > > https://issues.apache.org/jira/browse/CASSANDRA-12627 > >> > > "Thus, I'm -1 on {[PropertyOrEnvironmentSeedProvider}}." > >> > > > >> > > Not a rub, but I can't even get something committed using an > existing > >> > > extensible interface. Heaven forbid a use case I have would want to > >> > > *change* > >> > > the interface, I would probably get a -12. So I have no desire to > try > >> and > >> > > maintain a CDC implementation. I see myself falling into the same > old > >> > "why > >> > > you want to do this? -1" trap. > >> > > > >> > > Coordinator Triggers: > >> > > To bring things back really old-school coordinator triggers everyone > >> > always > >> > > wanted. In a nutshell, I DO believe they are easier to reason about > >> then > >> > > MV. It is pretty basic, it happens on the coordinator there is no > >> > batchlogs > >> > > or whatever, best effort possibly requiring more nodes then as the > >> keys > >> > > might be on different services. Actually I tend do like features > like. > >> > Once > >> > > something comes on the downswing of "software hype cycle" you know > >> it is > >> > > pretty stable as everyone's all excited about other things. > >> > > > >> > > As I said, I know I can use storm for top-n, so what is this > feature? > >> > Well > >> > > I want to optimize my network transfer generally by building my > batch > >> > > mutations on the server. Seems reasonable. Maybe I want to have my > own > >> > > little "read before write" thing like CQL lists. > >> > > > >> > > The warts, having tried it. First time i tried it found it did not > >> work > >> > > with non batches, patched in 3 hours. Took weeks before some CQL > user > >> had > >> > > the same problem and it got fixed :) There was no dynamic stuff at > the > >> > time > >> > > so it was BYO class loader. Going against the grain and saying. > >> > > > >> > > The thing you have to realize with the best effort coordinator > >> triggers > >> > are > >> > > that "transaction" could be incomplete and well that sucks maybe for > >> some > >> > > cases. But I actually felt the 2x index implementations force all > >> > problems > >> > > into a type of "foreign key transnational integrity " that does not > >> make > >> > > sense for cassandra. > >> > > > >> > > Have you every used elastic search, there version of consistency is > >> write > >> > > something, keep reading and eventually you see it, wildly popular :) > >> It > >> > is > >> > > a crazy world. > >> > > > >> > > >> > > >