yes, thrift does have async, though I haven't had to use it yet. Right now I'm working on adding CAS to hector followed by multi slice.
On Fri, Jun 13, 2014 at 9:01 PM, graham sanderson <gra...@vast.com> wrote: > Note as I mentioned mid post, thrift also supports async nowadays (there > was a recent discussion on cassandra dev and the choice was not to move to > it) > > I think the binary protocol is the way forward; CQL3 needs some new > features, or there need to be some other types of requests you can make > over the binary protocol > > On Jun 13, 2014, at 5:51 PM, Peter Lin <wool...@gmail.com> wrote: > > > without a doubt there's nice features of CQL3 like notifications and > async. I want to see CQL3 mature and handle all the use cases that Thrift > handles easily today. It's to everyone's benefit to work together and > improve CQL3. > > Other benefits of Thrift drivers today is being able to use object API > with generics. For tool builders, this is especially useful. Not everyone > wants to write tools, but I do so it matters to me. > > > > > On Fri, Jun 13, 2014 at 6:39 PM, Laing, Michael <michael.la...@nytimes.com > > wrote: > >> Just to add 2 more cents... :) >> >> The CQL3 protocol is asynchronous. This can provide a substantial >> throughput increase, according to my benchmarking, when one uses >> non-blocking techniques. >> >> It is also peer-to-peer. Hence the server can generate events to send to >> the client, e.g. schema changes - in general, 'triggers' become possible. >> >> ml >> >> >> On Fri, Jun 13, 2014 at 6:21 PM, graham sanderson <gra...@vast.com> >> wrote: >> >>> My 2 cents… >>> >>> A motivation for CQL3 AFAIK was to make Cassandra more familiar to SQL >>> users. This is a valid goal, and works well in many cases. >>> Equally there are use cases (that some might find ugly) where Cassandra >>> is chosen explicitly because of the sorts of things you can do at the >>> thrift level, which aren’t (currently) exposed via CQL3 >>> >>> To Robert’s point earlier - "Rational people should presume that Thrift >>> support must eventually disappear”… he is probably right (though frankly >>> I’d rather the non-blocking thrift version was added instead). However if >>> we do get rid of the thrift interface, then it needs to be at a time that >>> CQLn is capable of expressing all the things you could do via the thrift >>> API. Note, I need to go look and see if the non-blocking thrift version >>> also requires materializing the entire thrift object in memory. >>> >>> On Jun 13, 2014, at 4:55 PM, DuyHai Doan <doanduy...@gmail.com> wrote: >>> >>> There are always the pros and the cons with a querying language, as >>> always. >>> >>> But as far as I can see, the advantages of Thrift I can see over CQL3 >>> are: >>> >>> 1) Thrift require a little bit less decoding server-side (a difference >>> around 10% in CPU usage). >>> >>> 2) Thrift use more "compact" storage because CQL3 need to add extra >>> "marker" columns to guarantee the existence of primary key. It is worsen >>> when you use clustering columns because for each distinct clustering group >>> you have a related "marker" columns. >>> >>> That being said, point 1) is not really an issue since most of the time >>> nodes are more I/O bound than CPU bound. Only in extreme cases where you >>> have incredible read rate with data that fits entirely in memory that you >>> may notice the difference. >>> >>> For point 2) this is a small trade-off to have access to a query >>> language and being able to do slice queries using the WHERE clause. Some >>> like it, other hate it, it's just a question of taste. Please note that >>> the "waste" in disk space is somehow mitigated by compression. >>> >>> Long story short I think Thrift may have appropriate usage but only in >>> very few use cases. Recently a lot of improvement and features have been >>> added to CQL3 so that it shoud be considered as the first choice for most >>> users and if they fall into those few use cases then switch back to Thrift >>> >>> My 2 cents >>> >>> >>> >>> >>> >>> >>> On Fri, Jun 13, 2014 at 11:43 PM, Peter Lin <wool...@gmail.com> wrote: >>> >>>> >>>> With text based query approach like CQL, you loose the type with >>>> dynamic columns. Yes, we're storing it as bytes, but it is simpler and >>>> easier with Thrift to do these types of things. >>>> >>>> I like CQL3 and what it does, but text based query languages make >>>> certain dynamic schema use cases painful. Having used and built ORM's they >>>> are poorly suited to dynamic schemas. If you've never had to write an ORM >>>> to handle dynamic user defined schemas at runtime, it's tough to see where >>>> the problems arise and how that makes life painful. >>>> >>>> Just to be clear, I'm not saying "don't use CQL3" or "CQL3 is bad". I'm >>>> saying CQL3 is good for certain kinds of use cases and Thrift is good at >>>> certain use cases. People need to look at what and how they're storing data >>>> and do what makes the most sense to them. Slavishly following CQL3 doesn't >>>> make any sense to me. >>>> >>>> >>>> >>>> On Fri, Jun 13, 2014 at 5:30 PM, DuyHai Doan <doanduy...@gmail.com> >>>> wrote: >>>> >>>>> "the validation type is set to bytes, and my code is type safe, so it >>>>> knows which serializers to use. Those dynamic columns are driven off the >>>>> types in Java." --> Correct. However, you are still bound by the column >>>>> comparator type which should be fixed (unless again you set it to bytes, >>>>> in >>>>> this case you loose the ordering and sorting feature) >>>>> >>>>> Basically what you are doing is telling Cassandra to save data in the >>>>> cells as raw bytes, the serialization is taken care client side using the >>>>> appropriate serializer. This is perfectly a valid strategy. >>>>> >>>>> But how is it different from using CQL3 and setting the value to >>>>> "blob" (equivalent to bytes) and take care of the serialization >>>>> client-side >>>>> also ? You can even imagine saving value in JSON format and set the type >>>>> to >>>>> "text". >>>>> >>>>> Really, I don't see why CQL3 cannot achieve the scenario you describe. >>>>> >>>>> For the record, when you create a table in CQL3 as follow: >>>>> >>>>> CREATE TABLE user ( >>>>> id bigint PRIMARY KEY, >>>>> firstname text, >>>>> lastname text, >>>>> last_connection timestamp, >>>>> ....); >>>>> >>>>> C* will create a column family with validation type = bytes to >>>>> accommodate the timestamp and text types for the firstname, lastname and >>>>> last_connection columns. Basically the CQL3 engine is doing the >>>>> serialization server-side for you >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jun 13, 2014 at 11:19 PM, Peter Lin <wool...@gmail.com> wrote: >>>>> >>>>>> >>>>>> the validation type is set to bytes, and my code is type safe, so it >>>>>> knows which serializers to use. Those dynamic columns are driven off the >>>>>> types in Java. >>>>>> >>>>>> Having said that, CQL3 does have a new custom type feature, but the >>>>>> documentation is basically non-existent on how that actually works. One >>>>>> could also modify CQL such that insert statements gives Cassandra hints >>>>>> about what type it is, but I'm not aware of anyone enhancing CQL3 to do >>>>>> that. >>>>>> >>>>>> I realize my kind of use case is a bit unique, but I do know of >>>>>> others that are doing similar kinds of things. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Jun 13, 2014 at 5:11 PM, DuyHai Doan <doanduy...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> In thrift, when creating a column family, you need to define >>>>>>> >>>>>>> 1) the row/partition key type >>>>>>> 2) the column comparator type >>>>>>> 3) the validation type for the actual value (cell in CQL3 >>>>>>> terminology) >>>>>>> >>>>>>> Unless you use "dynamic composites" feature, which does not exist >>>>>>> (and probably won't) in CQL3, I don't see how you can have columns with >>>>>>> "different types" on the same row/partition >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 13, 2014 at 11:06 PM, Peter Lin <wool...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> when I say dynamic column, I mean non-static columns of different >>>>>>>> types within the same row. Some could be an object or one of the >>>>>>>> defined >>>>>>>> datatypes. >>>>>>>> >>>>>>>> with thrift I use the appropriate serializer to handle these >>>>>>>> dynamic columns. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 13, 2014 at 4:55 PM, DuyHai Doan <doanduy...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Well, before talking and discussing about "dynamic columns", we >>>>>>>>> should first define it clearly. What do people mean by "dynamic >>>>>>>>> columns" >>>>>>>>> exactly ? Is it the ability to add many columns "of same type" to an >>>>>>>>> existing physical row? If yes then CQL3 does support it with >>>>>>>>> clustering >>>>>>>>> columns. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jun 13, 2014 at 10:36 PM, Mark Greene <green...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Yeah I don't anticipate more than 1000 properties, well under in >>>>>>>>>> fact. I guess the trade off of using the clustered columns is that >>>>>>>>>> I'd have >>>>>>>>>> a table that would be tall and skinny which also has its challenges >>>>>>>>>> w/r/t >>>>>>>>>> memory. >>>>>>>>>> >>>>>>>>>> I'll look into your suggestion a bit more and consider some >>>>>>>>>> others around a hybrid of CQL and Thrift (where necssary). But from a >>>>>>>>>> newb's perspective, I sense the community is unsettled around this >>>>>>>>>> concept >>>>>>>>>> of truly dynamic columns. Coming from an HBase background, it's a >>>>>>>>>> consideration I didn't anticipate having to evaluate. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> about.me <http://about.me/markgreene> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jun 13, 2014 at 4:19 PM, DuyHai Doan < >>>>>>>>>> doanduy...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Mark >>>>>>>>>>> >>>>>>>>>>> I believe that in your table you want to have some "common" >>>>>>>>>>> fields that will be there whatever customer is, and other fields >>>>>>>>>>> that are >>>>>>>>>>> entirely customer-dependent, isn't it ? >>>>>>>>>>> >>>>>>>>>>> In this case, creating a table with static columns for the >>>>>>>>>>> common fields and a clustering column representing all custom fields >>>>>>>>>>> defined by a customer could be a solution (see here for static >>>>>>>>>>> column: >>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-6561 ) >>>>>>>>>>> >>>>>>>>>>> CREATE TABLE user_data ( >>>>>>>>>>> user_id bigint, >>>>>>>>>>> user_firstname text static, >>>>>>>>>>> user_lastname text static, >>>>>>>>>>> ... >>>>>>>>>>> custom_property_name text, >>>>>>>>>>> custom_property_value text, >>>>>>>>>>> PRIMARY KEY(user_id, custom_property_name, >>>>>>>>>>> custom_property_value)); >>>>>>>>>>> >>>>>>>>>>> Please note that with this solution you need to have "at least >>>>>>>>>>> one" custom property per customer to make it work >>>>>>>>>>> >>>>>>>>>>> The only thing to take care of is the type of >>>>>>>>>>> custom_property_value. You need to define it once for all. To >>>>>>>>>>> accommodate >>>>>>>>>>> for dynamic types, you can either save the value as blob or text(as >>>>>>>>>>> JSON) >>>>>>>>>>> and take care of the serialization/deserialization yourself at the >>>>>>>>>>> client >>>>>>>>>>> side >>>>>>>>>>> >>>>>>>>>>> As an alternative you can save custom properties in a map, >>>>>>>>>>> provided that their number is not too large. But considering the >>>>>>>>>>> business >>>>>>>>>>> case of CRM, I believe that it's quite rare and user has more than >>>>>>>>>>> 1000 >>>>>>>>>>> custom properties isn't it ? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jun 13, 2014 at 10:03 PM, Mark Greene < >>>>>>>>>>> green...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> My use case requires the support of arbitrary columns much like >>>>>>>>>>>> a CRM. My users can define 'custom' fields within the application. >>>>>>>>>>>> Ideally >>>>>>>>>>>> I wouldn't have to change the schema at all, which is why I like >>>>>>>>>>>> the old >>>>>>>>>>>> thrift approach rather than the CQL approach. >>>>>>>>>>>> >>>>>>>>>>>> Having said all that, I'd be willing to adapt my API to make >>>>>>>>>>>> explicit schema changes to Cassandra whenever my user makes a >>>>>>>>>>>> change to >>>>>>>>>>>> their custom fields if that's an accepted practice. >>>>>>>>>>>> >>>>>>>>>>>> Ultimately, I'm trying to figure out of the Cassandra community >>>>>>>>>>>> intends to support true schemaless use cases in the future. >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> about.me <http://about.me/markgreene> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jun 13, 2014 at 3:47 PM, DuyHai Doan < >>>>>>>>>>>> doanduy...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> This strikes me as bad practice in the world of multi tenant >>>>>>>>>>>>> systems. I don't want to create a table per customer. So I'm >>>>>>>>>>>>> wondering if >>>>>>>>>>>>> dynamically modifying the table is an accepted practice? --> Can >>>>>>>>>>>>> you give >>>>>>>>>>>>> some details about your use case ? How would you "alter" a table >>>>>>>>>>>>> structure >>>>>>>>>>>>> to adapt it to a new customer ? >>>>>>>>>>>>> >>>>>>>>>>>>> Wouldn't it be better to model your table so that it supports >>>>>>>>>>>>> addition/removal of customer ? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jun 13, 2014 at 9:00 PM, Mark Greene < >>>>>>>>>>>>> green...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks DuyHai, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have a follow up question to #2. You mentioned ideally I >>>>>>>>>>>>>> would create a new table instead of mutating an existing one. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This strikes me as bad practice in the world of multi tenant >>>>>>>>>>>>>> systems. I don't want to create a table per customer. So I'm >>>>>>>>>>>>>> wondering if >>>>>>>>>>>>>> dynamically modifying the table is an accepted practice? >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> about.me <http://about.me/markgreene> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jun 13, 2014 at 2:54 PM, DuyHai Doan < >>>>>>>>>>>>>> doanduy...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello Mark >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dynamic columns, as you said, are perfectly supported by >>>>>>>>>>>>>>> CQL3 via clustering columns. And no, using collections for >>>>>>>>>>>>>>> storing dynamic >>>>>>>>>>>>>>> data is a very bad idea if the cardinality is very high (>> >>>>>>>>>>>>>>> 1000 elements) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1) Is using Thrift a valid approach in the era of CQL? --> >>>>>>>>>>>>>>> Less and less. Unless you are looking for extreme performance, >>>>>>>>>>>>>>> you'd better >>>>>>>>>>>>>>> off choosing CQL3. The ease of programming and querying with >>>>>>>>>>>>>>> CQL3 does >>>>>>>>>>>>>>> worth the small overhead in CPU >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2) If CQL is the best practice, should I alter the schema >>>>>>>>>>>>>>> at runtime when I detect I need to do an schema mutation? --> >>>>>>>>>>>>>>> Ideally you >>>>>>>>>>>>>>> should not alter schema but create a new table to adapt to your >>>>>>>>>>>>>>> changing >>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 3) If I utilize CQL collections, will Cassandra page the >>>>>>>>>>>>>>> entire thing into the heap? --> Of course. All collections and >>>>>>>>>>>>>>> maps in >>>>>>>>>>>>>>> Cassandra are eagerly loaded entirely in memory on server side. >>>>>>>>>>>>>>> That's why >>>>>>>>>>>>>>> it is recommended to limit their cardinality to ~ 1000 elements >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jun 13, 2014 at 8:33 PM, Mark Greene < >>>>>>>>>>>>>>> green...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm looking for some best practices w/r/t supporting >>>>>>>>>>>>>>>> arbitrary columns. It seems from the docs I've read around CQL >>>>>>>>>>>>>>>> that they >>>>>>>>>>>>>>>> are supported in some capacity via collections but you can't >>>>>>>>>>>>>>>> exceed 64K in >>>>>>>>>>>>>>>> size. For my requirements that would cause problems. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So my questions are: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1) Is using Thrift a valid approach in the era of CQL? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2) If CQL is the best practice, should I alter the schema >>>>>>>>>>>>>>>> at runtime when I detect I need to do an schema mutation? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3) If I utilize CQL collections, will Cassandra page the >>>>>>>>>>>>>>>> entire thing into the heap? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> My data model is akin to a CRM, arbitrary column >>>>>>>>>>>>>>>> definitions per customer. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> Mark >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >> > >