Peter, I must meet you and shake your hand. I was actually having a debate with a number of people about a week back claiming there was "no reason to mix static and dynamic". We do it all the time I am glad someone else besides me "gets it" and I am not totally mad.
Ed On Thu, Feb 20, 2014 at 3:26 PM, Peter Lin <wool...@gmail.com> wrote: > > Hi Duyhai, > > yes, I am talking about mixing static and dynamic columns in a single > column family. Let me give you an example from retail. > > Say you're amazon and you sell over 10K different products. How do you > store all those products with all the different properties like color, > size, dimensions, etc. With relational databases people use EAV (entity > attribute value) tables. This means querying for data the system has to > reconstruct the object by pivot a bunch of rows and flattening it out to > populate the java object. Typically there are common fields to a product > like SKU, price, and category. > > Using both static and dynamic columns, data can be stored in 1 row and > queried by 1 row. Anyone that has used EAV approach to build product > databases will tell you how much that sucks. Another example is from auto > insurance. Typically a policy database will allow 1 or more types of items > for property insurance. Property insurance is home/auto insurance. > > Each insurance carrier supports different number of insurable items, > coverages and endorsements. Many systems use the same EAV approach, but the > problem is bigger. Typically a commercial auto policy may have hundreds of > drivers and vehicles. Each policy may have dozens or hundreds of coverages > and endorsements. It is common for an auto insurance model to have hundreds > of coverage and endorsements with different properties. Using the old ORM > approach, it's usually mapped table-per-class. Problem is, that results in > query explosion for polymorphic queries. This is a known problem with > polymorphic queries using traditional techniques. > > Given that Cassandra + thrift gives developers the ability to store > dynamic columns of different types, it solves the performance issues > inherent in EAV technique. > > The point I was trying to make in my first response is that going with > pure CQL makes it much harder to take advantage of the COOL features of > Cassandra. It does require building a framework to make it "mostly" > transparent to developers, but it is worth it in my opinion to learn and > understand both thrift and cql. I use annotations in my framework and > delegates to handle the serialization. This way, the developer only needs > annotate the class and the framework handles serialization and > deserialization. > > > > > > On Thu, Feb 20, 2014 at 3:05 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > >> "Developers can use what ever type they want for the name or value in a >> dynamic column and the framework will handle it appropriately." >> >> What do you mean by "dynamic" column ? If you want to be able to insert >> an arbitrary number of columns in one physical row, CQL3 clustering is >> there and does pretty well the job. >> >> If by "dynamic" you mean a column whose validation type can change at >> runtime (like the dynamic composite type : >> http://hector-client.github.io/hector/build/html/content/composite_with_templates.html) >> then why don't you just use blob type and serialize it yourself at client >> side ? >> >> More pratically, in your previous example : >> >> - insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int, >> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as >> long) >> >> I can't see real sensible use-case where you need to mix static and >> dynamic columns in the same column family. If you need to save domain >> model, use skinny row with a fixed number of columns known before hand. If >> you want to store time series or timeline of data, wide row is there. >> >> >> On Thu, Feb 20, 2014 at 8:55 PM, Peter Lin <wool...@gmail.com> wrote: >> >>> >>> my apologies Sylvain, I didn't mean to misquote you. I still feel that >>> even if someone is only going to use CQL, it is "worth it" to learn thrift. >>> >>> In the interest of discussion, I looked at both jira tickets and I don't >>> see how that makes it so a developer can specify the name and value type >>> for a dynamic column. >>> >>> https://issues.apache.org/jira/browse/CASSANDRA-6561 >>> https://issues.apache.org/jira/browse/CASSANDRA-4851 >>> >>> Am I missing something? If the grammar for insert statements doesn't >>> give users the ability declare the name and value type, it means the >>> developer has to default name and value to bytes. In their code, they have >>> to handle that manually or build their own framework. I built my own >>> framework, which handles this for me. Developers can use what ever type >>> they want for the name or value in a dynamic column and the framework will >>> handle it appropriately. >>> >>> To me, developers should take time to learn both and use both. I realize >>> it's more work to understand both and take time to read the code. Not >>> everyone is crazy enough spend time reading cassandra code base or spend >>> hundreds of hours studying hector and other cassandra clients. I will say >>> this, if I hadn't spend time studying cassandra and reading Hector code, I >>> wouldn't have been able to help one of DataStax customer port Hector to >>> .Net. I also wouldn't have been able to port Hector to C# natively in 3 >>> months. >>> >>> Rather than recommend people be lazy, it would be more useful to list >>> the pros/cons. To my knowledge, there isn't a good writeup on the pros/cons >>> of thrift and cql on cassandra.apache.org. I don't know if the DataStax >>> docs have a detailed write up of it, does it? >>> >>> >>> >>> >>> On Thu, Feb 20, 2014 at 12:46 PM, Sylvain Lebresne <sylv...@datastax.com >>> > wrote: >>> >>>> On Thu, Feb 20, 2014 at 6:26 PM, Peter Lin <wool...@gmail.com> wrote: >>>> >>>>> >>>>> I disagree with the sentiment that "thrift is not worth the trouble". >>>>> >>>> >>>> Way to quote only part of my sentence and get mental on it. My full >>>> sentence was "it's probably not worth the trouble to start with thrift if >>>> you're gonna use CQL later". >>>> >>>> >>>>> >>>>> CQL and all SQL inspired dialects limit one's ability to use arbitrary >>>>> typed data in dynamic columns. With thrift it's easy and straight forward. >>>>> With CQL there is no way to tell Cassandra the type of the name and value >>>>> for a dynamic column. You can only set the default type. That means using >>>>> a >>>>> "pure cql" approach you can deviate from the default type. Cassandra will >>>>> throw an exception indicating the type is different than the default type. >>>>> >>>> >>>>> Until such time that CQL abandons the shackles of SQL and adds the >>>>> ability to indicate the column and value type. Something like this >>>>> >>>> >>>>> insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int, >>>>> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as >>>>> long) >>>>> >>>>> This is one area where Thrift is superior to CQL. Having said that, >>>>> it's valid to use Cassandra "as if" it was a relational database, but then >>>>> you'd miss out on some of the unique features. >>>>> >>>> >>>> Man, if I had a nickel every time someone came on that mailing list >>>> pretending that something was possible with thrift and not CQL ... I will >>>> claim this: with CASSANDRA-6561 and CASSANDRA-4851 that just got in, there >>>> is *nothing* that thrift can do that CQL cannot. But well, what do I know >>>> about Cassandra. >>>> >>>> -- >>>> Sylvain >>>> >>>> >>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Feb 20, 2014 at 12:12 PM, Sylvain Lebresne < >>>>> sylv...@datastax.com> wrote: >>>>> >>>>>> On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo < >>>>>> edlinuxg...@gmail.com> wrote: >>>>>> >>>>>>> For what it is worth you schema is simple and uses compact storage. >>>>>>> Thus you really dont need anything in cassandra 2.0 as far as i can >>>>>>> tell. >>>>>>> You might be happier with a stable release like 1.2.something and just >>>>>>> hector or astyanax. You are really dealing with many issues you should >>>>>>> not >>>>>>> have to just to protoype a simple cassandra app. >>>>>> >>>>>> >>>>>> >>>>>> Of course, if everyone was using that reasoning, no-one would ever >>>>>> test new features and report problems/suggest improvement. So thanks to >>>>>> anyone like Rüdiger that actually tries stuff and take the time to report >>>>>> problems when they think they encounter one. Keep at it, *you* are the >>>>>> one >>>>>> helping Cassandra to get better everyday. >>>>>> >>>>>> And you are also right Rüdiger that it's probably not worth the >>>>>> trouble to start with thrift if you're gonna use CQL later. And you >>>>>> definitively should use CQL, it is Cassandra's future. >>>>>> >>>>>> -- >>>>>> Sylvain >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> On Thursday, February 20, 2014, Sylvain Lebresne < >>>>>>> sylv...@datastax.com> wrote: >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn <rkla...@gmail.com> >>>>>>> wrote: >>>>>>> >> >>>>>>> >> I have cloned the cassandra repo, applied the patch, and built >>>>>>> it. But when I want to run the bechmark I get an exception. See below. I >>>>>>> tried with a non-managed dependency to >>>>>>> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, >>>>>>> which I >>>>>>> compiled from source because I read that that might help. But that did >>>>>>> not >>>>>>> make a difference. >>>>>>> >> >>>>>>> >> So currently I don't know how to give the patch a try. Any ideas? >>>>>>> >> >>>>>>> >> cheers, >>>>>>> >> >>>>>>> >> Rüdiger >>>>>>> >> >>>>>>> >> Exception in thread "main" java.lang.IllegalArgumentException: >>>>>>> replicate_on_write is not a column defined in this metadata >>>>>>> >> at >>>>>>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273) >>>>>>> >> at >>>>>>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279) >>>>>>> >> at com.datastax.driver.core.Row.getBool(Row.java:117) >>>>>>> >> at >>>>>>> com.datastax.driver.core.TableMetadata$Options.<init>(TableMetadata.java:474) >>>>>>> >> at >>>>>>> com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107) >>>>>>> >> at >>>>>>> com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128) >>>>>>> >> at >>>>>>> com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89) >>>>>>> >> at >>>>>>> com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259) >>>>>>> >> at >>>>>>> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214) >>>>>>> >> at >>>>>>> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161) >>>>>>> >> at >>>>>>> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77) >>>>>>> >> at >>>>>>> com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890) >>>>>>> >> at >>>>>>> com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910) >>>>>>> >> at >>>>>>> com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806) >>>>>>> >> at com.datastax.driver.core.Cluster.connect(Cluster.java:158) >>>>>>> >> at >>>>>>> cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31) >>>>>>> >> at scala.Function0$class.apply$mcV$sp(Function0.scala:40) >>>>>>> >> at >>>>>>> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) >>>>>>> >> at scala.App$$anonfun$main$1.apply(App.scala:71) >>>>>>> >> at scala.App$$anonfun$main$1.apply(App.scala:71) >>>>>>> >> at scala.collection.immutable.List.foreach(List.scala:318) >>>>>>> >> at >>>>>>> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) >>>>>>> >> at scala.App$class.main(App.scala:71) >>>>>>> >> at >>>>>>> cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5) >>>>>>> >> at >>>>>>> cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala) >>>>>>> > >>>>>>> > I believe you've tried the cassandra trunk branch? trunk is >>>>>>> basically the future Cassandra 2.1 and the driver is currently unhappy >>>>>>> because the replicate_on_write option has been removed in that version. >>>>>>> I'm >>>>>>> supposed to have fixed that on the driver 2.0 branch like 2 days ago so >>>>>>> maybe you're also using a slightly old version of the driver sources in >>>>>>> there? Or maybe I've screwed up my fix, I'll double check. But anyway, >>>>>>> it >>>>>>> would be overall simpler to test with the cassandra-2.0 branch of >>>>>>> Cassandra, with which you shouldn't run into that. >>>>>>> > -- >>>>>>> > Sylvain >>>>>>> >>>>>>> -- >>>>>>> Sorry this was sent from mobile. Will do less grammar and spell >>>>>>> check than usual. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >