Re: Performance problem with large wide row inserts using CQL

Peter Lin Thu, 20 Feb 2014 16:46:25 -0800

Yeah

Slowly nosql products are adding schema :)


At least Cassandra is ahead of the curve

Sent from my iPhone

> On Feb 20, 2014, at 7:37 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> 
> Recomendations in cassandra have a shelf life of about 1 to 2 years. If you 
> try to assert a recomendation from year ago you stand a solid chance of 
> someone telling you there is now a better way.
> 
> Casaandra once loved being a schemaless datastore. Imagine that?
> 
> 
> On Thursday, February 20, 2014, Peter Lin <wool...@gmail.com> wrote:
> >
> > good example Ed.
> >
> > I'm so happy to see other people doing things like this. Even if the 
> > official DataStax docs recommend don't mix static and dynamic, to me that's 
> > a huge disservice to Cassandra users.
> >
> > If someone really wants to stick to relational model, then NewSql is a 
> > better fit, plus gives users the full power of SQL with subqueries, like, 
> > and joins. NewSql can't handle these kinds of use cases due to static 
> > nature of relational tables, row size limit and column limit.
> >
> >
> >
> > On Thu, Feb 20, 2014 at 6:18 PM, Edward Capriolo <edlinuxg...@gmail.com> 
> > wrote:
> >
> > CASSANDRA-6561 is interesting. Though having statically defined columns are 
> > not exactly a solution to do everything in "thrift".
> >
> > http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-metadata/
> >
> > Before collections or CQL existed I did some of these concepts myself.
> >
> > Say you have a column family named AllMyStuff
> >
> > columns named "friends_" would be a string and they would be a "Map" of 
> > friends to age
> >
> > set AllMySuff[edward][friends_bob]=34
> >
> > set AllMySuff[edward][friends_sara]=33
> >
> > Column name password could be a string
> >
> > set AllMySuff[edward][password]='mother'
> >
> > Columns named phone[00] phone[100] would be an array of phone numbers
> >
> > set AllMySuff[edward][phone[00]]=555-5555'
> >
> > It was quite easy for me to slice all the phone numbers
> >
> > startkey: phone
> > endkey: phone[100]
> >
> > But then every column starting with "action_xxxx" could be a page hit and i 
> > could have thousands / ten thousands of these
> >
> > In many cases CQL has nice/nicer abstractions for some of these things. But 
> > its largest detraction for me is that I can not take this already existing 
> > column family AllMyStuff and 'explain' it to CQL. Its a perfectly valid way 
> > to design something, and might be (probably) is more space efficient then 
> > the system of using composites CQL uses to pack things. I feel that as a 
> > data access language it dictates too much schema, not only what is in row 
> > schema, but it controls the format of the data on disk as well. Also 
> > schema's like mine above are very valid but selecting them into a table of 
> > fixed rows and columns does not map well.
> >
> > The way hive handles tackles this problem, is that the metadata is 
> > interpreted by a SerDe so that the physical data and the logical definition 
> > are not coupled.
> >
> >
> >
> >
> > On Thu, Feb 20, 2014 at 5:23 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
> >
> > Rüdiger
> >
> > "SortedMap<byte[], SortedMap<byte[], Pair<Long, byte[]>>"
> >
> >  When using a RandomPartitioner or Murmur3Partitioner, the outer map is a 
> > simple Map, not SortedMap.
> >
> >  The only case you have a SortedMap for row key is when using 
> > OrderPreservingPartitioner, which is clearly not advised for most cases 
> > because of hot spots in the cluster.
> >
> >
> >
> > On Thu, Feb 2
> 
> -- 
> Sorry this was sent from mobile. Will do less grammar and spell check than 
> usual.

Re: Performance problem with large wide row inserts using CQL

Reply via email to