Re: Wide row column slicing - row size shard limit

2012-02-16 Thread Data Craftsman
Hi Aaron Morton and   R. Verlangen,

Thanks for the quick answer. It's good to know Thrift's limit on the amount
of data it will accept / send.

I know the hard limit is 2 billion columns per row. My question is at what
size it will slowdown read/write performance and maintenance.  The blog I
reference said the row size should be less than 10MB.

It'll be better if Cassandra can transparently shard/split the wide row and
then distribute them to many nodes, to help the load balancing.

Are there any other ways to model historical data
(or time-series-data) besides wide row column slicing in Cassandra?

Thanks,
Charlie | Data Solution Architect Developer
http://mujiang.blogspot.com



On Thu, Feb 16, 2012 at 12:38 AM, aaron morton wrote:

> > Based on this blog of Basic Time Series with Cassandra data modeling,
> > http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
> I've not read that one but it sounds right. Mat Dennis knows his stuff
> http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling
>
> > There is a limit on how big the row size can be before slowing down the
> update and query performance, that is 10MB or less.
> There is no hard limit. Wide rows wont upset writes too much. Some read
> queries can avoid problems but most will not.
>
> Wide rows are a pain when it comes to maintenance.  They take longer to
> compact and repair.
>
> > Is this still true in Cassandra latest version? or in what release
> Cassandra will remove this limit?
> There is a limit of 2 billion columns per row. There is a not a limit of
> 10MB per row. I've seen some rows in the 100's of MB and they are always a
> pain.
>
> > Manually sharding the wide row will increase the application complexity,
> it would be better if Cassandra can handle it transparently.
> it's not that hard :)
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/02/2012, at 7:40 AM, Data Craftsman wrote:
>
> > Hello experts,
> >
> > Based on this blog of Basic Time Series with Cassandra data modeling,
> > http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
> >
> > "This (wide row column slicing) works well enough for a while, but over
> time, this row will get very large. If you are storing sensor data that
> updates hundreds of times per second, that row will quickly become gigantic
> and unusable. The answer to that is to shard the data up in some way"
> >
> > There is a limit on how big the row size can be before slowing down the
> update and query performance, that is 10MB or less.
> >
> > Is this still true in Cassandra latest version? or in what release
> Cassandra will remove this limit?
> >
> > Manually sharding the wide row will increase the application complexity,
> it would be better if Cassandra can handle it transparently.
> >
> > Thanks,
> > Charlie | DBA & Developer
> >
> >
> > p.s. Quora link,
> >
> http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data
> >
> >
> >
>
>


Re: Is this the correct data model thinking?

2012-03-01 Thread Data Craftsman
Yes.  Think in queries.
• 
Break your normalization habit
• 
Roughly ~one CF per query
• 
Denormalize!
• 
Use in-column entity caching


On Tue, Feb 28, 2012 at 12:12 AM, aaron morton wrote:

> A.) store ALL the data associated with the user onto a single users
> row-key. Some user keys may be small, others may get larger over time
> depending upon activity.
>
> I would go with this.
> The important thing is supporting the read queries.
>
> Cheers
> Aaron
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 28/02/2012, at 7:40 PM, Blake Starkenburg wrote:
>
> Using a user/member as an example I am curious which of the data models
> would be the best fit for performance and longevity of data in Cassandra?
>
> Consider the simple staples of user/member details like
> username,email,address,state,preferences,etc. Fairly simple, storing this
> data into a row key users->username[email], etc.
>
> Now as time goes on more data such as snapshot changes like
> users->username['change:123456] = 'changed email', etc. columns compound
> onto the users row-key. Perhaps more preferences are added onto the row-key
> or login information. I wouldn't expect the amount of columns to grow
> hugely, but I've also learned to plan for the un-expected...
>
> Simplicity would tell me to:
>
> A.) store ALL the data associated with the user onto a single users
> row-key. Some user keys may be small, others may get larger over time
> depending upon activity.
>
> but would B be a better performance model
>
> B.) Split out user data into seperate row-keys such as
> users->changes_username['change123456] = 'changed email' AND
> users->preferences_username['fav_color] = 'blue'. This would add a level of
> complexity and in some cases tiny row-keys along with multiple fetches for
> all user/member data?
>
> Curious what your opinions are?
>
> Thanks!
>
>
>
-- 
Thanks,

Charlie (Yi) Zhu (一个 木匠)
===
Data Solution Architect Developer
http://mujiang.blogspot.com


Composite primary key does not work on Cassandra 1.1.0-beta1

2012-03-01 Thread Data Craftsman
Howdy,

Here is the the CQL and error. Did I do something wrong?

/home/cassandra>cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 2.1.0 | Cassandra 1.1.0-beta1 | CQL spec 2.0.0 | Thrift protocol 19.28.0]
Use HELP for help.

cqlsh> use demo;

cqlsh:demo> CREATE TABLE timeline (
... user_id varchar,
... tweet_id uuid,
... author varchar,
... body varchar,
... PRIMARY KEY (user_id, tweet_id)
... );

Bad Request: line 7:0 mismatched input ')' expecting EOF

cqlsh:demo>

Reference: http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

Thanks,
Charlie | DBA


Re: Combining Cassandra with some SQL language

2012-03-01 Thread Data Craftsman
Agree. That is Transactional database API.

Orthogonality: Modular programming, implement Transactional database
API as database access interface.

Thanks,
Charlie | DBA

On Sun, Feb 26, 2012 at 6:21 AM, Adam Haney  wrote:
> I've been using a combination of MySQL and Cassandra for about a year now on
> a project that now serves about 20k users. We use Cassandra for storing
> large entities and MySQL to store meta data that allows us to do better ad
> hoc querying. It's worked quite well for us. During this time we have also
> been able to migrate some of our tables in MySQL to Cassandra if MySQL
> performance / capacity became a problem. This may seem obvious but if you're
> planning on creating a data model that spans multiple databases make sure
> you encapsulate the logic to read/write/delete information in a good data
> model library and only use that library to access your data. This is good
> practice anyway but when you add the extra complication of multiple
> databases that may reference one another it's an absolute must.
>
>
> On Sun, Feb 26, 2012 at 8:06 AM, R. Verlangen  wrote:
>>
>> Hi there,
>>
>> I'm currently busy with the technical design of a new project. Of course
>> it will depend on your needs, but is it weird to combine Cassandra with a
>> SQL language like MySQL?
>>
>> In my usecase it would be nice because we have some tables/CF's with lots
>> and lots of data that does not really have to be consistent 100%, but also
>> have some data that should be always consistent.
>>
>> What do you think of this?
>>
>> With kind regards,
>> Robin Verlangen
>
>


Re: Performance overhead when using start and end columns

2012-03-26 Thread Data Craftsman
Hi Aaron,

Thanks for the benchmark. The matrix is valuable.

Thanks,

Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


On Mon, Mar 26, 2012 at 10:53 AM, aaron morton  wrote:
> See the test's in the article.
>
> The code I used for profiling is also available.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 27/03/2012, at 6:21 AM, Mohit Anchlia wrote:
>
> Thanks but if I do have to specify start and end columns then how much
> overhead roughly would that translate to since reading metadata should be
> constant overall?
>
> On Mon, Mar 26, 2012 at 10:18 AM, aaron morton 
> wrote:
>>
>> Some information on query plans
>> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
>>
>> Tl;Dr; Select columns with no start, in the natural Comparator order.
>>
>> Cheers
>>
>>
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 25/03/2012, at 2:25 PM, Mohit Anchlia wrote:
>>
>> I have rows with around 2K-50K columns but when I do a query I only need
>> to fetch few columns between start and end columns. I was wondering what
>> performance overhead does it cause by using slice query with start and end
>> columns?
>>
>> Looking at the code it looks like when you give start and end column it
>> goes in IndexSliceReader logic, but it's hard to tell how much overhead on
>> an average one would see? Or is it even worth worrying about?
>>
>>
>
>


Server side scripting support in Cassandra - go Python !

2012-03-26 Thread Data Craftsman
Howdy,

Some Polyglot Persistence(NoSQL) products started support server side
scripting, similar to RDBMS store procedure.
E.g. Redis Lua scripting.

I wish it is Python when Cassandra has the server side scripting feature.

FYI,

http://antirez.com/post/250

http://nosql.mypopescu.com/post/19949274021/alchemydb-an-integrated-graphdb-rdbms-kv-store

"server side scripting support is an extremely powerful tool. Having
processing close to data (i.e. data locality) is a well known
advantage, ..., it can open the doors to completely new features."

Thanks,

Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


Re: Server side scripting support in Cassandra - go Python !

2012-04-05 Thread Data Craftsman
Just like the Oracle Store Procedure.

2012/3/26 Data Craftsman :
> Howdy,
>
> Some Polyglot Persistence(NoSQL) products started support server side
> scripting, similar to RDBMS store procedure.
> E.g. Redis Lua scripting.
>
> I wish it is Python when Cassandra has the server side scripting feature.
>
> FYI,
>
> http://antirez.com/post/250
>
> http://nosql.mypopescu.com/post/19949274021/alchemydb-an-integrated-graphdb-rdbms-kv-store
>
> "server side scripting support is an extremely powerful tool. Having
> processing close to data (i.e. data locality) is a well known
> advantage, ..., it can open the doors to completely new features."
>
> Thanks,
>
> Charlie (@mujiang) 一个 木匠
> ===
> Data Architect Developer
> http://mujiang.blogspot.com


Re: Materialized Views or Index CF - data model question

2012-04-10 Thread Data Craftsman
Hi Aaron,

Thanks for the quick answer, I'll build a prototype to benchmark each
approach next week.

Here are more questions based on your reply:

a) "These queries are not easily supported on standard Cassandra"
select * from book where price  < 992   order by price descending limit 30;

This is a typical (time series data)timeline query well supported by
Cassandra, from my understanding.

b) "You do not need a different CF for each custom secondary index.
Try putting the name of the index in the row key. "

I couldn't understand it. Can you help to build an demo with CF
structure and some sample data?

Thanks,
Charlie | DBA developer



On Sun, Apr 8, 2012 at 2:30 PM, aaron morton  wrote:
> We need to query data by each column, do pagination as below,
>
> select * from book where isbn   < "XYZ" order by ISBN   descending limit 30;
> select * from book where price  < 992   order by price  descending limit 30;
> select * from book where col_n1 < 789   order by col_n1 descending limit 30;
> select * from book where col_n2 < "MUJ" order by col_n2 descending limit 30;
> ...
> select * from book where col_nm < 978 order by col_nm descending limit 30;
>
> These queries are not easily supported on standard Cassandra. If you need
> this level of query complexity consider Data Stax Enterprise, Solr, or a
> RDBMS.
>
> If we choose Materialized Views approach, we have to update all
> 20 Materialized View column family(s), for each base row update.
> Will the Cassandra write performance acceptable?
>
> Yes, depending on the size of the cluster and the machine spec.
>
> It's often a good idea to design CF's to match the workloads. If you have
> some data that changes faster than other, consider splitting them into
> different CFs.
>
> Should we just normalize the data, create base book table with book_id
> as primary key, and then
> build 20 index column family(s), use wide row column slicing approach,
> with index column data value as column name and book_id as value?
>
> You do not need a different CF for each custom secondary index. Try putting
> the name of the index in the row key.
>
> What will you recommend?
>
> Take another look at the queries you *need* to support. Then build a small
> proof of concept to see if Cassandra will work for you.
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/04/2012, at 6:46 AM, Data Craftsman wrote:
>
> Howdy,
>
> Can I ask a data model question here?
>
> We have a book table with 20 columns, 300 million rows, average row
> size is 1500 bytes.
>
> create table book(
> book_id,
> isbn,
> price,
> author,
> titile,
> ...
> col_n1,
> col_n2,
> ...
> col_nm
> );
>
> Data usage:
>
> We need to query data by each column, do pagination as below,
>
> select * from book where isbn   < "XYZ" order by ISBN   descending limit 30;
> select * from book where price  < 992   order by price  descending limit 30;
> select * from book where col_n1 < 789   order by col_n1 descending limit 30;
> select * from book where col_n2 < "MUJ" order by col_n2 descending limit 30;
> ...
> select * from book where col_nm < 978 order by col_nm descending limit 30;
>
> Write: 100 million updates a day.
> Read : 16  million queries a day. 200 queries per second, one query
> returns 30 rows.
>
> ***
> Materialized Views approach
>
> {"ISBN_01",book_object1},{"ISBN_02",book_object2},...,{"ISBN_N",book_objectN}
> ...
> We will end up with 20 timelines.
>
>
> ***
> Index approach - create 2nd Column Family as Index
>
> 'ISBN_01': 'book_id_a01','book_id_a02',...,'book_id_aN'
> 'ISBN_02': 'book_id_b01','book_id_b02',...,'book_id_bN'
> ...
> 'ISBN_0m': 'book_id_m01','book_id_m02',...,'book_id_mN'
>
> This way, we will create 20 index Column Family(s).
>
> ---
>
> If we choose Materialized Views approach, we have to update all
> 20 Materialized View column family(s), for each base row update.
> Will the Cassandra write performance acceptable?
>
> Redis recommend building an index for the query on each column, that
> is your 1st strategy - create 2nd index CF:
> http://redis.io/topics/data-types-intro
> (see section [ Pushing IDs instead of the actual data in Redis lists ]
>
> Should we just normalize the data, create base book table with book_id
> as primary key, and then
> build 20 index column family(s), use wide row column slicing approach,

Re: Is this possible.

2012-04-26 Thread Data Craftsman
Data model:

REM CQL 3.0


$> cqlsh --cql3

drop COLUMNFAMILY user_score_v3;

CREATE COLUMNFAMILY user_score_v3
(name varchar,
 highscore float,
 x int,
 y varchar,
 z varchar,
 PRIMARY KEY (name, highscore)
);

DML is as usual, as commom, as RDBMS SQL.

Query:

Top 3,

SELECT name, highscore, x,y,z FROM user_score_v3 where name='abc'
ORDER BY highscore desc
LIMIT 3;

You may try Reversed Comparators, see
http://thelastpickle.com/2011/10/03/Reverse-Comparators/

Help this is helpful.

Thank,
Charlie | DBA


On Thu, Apr 26, 2012 at 12:34 PM, Ed Jone  wrote:
> Hello,
>
> I am new to cassandra and was hoping if someone can tell me if the following
> is possible.
>
>
> Given I have a columnfamily with a list of users in each Row.
>
> Each user has the properties: name, highscore, x, y, z.
>
> I want to use name as the column key, but I want the columns to be sorted by
> highscore (always).
>
> The only reads would be to get the top N users by highscore in a given row.
> I thought about adding the weight to the name as the key (eg:
> 299.76-johnsmith) but then I would not be able to update a given user.
>
> This was not possible in the past, but I am not familiar, with the newer
> cassandra versions.


--
Thanks,

Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


Re: Is this possible.

2012-04-26 Thread Data Craftsman
DML example,

insert into user_score_v3(name, highscore, x,y,z)
values ('abc', 299.76, 1001, '*', '*');
...


2012/4/26 Data Craftsman :
> Data model:
>
> REM CQL 3.0
> 
>
> $> cqlsh --cql3
>
> drop COLUMNFAMILY user_score_v3;
>
> CREATE COLUMNFAMILY user_score_v3
> (name varchar,
>  highscore float,
>  x int,
>  y varchar,
>  z varchar,
>  PRIMARY KEY (name, highscore)
> );
>
> DML is as usual, as commom, as RDBMS SQL.
>
> Query:
>
> Top 3,
>
> SELECT name, highscore, x,y,z FROM user_score_v3 where name='abc'
> ORDER BY highscore desc
> LIMIT 3;
>
> You may try Reversed Comparators, see
> http://thelastpickle.com/2011/10/03/Reverse-Comparators/
>
> Help this is helpful.
>
> Thank,
> Charlie | DBA
>
>
> On Thu, Apr 26, 2012 at 12:34 PM, Ed Jone  wrote:
>> Hello,
>>
>> I am new to cassandra and was hoping if someone can tell me if the following
>> is possible.
>>
>>
>> Given I have a columnfamily with a list of users in each Row.
>>
>> Each user has the properties: name, highscore, x, y, z.
>>
>> I want to use name as the column key, but I want the columns to be sorted by
>> highscore (always).
>>
>> The only reads would be to get the top N users by highscore in a given row.
>> I thought about adding the weight to the name as the key (eg:
>> 299.76-johnsmith) but then I would not be able to update a given user.
>>
>> This was not possible in the past, but I am not familiar, with the newer
>> cassandra versions.
>
>
> --
> Thanks,
>
> Charlie (@mujiang) 一个 木匠
> ===
> Data Architect Developer
> http://mujiang.blogspot.com



-- 
--
Thanks,

Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


Re: Server Side Logic/Script - Triggers / StoreProc

2012-04-27 Thread Data Craftsman
Howdy,

Some Polyglot Persistence(NoSQL) products started support server side
scripting, similar to RDBMS store procedure.
E.g. Redis Lua scripting.

I wish it is Python when Cassandra has the server side scripting feature.

FYI,

http://antirez.com/post/250

http://nosql.mypopescu.com/post/19949274021/alchemydb-an-integrated-graphdb-rdbms-kv-store

"server side scripting support is an extremely powerful tool. Having
processing close to data (i.e. data locality) is a well known
advantage, ..., it can open the doors to completely new features."

Thanks,

Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com

On Sun, Apr 22, 2012 at 9:35 AM, Brian O'Neill  wrote:
> Praveen,
>
> We are certainly interested. To get things moving we implemented an add-on
> for Cassandra to demonstrate the viability (using AOP):
> https://github.com/hmsonline/cassandra-triggers
>
> Right now the implementation executes triggers asynchronously, allowing you
> to implement a java interface and plugin your own java class that will get
> called for every insert.
>
> Per the discussion on 1311, we intend to extend our proof of concept to be
> able to invoke scripts as well.  (minimally we'll enable javascript, but
> we'll probably allow for ruby and groovy as well)
>
> -brian
>
> On Apr 22, 2012, at 12:23 PM, Praveen Baratam wrote:
>
> I found that Triggers are coming in Cassandra 1.2
> (https://issues.apache.org/jira/browse/CASSANDRA-1311) but no mention of any
> StoreProc like pattern.
>
> I know this has been discussed so many times but never met with
> any initiative. Even Groovy was staged out of the trunk.
>
> Cassandra is great for logging and as such will be infinitely more useful if
> some logic can be pushed into the Cassandra cluster nearer to the location
> of Data to generate a materialized view useful for applications.
>
> Server Side Scripts/Routines in Distributed Databases could soon prove to be
> the differentiating factor.
>
> Let me reiterate things with a use case.
>
> In our application we store time series data in wide rows with TTL set on
> each point to prevent data from growing beyond acceptable limits. Still the
> data size can be a limiting factor to move all of it from the cluster node
> to the querying node and then to the application via thrift for processing
> and presentation.
>
> Ideally we should process the data on the residing node and pass only the
> materialized view of the data upstream. This should be trivial if Cassandra
> implements some sort of server side scripting and CQL semantics to call it.
>
> Is anybody else interested in a similar feature? Is it being worked on? Are
> there any alternative strategies to this problem?
>
> Praveen
>
>
>
> --
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://weblogs.java.net/blog/boneill42/
> blog: http://brianoneill.blogspot.com/
>



-- 
--
Thanks,

Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


Re: Error deleting column families with 1.1

2012-05-09 Thread Data Craftsman
We have similar issue. I'll try to reproduce it and upload logs soon.

On Wed, May 9, 2012 at 3:30 AM, André Cruz  wrote:
> Here it is: https://issues.apache.org/jira/browse/CASSANDRA-4230
>
> Please let me know if you need further info.
>
> Best regards,
> André
>
> On May 8, 2012, at 23:55 , aaron morton wrote:
>
> Could you please create a ticket for this
> on https://issues.apache.org/jira/browse/CASSANDRA
>
> Please include:
> * operating system
> * keyspace / column family definition
> * output of of "ls -lah" for the "/var/lib/cassandra/data/Disco/Client/"
> directory after the error occurs.
>
> Thanks
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 8/05/2012, at 9:00 AM, André Cruz wrote:
>
> Hello.
>
> Since I upgraded to Cassandra 1.1, I get the following error when trying to
> delete a CF. After this happens the CF is not accessible anymore, but I
> cannot create another one with the same name until I restart the server.
>
> INFO [MigrationStage:1] 2012-05-07 18:10:12,682 ColumnFamilyStore.java (line
> 634) Enqueuing flush of Memtable-schema_columnfamilies@1128094887(978/1222
> serialized/live bytes, 21 ops)
> INFO [FlushWriter:2] 2012-05-07 18:10:12,682 Memtable.java (line 266)
> Writing Memtable-schema_columnfamilies@1128094887(978/1222 serialized/live
> bytes, 21 ops)
> INFO [FlushWriter:2] 2012-05-07 18:10:12,720 Memtable.java (line 307)
> Completed flushing
> /var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-28-Data.db
> (1041 bytes)
> INFO [MigrationStage:1] 2012-05-07 18:10:12,721 ColumnFamilyStore.java (line
> 634) Enqueuing flush of Memtable-schema_columns@1599271050(392/490
> serialized/live bytes, 8 ops)
> INFO [FlushWriter:2] 2012-05-07 18:10:12,722 Memtable.java (line 266)
> Writing Memtable-schema_columns@1599271050(392/490 serialized/live bytes, 8
> ops)
> INFO [CompactionExecutor:8] 2012-05-07 18:10:12,722 CompactionTask.java
> (line 114) Compacting
> [SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-26-Data.db'),
> SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-28-Data.db'),
> SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfam
> ilies-hc-27-Data.db'),
> SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-25-Data.db')]
> INFO [FlushWriter:2] 2012-05-07 18:10:12,806 Memtable.java (line 307)
> Completed flushing
> /var/lib/cassandra/data/system/schema_columns/system-schema_columns-hc-23-Data.db
> (447 bytes)
> INFO [CompactionExecutor:8] 2012-05-07 18:10:12,811 CompactionTask.java
> (line 225) Compacted to
> [/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-29-Data.db,].
>  24,797 to 21,431
> (~86% of original) bytes for 2 keys at 0.232252MB/s.  Time: 88ms.
> ERROR [MigrationStage:1] 2012-05-07 18:10:12,895 CLibrary.java (line 158)
> Unable to create hard link
> com.sun.jna.LastErrorException: errno was 17
> at org.apache.cassandra.utils.CLibrary.link(Native Method)
> at org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:150)
> at
> org.apache.cassandra.db.Directories.snapshotLeveledManifest(Directories.java:343)
> at
> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1450)
> at
> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1483)
> at org.apache.cassandra.db.DefsTable.dropColumnFamily(DefsTable.java:512)
> at org.apache.cassandra.db.DefsTable.mergeColumnFamilies(DefsTable.java:403)
> at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:270)
> at
> org.apache.cassandra.service.MigrationManager$1.call(MigrationManager.java:214)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> ERROR [Thrift:17] 2012-05-07 18:10:12,898 CustomTThreadPoolServer.java (line
> 204) Error occurred during processing of message.
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.io.IOError: java.io.IOException: Unable to create hard link from
> /var/lib/cassandra/data/Disco/Client/Client.json to /var/lib/cassandra/data/
> Disco/Client/snapshots/1336410612893-Client/Client.json (errno 17)
> at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:372)
> at
> org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:191)
> at
> org.apache.cassandra.service.MigrationManager.announceColumnFamilyDrop(MigrationManager.java:182)
> at
> org.apache.cassandra.thrift.CassandraServer.system_drop_column_family(CassandraServer.jav

When will CQL BATCH support binding variable (Query substitution use named parameters)?

2012-06-20 Thread Data Craftsman
Hello,

CQL BATCH is good for INSERT/UPDATE performance.

But it cannot do binding variable, exposed to SQL injection.

Is there a plan to make CQL BATCH to support binding variable in near future?

e.g.
http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/

Query substitution
Use named parameters and a dictionary of names and values.

>> cursor.execute("SELECT column FROM CF WHERE name=:name", dict(name="Foo"))


CQL Batch:
http://www.datastax.com/docs/1.1/references/cql/BATCH

BEGIN BATCH
INSERT INTO demo.product(id, price, description) VALUES (123, 5.98, 'AA''BB')
INSERT INTO demo.product(id, price, description) VALUES (124, 9.78, 'BB''CC')
...
INSERT INTO demo.product(id, price, description) VALUES (567, 2.34, 'EF')
APPLY BATCH;

Thanks,

Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


Re: When will CQL BATCH support binding variable (Query substitution use named parameters)?

2012-06-21 Thread Data Craftsman
Hi Sylvain,

Thanks for the quick response.

Yes. I don't know the difference from bind variable to "query
substitution". I'm a little confused, I just try to use your language.
 :)   In Oracle world, we called Binding Variable.

Can you give me a concise example for bound variables supported by BATCH?

E.g.

ls_a1 = 'AA''BB';
ls_a2 = 'CC''BB';

cursor_cassandra.execute(
"
BEGIN BATCH
  INSERT INTO demo.product(id, description) VALUES (123, ?)
  INSERT INTO demo.product(id, description) VALUES (124, ?)
APPLY BATCH
",
ls_a1, ls_a2
)

Thanks,
Charlie | DBA developer


On Wed, Jun 20, 2012 at 11:56 PM, Sylvain Lebresne  wrote:
> On Thu, Jun 21, 2012 at 12:06 AM, Data Craftsman
>  wrote:
>> Hello,
>>
>> CQL BATCH is good for INSERT/UPDATE performance.
>>
>> But it cannot do binding variable, exposed to SQL injection.
>>
>> Is there a plan to make CQL BATCH to support binding variable in near future?
>>
>> e.g.
>> http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/
>>
>> Query substitution
>> Use named parameters and a dictionary of names and values.
>>
>>>> cursor.execute("SELECT column FROM CF WHERE name=:name", dict(name="Foo"))
>
> That may be a problem with the python driver (cassandra-dbapi2) and
> you'd want to open an issue there.
>
> But note that the "query substitution" of the python driver is *not*
> related to CQL prepared statements (that use question marks for bound
> variables). Those support BATCH all right.
>
> --
> Sylvain


How to create a COLUMNFAMILY with Leveled Compaction?

2012-07-31 Thread Data Craftsman 木匠
Sorry for my stupid simple question. How to create a COLUMNFAMILY with
Leveled Compaction?

There is no example in documentation:
http://www.datastax.com/docs/1.1/configuration/storage_configuration#compaction-strategy

I try it on Cassandra 1.1.0 and 1.1.2, both failed. The COLUMNFAMILY
is still 'SizeTieredCompactionStrategy'.  :(

Here is my test and output:

@host01:/usr/share/cassandra>cqlsh host01 --cql3
Connected to BookCluster at host01:9160.
[cqlsh 2.2.0 | Cassandra 1.1.0 | CQL spec 3.0.0 | Thrift protocol 19.30.0]
Use HELP for help.
cqlsh>
cqlsh> use demo;

cqlsh:demo> CREATE COLUMNFAMILY book
... (isbn varchar,
...  book_id bigint,
...  price int,
...  obj varchar,
...  PRIMARY KEY (isbn, book_id)
... )
... WITH compaction_strategy_class='LeveledCompactionStrategy';
cqlsh:demo>
cqlsh:demo> describe COLUMNFAMILY book;

CREATE COLUMNFAMILY book (
  isbn text PRIMARY KEY
) WITH
  comment='' AND
  
comparator='CompositeType(org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.UTF8Type)'
AND
  read_repair_chance=0.10 AND
  gc_grace_seconds=864000 AND
  default_validation=text AND
  min_compaction_threshold=4 AND
  max_compaction_threshold=32 AND
  replicate_on_write=True AND
  compaction_strategy_class='SizeTieredCompactionStrategy' AND
  
compression_parameters:sstable_compression='org.apache.cassandra.io.compress.SnappyCompressor';

cqlsh:demo>

Thanks,
Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


Re: How to create a COLUMNFAMILY with Leveled Compaction?

2012-08-03 Thread Data Craftsman 木匠
Nobody use Leveled Compaction with CQL 3.0 ?

-Z

On Tue, Jul 31, 2012 at 11:17 AM, Data Craftsman 木匠
 wrote:
> Sorry for my stupid simple question. How to create a COLUMNFAMILY with
> Leveled Compaction?
>
> There is no example in documentation:
> http://www.datastax.com/docs/1.1/configuration/storage_configuration#compaction-strategy
>
> I try it on Cassandra 1.1.0 and 1.1.2, both failed. The COLUMNFAMILY
> is still 'SizeTieredCompactionStrategy'.  :(
>
> Here is my test and output:
>
> @host01:/usr/share/cassandra>cqlsh host01 --cql3
> Connected to BookCluster at host01:9160.
> [cqlsh 2.2.0 | Cassandra 1.1.0 | CQL spec 3.0.0 | Thrift protocol 19.30.0]
> Use HELP for help.
> cqlsh>
> cqlsh> use demo;
>
> cqlsh:demo> CREATE COLUMNFAMILY book
> ... (isbn varchar,
> ...  book_id bigint,
> ...  price int,
> ...  obj varchar,
> ...  PRIMARY KEY (isbn, book_id)
> ... )
> ... WITH compaction_strategy_class='LeveledCompactionStrategy';
> cqlsh:demo>
> cqlsh:demo> describe COLUMNFAMILY book;
>
> CREATE COLUMNFAMILY book (
>   isbn text PRIMARY KEY
> ) WITH
>   comment='' AND
>   
> comparator='CompositeType(org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.UTF8Type)'
> AND
>   read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   default_validation=text AND
>   min_compaction_threshold=4 AND
>   max_compaction_threshold=32 AND
>   replicate_on_write=True AND
>   compaction_strategy_class='SizeTieredCompactionStrategy' AND
>   
> compression_parameters:sstable_compression='org.apache.cassandra.io.compress.SnappyCompressor';
>
> cqlsh:demo>
>
> Thanks,
> Charlie (@mujiang) 一个 木匠
> ===
> Data Architect Developer
> http://mujiang.blogspot.com


Re: CQL connections

2012-08-10 Thread Data Craftsman 木匠
I want to know it too.

http://www.datastax.com/support-forums/topic/when-will-pycassa-support-cql

Connection pool and load balance is a necessary feature for multi-user
production application.

Thanks,
Charlie | DBA

On Fri, Aug 10, 2012 at 6:47 AM, David McNelis  wrote:
> In using CQL (the python library, at least), I didn't see a way to pass in
> multiple nodes as hosts.  With other libraries (like Hector and Pycassa) I
> can set multiple hosts and my app will work with anyone on that list.  Is
> there something similar going on in the background with CQL?
>
> If not, then is anyone aware of plans to do so?


Re: How to set LeveledCompactionStrategy for an existing table

2012-09-02 Thread Data Craftsman 木匠
We have same problem.

On Friday, August 31, 2012, Jean-Armel Luce  wrote:
> Hello Aaron.
>
> Thanks for your answer
>
> Jira ticket 4597 created :
https://issues.apache.org/jira/browse/CASSANDRA-4597
>
> Jean-Armel
>
> 2012/8/31 aaron morton 
>
> Looks like a bug.
> Can you please create a ticket on
https://issues.apache.org/jira/browse/CASSANDRA and update the email thread
?
> Can you include this: CFPropDefs.applyToCFMetadata() does not set the
compaction class on CFM
> Thanks
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 31/08/2012, at 7:05 AM, Jean-Armel Luce  wrote:
>
> I tried as you said with cassandra-cli, and still unsuccessfully
>
> [default@unknown] use test1;
> Authenticated to keyspace: test1
> [default@test1] UPDATE COLUMN FAMILY pns_credentials with
compaction_strategy='LeveledCompactionStrategy';
> 8ed12919-ef2b-327f-8f57-4c2de26c9d51
> Waiting for schema agreement...
> ... schemas agree across the cluster
>
> And then, when I check the compaction strategy, it is still
SizeTieredCompactionStrategy
> [default@test1] describe pns_credentials;
> ColumnFamily: pns_credentials
>   Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>   Default column value validator:
org.apache.cassandra.db.marshal.UTF8Type
>   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>   GC grace seconds: 864000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 0.1
>   DC Local Read repair chance: 0.0
>   Replicate on write: true
>   Caching: KEYS_ONLY
>   Bloom Filter FP chance: default
>   Built indexes: []
>   Column Metadata:
> Column Name: isnew
>   Validation Class: org.apache.cassandra.db.marshal.Int32Type
> Column Name: ts
>   Validation Class: org.apache.cassandra.db.marshal.DateType
> Column Name: mergestatus
>   Validation Class: org.apache.cassandra.db.marshal.Int32Type
> Column Name: infranetaccount
>   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
> Column Name: user_level
>   Validation Class: org.apache.cassandra.db.marshal.Int32Type
> Column Name: msisdn
>   Validation Class: org.apache.cassandra.db.marshal.LongType
> Column Name: mergeusertype
>   Validation Class: org.apache.cassandra.db.marshal.Int32Type
>   Compaction Strategy:
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
>   Compression Options:
> sstable_compression:
org.apache.cassandra.io.compress.SnappyCompressor
>
>
>
> I tried also to create a new table with LeveledCompactionStrategy (using
cqlsh), and when I check the compaction strategy, the
SizeTieredCompactionStrategy is set for this table.
>
> cqlsh:test1> CREATE TABLE pns_credentials3 (
>  ...   ise text PRIMARY KEY,
>  ...   isnew int,
>  ...   ts timestamp,
>  ...   mergestatus int,
>  ...   infranetaccount text,
>  ...   user_level int,
>  ...   msisdn bigint,
>  ...   mergeusertype int
>  ... ) WITH
>  ...   comment='' AND
>

-- 
Thanks,

Charlie (@mujiang) 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


Number of columns per row for Composite Primary Key CQL 3.0

2012-09-07 Thread Data Craftsman 木匠
Hello experts.

Should I limit the number of rows per Composite Primary Key's leading column?

I think it falls into the same wide row good practice for number of
columns per row for CQL 2.0, e.g. 10M or less.

Any comments will be appreciated.

-- 
Thanks,

Charlie (@mujiang) 木匠
===
Data Architect Developer 汉唐 田园牧歌DBA
http://mujiang.blogspot.com


Re: Number of columns per row for Composite Primary Key CQL 3.0

2012-09-11 Thread Data Craftsman 木匠
Hi Aaron,

Thanks for the suggestion, as always.  :)   I'll read your slides soon.

What is "MM" stands for? million ?

Thanks,
Charlie

On Mon, Sep 10, 2012 at 6:37 PM, aaron morton  wrote:
> In general wider rows take a bit longer to read, however different access
> patterns have different performance. I did some tests here
> http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance
> and http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
>
> I would suggest 1MM cols is fine, if you get to 10MM cols per row you
> probably have gone too far. Remember the byte size of the row is also
> important; larger rows churn memory more and take longer to compact /
> repair.
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 8/09/2012, at 11:05 AM, Data Craftsman 木匠 
> wrote:
>
> Hello experts.
>
> Should I limit the number of rows per Composite Primary Key's leading
> column?
>
> I think it falls into the same wide row good practice for number of
> columns per row for CQL 2.0, e.g. 10M or less.
>
> Any comments will be appreciated.
>
> --
> Thanks,
>
> Charlie (@mujiang) 木匠
> ===
> Data Architect Developer 汉唐 田园牧歌DBA
> http://mujiang.blogspot.com


Re: Using the commit log for external synchronization

2012-09-20 Thread Data Craftsman 木匠
This will be a good new feature. I guess the development team don't
have time on this yet.  ;)


On Thu, Sep 20, 2012 at 1:29 PM, Ben Hood <0x6e6...@gmail.com> wrote:
> Hi,
>
> I'd like to incrementally synchronize data written to Cassandra into
> an external store without having to maintain an index to do this, so I
> was wondering whether anybody is using the commit log to establish
> what updates have taken place since a given point in time?
>
> Cheers,
>
> Ben



-- 
Thanks,

Charlie (@mujiang) 木匠
===
Data Architect Developer 汉唐 田园牧歌DBA
http://mujiang.blogspot.com