Running hadoop jobs over compressed column familes with datastatx

2014-04-23 Thread marlon hendred
Hi, I'm attempting to dump a pig relation of a compressed column family. Its a single column whose value is a json blob. It's compressed via snappy compression and the value validator is BytesType. After I create the relation and dump I get garbage. Here is the describe: ColumnFamily: CF Ke

Re: Cassandra hopefully durable writes (commitlog_sync)

2014-04-23 Thread Robert Coli
On Wed, Apr 23, 2014 at 8:08 AM, Mohica Jasha wrote: > I wonder which is correct? > Does Cassandra (default configuration) wait till it persists the update to > its commit log before it acks back the write? or doesn't it? > > I wonder what happens if due to power outage all replicas die at the sam

Re: Creating a CQL3 storage layout with row key, column key and column value

2014-04-23 Thread DuyHai Doan
"And now, when I create an index on column_value. Will the column_value still be stored with the column_key or will Cassandra create an extra column?" Cassandra will create a column family whose name is "index_name" to index the column_value for you. You can't have access to this column family

Cassandra hopefully durable writes (commitlog_sync)

2014-04-23 Thread Mohica Jasha
There is a discrepancy between the two documentation: http://wiki.apache.org/cassandra/Durability Cassandra's default configuration sets the commitlog_sync mode to periodic, > causing the commitlog to be synced every commitlog_sync_period_in_ms > milliseconds, > so you can potentially lose up to

Re: Creating a CQL3 storage layout with row key, column key and column value

2014-04-23 Thread Sebastian Schmidt
Yes I mean a secondary index. For example: CREATE INDEX index_name ON table_name(column_value); Am 23.04.2014 17:01, schrieb DuyHai Doan: What do you mean by "index on column_value" ? Do you mean secondary index ? On Wed, Apr 23, 2014 at 4:52 PM, Sebastian Schmidt

Re: Creating a CQL3 storage layout with row key, column key and column value

2014-04-23 Thread DuyHai Doan
What do you mean by "index on column_value" ? Do you mean secondary index ? On Wed, Apr 23, 2014 at 4:52 PM, Sebastian Schmidt wrote: > And now, when I create an index on column_value. Will the column_value > still be stored with the column_key or will Cassandra create an extra > column? > > Am

Re: Creating a CQL3 storage layout with row key, column key and column value

2014-04-23 Thread Sebastian Schmidt
And now, when I create an index on column_value. Will the column_value still be stored with the column_key or will Cassandra create an extra column? Am 23.04.2014 16:47, schrieb DuyHai Doan: The schema you just showed allows, for one row key (partition key), to have several distinct pairs of c

Re: Creating a CQL3 storage layout with row key, column key and column value

2014-04-23 Thread Sebastian Schmidt
Thank you very much for your help :) Am 23.04.2014 16:47, schrieb DuyHai Doan: The schema you just showed allows, for one row key (partition key), to have several distinct pairs of column_key/column_value. And that's exactly what you want ... On Wed, Apr 23, 2014 at 4:44 PM, Sebastian Schm

Re: Creating a CQL3 storage layout with row key, column key and column value

2014-04-23 Thread DuyHai Doan
The schema you just showed allows, for one row key (partition key), to have several distinct pairs of column_key/column_value. And that's exactly what you want ... On Wed, Apr 23, 2014 at 4:44 PM, Sebastian Schmidt wrote: > Hi, > > I want to create a storage layout with CQL3 like this: > > ro

Re: Deleting column names

2014-04-23 Thread Andreas Wagner
Thanks :) This works ... Kind regards Andreas On 04/22/2014 06:05 PM, Laing, Michael wrote: Your understanding is incorrect - the easiest way to see that is to try it. On Tue, Apr 22, 2014 at 12:00 PM, Sebastian Schmidt > wrote: From my understanding, this wo

Creating a CQL3 storage layout with row key, column key and column value

2014-04-23 Thread Sebastian Schmidt
Hi, I want to create a storage layout with CQL3 like this: row_key should be my row key column_key should by my column key column_value should be the value saved for the column key How can I achieve this? I figured that doing this: CREATE TABLE table_name (row_key BLOB, column_key BLOB, colum

Re: Cassanda JVM GC defaults question

2014-04-23 Thread Ruchir Jha
Lowering CMSInitiatingOccupancyFraction to less than 0.75 will lead to more GC interference and will impact write performance. If you're not sensitive to this impact, your expectation is correct, however make sure your flush_largest_memtables_at is always set to less than or equal to the occupancy

Cassanda JVM GC defaults question

2014-04-23 Thread Ken Hancock
I'm in the process of trying to tune the GC and I'm far from an expert in this area, so hoping someone can tell me I'm either out in left field or on-track. Cassandra's default GC settings are (abbreviated): +UseConcMarkSweepGC +CMSInitiaitingOccupancyFraction=75 +UseCMSInitiatingOccupancyOnly Al