Re: java exception on starting cqlsh

2013-09-17 Thread Tim Dunphy
ok great, and thank you!

[root@beta:~] #cqlsh --cqlversion="3.0.0"
Connected to mycluster Cluster at beta.mydomain.com:9160.
[cqlsh 4.0.0 | Cassandra 1.2.2 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.
cqlsh>



On Tue, Sep 17, 2013 at 2:41 AM, Sylvain Lebresne wrote:

> Short answer: you'll need to pass something like --cqlversion="3.0.0" to
> cqlsh.
>
> Longer answer: when a CQL client connects (and cqlsh is one), it asks to
> use a specific version of CQL. If it asked for a version that is newer than
> what the server knows, you get the error message you have above. So what
> the above means is that you've used cqlsh from C* 2.0 (that asks for CQL
> 3.1.0 which is the version of CQL as of C* 2.0.0) against a C* 1.2 node. In
> other words, what that error mean is that you use cqlsh from C* 2.0 so
> maybe you mean to use C* 2.0 features, like compare-and-swap for instance,
> but the server you're contacting does not know that so it'll refuse.
> Passing --cqlversion="3.0.0" is a simple way to skip the check (note that
> it has *no* impact on the server outside of making the check happy).
>
> --
> Sylvain
>
>
>
> On Tue, Sep 17, 2013 at 7:39 AM, Tim Dunphy  wrote:
>
>> hey guys,
>>
>>  I'm getting this exception when I try to run cqlsh.
>>
>> [root@beta:/var/www/admin] #cqlsh beta.mydomain.com 9160
>> Traceback (most recent call last):
>>   File "/etc/alternatives/cassandrahome/bin/cqlsh", line 2027, in 
>> main(*read_options(sys.argv[1:], os.environ))
>>   File "/etc/alternatives/cassandrahome/bin/cqlsh", line 2013, in main
>> display_float_precision=options.float_precision)
>>   File "/etc/alternatives/cassandrahome/bin/cqlsh", line 477, in __init__
>> cql_version=cqlver, transport=transport)
>>   File
>> "/etc/alternatives/cassandrahome/bin/../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py",
>> line 143, in connect
>>   File
>> "/etc/alternatives/cassandrahome/bin/../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py",
>> line 59, in __init__
>>   File
>> "/etc/alternatives/cassandrahome/bin/../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/thrifteries.py",
>> line 162, in establish_connection
>>   File
>> "/etc/alternatives/cassandrahome/bin/../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/thrifteries.py",
>> line 165, in set_cql_version
>>   File
>> "/etc/alternatives/cassandrahome/bin/../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py",
>> line 1983, in set_cql_version
>>   File
>> "/etc/alternatives/cassandrahome/bin/../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py",
>> line 2004, in recv_set_cql_version
>> cql.cassandra.ttypes.InvalidRequestException:
>> InvalidRequestException(why='Provided version 3.1.0 is not supported by
>> this server (supported: 2.0.0, 3.0.1)')
>>
>>
>> How do I correct the problem?
>>
>> thanks
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: questions related to the SSTable file

2013-09-17 Thread Hiller, Dean
You may want to be careful as column 1 could be stored in both files until 
compaction as well when column 1 has encountered changes and cassandra returns 
the latest column 1 version but two sstables contain column 1.  (At least that 
is the way I understand it).

Later,
Dean

From: "Takenori Sato (Cloudian)" mailto:ts...@cloudian.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, September 16, 2013 8:12 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: questions related to the SSTable file

Hi,

> 1) I will expect same row key could show up in both sstable2json output, as 
> this one row exists in both SSTable files, right?

Yes.

> 2) If so, what is the boundary? Will Cassandra guarantee the column level as 
> the boundary? What I mean is that for one column's data, it will be 
> guaranteed to be either in the first file, or 2nd file, right? There is no 
> chance that Cassandra will cut the data of one column into 2 part, and one 
> part stored in first SSTable file, and the other part stored in second 
> SSTable file. Is my understanding correct?

No.

> 3) If what we are talking about are only the SSTable files in snapshot, 
> incremental backup SSTable files, exclude the runtime SSTable files, will 
> anything change? For snapshot or incremental backup SSTable files, first can 
> one row data still may exist in more than one SSTable file? And any boundary 
> change in this case?
> 4) If I want to use incremental backup SSTable files as the way to catch data 
> being changed, is it a good way to do what I try to archive? In this case, 
> what happen in the following example:

I don't fully understand, but snapshot will do. It will create hard links to 
all the SSTable files present at snapshot.


Let me explain how SSTable and compaction works.

Suppose we have 4 files being compacted(the last one has bee just flushed, then 
which triggered compaction). Note that file names are simplified.

- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
- Color-3-Data.db: [{Aqua: {hex: #00}}, {Green: {hex2: #32CD32}}, {Blue: 
{}}]
- Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]

They are created by the following operations.

- Add a row of (key, column, column_value = Blue, hex, #FF)
- Add a row of (key, column, column_value = Lavender, hex, #E6E6FA)
 memtable is flushed => Color-1-Data.db 
- Add a row of (key, column, column_value = Green, hex, #008000)
- Add a column of (key, column, column_value = Blue, hex2, #2c86ff)
 memtable is flushed => Color-2-Data.db 
- Add a column of (key, column, column_value = Green, hex2, #32CD32)
- Add a row of (key, column, column_value = Aqua, hex, #00)
- Delete a row of (key = Blue)
 memtable is flushed => Color-3-Data.db 
- Add a row of (key, column, column_value = Magenta, hex, #FF00FF)
- Add a row of (key, column, column_value = Gold, hex, #FFD700)
 memtable is flushed => Color-4-Data.db 

Then, a compaction will merge all those fragments together into the latest ones 
as follows.

- Color-5-Data.db: [{Lavender: {hex: #E6E6FA}, {Aqua: {hex: #00}, {Green: 
{hex: #008000, hex2: #32CD32}}, {Magenta: {hex: #FF00FF}}, {Gold: {hex: 
#FFD700}}]
* assuming RandomPartitioner is used

Hope they would help.

- Takenori

(2013/09/17 10:51), java8964 java8964 wrote:
Hi, I have some questions related to the SSTable in the Cassandra, as I am 
doing a project to use it and hope someone in this list can share some thoughts.

My understand is the SSTable is per column family. But each column family could 
have multi SSTable files. During the runtime, one row COULD split into more 
than one SSTable file, even this is not good for performance, but it does 
happen, and Cassandra will try to merge and store one row data into one SSTable 
file during compassion.

The question is when one row is split in multi SSTable files, what is the 
boundary? Or let me ask this way, if one row exists in 2 SSTable files, if I 
run sstable2json tool to run on both SSTable files individually:

1) I will expect same row key could show up in both sstable2json output, as 
this one row exists in both SSTable files, right?
2) If so, what is the boundary? Will Cassandra guarantee the column level as 
the boundary? What I mean is that for one column's data, it will be guaranteed 
to be either in the first file, or 2nd file, right? There is no chance that 
Cassandra will cut the data of one column into 2 part, and one part stored in 
first SSTable file, and the other part stored in second SSTable file. Is my 
understanding correct?
3) If what we are talking about are only the SSTable files in snapshot, 
incremental backup SSTable files, exclude the runtime SSTable files, will 
anything change? For snapshot or incremental

Cassandra nodetool could not resolve '127.0.0.1': unknown host

2013-09-17 Thread pradeep kumar
I am very new to cassandra. Just started exploring.

I am running a single node cassandra server & facing a problem in seeing
status of the cassandra using nodetool command.

i have hostname configured on my VM as myMachineIP cass1 in /etc/hosts

and

i configured my cassandra_instal_path/conf/cassandra.yaml file with
listen_address, rpc_address as localhost and clustername as casscluster

(also tried with my hostname which is cass1 as listen_address/rpc_address)

Nut not sure what is the reason why i am not able to get statususing
nodetool command.

$ nodetool

Cannot resolve '127.0.0.1': unknown host

$ nodetool -host 127.0.0.1

Cannot resolve '127.0.0.1': unknown host

$ nodetool -host cass1

Cannot resolve 'cass1': unknown host

But i am able to connect to cassandra-cli

console output:

Connected to: "casscluster" on 127.0.0.1/9160 Welcome to Cassandra CLI
version 1.2.8

Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.

my /etc/hosts looks like:

127.0.0.1 localhost.localdomain localhost.localdomain localhost4
localhost4.localdomain4 localhost cass1

::1 localhost.localdomain localhost.localdomain localhost6
localhost6.localdomain6 localhost cass1

[myMachineIP] cass1

what could be the reason why i am not able to run nodetool?

Please help.


Re: Cassandra nodetool could not resolve '127.0.0.1': unknown host

2013-09-17 Thread Shahab Yunus
Have you tried specifying your hostname (not localhost) in cassandra.yaml
and start it?

Regards,
Shahab


On Tue, Sep 17, 2013 at 8:39 AM, pradeep kumar wrote:

> I am very new to cassandra. Just started exploring.
>
> I am running a single node cassandra server & facing a problem in seeing
> status of the cassandra using nodetool command.
>
> i have hostname configured on my VM as myMachineIP cass1 in /etc/hosts
>
> and
>
> i configured my cassandra_instal_path/conf/cassandra.yaml file with
> listen_address, rpc_address as localhost and clustername as casscluster
>
> (also tried with my hostname which is cass1 as listen_address/rpc_address)
>
> Nut not sure what is the reason why i am not able to get statususing
> nodetool command.
>
> $ nodetool
>
> Cannot resolve '127.0.0.1': unknown host
>
> $ nodetool -host 127.0.0.1
>
> Cannot resolve '127.0.0.1': unknown host
>
> $ nodetool -host cass1
>
> Cannot resolve 'cass1': unknown host
>
> But i am able to connect to cassandra-cli
>
> console output:
>
> Connected to: "casscluster" on 127.0.0.1/9160 Welcome to Cassandra CLI
> version 1.2.8
>
> Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
>
> my /etc/hosts looks like:
>
> 127.0.0.1 localhost.localdomain localhost.localdomain localhost4
> localhost4.localdomain4 localhost cass1
>
> ::1 localhost.localdomain localhost.localdomain localhost6
> localhost6.localdomain6 localhost cass1
>
> [myMachineIP] cass1
>
> what could be the reason why i am not able to run nodetool?
>
> Please help.
>


RE: questions related to the SSTable file

2013-09-17 Thread java8964 java8964
Hi, Takenori:
Thanks for your quick reply. Your explain is clear for me understanding what 
compaction mean, and I also can understand now same row key will exist in multi 
SSTable file.
But beyond that, I want to know what happen if one row data is too large to put 
in one SSTable file. In your example, the same row exist in multi SSTable files 
as it is keeping changing and flushing into the disk at runtime. That's fine, 
in this case, in every SSTable file of the 4, there is no single file contains 
whole data of that row, but each one does contain full picture of individual 
unit ( I don't know what I should call this unit, but it will be larger than 
one column, right?). Just in your example, there is no way in any time, we 
could have SSTable files like following, right:
- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #}}]- 
Color-1-Data_1.db:  [{Blue: {hex:FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
- Color-3-Data.db: [{Aqua: {hex: #00}}, {Green: {hex2: #32CD32}}, {Blue: 
{}}]
- Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
I don't see any reason Cassandra will ever do that, but just want to confirm, 
as your 'no' answer to my 2 question is confusion.
Another question from my originally email, even though I may get the answer 
already from your example, but just want to confirm it.Just use your example, 
let's say after the first 2 steps:
- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]There is a 
incremental backup. After that, there is following changes coming:
- Add a column of (key, column, column_value = Green, hex2, #32CD32)
- Add a row of (key, column, column_value = Aqua, hex, #00)
- Delete a row of (key = Blue) memtable is flushed => Color-3-Data.db 
Another incremental backup right now.
Now in this case, my assumption is only Color-3-Data.db will be in this backup, 
right? Even though Color-1-Data.db and Color-2-Data.db contains the data of the 
same row key as Color-3-Data.db, but from a incremental backup point of view, 
only Color-3-Data.db will be stored.
The reason I asked those question is that I am thinking to use MapReduce jobs 
to parse the incremental backup files, and rebuild the snapshot in Hadoop side. 
Of course, the column families I am doing is pure Fact data. So there is 
delete/update in Cassandra for these kind of data, just appending. But it is 
still important for me to understand the SSTable file's content.
Thanks
Yong

Date: Tue, 17 Sep 2013 11:12:01 +0900
From: ts...@cloudian.com
To: user@cassandra.apache.org
Subject: Re: questions related to the SSTable file


  

  
  
Hi,



> 1) I will expect same row key could show up in both
sstable2json output, as this one row exists in both SSTable files,
right?



Yes.



> 2) If so, what is the boundary? Will Cassandra guarantee the
column level as the boundary? What I mean is that for one column's
data, it will be guaranteed to be either in the first file, or 2nd
file, right? There is no chance that Cassandra will cut the data of
one column into 2 part, and one part stored in first SSTable file,
and the other part stored in second SSTable file. Is my
understanding correct?



No.



> 3) If what we are talking about are only the SSTable files in
snapshot, incremental backup SSTable files, exclude the runtime
SSTable files, will anything change? For snapshot or incremental
backup SSTable files, first can one row data still may exist in more
than one SSTable file? And any boundary change in this case?
> 4) If I want to use incremental backup SSTable files as
  the way to catch data being changed, is it a good way to do what I
  try to archive? In this case, what happen in the following
  example:

  

  I don't fully understand, but snapshot will do. It will create
  hard links to all the SSTable files present at snapshot. 

  

  

  Let me explain how SSTable and compaction works.

  

  Suppose we have 4 files being compacted(the last one has bee just
  flushed, then which triggered compaction). Note that file names
  are simplified.

  

  - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex:
  #FF}}]

  - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2:
  #2c86ff}}]

  - Color-3-Data.db: [{Aqua: {hex: #00}}, {Green: {hex2:
  #32CD32}}, {Blue: {}}]

  - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex:
  #FFD700}}]

  

  They are created by the following operations.

  

  - Add a row of (key, column, column_value = Blue, hex, #FF)

  - Add a row of (key, column, column_value = Lavender, hex,
  #E6E6FA)

   memtable is flushed => Color-1-Data.db 

  - Add a row of (key, column, colum

Re: questions related to the SSTable file

2013-09-17 Thread Hiller, Dean
You have to first understand the rules of

 1.  Sstables are immutable so Color-1-Data.db will not be modified and only 
deleted once compacted
 2.  Memtables are flushed when reaching a limit so if Blue:{hex} is modified, 
it is done in the in-memory memtable that is eventually flushed
 3.  Once flushed, it is an SSTable on disk and you have two values for "hex" 
both with two timestamps so we know which one is the current value

When it finally compacts, the old value can go away.

Dean

From: java8964 java8964 mailto:java8...@hotmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, September 17, 2013 7:32 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: RE: questions related to the SSTable file

Hi, Takenori:

Thanks for your quick reply. Your explain is clear for me understanding what 
compaction mean, and I also can understand now same row key will exist in multi 
SSTable file.

But beyond that, I want to know what happen if one row data is too large to put 
in one SSTable file. In your example, the same row exist in multi SSTable files 
as it is keeping changing and flushing into the disk at runtime. That's fine, 
in this case, in every SSTable file of the 4, there is no single file contains 
whole data of that row, but each one does contain full picture of individual 
unit ( I don't know what I should call this unit, but it will be larger than 
one column, right?). Just in your example, there is no way in any time, we 
could have SSTable files like following, right:

- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #}}]
- Color-1-Data_1.db:  [{Blue: {hex:FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
- Color-3-Data.db: [{Aqua: {hex: #00}}, {Green: {hex2: #32CD32}}, {Blue: 
{}}]
- Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]

I don't see any reason Cassandra will ever do that, but just want to confirm, 
as your 'no' answer to my 2 question is confusion.

Another question from my originally email, even though I may get the answer 
already from your example, but just want to confirm it.
Just use your example, let's say after the first 2 steps:

- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
There is a incremental backup. After that, there is following changes coming:

- Add a column of (key, column, column_value = Green, hex2, #32CD32)
- Add a row of (key, column, column_value = Aqua, hex, #00)
- Delete a row of (key = Blue)
 memtable is flushed => Color-3-Data.db 
Another incremental backup right now.

Now in this case, my assumption is only Color-3-Data.db will be in this backup, 
right? Even though Color-1-Data.db and Color-2-Data.db contains the data of the 
same row key as Color-3-Data.db, but from a incremental backup point of view, 
only Color-3-Data.db will be stored.

The reason I asked those question is that I am thinking to use MapReduce jobs 
to parse the incremental backup files, and rebuild the snapshot in Hadoop side. 
Of course, the column families I am doing is pure Fact data. So there is 
delete/update in Cassandra for these kind of data, just appending. But it is 
still important for me to understand the SSTable file's content.

Thanks

Yong



Date: Tue, 17 Sep 2013 11:12:01 +0900
From: ts...@cloudian.com
To: user@cassandra.apache.org
Subject: Re: questions related to the SSTable file

Hi,

> 1) I will expect same row key could show up in both sstable2json output, as 
> this one row exists in both SSTable files, right?

Yes.

> 2) If so, what is the boundary? Will Cassandra guarantee the column level as 
> the boundary? What I mean is that for one column's data, it will be 
> guaranteed to be either in the first file, or 2nd file, right? There is no 
> chance that Cassandra will cut the data of one column into 2 part, and one 
> part stored in first SSTable file, and the other part stored in second 
> SSTable file. Is my understanding correct?

No.

> 3) If what we are talking about are only the SSTable files in snapshot, 
> incremental backup SSTable files, exclude the runtime SSTable files, will 
> anything change? For snapshot or incremental backup SSTable files, first can 
> one row data still may exist in more than one SSTable file? And any boundary 
> change in this case?
> 4) If I want to use incremental backup SSTable files as the way to catch data 
> being changed, is it a good way to do what I try to archive? In this case, 
> what happen in the following example:

I don't fully understand, but snapshot will do. It will create hard links to 
all the SSTable files present at snapshot.


Let me explain how SSTable and compaction works.

Suppose we have 4 files being

RE: questions related to the SSTable file

2013-09-17 Thread java8964 java8964
Hi, Dean:
Can you explain a little more about what do you mean?
If I change example a little bit:
Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #FF}}]
Now if we add a new Green column, and update the blue column, but the data 
flushed to another SSTable file:
Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex: #2c86ff}}]
So you mean at this time, I could get 2 SSTable files, both contain column 
"Blue" for the same row key, right? In this case, I should be fine as value of 
the "Blue" column contain the timestamp to help me to find out which is the 
last change, right? In MR world, each file COULD be processed by different 
Mapper, but will be sent to the same reducer as both data will be shared same 
key.
Yong

> From: dean.hil...@nrel.gov
> To: user@cassandra.apache.org
> Date: Tue, 17 Sep 2013 06:32:03 -0600
> Subject: Re: questions related to the SSTable file
> 
> You may want to be careful as column 1 could be stored in both files until 
> compaction as well when column 1 has encountered changes and cassandra 
> returns the latest column 1 version but two sstables contain column 1.  (At 
> least that is the way I understand it).
> 
> Later,
> Dean
> 
> From: "Takenori Sato (Cloudian)" 
> mailto:ts...@cloudian.com>>
> Reply-To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Date: Monday, September 16, 2013 8:12 PM
> To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Subject: Re: questions related to the SSTable file
> 
> Hi,
> 
> > 1) I will expect same row key could show up in both sstable2json output, as 
> > this one row exists in both SSTable files, right?
> 
> Yes.
> 
> > 2) If so, what is the boundary? Will Cassandra guarantee the column level 
> > as the boundary? What I mean is that for one column's data, it will be 
> > guaranteed to be either in the first file, or 2nd file, right? There is no 
> > chance that Cassandra will cut the data of one column into 2 part, and one 
> > part stored in first SSTable file, and the other part stored in second 
> > SSTable file. Is my understanding correct?
> 
> No.
> 
> > 3) If what we are talking about are only the SSTable files in snapshot, 
> > incremental backup SSTable files, exclude the runtime SSTable files, will 
> > anything change? For snapshot or incremental backup SSTable files, first 
> > can one row data still may exist in more than one SSTable file? And any 
> > boundary change in this case?
> > 4) If I want to use incremental backup SSTable files as the way to catch 
> > data being changed, is it a good way to do what I try to archive? In this 
> > case, what happen in the following example:
> 
> I don't fully understand, but snapshot will do. It will create hard links to 
> all the SSTable files present at snapshot.
> 
> 
> Let me explain how SSTable and compaction works.
> 
> Suppose we have 4 files being compacted(the last one has bee just flushed, 
> then which triggered compaction). Note that file names are simplified.
> 
> - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #FF}}]
> - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> - Color-3-Data.db: [{Aqua: {hex: #00}}, {Green: {hex2: #32CD32}}, {Blue: 
> {}}]
> - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
> 
> They are created by the following operations.
> 
> - Add a row of (key, column, column_value = Blue, hex, #FF)
> - Add a row of (key, column, column_value = Lavender, hex, #E6E6FA)
>  memtable is flushed => Color-1-Data.db 
> - Add a row of (key, column, column_value = Green, hex, #008000)
> - Add a column of (key, column, column_value = Blue, hex2, #2c86ff)
>  memtable is flushed => Color-2-Data.db 
> - Add a column of (key, column, column_value = Green, hex2, #32CD32)
> - Add a row of (key, column, column_value = Aqua, hex, #00)
> - Delete a row of (key = Blue)
>  memtable is flushed => Color-3-Data.db 
> - Add a row of (key, column, column_value = Magenta, hex, #FF00FF)
> - Add a row of (key, column, column_value = Gold, hex, #FFD700)
>  memtable is flushed => Color-4-Data.db 
> 
> Then, a compaction will merge all those fragments together into the latest 
> ones as follows.
> 
> - Color-5-Data.db: [{Lavender: {hex: #E6E6FA}, {Aqua: {hex: #00}, {Green: 
> {hex: #008000, hex2: #32CD32}}, {Magenta: {hex: #FF00FF}}, {Gold: {hex: 
> #FFD700}}]
> * assuming RandomPartitioner is used
> 
> Hope they would help.
> 
> - Takenori
> 
> (2013/09/17 10:51), java8964 java8964 wrote:
> Hi, I have some questions related to the SSTable in the Cassandra, as I am 
> doing a project to use it and hope someone in this list can share some 
> thoughts.
> 
> My understand is the SSTable is per column family. But each column family 
> could have multi SSTable files. During the runtime, one row COULD split into 
> more than one SSTable file, even this is not good 

RE: questions related to the SSTable file

2013-09-17 Thread java8964 java8964
Thanks Dean for clarification.
But if I put hundreds of megabyte data of one row through one put, what you 
mean is Cassandra will put all of them into one SSTable, even the data is very 
big, right? Let's assume in this case the Memtables in memory reaches its limit 
by this change.What I want to know is if there is possibility 2 SSTables be 
generated in above case, what is the boundary.
I understand if following changes apply to the same row key as above example, 
additional SSTable file could be generated. That is clear for me.
Yong

> From: dean.hil...@nrel.gov
> To: user@cassandra.apache.org
> Date: Tue, 17 Sep 2013 07:39:48 -0600
> Subject: Re: questions related to the SSTable file
> 
> You have to first understand the rules of
> 
>  1.  Sstables are immutable so Color-1-Data.db will not be modified and only 
> deleted once compacted
>  2.  Memtables are flushed when reaching a limit so if Blue:{hex} is 
> modified, it is done in the in-memory memtable that is eventually flushed
>  3.  Once flushed, it is an SSTable on disk and you have two values for "hex" 
> both with two timestamps so we know which one is the current value
> 
> When it finally compacts, the old value can go away.
> 
> Dean
> 
> From: java8964 java8964 mailto:java8...@hotmail.com>>
> Reply-To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Date: Tuesday, September 17, 2013 7:32 AM
> To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Subject: RE: questions related to the SSTable file
> 
> Hi, Takenori:
> 
> Thanks for your quick reply. Your explain is clear for me understanding what 
> compaction mean, and I also can understand now same row key will exist in 
> multi SSTable file.
> 
> But beyond that, I want to know what happen if one row data is too large to 
> put in one SSTable file. In your example, the same row exist in multi SSTable 
> files as it is keeping changing and flushing into the disk at runtime. That's 
> fine, in this case, in every SSTable file of the 4, there is no single file 
> contains whole data of that row, but each one does contain full picture of 
> individual unit ( I don't know what I should call this unit, but it will be 
> larger than one column, right?). Just in your example, there is no way in any 
> time, we could have SSTable files like following, right:
> 
> - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #}}]
> - Color-1-Data_1.db:  [{Blue: {hex:FF}}]
> - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> - Color-3-Data.db: [{Aqua: {hex: #00}}, {Green: {hex2: #32CD32}}, {Blue: 
> {}}]
> - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
> 
> I don't see any reason Cassandra will ever do that, but just want to confirm, 
> as your 'no' answer to my 2 question is confusion.
> 
> Another question from my originally email, even though I may get the answer 
> already from your example, but just want to confirm it.
> Just use your example, let's say after the first 2 steps:
> 
> - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #FF}}]
> - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> There is a incremental backup. After that, there is following changes coming:
> 
> - Add a column of (key, column, column_value = Green, hex2, #32CD32)
> - Add a row of (key, column, column_value = Aqua, hex, #00)
> - Delete a row of (key = Blue)
>  memtable is flushed => Color-3-Data.db 
> Another incremental backup right now.
> 
> Now in this case, my assumption is only Color-3-Data.db will be in this 
> backup, right? Even though Color-1-Data.db and Color-2-Data.db contains the 
> data of the same row key as Color-3-Data.db, but from a incremental backup 
> point of view, only Color-3-Data.db will be stored.
> 
> The reason I asked those question is that I am thinking to use MapReduce jobs 
> to parse the incremental backup files, and rebuild the snapshot in Hadoop 
> side. Of course, the column families I am doing is pure Fact data. So there 
> is delete/update in Cassandra for these kind of data, just appending. But it 
> is still important for me to understand the SSTable file's content.
> 
> Thanks
> 
> Yong
> 
> 
> 
> Date: Tue, 17 Sep 2013 11:12:01 +0900
> From: ts...@cloudian.com
> To: user@cassandra.apache.org
> Subject: Re: questions related to the SSTable file
> 
> Hi,
> 
> > 1) I will expect same row key could show up in both sstable2json output, as 
> > this one row exists in both SSTable files, right?
> 
> Yes.
> 
> > 2) If so, what is the boundary? Will Cassandra guarantee the column level 
> > as the boundary? What I mean is that for one column's data, it will be 
> > guaranteed to be either in the first file, or 2nd file, right? There is no 
> > chance that Cassandra will cut the data of one colu

Re: questions related to the SSTable file

2013-09-17 Thread Shahab Yunus
java8964, basically are you asking that what will happen if we put large
amount of data in one column of one row at once? How will this blob of data
representing one column and one row i.e. cell will be split into multiple
SSTable? Or in such particular cases it will always be one extra large
SSTable? I am also interesting in knowing the answer.

Regards,
Shahab


On Tue, Sep 17, 2013 at 9:50 AM, java8964 java8964 wrote:

> Thanks Dean for clarification.
>
> But if I put hundreds of megabyte data of one row through one put, what
> you mean is Cassandra will put all of them into one SSTable, even the data
> is very big, right? Let's assume in this case the Memtables in memory
> reaches its limit by this change.
> What I want to know is if there is possibility 2 SSTables be generated in
> above case, what is the boundary.
>
> I understand if following changes apply to the same row key as above
> example, additional SSTable file could be generated. That is clear for me.
>
> Yong
>
> > From: dean.hil...@nrel.gov
> > To: user@cassandra.apache.org
> > Date: Tue, 17 Sep 2013 07:39:48 -0600
> > Subject: Re: questions related to the SSTable file
> >
> > You have to first understand the rules of
> >
> > 1. Sstables are immutable so Color-1-Data.db will not be modified and
> only deleted once compacted
> > 2. Memtables are flushed when reaching a limit so if Blue:{hex} is
> modified, it is done in the in-memory memtable that is eventually flushed
> > 3. Once flushed, it is an SSTable on disk and you have two values for
> "hex" both with two timestamps so we know which one is the current value
> >
> > When it finally compacts, the old value can go away.
> >
> > Dean
> >
> > From: java8964 java8964  java8...@hotmail.com>>
> > Reply-To: "user@cassandra.apache.org"
> mailto:user@cassandra.apache.org>>
> > Date: Tuesday, September 17, 2013 7:32 AM
> > To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> > Subject: RE: questions related to the SSTable file
> >
> > Hi, Takenori:
> >
> > Thanks for your quick reply. Your explain is clear for me understanding
> what compaction mean, and I also can understand now same row key will exist
> in multi SSTable file.
> >
> > But beyond that, I want to know what happen if one row data is too large
> to put in one SSTable file. In your example, the same row exist in multi
> SSTable files as it is keeping changing and flushing into the disk at
> runtime. That's fine, in this case, in every SSTable file of the 4, there
> is no single file contains whole data of that row, but each one does
> contain full picture of individual unit ( I don't know what I should call
> this unit, but it will be larger than one column, right?). Just in your
> example, there is no way in any time, we could have SSTable files like
> following, right:
> >
> > - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #}}]
> > - Color-1-Data_1.db: [{Blue: {hex:FF}}]
> > - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> > - Color-3-Data.db: [{Aqua: {hex: #00}}, {Green: {hex2: #32CD32}},
> {Blue: {}}]
> > - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
> >
> > I don't see any reason Cassandra will ever do that, but just want to
> confirm, as your 'no' answer to my 2 question is confusion.
> >
> > Another question from my originally email, even though I may get the
> answer already from your example, but just want to confirm it.
> > Just use your example, let's say after the first 2 steps:
> >
> > - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #FF}}]
> > - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> > There is a incremental backup. After that, there is following changes
> coming:
> >
> > - Add a column of (key, column, column_value = Green, hex2, #32CD32)
> > - Add a row of (key, column, column_value = Aqua, hex, #00)
> > - Delete a row of (key = Blue)
> >  memtable is flushed => Color-3-Data.db 
> > Another incremental backup right now.
> >
> > Now in this case, my assumption is only Color-3-Data.db will be in this
> backup, right? Even though Color-1-Data.db and Color-2-Data.db contains the
> data of the same row key as Color-3-Data.db, but from a incremental backup
> point of view, only Color-3-Data.db will be stored.
> >
> > The reason I asked those question is that I am thinking to use MapReduce
> jobs to parse the incremental backup files, and rebuild the snapshot in
> Hadoop side. Of course, the column families I am doing is pure Fact data.
> So there is delete/update in Cassandra for these kind of data, just
> appending. But it is still important for me to understand the SSTable
> file's content.
> >
> > Thanks
> >
> > Yong
> >
> >
> > 
> > Date: Tue, 17 Sep 2013 11:12:01 +0900
> > From: ts...@cloudian.com
> > To: user@cassandra

Re: questions related to the SSTable file

2013-09-17 Thread Hiller, Dean
Netflix created file streaming in astyanax into cassandra specifically because 
writing too big a column cell is a bad thing.  The limit is really dependent on 
use case….do you have servers writing 1000's of 200Meg files at the same 
time….if so, astyanax streaming may be a better way to go there where it 
divides up the file amongst cells and rows.

I know the limit of a row size is really your hard disk space and the column 
count if I remember goes into billions though realistically, I think beyond 10 
million might slow down a bit….all I know is we tested up to 10 million columns 
with no issues in our use-case.

So you mean at this time, I could get 2 SSTable files, both contain column 
"Blue" for the same row key, right?

Yes

In this case, I should be fine as value of the "Blue" column contain the 
timestamp to help me to find out which is the last change, right?

Yes

In MR world, each file COULD be processed by different Mapper, but will be sent 
to the same reducer as both data will be shared same key.

If that is the way you are writing it, then yes

Dean

From: Shahab Yunus mailto:shahab.yu...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, September 17, 2013 7:54 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: questions related to the SSTable file

derstand if following changes apply to the same row key as above example, 
additional SSTable file could be generated. That is


RE: questions related to the SSTable file

2013-09-17 Thread java8964 java8964
Another question related to the SSTable files generated in the incremental 
backup is not really ONLY incremental delta, right? It will include more than 
delta in the SSTable files.
I will use the example to show my question:
first, we have this data in the SSTable file 1:
rowkey(1), columns (maker=honda).
later, if we add one column in the same key:
rowkey(1), columns (maker=honda, color=blue)
The data above being flushed to another SSTable file 2. In this case, it will 
be part of the incremental backup at this time. But in fact, it will contain 
both old data (make=honda), plus new changes (color=blue).
So in fact, incremental backup of Cassandra is just hard link all the new 
SSTable files being generated during the incremental backup period. It could 
contain any data, not just the data being update/insert/delete in this period, 
correct?
Thanks
Yong

> From: dean.hil...@nrel.gov
> To: user@cassandra.apache.org
> Date: Tue, 17 Sep 2013 08:11:36 -0600
> Subject: Re: questions related to the SSTable file
> 
> Netflix created file streaming in astyanax into cassandra specifically 
> because writing too big a column cell is a bad thing.  The limit is really 
> dependent on use case….do you have servers writing 1000's of 200Meg files at 
> the same time….if so, astyanax streaming may be a better way to go there 
> where it divides up the file amongst cells and rows.
> 
> I know the limit of a row size is really your hard disk space and the column 
> count if I remember goes into billions though realistically, I think beyond 
> 10 million might slow down a bit….all I know is we tested up to 10 million 
> columns with no issues in our use-case.
> 
> So you mean at this time, I could get 2 SSTable files, both contain column 
> "Blue" for the same row key, right?
> 
> Yes
> 
> In this case, I should be fine as value of the "Blue" column contain the 
> timestamp to help me to find out which is the last change, right?
> 
> Yes
> 
> In MR world, each file COULD be processed by different Mapper, but will be 
> sent to the same reducer as both data will be shared same key.
> 
> If that is the way you are writing it, then yes
> 
> Dean
> 
> From: Shahab Yunus mailto:shahab.yu...@gmail.com>>
> Reply-To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Date: Tuesday, September 17, 2013 7:54 AM
> To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Subject: Re: questions related to the SSTable file
> 
> derstand if following changes apply to the same row key as above example, 
> additional SSTable file could be generated. That is
  

Re: questions related to the SSTable file

2013-09-17 Thread Robert Coli
On Tue, Sep 17, 2013 at 6:54 AM, Shahab Yunus wrote:

> java8964, basically are you asking that what will happen if we put large
> amount of data in one column of one row at once? How will this blob of data
> representing one column and one row i.e. cell will be split into multiple
> SSTable? Or in such particular cases it will always be one extra large
> SSTable? I am also interesting in knowing the answer.
>

A memtable is flushed to a single SSTable at whatever size it is as a
memtable. You cannot have a memtable larger than (a portion of) your JVM
heap.

=Rob


I don't understand shuffle progress

2013-09-17 Thread Juan Manuel Formoso
I am running shuffle on a cluster after upgrading to 1.2.X, and I don't
understand how to check progress.

I'm counting the lines of cassandra-shuffle ls, and it decreases VERY
slowly. Sometimes not at all after 24 hours of processing.

Is that value accurate? Does the shuffle operation supports
disabling/re-enabling (or restarting the cluster) and resuming from the
last position? Or does it start over?

-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: I don't understand shuffle progress

2013-09-17 Thread Robert Coli
On Tue, Sep 17, 2013 at 12:13 PM, Juan Manuel Formoso wrote:

> I am running shuffle on a cluster after upgrading to 1.2.X, and I don't
> understand how to check progress.
>

If your shuffle succeeds, you will be the first reported case of shuffle
succeeding on a non-test cluster. Until I hear a report of someone having
real world success, I recommend against using shuffle.

If you want to enable vnodes on a cluster with existing data, IMO you
should fork writes and bulk load a replacement cluster.


> I'm counting the lines of cassandra-shuffle ls, and it decreases VERY
> slowly. Sometimes not at all after 24 hours of processing.
>

I have heard reports of shuffle taking an insanely long amount of time,
such as this, as well.


> Is that value accurate?
>

Probably.


> Does the shuffle operation supports disabling/re-enabling (or restarting
> the cluster) and resuming from the last position? Or does it start over?
>

Yes, via the arguments "enable" and "disable". "clear" is what you use if
you want to clear the queue and start over.

Note that once you have started shuffle, you don't want to add/remove a
node until the shuffle is complete.

https://issues.apache.org/jira/browse/CASSANDRA-5525

=Rob


Re: Multi-dc restart impact

2013-09-17 Thread Robert Coli
On Thu, Sep 5, 2013 at 6:14 AM, Chris Burroughs
wrote:

> We have a 2 DC cluster running cassandra 1.2.9.  They are in actual
> physically separate DCs on opposite coasts of the US, not just logical
> ones.  The primary use of this cluster is CL.ONE reads out of a single
> column family.  My expectation was that in such a scenario restarts would
> have minimal impact in the DC where the restart occurred, and no impact in
> the remote DC.
>
> We are seeing instead that restarts in one DC have a dramatic impact on
> performance in the other (let's call them DCs "A" and "B").
>

Did you end up filing a JIRA on this, or some other outcome?

=Rob


Re: questions related to the SSTable file

2013-09-17 Thread Shahab Yunus
Thanks Robert for the answer. It makes sense. If that happens then it means
that your design or use case needs some rework ;)

Regards,
Shahab


On Tue, Sep 17, 2013 at 2:37 PM, java8964 java8964 wrote:

> Another question related to the SSTable files generated in the incremental
> backup is not really ONLY incremental delta, right? It will include more
> than delta in the SSTable files.
>
> I will use the example to show my question:
>
> first, we have this data in the SSTable file 1:
>
> rowkey(1), columns (maker=honda).
>
> later, if we add one column in the same key:
>
> rowkey(1), columns (maker=honda, color=blue)
>
> The data above being flushed to another SSTable file 2. In this case, it
> will be part of the incremental backup at this time. But in fact, it will
> contain both old data (make=honda), plus new changes (color=blue).
>
> So in fact, incremental backup of Cassandra is just hard link all the new
> SSTable files being generated during the incremental backup period. It
> could contain any data, not just the data being update/insert/delete in
> this period, correct?
>
> Thanks
>
> Yong
>
> > From: dean.hil...@nrel.gov
> > To: user@cassandra.apache.org
> > Date: Tue, 17 Sep 2013 08:11:36 -0600
> > Subject: Re: questions related to the SSTable file
> >
> > Netflix created file streaming in astyanax into cassandra specifically
> because writing too big a column cell is a bad thing. The limit is really
> dependent on use case….do you have servers writing 1000's of 200Meg files
> at the same time….if so, astyanax streaming may be a better way to go there
> where it divides up the file amongst cells and rows.
> >
> > I know the limit of a row size is really your hard disk space and the
> column count if I remember goes into billions though realistically, I think
> beyond 10 million might slow down a bit….all I know is we tested up to 10
> million columns with no issues in our use-case.
> >
> > So you mean at this time, I could get 2 SSTable files, both contain
> column "Blue" for the same row key, right?
> >
> > Yes
> >
> > In this case, I should be fine as value of the "Blue" column contain the
> timestamp to help me to find out which is the last change, right?
> >
> > Yes
> >
> > In MR world, each file COULD be processed by different Mapper, but will
> be sent to the same reducer as both data will be shared same key.
> >
> > If that is the way you are writing it, then yes
> >
> > Dean
> >
> > From: Shahab Yunus mailto:shahab.yu...@gmail.com
> >>
> > Reply-To: "user@cassandra.apache.org"
> mailto:user@cassandra.apache.org>>
> > Date: Tuesday, September 17, 2013 7:54 AM
> > To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> > Subject: Re: questions related to the SSTable file
> >
> > derstand if following changes apply to the same row key as above
> example, additional SSTable file could be generated. That is
>


Bad Request: line no viable alternative at input '-'

2013-09-17 Thread Grga Pitich
cassandra 2.0, cqlsh 3.0.

I was trying to import csv file (fields delimited by |) however import
chokes after certain number of lines with the following error:

Bad Request: line 1:300 no viable alternative at input '-'
Aborting import at record #514 (line 515). Previously-inserted values still
present.

the filed in question is a float number (longitude) and all fields defined
in cassandra table (cf) match values in the csv file.

what may be wrong?


Re: How can I switch from multiple disks to a single disk?

2013-09-17 Thread Juan Manuel Formoso
Anyone who knows for sure if this would work?

Thanks!

On Monday, September 16, 2013, sankalp kohli wrote:

> I think you can do by moving all the sstables under one drive. I am not
> sure though. The sstables names should be unique across drives.
>
>
> On Mon, Sep 16, 2013 at 10:14 AM, Juan Manuel Formoso 
> 
> > wrote:
>
>> Because I ran out of space when shuffling, I was forced to add multiple
>> disks on my Cassandra nodes.
>>
>> When I finish compacting, cleaning up, and repairing, I'd like to remove
>> them and return to one disk per node.
>>
>> What is the procedure to make the switch?
>> Can I just kill cassandra, move the data from one disk to the other,
>> remove the configuration for the second disk, and re-start cassandra?
>>
>> I assume files will not have the same name and thus not be overwritten,
>> is this the case? Does it pick it up just like that?
>>
>> Thanks
>>
>> --
>> *Juan Manuel Formoso
>> *Senior Geek
>> http://twitter.com/juanformoso
>> http://seniorgeek.com.ar
>> LLAP
>>
>
>

-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: I don't understand shuffle progress

2013-09-17 Thread Juan Manuel Formoso
> If your shuffle succeeds, you will be the first reported case of
shuffle succeeding on a non-test cluster.

Awesome! :O

I'll try to migrate to a new cluster then.

Any better alternatives than creating a small application that reads from
one cluster and inserts in the new one that anybody can suggest?

On Tuesday, September 17, 2013, Robert Coli wrote:

> On Tue, Sep 17, 2013 at 12:13 PM, Juan Manuel Formoso 
> 
> >wrote:
>
> > I am running shuffle on a cluster after upgrading to 1.2.X, and I don't
> > understand how to check progress.
> >
>
> If your shuffle succeeds, you will be the first reported case of shuffle
> succeeding on a non-test cluster. Until I hear a report of someone having
> real world success, I recommend against using shuffle.
>
> If you want to enable vnodes on a cluster with existing data, IMO you
> should fork writes and bulk load a replacement cluster.
>
>
> > I'm counting the lines of cassandra-shuffle ls, and it decreases VERY
> > slowly. Sometimes not at all after 24 hours of processing.
> >
>
> I have heard reports of shuffle taking an insanely long amount of time,
> such as this, as well.
>
>
> > Is that value accurate?
> >
>
> Probably.
>
>
> > Does the shuffle operation supports disabling/re-enabling (or restarting
> > the cluster) and resuming from the last position? Or does it start over?
> >
>
> Yes, via the arguments "enable" and "disable". "clear" is what you use if
> you want to clear the queue and start over.
>
> Note that once you have started shuffle, you don't want to add/remove a
> node until the shuffle is complete.
>
> https://issues.apache.org/jira/browse/CASSANDRA-5525
>
> =Rob
>


-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Cassandra column family using Composite Columns

2013-09-17 Thread Raihan Jamal
I am designing the Column Family for our use case in Cassandra. I am
planning to go with Dynamic Column Structure.

Below is my requirement per our use case-

user-id   column1123  (Column1-Value  Column1-SchemaName  LMD)

 For each user-id, we will be storing column1 and its value and that value
will store these three things always-

(Column1-Value   Column1-SchemaName LMD)

 In my above example, I have show only one columns but it might have more
columns and those columns will also follow the same concept.

Now I am not sure, how to store these three things always at a column value
level? Should I use composite columns at a column level? if yes, then I am
not sure how to make a column family like this in Cassandra.

Column1-value will be in binary, Column1-SchemaName will be String,
LMD will be DateType.

 This is what I have so far-

create column family USER_DATA
with key_validation_class = 'UTF8Type'
and comparator = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and gc_grace = 86400
and column_metadata = [ {column_name : 'lmd', validation_class : DateType}];

 Can anyone help me in designing the column family for this? Thanks.


Re: I don't understand shuffle progress

2013-09-17 Thread David McNelis
Stable loader is the way to go to load up the new cluster.

On Tuesday, September 17, 2013, Juan Manuel Formoso wrote:

> > If your shuffle succeeds, you will be the first reported case of
> shuffle succeeding on a non-test cluster.
>
> Awesome! :O
>
> I'll try to migrate to a new cluster then.
>
> Any better alternatives than creating a small application that reads from
> one cluster and inserts in the new one that anybody can suggest?
>
> On Tuesday, September 17, 2013, Robert Coli wrote:
>
>> On Tue, Sep 17, 2013 at 12:13 PM, Juan Manuel Formoso > >wrote:
>>
>> > I am running shuffle on a cluster after upgrading to 1.2.X, and I don't
>> > understand how to check progress.
>> >
>>
>> If your shuffle succeeds, you will be the first reported case of shuffle
>> succeeding on a non-test cluster. Until I hear a report of someone having
>> real world success, I recommend against using shuffle.
>>
>> If you want to enable vnodes on a cluster with existing data, IMO you
>> should fork writes and bulk load a replacement cluster.
>>
>>
>> > I'm counting the lines of cassandra-shuffle ls, and it decreases VERY
>> > slowly. Sometimes not at all after 24 hours of processing.
>> >
>>
>> I have heard reports of shuffle taking an insanely long amount of time,
>> such as this, as well.
>>
>>
>> > Is that value accurate?
>> >
>>
>> Probably.
>>
>>
>> > Does the shuffle operation supports disabling/re-enabling (or restarting
>> > the cluster) and resuming from the last position? Or does it start over?
>> >
>>
>> Yes, via the arguments "enable" and "disable". "clear" is what you use if
>> you want to clear the queue and start over.
>>
>> Note that once you have started shuffle, you don't want to add/remove a
>> node until the shuffle is complete.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-5525
>>
>> =Rob
>>
>
>
> --
> *Juan Manuel Formoso
> *Senior Geek
> http://twitter.com/juanformoso
> http://seniorgeek.com.ar
> LLAP
>


Re: questions related to the SSTable file

2013-09-17 Thread Takenori Sato
> So in fact, incremental backup of Cassandra is just hard link all the new
SSTable files being generated during the incremental backup period. It
could contain any data, not just the data being update/insert/delete in
this period, correct?

Correct.

But over time, some old enough SSTable files are usually shared across
multiple snapshots.


On Wed, Sep 18, 2013 at 3:37 AM, java8964 java8964 wrote:

> Another question related to the SSTable files generated in the incremental
> backup is not really ONLY incremental delta, right? It will include more
> than delta in the SSTable files.
>
> I will use the example to show my question:
>
> first, we have this data in the SSTable file 1:
>
> rowkey(1), columns (maker=honda).
>
> later, if we add one column in the same key:
>
> rowkey(1), columns (maker=honda, color=blue)
>
> The data above being flushed to another SSTable file 2. In this case, it
> will be part of the incremental backup at this time. But in fact, it will
> contain both old data (make=honda), plus new changes (color=blue).
>
> So in fact, incremental backup of Cassandra is just hard link all the new
> SSTable files being generated during the incremental backup period. It
> could contain any data, not just the data being update/insert/delete in
> this period, correct?
>
> Thanks
>
> Yong
>
> > From: dean.hil...@nrel.gov
> > To: user@cassandra.apache.org
> > Date: Tue, 17 Sep 2013 08:11:36 -0600
>
> > Subject: Re: questions related to the SSTable file
> >
> > Netflix created file streaming in astyanax into cassandra specifically
> because writing too big a column cell is a bad thing. The limit is really
> dependent on use case….do you have servers writing 1000's of 200Meg files
> at the same time….if so, astyanax streaming may be a better way to go there
> where it divides up the file amongst cells and rows.
> >
> > I know the limit of a row size is really your hard disk space and the
> column count if I remember goes into billions though realistically, I think
> beyond 10 million might slow down a bit….all I know is we tested up to 10
> million columns with no issues in our use-case.
> >
> > So you mean at this time, I could get 2 SSTable files, both contain
> column "Blue" for the same row key, right?
> >
> > Yes
> >
> > In this case, I should be fine as value of the "Blue" column contain the
> timestamp to help me to find out which is the last change, right?
> >
> > Yes
> >
> > In MR world, each file COULD be processed by different Mapper, but will
> be sent to the same reducer as both data will be shared same key.
> >
> > If that is the way you are writing it, then yes
> >
> > Dean
> >
> > From: Shahab Yunus mailto:shahab.yu...@gmail.com
> >>
> > Reply-To: "user@cassandra.apache.org"
> mailto:user@cassandra.apache.org>>
> > Date: Tuesday, September 17, 2013 7:54 AM
> > To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> > Subject: Re: questions related to the SSTable file
> >
> > derstand if following changes apply to the same row key as above
> example, additional SSTable file could be generated. That is
>


Re: How can I switch from multiple disks to a single disk?

2013-09-17 Thread Robert Coli
On Tue, Sep 17, 2013 at 4:01 PM, Juan Manuel Formoso wrote:

> Anyone who knows for sure if this would work?


Sankalp Kohli (whose last name is phonetically awesome!) has pointed you in
the correct direction.

To be a bit more explicit :

1) determine if sstable names are unique across drives (they should be)
2) pre-copy all sstables from all source drives to target single drive
3) drain and stop cassandra
4) re-copy all sstables from all source drives to target single drive, with
--delete or equivalent option to rsync such that you delete any files
missing from source drives due to compaction in the interim
5) start cassandra with new conf file with single drive
6) if it doesn't work for some unforseen reason, you still have all your
sstables in the old dirs, so just revert the conf file and fail back

=Rob


Re: How can I switch from multiple disks to a single disk?

2013-09-17 Thread Juan Manuel Formoso
Thanks! But, shouldn't I be able to just stop Cassandra, copy the files,
change the config and restart? Why should I drain?

My RF+consistency level can handle one replica down (I forgot to mention
that in my OP, apologies)

Would it work in theory?

On Tuesday, September 17, 2013, Robert Coli wrote:

> On Tue, Sep 17, 2013 at 4:01 PM, Juan Manuel Formoso 
> 
> > wrote:
>
>> Anyone who knows for sure if this would work?
>
>
> Sankalp Kohli (whose last name is phonetically awesome!) has pointed you
> in the correct direction.
>
> To be a bit more explicit :
>
> 1) determine if sstable names are unique across drives (they should be)
> 2) pre-copy all sstables from all source drives to target single drive
> 3) drain and stop cassandra
> 4) re-copy all sstables from all source drives to target single drive,
> with --delete or equivalent option to rsync such that you delete any files
> missing from source drives due to compaction in the interim
> 5) start cassandra with new conf file with single drive
> 6) if it doesn't work for some unforseen reason, you still have all your
> sstables in the old dirs, so just revert the conf file and fail back
>
> =Rob
>
>

-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: I don't understand shuffle progress

2013-09-17 Thread Juan Manuel Formoso
Will the new cluster be evenly balanced? Remember that the old one was pre
1.2.X, so I had no vnodes

I haven't used that tool, will look it up.

Thanks for the suggestion!

On Tuesday, September 17, 2013, David McNelis wrote:

> Stable loader is the way to go to load up the new cluster.
>
> On Tuesday, September 17, 2013, Juan Manuel Formoso wrote:
>
>> > If your shuffle succeeds, you will be the first reported case of
>> shuffle succeeding on a non-test cluster.
>>
>> Awesome! :O
>>
>> I'll try to migrate to a new cluster then.
>>
>> Any better alternatives than creating a small application that reads from
>> one cluster and inserts in the new one that anybody can suggest?
>>
>> On Tuesday, September 17, 2013, Robert Coli wrote:
>>
>>> On Tue, Sep 17, 2013 at 12:13 PM, Juan Manuel Formoso <
>>> jform...@gmail.com>wrote:
>>>
>>> > I am running shuffle on a cluster after upgrading to 1.2.X, and I don't
>>> > understand how to check progress.
>>> >
>>>
>>> If your shuffle succeeds, you will be the first reported case of shuffle
>>> succeeding on a non-test cluster. Until I hear a report of someone having
>>> real world success, I recommend against using shuffle.
>>>
>>> If you want to enable vnodes on a cluster with existing data, IMO you
>>> should fork writes and bulk load a replacement cluster.
>>>
>>>
>>> > I'm counting the lines of cassandra-shuffle ls, and it decreases VERY
>>> > slowly. Sometimes not at all after 24 hours of processing.
>>> >
>>>
>>> I have heard reports of shuffle taking an insanely long amount of time,
>>> such as this, as well.
>>>
>>>
>>> > Is that value accurate?
>>> >
>>>
>>> Probably.
>>>
>>>
>>> > Does the shuffle operation supports disabling/re-enabling (or
>>> restarting
>>> > the cluster) and resuming from the last position? Or does it start
>>> over?
>>> >
>>>
>>> Yes, via the arguments "enable" and "disable". "clear" is what you use if
>>> you want to clear the queue and start over.
>>>
>>> Note that once you have started shuffle, you don't want to add/remove a
>>> node until the shuffle is complete.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-5525
>>>
>>> =Rob
>>>
>>
>>
>> --
>> *Juan Manuel Formoso
>> *Senior Geek
>> http://twitter.com/juanformoso
>> http://seniorgeek.com.ar
>> LLAP
>>
>

-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: questions related to the SSTable file

2013-09-17 Thread Robert Coli
On Tue, Sep 17, 2013 at 5:46 PM, Takenori Sato  wrote:

> > So in fact, incremental backup of Cassandra is just hard link all the
> new SSTable files being generated during the incremental backup period. It
> could contain any data, not just the data being update/insert/delete in
> this period, correct?
>
> Correct.
>
> But over time, some old enough SSTable files are usually shared across
> multiple snapshots.
>

To be clear, "incremental backup" feature backs up the data being modified
in that period, because it writes only those files to the incremental
backup dir as hard links, between full snapshots.

http://www.datastax.com/docs/1.0/operations/backup_restore
"
When incremental backups are enabled (disabled by default), Cassandra
hard-links each flushed SSTable to a backups directory under the keyspace
data directory. This allows you to store backups offsite without
transferring entire snapshots. Also, incremental backups combine with
snapshots to provide a dependable, up-to-date backup mechanism.
"

What Takenori is referring to is that a full snapshot is in some ways an
"incremental backup" because it shares hard linked SSTables with other
snapshots.

=Rob


Re: I don't understand shuffle progress

2013-09-17 Thread Robert Coli
On Tue, Sep 17, 2013 at 4:00 PM, Juan Manuel Formoso wrote:

> Any better alternatives than creating a small application that reads from
> one cluster and inserts in the new one that anybody can suggest?
>
>
http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

In theory if you wanted to do the "copy-the-files" method while enabling
vnodes on the target cluster, you could :

1) create new target cluster with vnodes enabled
2) fork writes so they go to both source and target cluster
3) copy 100% of sstables from all source nodes to all target nodes (being
sure to ensure non-collision of sstables of names, probably by adding a few
hundreds/thousands to the sequence of various nodes in a predictable
fashion)
4) be certain that you did not accidentally resurrect data from purged
source sstables in 3)
5) run cleanup compaction on all nodes in target cluster
6) turn off writes to old source cluster

=Rob
* notes that this process would make a good blog post.. :D


Re: How can I switch from multiple disks to a single disk?

2013-09-17 Thread Robert Coli
On Tue, Sep 17, 2013 at 5:57 PM, Juan Manuel Formoso wrote:

> Thanks! But, shouldn't I be able to just stop Cassandra, copy the files,
> change the config and restart? Why should I drain?


If you drain, you reduce to zero the chance of having some problem with the
SSTables flushed as a result of the restart.

However you are correct that you probably do not "need" to do so... :D

=Rob


Re: questions related to the SSTable file

2013-09-17 Thread Takenori Sato(Cloudian)

Thanks, Rob, for clarifying!

- Takenori

(2013/09/18 10:01), Robert Coli wrote:
On Tue, Sep 17, 2013 at 5:46 PM, Takenori Sato > wrote:


> So in fact, incremental backup of Cassandra is just hard link
all the new SSTable files being generated during the incremental
backup period. It could contain any data, not just the data being
update/insert/delete in this period, correct?

Correct.

But over time, some old enough SSTable files are usually shared
across multiple snapshots.


To be clear, "incremental backup" feature backs up the data being 
modified in that period, because it writes only those files to the 
incremental backup dir as hard links, between full snapshots.


http://www.datastax.com/docs/1.0/operations/backup_restore
"
When incremental backups are enabled (disabled by default), Cassandra 
hard-links each flushed SSTable to a backups directory under the 
keyspace data directory. This allows you to store backups offsite 
without transferring entire snapshots. Also, incremental backups 
combine with snapshots to provide a dependable, up-to-date backup 
mechanism.

"

What Takenori is referring to is that a full snapshot is in some ways 
an "incremental backup" because it shares hard linked SSTables with 
other snapshots.


=Rob




Re: I don't understand shuffle progress

2013-09-17 Thread Paulo Motta
That is very disappointing to hear. Vnodes support is one of the main
reasons we're upgrading from 1.1.X to 1.2.X.

So you're saying the only feasible way of enabling VNodes on an upgraded C*
1.2 is by doing fork writes to a brand new cluster + bulk load of sstables
from the old cluster? Or is it possible to succeed on shuffling, even if
that means waiting some weeks for the shuffle to complete?


2013/9/17 Robert Coli 

> On Tue, Sep 17, 2013 at 4:00 PM, Juan Manuel Formoso  >wrote:
>
> > Any better alternatives than creating a small application that reads from
> > one cluster and inserts in the new one that anybody can suggest?
> >
> >
> http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra
>
> In theory if you wanted to do the "copy-the-files" method while enabling
> vnodes on the target cluster, you could :
>
> 1) create new target cluster with vnodes enabled
> 2) fork writes so they go to both source and target cluster
> 3) copy 100% of sstables from all source nodes to all target nodes (being
> sure to ensure non-collision of sstables of names, probably by adding a few
> hundreds/thousands to the sequence of various nodes in a predictable
> fashion)
> 4) be certain that you did not accidentally resurrect data from purged
> source sstables in 3)
> 5) run cleanup compaction on all nodes in target cluster
> 6) turn off writes to old source cluster
>
> =Rob
> * notes that this process would make a good blog post.. :D
>



-- 
Paulo Ricardo

-- 
European Master in Distributed Computing***
Royal Institute of Technology - KTH
*
*Instituto Superior Técnico - IST*
*http://paulormg.com*


RE: questions related to the SSTable file

2013-09-17 Thread java8964 java8964
Quote: 
"
To be clear, "incremental backup" feature backs up the data being modified in 
that period, because it writes only those files to the incremental backup dir 
as hard links, between full snapshots."
I thought I was clearer, but your clarification confused me again.My 
understanding so far from all the answer I got so far, I believe, the more 
accurate statement of "incremental backup" should be "incremental backup" 
feature backs up the SSTable files being generated in that period. 
But there is no way we can be sure that these SSTable files will ONLY contain 
modified data. So the statement being quoted above is not exactly right. I 
agree that all the modified data in that period will be in the incremental 
sstable files, but a lot of other unmodified data will be in them too.
If we have 2 rows data with different row key in the same memtable, and if only 
2nd row being modified. When the memtable is flushed to SSTable file, it will 
contain both rows, and both will be in the incremental backup files. So for 
first row, nothing change, but it will be in the incremental backup.
If I have one row with one column, now a new column is added, and whole row in 
one memtable being flushed to SSTable file, as also in this incremental backup. 
For first column, nothing change, but it will still be in incremental backup 
file.
The point I tried to make is this is important if I design an ETL to consume 
the incremental backup SSTable files. As above example, I have to realize that 
in the incremental backup sstable files, they could or most likely contain old 
data which was previous being processed already. That will require additional 
logic and responsibility in the ETL to handle it, or any outsider SSTable 
consumer to pay attention to it.
Yong
Date: Tue, 17 Sep 2013 18:01:45 -0700
Subject: Re: questions related to the SSTable file
From: rc...@eventbrite.com
To: user@cassandra.apache.org

On Tue, Sep 17, 2013 at 5:46 PM, Takenori Sato  wrote:

> So in fact, incremental backup of Cassandra is just hard link all the new 
> SSTable files being generated during the incremental backup period. It could 
> contain any data, not just the data being update/insert/delete in this 
> period, correct?


Correct.
But over time, some old enough SSTable files are usually shared across multiple 
snapshots. 

To be clear, "incremental backup" feature backs up the data being modified in 
that period, because it writes only those files to the incremental backup dir 
as hard links, between full snapshots.

http://www.datastax.com/docs/1.0/operations/backup_restore
"When incremental backups are enabled (disabled by default), Cassandra 
hard-links each flushed SSTable to a backups directory under the keyspace data 
directory. This allows you to store backups offsite without transferring entire 
snapshots. Also, incremental backups combine with snapshots to provide a 
dependable, up-to-date backup mechanism.
"

What Takenori is referring to is that a full snapshot is in some ways an 
"incremental backup" because it shares hard linked SSTables with other 
snapshots.

=Rob  

Re: I don't understand shuffle progress

2013-09-17 Thread David McNelis
As Rob mentioned, no one (myself included) has successfully used shuffle in
the wild (that I've heard of).

Shuffle is *supposed* to be a transparent background process... and is
designed, in theory, to take a long time to run (weeks is the right way to
think of it).

Be sure to keep an eye on your drive space if you are going to wait it out.
 Unless you have < 1/2 of your drives in use you are going to need to run
cleanup periodically to avoid running out of disk space, because shuffle
NEVER removes data, only makes copies of the data on the new destination
nodes.

I think that is the area that people tend to see the most failures, because
the newer versions of cassandra can survive OK with more than 1/2 the disk
in use, more and more people are using > 50% of their disks.


On Tue, Sep 17, 2013 at 9:41 PM, Paulo Motta wrote:

> That is very disappointing to hear. Vnodes support is one of the main
> reasons we're upgrading from 1.1.X to 1.2.X.
>
> So you're saying the only feasible way of enabling VNodes on an upgraded
> C* 1.2 is by doing fork writes to a brand new cluster + bulk load of
> sstables from the old cluster? Or is it possible to succeed on shuffling,
> even if that means waiting some weeks for the shuffle to complete?
>
>
> 2013/9/17 Robert Coli 
>
>> On Tue, Sep 17, 2013 at 4:00 PM, Juan Manuel Formoso > >wrote:
>>
>> > Any better alternatives than creating a small application that reads
>> from
>> > one cluster and inserts in the new one that anybody can suggest?
>> >
>> >
>> http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra
>>
>> In theory if you wanted to do the "copy-the-files" method while enabling
>> vnodes on the target cluster, you could :
>>
>> 1) create new target cluster with vnodes enabled
>> 2) fork writes so they go to both source and target cluster
>> 3) copy 100% of sstables from all source nodes to all target nodes (being
>> sure to ensure non-collision of sstables of names, probably by adding a
>> few
>> hundreds/thousands to the sequence of various nodes in a predictable
>> fashion)
>> 4) be certain that you did not accidentally resurrect data from purged
>> source sstables in 3)
>> 5) run cleanup compaction on all nodes in target cluster
>> 6) turn off writes to old source cluster
>>
>> =Rob
>> * notes that this process would make a good blog post.. :D
>>
>
>
>
> --
> Paulo Ricardo
>
> --
> European Master in Distributed Computing***
> Royal Institute of Technology - KTH
> *
> *Instituto Superior Técnico - IST*
> *http://paulormg.com*
>


Re: I don't understand shuffle progress

2013-09-17 Thread Juan Manuel Formoso
I have been trying to make it work non-stop since Friday afternoon. I
officially gave up today and I'm going to go the sstableloader route.

I wrote a little of what I tried here:
http://seniorgeek.com.ar/blog/2013/09/16/tips-for-running-cassandra-shuffle/
(I have yet to update it with the fact that I had to give up)

I would strongly recommend you don't use shuffle unless you have very
little data to move around.


On Tue, Sep 17, 2013 at 10:41 PM, Paulo Motta wrote:

> That is very disappointing to hear. Vnodes support is one of the main
> reasons we're upgrading from 1.1.X to 1.2.X.
>
> So you're saying the only feasible way of enabling VNodes on an upgraded C*
> 1.2 is by doing fork writes to a brand new cluster + bulk load of sstables
> from the old cluster? Or is it possible to succeed on shuffling, even if
> that means waiting some weeks for the shuffle to complete?
>
>
> 2013/9/17 Robert Coli 
>
> > On Tue, Sep 17, 2013 at 4:00 PM, Juan Manuel Formoso  > >wrote:
> >
> > > Any better alternatives than creating a small application that reads
> from
> > > one cluster and inserts in the new one that anybody can suggest?
> > >
> > >
> > http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra
> >
> > In theory if you wanted to do the "copy-the-files" method while enabling
> > vnodes on the target cluster, you could :
> >
> > 1) create new target cluster with vnodes enabled
> > 2) fork writes so they go to both source and target cluster
> > 3) copy 100% of sstables from all source nodes to all target nodes (being
> > sure to ensure non-collision of sstables of names, probably by adding a
> few
> > hundreds/thousands to the sequence of various nodes in a predictable
> > fashion)
> > 4) be certain that you did not accidentally resurrect data from purged
> > source sstables in 3)
> > 5) run cleanup compaction on all nodes in target cluster
> > 6) turn off writes to old source cluster
> >
> > =Rob
> > * notes that this process would make a good blog post.. :D
> >
>
>
>
> --
> Paulo Ricardo
>
> --
> European Master in Distributed Computing***
> Royal Institute of Technology - KTH
> *
> *Instituto Superior Técnico - IST*
> *http://paulormg.com*
>



-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Unsupported major.minor version 51.0

2013-09-17 Thread Gary Zhao
Hello

I just saw this error. Anyone knows how to fix it?

[root@gary-vm1 apache-cassandra-2.0.0]# bin/cassandra -f
xss =  -ea -javaagent:bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms4014M -Xmx4014M -Xmn400M
-XX:+HeapDumpOnOutOfMemoryError -Xss180k
Exception in thread "main" java.lang.UnsupportedClassVersionError:
org/apache/cassandra/service/CassandraDaemon : Unsupported major.minor
version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
Could not find the main class:
org.apache.cassandra.service.CassandraDaemon.  Program will exit.
[root@gary-vm1 apache-cassandra-2.0.0]# java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

Thanks
Gary


Re: Unsupported major.minor version 51.0

2013-09-17 Thread Jason Wee
cassandra 2.0, then use oracle or open jdk version 7.

Jason


On Wed, Sep 18, 2013 at 11:21 AM, Gary Zhao  wrote:

> Hello
>
> I just saw this error. Anyone knows how to fix it?
>
> [root@gary-vm1 apache-cassandra-2.0.0]# bin/cassandra -f
> xss =  -ea -javaagent:bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms4014M -Xmx4014M -Xmn400M
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k
> Exception in thread "main" java.lang.UnsupportedClassVersionError:
> org/apache/cassandra/service/CassandraDaemon : Unsupported major.minor
> version 51.0
> at java.lang.ClassLoader.defineClass1(Native Method)
>  at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
>  at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>  at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>  at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> Could not find the main class:
> org.apache.cassandra.service.CassandraDaemon.  Program will exit.
> [root@gary-vm1 apache-cassandra-2.0.0]# java -version
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> Thanks
> Gary
>


Re: Unsupported major.minor version 51.0

2013-09-17 Thread Gary Zhao
Thanks Jason. Does Node.js work with 2.0? I'm wondering which version
should I run. Thanks.


On Tue, Sep 17, 2013 at 8:24 PM, Jason Wee  wrote:

> cassandra 2.0, then use oracle or open jdk version 7.
>
> Jason
>
>
> On Wed, Sep 18, 2013 at 11:21 AM, Gary Zhao  wrote:
>
>> Hello
>>
>> I just saw this error. Anyone knows how to fix it?
>>
>> [root@gary-vm1 apache-cassandra-2.0.0]# bin/cassandra -f
>> xss =  -ea -javaagent:bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
>> -XX:ThreadPriorityPolicy=42 -Xms4014M -Xmx4014M -Xmn400M
>> -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>> Exception in thread "main" java.lang.UnsupportedClassVersionError:
>> org/apache/cassandra/service/CassandraDaemon : Unsupported major.minor
>> version 51.0
>> at java.lang.ClassLoader.defineClass1(Native Method)
>>  at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
>> at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
>>  at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>>  at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>> Could not find the main class:
>> org.apache.cassandra.service.CassandraDaemon.  Program will exit.
>> [root@gary-vm1 apache-cassandra-2.0.0]# java -version
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> Thanks
>> Gary
>>
>
>


Re: questions related to the SSTable file

2013-09-17 Thread Takenori Sato
Yong,

It seems there is still a misunderstanding.

> But there is no way we can be sure that these SSTable files will ONLY
contain modified data. So the statement being quoted above is not exactly
right. I agree that all the modified data in that period will be in the
incremental sstable files, but a lot of other unmodified data will be in
them too.

memtable(a new sstable) contains only modified data as I explained by the
example.

> If we have 2 rows data with different row key in the same memtable, and
if only 2nd row being modified. When the memtable is flushed to SSTable
file, it will contain both rows, and both will be in the incremental backup
files. So for first row, nothing change, but it will be in the incremental
backup.

Unless the first row is modified, it does not exist in memtable at all.

> If I have one row with one column, now a new column is added, and whole
row in one memtable being flushed to SSTable file, as also in this
incremental backup. For first column, nothing change, but it will still be
in incremental backup file.

For example, if it works as you understand, then, Color-2 should contain
two more rows, Lavender, and Blue with an existing column, hex, like the
following. But it's not.

- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]

--> your understanding
- Color-2-Data.db: [{Lavender: {hex: #E6E6FA}}, {Green: {hex: #008000}},
{Blue: {hex: #FF}, {hex2: #2c86ff}}]
* Row, Lavender, and Column Blue's hex have no changes


> The point I tried to make is this is important if I design an ETL to
consume the incremental backup SSTable files. As above example, I have to
realize that in the incremental backup sstable files, they could or most
likely contain old data which was previous being processed already. That
will require additional logic and responsibility in the ETL to handle it,
or any outsider SSTable consumer to pay attention to it.

I suggest to try org.apache.cassandra.tools.SSTableExport, then you will
see what's going on under the hood.

- Takenori








On Wed, Sep 18, 2013 at 10:51 AM, java8964 java8964 wrote:

> Quote:
>
> "
> To be clear, "incremental backup" feature backs up the data being modified
> in that period, because it writes only those files to the incremental
> backup dir as hard links, between full snapshots.
> "
>
> I thought I was clearer, but your clarification confused me again.
> My understanding so far from all the answer I got so far, I believe, the
> more accurate statement of "incremental backup" should be "incremental
> backup" feature backs up the SSTable files being generated in that period.
>
> But there is no way we can be sure that these SSTable files will ONLY
> contain modified data. So the statement being quoted above is not exactly
> right. I agree that all the modified data in that period will be in the
> incremental sstable files, but a lot of other unmodified data will be in
> them too.
>
> If we have 2 rows data with different row key in the same memtable, and if
> only 2nd row being modified. When the memtable is flushed to SSTable file,
> it will contain both rows, and both will be in the incremental backup
> files. So for first row, nothing change, but it will be in the incremental
> backup.
>
> If I have one row with one column, now a new column is added, and whole
> row in one memtable being flushed to SSTable file, as also in this
> incremental backup. For first column, nothing change, but it will still be
> in incremental backup file.
>
> The point I tried to make is this is important if I design an ETL to
> consume the incremental backup SSTable files. As above example, I have to
> realize that in the incremental backup sstable files, they could or most
> likely contain old data which was previous being processed already. That
> will require additional logic and responsibility in the ETL to handle it,
> or any outsider SSTable consumer to pay attention to it.
>
> Yong
>
> --
> Date: Tue, 17 Sep 2013 18:01:45 -0700
>
> Subject: Re: questions related to the SSTable file
> From: rc...@eventbrite.com
> To: user@cassandra.apache.org
>
>
> On Tue, Sep 17, 2013 at 5:46 PM, Takenori Sato  wrote:
>
> > So in fact, incremental backup of Cassandra is just hard link all the
> new SSTable files being generated during the incremental backup period. It
> could contain any data, not just the data being update/insert/delete in
> this period, correct?
>
> Correct.
>
> But over time, some old enough SSTable files are usually shared across
> multiple snapshots.
>
>
> To be clear, "incremental backup" feature backs up the data being modified
> in that period, because it writes only those files to the incremental
> backup dir as hard links, between full snapshots.
>
> http://www.datastax.com/docs/1.0/operations/backup_restore
> "
> When incremental backups are enabled (disabled by default), Cassandra
> hard-links each flus

Re: Unsupported major.minor version 51.0

2013-09-17 Thread Dave Brosius

Cassandra-2.0 needs to run on jdk7




On 09/17/2013 11:21 PM, Gary Zhao wrote:

Hello

I just saw this error. Anyone knows how to fix it?

[root@gary-vm1 apache-cassandra-2.0.0]# bin/cassandra -f
xss =  -ea -javaagent:bin/../lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4014M 
-Xmx4014M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
Exception in thread "main" java.lang.UnsupportedClassVersionError: 
org/apache/cassandra/service/CassandraDaemon : Unsupported major.minor 
version 51.0

at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
Could not find the main class: 
org.apache.cassandra.service.CassandraDaemon.  Program will exit.

[root@gary-vm1 apache-cassandra-2.0.0]# java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

Thanks
Gary




Re: Unsupported major.minor version 51.0

2013-09-17 Thread Jason Wee
Sorry, I have no knowledge on Node.js, probably someone else might know.

Jason


On Wed, Sep 18, 2013 at 11:29 AM, Gary Zhao  wrote:

> Thanks Jason. Does Node.js work with 2.0? I'm wondering which version
> should I run. Thanks.
>
>
> On Tue, Sep 17, 2013 at 8:24 PM, Jason Wee  wrote:
>
>> cassandra 2.0, then use oracle or open jdk version 7.
>>
>> Jason
>>
>>
>> On Wed, Sep 18, 2013 at 11:21 AM, Gary Zhao  wrote:
>>
>>> Hello
>>>
>>> I just saw this error. Anyone knows how to fix it?
>>>
>>> [root@gary-vm1 apache-cassandra-2.0.0]# bin/cassandra -f
>>> xss =  -ea -javaagent:bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
>>> -XX:ThreadPriorityPolicy=42 -Xms4014M -Xmx4014M -Xmn400M
>>> -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>>> Exception in thread "main" java.lang.UnsupportedClassVersionError:
>>> org/apache/cassandra/service/CassandraDaemon : Unsupported major.minor
>>> version 51.0
>>> at java.lang.ClassLoader.defineClass1(Native Method)
>>>  at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
>>> at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
>>>  at
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>>>  at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>> Could not find the main class:
>>> org.apache.cassandra.service.CassandraDaemon.  Program will exit.
>>> [root@gary-vm1 apache-cassandra-2.0.0]# java -version
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> Thanks
>>> Gary
>>>
>>
>>
>


Re: Unsupported major.minor version 51.0

2013-09-17 Thread Skye Book
It depends on which driver you're using, and drivers for 1.2 may indeed mostly 
work with 2.0.  I've been using Astyanax (for Java) with 2.0 even though it 
doesn't specifically support the new release.

On Sep 17, 2013, at 11:34 PM, Jason Wee  wrote:

> Sorry, I have no knowledge on Node.js, probably someone else might know.
> 
> Jason
> 
> 
> On Wed, Sep 18, 2013 at 11:29 AM, Gary Zhao  wrote:
> Thanks Jason. Does Node.js work with 2.0? I'm wondering which version should 
> I run. Thanks.
> 
> 
> On Tue, Sep 17, 2013 at 8:24 PM, Jason Wee  wrote:
> cassandra 2.0, then use oracle or open jdk version 7.
> 
> Jason
> 
> 
> On Wed, Sep 18, 2013 at 11:21 AM, Gary Zhao  wrote:
> Hello
> 
> I just saw this error. Anyone knows how to fix it?
> 
> [root@gary-vm1 apache-cassandra-2.0.0]# bin/cassandra -f 
> xss =  -ea -javaagent:bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities 
> -XX:ThreadPriorityPolicy=42 -Xms4014M -Xmx4014M -Xmn400M 
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k
> Exception in thread "main" java.lang.UnsupportedClassVersionError: 
> org/apache/cassandra/service/CassandraDaemon : Unsupported major.minor 
> version 51.0
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>   at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> Could not find the main class: org.apache.cassandra.service.CassandraDaemon.  
> Program will exit.
> [root@gary-vm1 apache-cassandra-2.0.0]# java -version
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> 
> Thanks
> Gary
> 
> 
>