Re: SSTables not opened on new cluste

2013-05-04 Thread Philippe
After trying every possible combination of parameters, config and the rest,
I ended up downgrading the new node from 1.1.11 to 1.1.2 to match the
existing 3 nodes. And that solved the issue immediately : the schema was
propagated and the node started handling reads & writes.


2013/5/3 Philippe 

> Unfortunately not, I've moved on to trying to add the nodes the current
> cluster and then decommission the "old" ones.
>
> But even that is not working, this is the strangest of things : while
> trying to add a new node, I
>  - set its token to  an existing value+1
>  - ensure the yaml (clutser name, partitionner, etc) are the same
>  - verified I can connect to the :7000 ports across machines
>  - cleared the data and commitlog directory
>
> when I start the node, it goes through the bootstrap process but never
> "imports" the schema from the cluster (the data/ directory stays empty,
> permissions are correct) and I get errors when some reads come in after the
> bootstrap completes. I've tried restarting the node with -D replace.token
> The log is below. I've been at it all day so I've been adding the node &
> decommissioning it, assuming that would clear any state in the cluster.
>
>  INFO [main] 2013-05-03 21:10:22,126 StorageService.java (line 788)
> JOINING: waiting for ring information
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,421 Gossiper.java (line 838)
> InetAddress /172.16.0.42 is now dead.
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,427 StorageService.java (line
> 1303) Removing token Token(bytes[03abaa3001]) for /172.16.0.42
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,429 ColumnFamilyStore.java (line
> 674) Enqueuing flush of Memtable-LocationInfo@62602(47/58
> serialized/live bytes, 2 ops)
>
>  INFO [FlushWriter:1] 2013-05-03 21:10:26,430 Memtable.java (line 264)
> Writing Memtable-LocationInfo@62602(47/58 serialized/live bytes, 2
> ops)
>
>  INFO [FlushWriter:1] 2013-05-03 21:10:26,542 Memtable.java (line 305)
> Completed flushing
> /var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-hf-2-Data.db
> (160 bytes) for commitlog position ReplayPosition(segmentId=1367608221854,
> position=769)
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,544 Gossiper.java (line 858)
> Node /{other_node_A} is now part of the cluster
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,545 Gossiper.java (line 824)
> InetAddress /{other_node_A} is now UP
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,547 ColumnFamilyStore.java (line
> 674) Enqueuing flush of Memtable-LocationInfo@1290155526(30/37
> serialized/live bytes, 1 ops)
>
>  INFO [FlushWriter:1] 2013-05-03 21:10:26,548 Memtable.java (line 264)
> Writing Memtable-LocationInfo@1290155526(30/37 serialized/live bytes, 1
> ops)
>
>  INFO [FlushWriter:1] 2013-05-03 21:10:26,662 Memtable.java (line 305)
> Completed flushing
> /var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-hf-3-Data.db
> (84 bytes) for commitlog position ReplayPosition(segmentId=1367608221854,
> position=862)
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,664 Gossiper.java (line 858)
> Node /{other_node_B} is now part of the cluster
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,665 Gossiper.java (line 824)
> InetAddress /{other_node_B} is now UP
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,666 ColumnFamilyStore.java (line
> 674) Enqueuing flush of Memtable-LocationInfo@899933215(30/37
> serialized/live bytes, 1 ops)
>
>  INFO [FlushWriter:1] 2013-05-03 21:10:26,667 Memtable.java (line 264)
> Writing Memtable-LocationInfo@899933215(30/37 serialized/live bytes, 1
> ops)
>
>  INFO [FlushWriter:1] 2013-05-03 21:10:26,779 Memtable.java (line 305)
> Completed flushing
> /var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-hf-4-Data.db
> (84 bytes) for commitlog position ReplayPosition(segmentId=1367608221854,
> position=955)
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,781 Gossiper.java (line 858)
> Node /{other_node_C} is now part of the cluster
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,782 Gossiper.java (line 824)
> InetAddress /{other_node_C} is now UP
>
>  INFO [GossipStage:1] 2013-05-03 21:10:26,784 ColumnFamilyStore.java (line
> 674) Enqueuing flush of Memtable-LocationInfo@1542631196(30/37
> serialized/live bytes, 1 ops)
>
>  INFO [FlushWriter:1] 2013-05-03 21:10:26,785 Memtable.java (line 264)
> Writing Memtable-LocationInfo@1542631196(30/37 serialized/live bytes, 1
> ops)
>
>  INFO [CompactionExecutor:4] 2013-05-03 21:10:26,789 CompactionTask.java
> (line 107) Compacting
> [SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-hf-3-Data.db'),
> SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-hf-4-Data.db'),
> SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-hf-2-Data.db'),
> SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-hf-1-Data.db')]
>
>  INFO [FlushWriter:1] 2013-05-03 21:10:26,939 Memtable.java (li

Re: cql query

2013-05-04 Thread Sri Ramya
thanks for ur reply.

On Fri, May 3, 2013 at 11:45 PM, Jabbar Azam  wrote:

> Sorry Sri,  I've never used hector. How ever it's straightforward in
> astyanax.  There are examples on the github page.
> On 3 May 2013 18:50, "Sri Ramya"  wrote:
>
>> Can you tell me how to do this in hector. Can you give me some example.
>>
>> On Fri, May 3, 2013 at 10:29 AM, Sri Ramya  wrote:
>>
>>> thank you very much. i will try and let you know whether its working or
>>> not
>>>
>>>
>>>
>>> On Thu, May 2, 2013 at 7:04 PM, Jabbar Azam  wrote:
>>>
 Hello Sri,

 As far as I know you can if name and age are part of your partition key
 and timestamp is the cluster key e.g.

 create table columnfamily (
 name varchar,
 age varchar,
 tstamp timestamp,
partition key((name, age), tstamp)
 );




 Thanks

 Jabbar Azam


 On 2 May 2013 11:45, Sri Ramya  wrote:

> hi
>
> Can some body tell me is it possible to to do multiple query on
> cassandra
> like "Select * from columnfamily where name='foo' and age ='21' and
> timestamp >= 'unixtimestamp' ";
>
> Please tell me some guidence for these kind of queries
>
>   Thank you
>


>>>
>>


Look table structuring advice

2013-05-04 Thread Jabbar Azam
Hello,

I want to create a simple table holding user roles e.g.

create table roles (
   name text,
   primary key(name)
);

If I want to get a list of roles for some admin tool I can use the
following CQL3

select * from roles;

When a new name is added it will be stored on a different host and doing a
select * is going to be inefficient because the table will be stored across
the cluster and each node will respond. The number of roles may be less
than or just greater than a dozen. I'm not sure if I'm storing the roles
correctly.


The other thing I'm thinking about is that when I've read the roles once
then I can cache them.

Thanks

Jabbar Azam


Re: Look table structuring advice

2013-05-04 Thread Dave Brosius

if you want to store all the roles in one row, you can do

create table roles (synthetic_key int, name text, primary 
key(synthetic_key, name)) with compact storage


when inserting roles, just use the same key

insert into roles (synthetic_key, name) values (0, 'Programmer');
insert into roles (synthetic_key, name) values (0, 'Tester');

and use

select * from roles where synthetic_key = 0;


(or some arbitrary key value you decide to use)

the that data is stored on one node (and its replicas)

of course if the number of roles grows to be large, you lose most of the 
value in having a cluster.




On 05/04/2013 12:09 PM, Jabbar Azam wrote:

Hello,

I want to create a simple table holding user roles e.g.

create table roles (
   name text,
   primary key(name)
);

If I want to get a list of roles for some admin tool I can use the 
following CQL3


select * from roles;

When a new name is added it will be stored on a different host and 
doing a select * is going to be inefficient because the table will be 
stored across the cluster and each node will respond. The number of 
roles may be less than or just greater than a dozen. I'm not sure if 
I'm storing the roles correctly.



The other thing I'm thinking about is that when I've read the roles 
once then I can cache them.


Thanks

Jabbar Azam




Re: Look table structuring advice

2013-05-04 Thread Jabbar Azam
I never thought about using a synthetic key, but in this instance with
about a dozen rows it's probably ok. Thanks for your great idea.

Where  did you read about the synthetic key idea? I've not come across it
before.

Thanks

Jabbar Azam


On 4 May 2013 19:30, Dave Brosius  wrote:

> if you want to store all the roles in one row, you can do
>
> create table roles (synthetic_key int, name text, primary
> key(synthetic_key, name)) with compact storage
>
> when inserting roles, just use the same key
>
> insert into roles (synthetic_key, name) values (0, 'Programmer');
> insert into roles (synthetic_key, name) values (0, 'Tester');
>
> and use
>
> select * from roles where synthetic_key = 0;
>
>
> (or some arbitrary key value you decide to use)
>
> the that data is stored on one node (and its replicas)
>
> of course if the number of roles grows to be large, you lose most of the
> value in having a cluster.
>
>
>
>
> On 05/04/2013 12:09 PM, Jabbar Azam wrote:
>
>> Hello,
>>
>> I want to create a simple table holding user roles e.g.
>>
>> create table roles (
>>name text,
>>primary key(name)
>> );
>>
>> If I want to get a list of roles for some admin tool I can use the
>> following CQL3
>>
>> select * from roles;
>>
>> When a new name is added it will be stored on a different host and doing
>> a select * is going to be inefficient because the table will be stored
>> across the cluster and each node will respond. The number of roles may be
>> less than or just greater than a dozen. I'm not sure if I'm storing the
>> roles correctly.
>>
>>
>> The other thing I'm thinking about is that when I've read the roles once
>> then I can cache them.
>>
>> Thanks
>>
>> Jabbar Azam
>>
>
>


Re: Look table structuring advice

2013-05-04 Thread Dave Brosius
I just used 'synthetic key' as it's a term used with standard rdbms to 
mean a key that means nothing in the model, and is often a sequence or such.


There's nothing particular to cassandra specific to that term. Just 
thought it would be something familiar to someone who understood rdbms.


On 05/04/2013 02:44 PM, Jabbar Azam wrote:
I never thought about using a synthetic key, but in this instance with 
about a dozen rows it's probably ok. Thanks for your great idea.


Where  did you read about the synthetic key idea? I've not come across 
it before.


Thanks

Jabbar Azam


On 4 May 2013 19:30, Dave Brosius > wrote:


if you want to store all the roles in one row, you can do

create table roles (synthetic_key int, name text, primary
key(synthetic_key, name)) with compact storage

when inserting roles, just use the same key

insert into roles (synthetic_key, name) values (0, 'Programmer');
insert into roles (synthetic_key, name) values (0, 'Tester');

and use

select * from roles where synthetic_key = 0;


(or some arbitrary key value you decide to use)

the that data is stored on one node (and its replicas)

of course if the number of roles grows to be large, you lose most
of the value in having a cluster.




On 05/04/2013 12:09 PM, Jabbar Azam wrote:

Hello,

I want to create a simple table holding user roles e.g.

create table roles (
   name text,
   primary key(name)
);

If I want to get a list of roles for some admin tool I can use
the following CQL3

select * from roles;

When a new name is added it will be stored on a different host
and doing a select * is going to be inefficient because the
table will be stored across the cluster and each node will
respond. The number of roles may be less than or just greater
than a dozen. I'm not sure if I'm storing the roles correctly.


The other thing I'm thinking about is that when I've read the
roles once then I can cache them.

Thanks

Jabbar Azam







hector or astyanax

2013-05-04 Thread 李 晗
hello,
i want to know which cassandra client is better?
and what are their advantages and disadvantages?

thanks

Cassandra running High Load with no one using the cluster

2013-05-04 Thread Aiman Parvaiz
Since last night I am seeing CPU load spikes on our cassandra
boxes(Occasionally load goes up to 20, its a Amazon EC2 c1.xlarge with 300
iops EBS). After digging around a little I believe its related to heap
memory and flushing memtables.

>From logs:
WARN 03:22:03,414 Heap is 0.7786981388910019 full.  You may need to reduce
memtable and/or cache sizes.  Cassandra will now flush up to the two
largest memtables to free up memory.  Adjust flush_largest_memtables_at
threshold in cassandra.yaml if you don't want Cassandra to do this
automatically

WARN 03:22:03,415 Flushing CFS(Keyspace='XXX', ColumnFamily='') to
relieve memory pressure

I have three nodes and only 2 of them are hitting this high load, moreover
cluster is under extremely light load, no one is using it since yesterday
and I still see this load.

I also observed that `top -H` showed many threads in Sleep state and only a
handful in R state. `nodetool cfstat` showed the following for the
ColumnFamily in the above stated cassandra logs:

  Column Family: 
SSTable count: 8
Space used (live): 1479005837
Space used (total): 1479005837
Number of Keys (estimate): 2923008
Memtable Columns Count: 35375
Memtable Data Size: 7088479
Memtable Switch Count: 2393
Read Count: 2339668632
Read Latency: 3.042 ms.
Write Count: 360448535
Write Latency: 0.079 ms.
Pending Tasks: 0
Bloom Filter False Positives: 143197
Bloom Filter False Ratio: 0.73004
Bloom Filter Space Used: 7142048
Compacted row minimum size: 73
Compacted row maximum size: **785939**
Compacted row mean size: 1957

`Compacted Row maximum size` for other ColumFamily is significantly less
than this number.

When starting this cluster we set
> JVM_OPTS="$JVM_OPTS -Xss1000k"

We are using cassandra 1.1.0 and open-6-jdk

Can any one please help me understand whey with no load on the system I am
still seeing such high load on my machines.

Thanks


Re: Hadoop jobs and data locality

2013-05-04 Thread Shamim
Hello,
  We have also came across this issue in our dev environment, when we upgrade 
Cassandra from 1.1.5 to 1.2.1 version. I have mentioned this issue in few times 
in this forum but haven't got any answer yet. For quick work around you can use 
pig.splitCombination false in your pig script to avoid this issue, but it will 
make one of your task with a very big amount of data. I can't figure out why 
this happening in newer version of Cassandra, strongly guess some thing goes 
wrong in Cassandra implementation of LoadFunc or in Murmur3Partition (it's my 
guess).
Here is my earliar post
http://www.mail-archive.com/user@cassandra.apache.org/msg28016.html
http://www.mail-archive.com/user@cassandra.apache.org/msg29425.html

Any comment from authors will be highly appreciated
P.S. please keep me in touch with any solution or hints.

-- 
Best regards
  Shamim A.



03.05.2013, 19:25, "cscetbon@orange.com" :
> Hi,
> I'm using Pig to calculate the sum of a columns from a columnfamily (scan of 
> all rows) and I've read that input data locality is supported at 
> http://wiki.apache.org/cassandra/HadoopSupport
> However when I execute my Pig script Hadoop assigns only one mapper to the 
> task and not one mapper on each node (replication factor = 1).  FYI, I've 8 
> mappers available (2 per node).
> Is there anything that can disable the data locality feature ?
>
> Thanks
> --
> Cyril SCETBON
>
> _
>  Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc pas etre diffuses, 
> exploites ou copies sans autorisation. Si vous avez recu ce message par 
> erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les 
> pieces jointes. Les messages electroniques etant susceptibles d'alteration, 
> France Telecom - Orange decline toute responsabilite si ce message a ete 
> altere, deforme ou falsifie. Merci. This message and its attachments may 
> contain confidential or privileged information that may be protected by law; 
> they should not be distributed, used or copied without authorisation. If you 
> have received this email in error, please notify the sender and delete this 
> message and its attachments. As emails may be altered, France Telecom - 
> Orange is not liable for messages that have been modified, changed or 
> falsified. Thank you.