Re: Cassandra 1.0 hangs during GC

2012-07-23 Thread Nikolay Kоvshov
 21th I have mirgated to cassandra 1.1.2 but see no improvement 

cat /var/log/cassandra/Earth1.log | grep "GC for"
INFO [ScheduledTasks:1] 2012-05-22 17:42:48,445 GCInspector.java (line 123) GC 
for ParNew: 345 ms for 1 collections, 82451888 used; max is 8464105472
INFO [ScheduledTasks:1] 2012-05-23 02:47:13,911 GCInspector.java (line 123) GC 
for ParNew: 312 ms for 1 collections, 110617416 used; max is 8464105472
INFO [ScheduledTasks:1] 2012-05-23 11:57:54,317 GCInspector.java (line 123) GC 
for ParNew: 298 ms for 1 collections, 98161920 used; max is 8464105472
INFO [ScheduledTasks:1] 2012-07-02 08:52:37,019 GCInspector.java (line 123) GC 
for ParNew: 196886 ms for 1 collections, 2310058496 used; max is 8464105472
INFO [ScheduledTasks:1] 2012-07-16 17:41:25,940 GCInspector.java (line 123) GC 
for ParNew: 200146 ms for 1 collections, 2345987088 used; max is 8464105472
=== Migrated from 1.0.0 to 1.1.2
INFO [ScheduledTasks:1] 2012-07-21 09:05:08,280 GCInspector.java (line 122) GC 
for ParNew: 282 ms for 1 collections, 466406864 used; max is 8464105472
INFO [ScheduledTasks:1] 2012-07-21 12:38:43,132 GCInspector.java (line 122) GC 
for ParNew: 233 ms for 1 collections, 405269504 used; max is 8464105472
INFO [ScheduledTasks:1] 2012-07-22 02:29:09,596 GCInspector.java (line 122) GC 
for ParNew: 253 ms for 1 collections, 389700768 used; max is 8464105472
INFO [ScheduledTasks:1] 2012-07-22 17:45:46,357 GCInspector.java (line 122) GC 
for ParNew: 57391 ms for 1 collections, 400083984 used; max is 8464105472

Memory and yaml memory-related settings are default 
I do not do deletes
I have 2 CF's and no secondary indexes

LiveRatio's:
 INFO [pool-1-thread-1] 2012-06-09 02:36:07,759 Memtable.java (line 177) 
CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted 
was 1.0).  calculation took 85ms for 6236 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:47,614 Memtable.java (line 213) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
(just-counted was 1.0).  calculation took 8ms for 1 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:51,012 Memtable.java (line 213) 
CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted 
was 1.0).  calculation took 99ms for 1094 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:51,331 Memtable.java (line 213) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
(just-counted was 1.0).  calculation took 80ms for 242 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:51,856 Memtable.java (line 213) 
CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted 
was 1.0).  calculation took 505ms for 2678 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:52,881 Memtable.java (line 213) 
CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted 
was 1.0).  calculation took 776ms for 5236 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:52,945 Memtable.java (line 213) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
(just-counted was 1.0).  calculation took 64ms for 389 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:55,162 Memtable.java (line 213) 
CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted 
was 1.0).  calculation took 1378ms for 8948 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:55,304 Memtable.java (line 213) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
(just-counted was 1.0).  calculation took 140ms for 1082 columns
 INFO [MemoryMeter:1] 2012-07-21 09:05:08,439 Memtable.java (line 213) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
2.5763038186160894 (just-counted was 2.5763038186160894).  calculation took 
8796ms for 102193 columns

18.07.2012, 07:51, "aaron morton" :
> Assuming all the memory and yaml settings default that does not sound 
> right. The first thought would be the memory meter not counting correctly...
> Do you do a lot of deletes ?
> Do you have a lot of CF's and/or secondary indexes ?
> Can you see log lines about the "liveRatio" for your cf's ?
> I would upgrade to 1.0.10 before getting too carried away though.
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/07/2012, at 8:14 PM, Nikolay Kоvshov wrote:
>
>> This is a cluster of 2 nodes, each having 8G of operating memory, 
>> replicationfactor=2
>> Write/read pressure is quite low and almost never exceeds 10/second
>>
>> From time to time (2-3 times in a month) I see GC activity in logs and for 
>> this time cassandra stops responding to requests which results in a timeout 
>> in upper-layer application. Total time of unavailability can be over 5 
>> minues (like in the following case)
>>
>> What can I do with that? Wiil it become much worse when my cluster grows up?
>>
>> INFO [GossipTasks:1] 2012-07-16 13:10:50,055 Gossiper.java (line 736) 
>> InetAddress /10.220.50.9 is now dead.
>>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,056 GCInspector.j

Re: Unreachable node, not in nodetool ring

2012-07-23 Thread Alain RODRIGUEZ
Does anyone knows how to totally remove a dead node that only appears
when doing a "describe cluster" from the cli ?

I still got this issue in my production cluster.

Alain

2012/7/20 Alain RODRIGUEZ :
> Hi Aaron,
>
> I have repaired and cleanup both nodes already and I did it after any
> change on my ring (It tooks me a while btw :)).
>
> The node *.211 is actually out of the ring and out of my control
> 'cause I don't have the server anymore (EC2 instance terminated a few
> days ago).
>
> Alain
>
> 2012/7/20 aaron morton :
>> I would:
>>
>> * run repair on 10.58.83.109
>> * run cleanup on 10.59.21.241 (I assume this was the first node).
>>
>> It looks like 0.56.62.211 is out of the cluster.
>>
>> Cheers
>>
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote:
>>
>> Not sure if this may help :
>>
>> nodetool -h localhost gossipinfo
>> /10.58.83.109
>>  RELEASE_VERSION:1.1.2
>>  RACK:1b
>>  LOAD:5.9384978406E10
>>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
>>  DC:eu-west
>>  STATUS:NORMAL,85070591730234615865843651857942052864
>>  RPC_ADDRESS:0.0.0.0
>> /10.248.10.94
>>  RELEASE_VERSION:1.1.2
>>  LOAD:3.0128207422E10
>>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
>>  STATUS:LEFT,0,1342866804032
>>  RPC_ADDRESS:0.0.0.0
>> /10.56.62.211
>>  RELEASE_VERSION:1.1.2
>>  LOAD:11594.0
>>  RACK:1b
>>  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
>>  DC:eu-west
>>  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
>>  STATUS:removed,170141183460469231731687303715884105727,1342453967415
>>  RPC_ADDRESS:0.0.0.0
>> /10.59.21.241
>>  RELEASE_VERSION:1.1.2
>>  RACK:1b
>>  LOAD:1.08667047094E11
>>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
>>  DC:eu-west
>>  STATUS:NORMAL,0
>>  RPC_ADDRESS:0.0.0.0
>>
>> Story :
>>
>> I had 2 node cluster
>>
>> 10.248.10.94 Token 0
>> 10.59.21.241 Token 85070591730234615865843651857942052864
>>
>> Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1
>> (170141183460469231731687303715884105727). This failed, I removed
>> token.
>>
>> I repeat the previous operation with the node 10.59.21.241 and it went
>> fine. Next I decommissionned the node 10.248.10.94 and moved
>> 10.59.21.241 to the token 0.
>>
>> Now I am on the situation described before.
>>
>> Alain
>>
>>
>> 2012/7/19 Alain RODRIGUEZ :
>>
>> Hi, I wasn't able to see the token used currently by the 10.56.62.211
>>
>> (ghost node).
>>
>>
>> I already removed the token 6 days ago :
>>
>>
>> -> "Removing token 170141183460469231731687303715884105727 for
>> /10.56.62.211"
>>
>>
>> "- check in cassandra log. It is possible you see a log line telling
>>
>> you 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same
>>
>> token"
>>
>>
>> Nothing like that in the logs
>>
>>
>> I tried the following without success :
>>
>>
>> $ nodetool -h localhost removetoken 170141183460469231731687303715884105727
>>
>> Exception in thread "main" java.lang.UnsupportedOperationException:
>>
>> Token not found.
>>
>> ...
>>
>>
>> I really thought this was going to work :-).
>>
>>
>> Any other ideas ?
>>
>>
>> Alain
>>
>>
>> PS : I heard that Octo is a nice company and you use Cassandra so I
>>
>> guess you're fine in there :-). I wish you the best thanks for your
>>
>> help.
>>
>>
>> 2012/7/19 Olivier Mallassi :
>>
>> I got that a couple of time (due to DNS issues in our infra)
>>
>>
>> what you could try
>>
>> - check in cassandra log. It is possible you see a log line telling you
>>
>> 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same token
>>
>> - if 10.56.62.211 is up, try decommission (via nodetool)
>>
>> - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1
>>
>> - use removetoken (via nodetool) to remove the token associated with
>>
>> 10.56.62.211. in case of failure, you can use removetoken -f instead.
>>
>>
>> then, the unreachable IP should have disappeared.
>>
>>
>>
>> HTH
>>
>>
>> On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ 
>>
>> wrote:
>>
>>
>> Hi,
>>
>>
>> I tried to add a node a few days ago and it failed. I finally made it
>>
>> work with an other node but now when I describe cluster on cli I got
>>
>> this :
>>
>>
>> Cluster Information:
>>
>>   Snitch: org.apache.cassandra.locator.Ec2Snitch
>>
>>   Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>
>>   Schema versions:
>>
>>  UNREACHABLE: [10.56.62.211]
>>
>>  e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109]
>>
>>
>> And nodetool ring gives me :
>>
>>
>> Address DC  RackStatus State   Load
>>
>> OwnsToken
>>
>>
>>85070591730234615865843651857942052864
>>
>> 10.59.21.241eu-west 1b  Up Normal  101.17 GB
>>
>> 50.00%  0
>>
>> 10.58.83.109eu-west 1b  Up Normal  55.27 GB
>>
>> 50.00%  85070591730234615865843651857942052864
>>
>>
>> The point, as you can see,

R: Re: Counters values are less than expected [1.0.6 - Java/Pelops]

2012-07-23 Thread cbert...@libero.it


Cannot reproduce ...Written in CL Quorum, RF = 3, cluster of 5 nodes ... I 
suppose it's an issue with the client since it's not the first "strange 
behaviour" with CounterColumns ...



Messaggio originale

Da: aa...@thelastpickle.com

Data: 20/07/2012 11.12

A: 

Ogg: Re: Counters values are less than expected [1.0.6 - Java/Pelops]



Nothing jumps out, can you reproduce the problem ? 
If you can repo it let us know and the RF / CL. 
Good luck.

-Aaron MortonFreelance 
Developer@aaronmortonhttp://www.thelastpickle.com



On 20/07/2012, at 1:07 AM, cbert...@libero.it wrote:Hi all, I have a problem 
with counters I'd like to solve before going in 
production.
When a user write a comment in my platform I increase a counter (there is a 
counter for each user) and I write a new column in the user specific row.
Everything worked fine but yesterday I noticed that the column count of the 
Row was different from the counters value ... 

In my test environment the user had 7 comments, so 7 columns and 7 as value of 
his countercolumn.
I wrote 3 comments in few minutes, the counter value was still 7, the columns 
number was 10!
Counters and columns are written in the same operation. I've checked for my 
application log but all was normal.
I wrote one more comment today to check and now counter is 8 and column number 
is 11 .

I'm trying to get permissions to read the cassandra log (no comment) but in 
the meanwhile I'd like to know if anyone faced problems like this one ... I've 
read that sometimes people had counters bigger than expected due to client 
retry of succesful operation marked as failed ...  

I will post log results ... thanks for any help

Regards,
Carlo









Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread rohit bhatia
You should probably try to break the one row scheme to
2*Number_of_nodes rows scheme.. This should ensure proper distribution
of rows and still allow u to query from a few fixed number of rows.
How u do it depends on how are u gonna choose ur 200-500 columns
during reading (try having them in the same row)

Even if u r forced to put them in seperate rows, u can make the row
key as "some modulus of hash of column name", ensuring symmetry and
easy access of columns...

On Mon, Jul 23, 2012 at 6:02 PM, Ertio Lew  wrote:
> Any ideas/suggestions please?


nodetool move causes summary load to grow

2012-07-23 Thread Nikolay Kоvshov
I have a testing cluster cassandra 1.1.2 with default memory and cache 
settings, 1 CF, 1 KS, RF = 2

This is an empty cluster

10.111.1.141datacenter1 rack1   Up Normal  43.04 KB100.00%  
   0   
10.111.1.142datacenter1 rack1   Up Normal  26.95 KB100.00%  
   85070591730234615865843651857942052864 

Then I fill it with 200 MB of random data

10.111.1.141datacenter1 rack1   Up Normal  153.34 MB   100.00%  
   0   
10.111.1.142datacenter1 rack1   Up Normal  76.94 MB100.00%  
   85070591730234615865843651857942052864

Though second token is something that your python script calculates for equal 
token distribution and keys were random, well, this is still looks reasonable

Now I start moving nodes. I wish them change places

nodetool -h d-st-n2 move 1
nodetool -h d-st-n2 cleanup
Here I expect for the second node to have a load of almost 0, but this does not 
happen

10.111.1.141datacenter1 rack1   Up Normal  195.53 MB   100.00%  
   0   
10.111.1.142datacenter1 rack1   Up Normal  249.82 MB   100.00%  
   1  

nodetool -h d-st-n1 move 85070591730234615865843651857942052864
nodetool -h d-st-n1 cleanup

Nodes did not change places, but the memory grows bigger 

10.111.1.142datacenter1 rack1   Up Normal  271.2 MB100.00%  
   1   
10.111.1.141datacenter1 rack1   Up Normal  195.53 MB   100.00%  
   85070591730234615865843651857942052864

nodetool -h d-st-n1 move z
nodetool -h d-st-n2 move a
nodetool -h d-st-n1 move 0
nodetool -h d-st-n2 move 85070591730234615865843651857942052864
nodetool -h d-st-n1 cleanup && nodetool -h d-st-n2 cleanup

Here I come to the original cluster, but now it is much bigger

10.111.1.141datacenter1 rack1   Up Normal  353.09 MB   100.00%  
   0   
10.111.1.142datacenter1 rack1   Up Normal  195.58 MB   100.00%  
   85070591730234615865843651857942052864

Could you help me understand the effect of nodetool move ? I see no logic in 
these results.


Re: Cassandra 1.0 hangs during GC

2012-07-23 Thread Joost van de Wijgerd
Howmuch memory do you have on the machine. Seems like you have 8G
reserved for the Cassandra java process, If this is all the memory on
the machine you might be swapping. Also which jvm do you use?

kind regards

Joost

On Mon, Jul 23, 2012 at 10:07 AM, Nikolay Kоvshov  wrote:
>  21th I have mirgated to cassandra 1.1.2 but see no improvement
>
> cat /var/log/cassandra/Earth1.log | grep "GC for"
> INFO [ScheduledTasks:1] 2012-05-22 17:42:48,445 GCInspector.java (line 123) 
> GC for ParNew: 345 ms for 1 collections, 82451888 used; max is 8464105472
> INFO [ScheduledTasks:1] 2012-05-23 02:47:13,911 GCInspector.java (line 123) 
> GC for ParNew: 312 ms for 1 collections, 110617416 used; max is 8464105472
> INFO [ScheduledTasks:1] 2012-05-23 11:57:54,317 GCInspector.java (line 123) 
> GC for ParNew: 298 ms for 1 collections, 98161920 used; max is 8464105472
> INFO [ScheduledTasks:1] 2012-07-02 08:52:37,019 GCInspector.java (line 123) 
> GC for ParNew: 196886 ms for 1 collections, 2310058496 used; max is 8464105472
> INFO [ScheduledTasks:1] 2012-07-16 17:41:25,940 GCInspector.java (line 123) 
> GC for ParNew: 200146 ms for 1 collections, 2345987088 used; max is 8464105472
> === Migrated from 1.0.0 to 1.1.2
> INFO [ScheduledTasks:1] 2012-07-21 09:05:08,280 GCInspector.java (line 122) 
> GC for ParNew: 282 ms for 1 collections, 466406864 used; max is 8464105472
> INFO [ScheduledTasks:1] 2012-07-21 12:38:43,132 GCInspector.java (line 122) 
> GC for ParNew: 233 ms for 1 collections, 405269504 used; max is 8464105472
> INFO [ScheduledTasks:1] 2012-07-22 02:29:09,596 GCInspector.java (line 122) 
> GC for ParNew: 253 ms for 1 collections, 389700768 used; max is 8464105472
> INFO [ScheduledTasks:1] 2012-07-22 17:45:46,357 GCInspector.java (line 122) 
> GC for ParNew: 57391 ms for 1 collections, 400083984 used; max is 8464105472
>
> Memory and yaml memory-related settings are default
> I do not do deletes
> I have 2 CF's and no secondary indexes
>
> LiveRatio's:
>  INFO [pool-1-thread-1] 2012-06-09 02:36:07,759 Memtable.java (line 177) 
> CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted 
> was 1.0).  calculation took 85ms for 6236 columns
>  INFO [MemoryMeter:1] 2012-07-21 09:04:47,614 Memtable.java (line 213) 
> CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
> (just-counted was 1.0).  calculation took 8ms for 1 columns
>  INFO [MemoryMeter:1] 2012-07-21 09:04:51,012 Memtable.java (line 213) 
> CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted 
> was 1.0).  calculation took 99ms for 1094 columns
>  INFO [MemoryMeter:1] 2012-07-21 09:04:51,331 Memtable.java (line 213) 
> CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
> (just-counted was 1.0).  calculation took 80ms for 242 columns
>  INFO [MemoryMeter:1] 2012-07-21 09:04:51,856 Memtable.java (line 213) 
> CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted 
> was 1.0).  calculation took 505ms for 2678 columns
>  INFO [MemoryMeter:1] 2012-07-21 09:04:52,881 Memtable.java (line 213) 
> CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted 
> was 1.0).  calculation took 776ms for 5236 columns
>  INFO [MemoryMeter:1] 2012-07-21 09:04:52,945 Memtable.java (line 213) 
> CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
> (just-counted was 1.0).  calculation took 64ms for 389 columns
>  INFO [MemoryMeter:1] 2012-07-21 09:04:55,162 Memtable.java (line 213) 
> CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted 
> was 1.0).  calculation took 1378ms for 8948 columns
>  INFO [MemoryMeter:1] 2012-07-21 09:04:55,304 Memtable.java (line 213) 
> CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
> (just-counted was 1.0).  calculation took 140ms for 1082 columns
>  INFO [MemoryMeter:1] 2012-07-21 09:05:08,439 Memtable.java (line 213) 
> CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
> 2.5763038186160894 (just-counted was 2.5763038186160894).  calculation took 
> 8796ms for 102193 columns
>
> 18.07.2012, 07:51, "aaron morton" :
>> Assuming all the memory and yaml settings default that does not sound right. 
>> The first thought would be the memory meter not counting correctly...
>> Do you do a lot of deletes ?
>> Do you have a lot of CF's and/or secondary indexes ?
>> Can you see log lines about the "liveRatio" for your cf's ?
>> I would upgrade to 1.0.10 before getting too carried away though.
>> Cheers
>>
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 17/07/2012, at 8:14 PM, Nikolay Kоvshov wrote:
>>
>>> This is a cluster of 2 nodes, each having 8G of operating memory, 
>>> replicationfactor=2
>>> Write/read pressure is quite low and almost never exceeds 10/second
>>>
>>> From time to time (2-3 times in a month) I see GC activity in logs and for 
>>> this time cassandra stops responding to r

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Eldad Yamin
in addition, if you don't know how many rows will be needed - in each row,
you can store the key of the next one.
Just like in a linked list.

OR

have 1 row that will hold all the keys that combining your other rows.
1st select the main row (with the keys), then select the other rows.



On Mon, Jul 23, 2012 at 3:40 PM, rohit bhatia  wrote:

> You should probably try to break the one row scheme to
> 2*Number_of_nodes rows scheme.. This should ensure proper distribution
> of rows and still allow u to query from a few fixed number of rows.
> How u do it depends on how are u gonna choose ur 200-500 columns
> during reading (try having them in the same row)
>
> Even if u r forced to put them in seperate rows, u can make the row
> key as "some modulus of hash of column name", ensuring symmetry and
> easy access of columns...
>
> On Mon, Jul 23, 2012 at 6:02 PM, Ertio Lew  wrote:
> > Any ideas/suggestions please?
>


Migrating data from a 0.8.8 -> 1.1.2 ring

2012-07-23 Thread Mike Heffner
Hi,

We are migrating from a 0.8.8 ring to a 1.1.2 ring and we are noticing
missing data post-migration. We use pre-built/configured AMIs so our
preferred route is to leave our existing production 0.8.8 untouched and
bring up a parallel 1.1.2 ring and migrate data into it. Data is written to
the rings via batch processes so we can easily assure that both the
existing and new rings will have the same data post migration.

The ring we are migrating from is:

  * 12 nodes
  * single data-center, 3 AZs
  * 0.8.8

The ring we are migrating to is the same except 1.1.2.

The steps we are taking are:

1. Bring up a 1.1.2 ring in the same AZ/data center configuration with
tokens matching the corresponding nodes in the 0.8.8 ring.
2. Create the same keyspace on 1.1.2.
3. Create each CF in the keyspace on 1.1.2.
4. Flush each node of the 0.8.8 ring.
5. Rsync each non-compacted sstable from 0.8.8 to the corresponding node in
1.1.2.
6. Move each 0.8.8 sstable into the 1.1.2 directory structure by renaming
the file to the  /cassandra/data///-... format.
For example, for the keyspace "Metrics" and CF "epochs_60" we get:
"cassandra/data/Metrics/epochs_60/Metrics-epochs_60-g-941-Data.db".
7. On each 1.1.2 node run `nodetool -h localhost refresh Metrics ` for
each CF in the keyspace. We notice that storage load jumps accordingly.
8. On each 1.1.2 node run `nodetool -h localhost upgradesstables`. This
takes awhile but appears to correctly rewrite each sstable in the new 1.1.x
format. Storage load drops as sstables are compressed.

After these steps we run a script that validates data on the new ring. What
we've noticed is that large portions of the data that was on the 0.8.8 is
not available on the 1.1.2 ring. We've tried reading at both quorum and
ONE, but the resulting data appears missing in both cases.

We have fewer than 143 million row keys in the CFs we're testing and none
of the *-Filter.db files are > 10MB, so I don't believe this is our
problem: https://issues.apache.org/jira/browse/CASSANDRA-3820

Anything else to test verify? Are the steps above correct for this type of
upgrade? Is this type of upgrade/migration supported?

We have also tried running a repair across the cluster after step #8. While
it took a few retries due to
https://issues.apache.org/jira/browse/CASSANDRA-4456, we still had missing
data afterwards.

Any assistance would be appreciated.


Thanks!

Mike

-- 

  Mike Heffner 
  Librato, Inc.


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Mohit Anchlia
On Mon, Jul 23, 2012 at 10:07 AM, Ertio Lew  wrote:

> My major concern is that is it too bad retrieving 300-500 rows (each for a
> single column) in a single read query that I should store all these(around
> a hundred million) columns in a single row?


You could create multiple rows and each row with around 1k-2k columns
depending on how big your columns are. I definitely suggest running some
tests? Is this timeseries data or event type data?


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
Actually these columns are 1 for each entity in my application & I need to
query at any time columns for a list of 300-500 entities in one go.


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Mohit Anchlia
On Mon, Jul 23, 2012 at 10:53 AM, Ertio Lew  wrote:

> Actually these columns are 1 for each entity in my application & I need to
> query at any time columns for a list of 300-500 entities in one go.


Can you describe your situation with small example?


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
For each user in my application, I want to store a *value* that is queried
by using the userId. So there is going to be one column for each user
(userId as col Name & *value* as col Value). Now I want to store these
columns such that can efficiently read columns for  atleast  300-500 users
in a single read query.


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Mohit Anchlia
On Mon, Jul 23, 2012 at 11:00 AM, Ertio Lew  wrote:

> For each user in my application, I want to store a *value* that is queried
> by using the userId. So there is going to be one column for each user
> (userId as col Name & *value* as col Value). Now I want to store these
> columns such that can efficiently read columns for  atleast  300-500 users
> in a single read query.


Is the query timebased or userid based? How do you determine which users to
read first? Do you read all of them or few of them? What's the query
criteria?

It would be helpful to understand exactly how your query works. In NoSQL
there are no Btree indexes, which means you need to store data that is
materialized based on your query pattern.


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
I want to read columns for a randomly selected list of userIds(completely
random). I fetch the data using userIds(which would be used as column names
in case of single row or as rowkeys incase of 1 row for each user) for a
selected list of users. Assume that the application knows the list of
userIds  which it has to demand from DB.


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Mohit Anchlia
On Mon, Jul 23, 2012 at 11:16 AM, Ertio Lew  wrote:

> I want to read columns for a randomly selected list of userIds(completely
> random). I fetch the data using userIds(which would be used as column names
> in case of single row or as rowkeys incase of 1 row for each user) for a
> selected list of users. Assume that the application knows the list of
> userIds  which it has to demand from DB.


Since it's based on user keys it's best to store it as a row key. So one
row per userid and the row key would be the userid. You could also take a
different approach in case your userid is a no. you could do (userid -
(userid % noofusers)) as a row key. This will let you store multiple users
in one row and help with search as well. Since you know your data better
you can chose either one. There are other more involved ways but I think
either of the above should work for you.


CQL3 and column slices

2012-07-23 Thread Josep Blanquer
Hi,

 I am confused as to what is the way to specify column slices for composite
type CFs using CQL3.

I first thought that the way to do so was to use the very ugly and
unintuitive syntax of constructing the PK prefix with equalities, except
the last part of the composite type. But, now, after seeing
https://issues.apache.org/jira/browse/CASSANDRA-4372 , and realizing that
the ugly/unintuitive way to specify that has been taken away (i.e.,
"fixed") ...I don't know what is the way to express it anymore.

In particular, and following the example of 4372...if you have this table
with 6 columns, 5 of them being the composite :

CREATE TABLE bug_test (a int, b int, c int, d int, e int, f text, PRIMARY
KEY (a, b, c, d, e) );
with some data in it:

SELECT * FROM bug_test;

Results:

a | b | c | d | e | f
--+--
1 | 1 | 1 | 1 | 1 | 1
1 | 1 | 1 | 1 | 2 | 2
1 | 1 | 1 | 1 | 3 | 3
1 | 1 | 1 | 1 | 5 | 5
1 | 1 | 1 | 2 | 1 | 1

how can I do a slice starting after 1:1:1:1:2 to the end?

I thought that the (very ugly way) was:

SELECT a, b, c, d, e, f FROM bug_test WHERE a = 1 AND b = 1 AND c = 1 AND d
= 1 AND e > 2;

(despite the fact that it felt completely wrong since these conditions need
to be considered together, not as 5 independent ones...otherwise one
realizes that the result will contain rows that don't match it, for example
that contain d=2 in this case)

is there some way to express that in CQL3? something logically equivalent
to

SELECT *  FROM bug_test WHERE a:b:c:d:e > 1:1:1:1:2??

Cheers,

Josep M.


Re: Is it possible to design queries to retrieve columns which match acronyms of col names ?

2012-07-23 Thread Philip O'Toole
On Tue, Jul 24, 2012 at 03:13:40AM +0530, Ertio Lew wrote:
> Hi all,
> 
> I wanted to know if it is somehow possible to design queries which could
> fetch all columns from a row whose acronyms(first letter from each word)
> would match a particular string ?

Do you know the "particular" strings ahead of time? Or only at runtime?

-- 
Philip O'Toole

Senior Developer
Loggly, Inc.
San Francisco, Calif.
www.loggly.com



Bringing a dead node back up after fixing hardware issues

2012-07-23 Thread Eran Chinthaka Withana
Hi,

In my cluster, one of the nodes went down (due to a hardware failure). We
managed to get it fixed in couple of days. But it seems its harder to bring
this same node back into cluster without creating read misses. Here is what
I did.

Method 1: I copied the data from all the nodes in that data center, into
the repaired node, and brought it back up. But because of the rate of
updates happening, the read misses started going up.

Method 2: I issued a removetoken command for that node's token and let the
cluster stream the data into relevant nodes. At the end of this process,
the dead node was not showing up in the ring output. Then I brought the
node back up. I was expecting, Cassandra to first stream data into the new
node (which happens to be the dead node which was in the cluster earlier)
and once its done then make it serve reads. But, in the server log, I can
see as soon the node comes up, it started serving reads, creating a large
number of read misses.

So the question is, what is the best way to bring back a dead node (once
its hardware issues are fixed) without impacting read misses?

Thanks,
Eran Chinthaka Withana


Re: Bringing a dead node back up after fixing hardware issues

2012-07-23 Thread Brandon Williams
On Mon, Jul 23, 2012 at 6:26 PM, Eran Chinthaka Withana
 wrote:
> Method 1: I copied the data from all the nodes in that data center, into the
> repaired node, and brought it back up. But because of the rate of updates
> happening, the read misses started going up.

That's not really a good method when you scale up and the amount of
data in the cluster won't fit on a single machine.

> Method 2: I issued a removetoken command for that node's token and let the
> cluster stream the data into relevant nodes. At the end of this process, the
> dead node was not showing up in the ring output. Then I brought the node
> back up. I was expecting, Cassandra to first stream data into the new node
> (which happens to be the dead node which was in the cluster earlier) and
> once its done then make it serve reads. But, in the server log, I can see as
> soon the node comes up, it started serving reads, creating a large number of
> read misses.

Removetoken is for dead nodes, so the node has no way of locally
knowing it shouldn't be a cluster member any longer when it starts up.
 Instead if you had decommissioned, it would have saved a flag to
indicate it should bootstrap at the next startup.

> So the question is, what is the best way to bring back a dead node (once its
> hardware issues are fixed) without impacting read misses?

Increase your consistency level.  Run a repair on the node once it's
back up, unless the repair time took longer than gc_grace, in which
case you need to removetoken it, delete all the data, and bootstrap it
back in if you don't want anything deleted to resurrect.

-Brandon


Re: Bringing a dead node back up after fixing hardware issues

2012-07-23 Thread Eran Chinthaka Withana
Thanks Brandon for the answer (and I didn't know driftx = Brandon Williams.
Thanks for your awesome support in Cassandra IRC)

Increasing CL is tricky for us for now, as our RF on that datacenter is 2
and CL is set to ONE. If we make the CL to be LOCAL_QUORUM, then, if a node
goes down we will have trouble. I will try to increase the RF to 3 in that
data center and set the CL to LOCAL_QUORUM if nothing works out.

About decommissioning, if the node goes down. There is no way of knowing
running that command on that node, right? IIUC, decommissioning should be
run on a node that needs to be decommissioned.

Coming back to the original question, without touching the CL, can we bring
back a dead node (after fixing it) and somehow tell Cassandra that the node
is backup and do not send read requests until it gets all the data?

Thanks,
Eran Chinthaka Withana


On Mon, Jul 23, 2012 at 6:48 PM, Brandon Williams  wrote:

> On Mon, Jul 23, 2012 at 6:26 PM, Eran Chinthaka Withana
>  wrote:
> > Method 1: I copied the data from all the nodes in that data center, into
> the
> > repaired node, and brought it back up. But because of the rate of updates
> > happening, the read misses started going up.
>
> That's not really a good method when you scale up and the amount of
> data in the cluster won't fit on a single machine.
>
> > Method 2: I issued a removetoken command for that node's token and let
> the
> > cluster stream the data into relevant nodes. At the end of this process,
> the
> > dead node was not showing up in the ring output. Then I brought the node
> > back up. I was expecting, Cassandra to first stream data into the new
> node
> > (which happens to be the dead node which was in the cluster earlier) and
> > once its done then make it serve reads. But, in the server log, I can
> see as
> > soon the node comes up, it started serving reads, creating a large
> number of
> > read misses.
>
> Removetoken is for dead nodes, so the node has no way of locally
> knowing it shouldn't be a cluster member any longer when it starts up.
>  Instead if you had decommissioned, it would have saved a flag to
> indicate it should bootstrap at the next startup.
>
> > So the question is, what is the best way to bring back a dead node (once
> its
> > hardware issues are fixed) without impacting read misses?
>
> Increase your consistency level.  Run a repair on the node once it's
> back up, unless the repair time took longer than gc_grace, in which
> case you need to removetoken it, delete all the data, and bootstrap it
> back in if you don't want anything deleted to resurrect.
>
> -Brandon
>


Re: Cassandra 1.0 hangs during GC

2012-07-23 Thread Wojciech Meler
Can you provide output from sar command for the time period when long
GC occurred ?

Regards,
Wojciech Meler