Secondary index query + 2 Datacenters + Row Cache + Restart = 0 rows

2013-02-01 Thread Alexei Bakanov
Hello,

I've found a combination that doesn't work:
A column family that have a secondary index and caching='ALL' with
data in two datacenters and I do a restart of the nodes, then my
secondary index queries start returning 0 rows.
It happens when amount of data goes over a certain threshold, so I
suspect that compactions are involved in this as well.
Taking out one of the ingredients fixes the problem and my queries
return rows from secondary index.
I suspect that this guy is struggling with the same thing
https://issues.apache.org/jira/browse/CASSANDRA-4785

Here is a sequence of actions that reproduces it with help of CCM:

$ ccm create --cassandra-version 1.2.1 --nodes 2 -p RandomPartitioner
testRowCacheDC
$ ccm updateconf 'endpoint_snitch: PropertyFileSnitch'
$ ccm updateconf 'row_cache_size_in_mb: 200'
$ cp ~/Downloads/cassandra-topology.properties
~/.ccm/testRowCacheDC/node1/conf/  (please find .properties file
below)
$ cp ~/Downloads/cassandra-topology.properties ~/.ccm/testRowCacheDC/node2/conf/
$ ccm start
$ ccm cli
 ->create keyspace and column family(please find schema below)
$ python populate_rowcache.py
$ ccm stop  (I tried flush first, doesn't help)
$ ccm start
$ ccm cli
Connected to: "testRowCacheDC" on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.2.1-SNAPSHOT

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] use testks;
Authenticated to keyspace: testks
[default@testks] get cf1 where 'indexedColumn'='userId_75';

0 Row Returned.
Elapsed time: 68 msec(s).

My cassandra instances run with -Xms1927M -Xmx1927M -Xmn400M
Thanks for help.

Best regards,
Alexei


-- START cassandra-topology.properties --
127.0.0.1=DC1:RAC1
127.0.0.2=DC2:RAC1
default=DC1:r1
-- FINISH cassandra-topology.properties --

-- START cassandra-cli schema ---
create keyspace testks
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {DC2 : 1, DC1 : 1}
  and durable_writes = true;

use testks;

create column family cf1
  with column_type = 'Standard'
  and comparator = 'org.apache.cassandra.db.marshal.AsciiType'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 1.0
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'ALL'
  and column_metadata = [
{column_name : 'indexedColumn',
validation_class : UTF8Type,
index_name : 'INDEX1',
index_type : 0}]
  and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};
---FINISH cassandra-cli schema ---

-- START populate_rowcache.py ---
from pycassa.batch import Mutator

import pycassa

pool = pycassa.ConnectionPool('testks', timeout=5)
cf = pycassa.ColumnFamily(pool, 'cf1')

for userId in xrange(0, 1000):
print userId
b = Mutator(pool, queue_size=200)
for itemId in xrange(20):
rowKey = 'userId_%s:itemId_%s'%(userId, itemId)
for message_number in xrange(10):
b.insert(cf, rowKey, {'indexedColumn': 'userId_%s'%userId,
str(message_number): str(message_number)})
b.send()

pool.dispose()
-- FINISH populate_rowcache.py ---


Re: Start token sorts after end token

2013-02-01 Thread Jeremy Hanna
See https://issues.apache.org/jira/browse/CASSANDRA-5168 - should be fixed in 
1.1.10 and 1.2.2.

On Jan 30, 2013, at 9:18 AM, Tejas Patil  wrote:

> While reading data from Cassandra in map-reduce, I am getting 
> "InvalidRequestException(why:Start token sorts after end token)"
> 
> Below is the code snippet that I used and the entire stack trace.
> (I am using Cassandra 1.2.0 and hadoop 0.20.2)
> Can you point out the issue here ?
> 
> Code snippet:
>SlicePredicate predicate = new SlicePredicate();
> 
> SliceRange sliceRange = new SliceRange();
> sliceRange.start = ByteBuffer.wrap(("1".getBytes()));
> sliceRange.finish = ByteBuffer.wrap(("100".getBytes()));
> sliceRange.reversed = false;
> //predicate.slice_range = sliceRange;
> 
> List colNames = new ArrayList();
> colNames.add(ByteBuffer.wrap("url".getBytes()));
> colNames.add(ByteBuffer.wrap("Parent".getBytes()));
> predicate.column_names = colNames;
> 
> ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate);
> 
> Full stack trace:
> java.lang.RuntimeException: InvalidRequestException(why:Start token sorts 
> after end token)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:184)
>   at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)
>   at 
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:
> 



Re: too many warnings of Heap is full

2013-02-01 Thread Guillermo Barbero
> What is the cardinality like on these indexes? Can you provide the
> schema creation for these two column families?

This is the schema of the CFs:

create column family CF_users
with comparator = UTF8Type
and column_metadata =
[
{column_name: userSBCode, validation_class:
UTF8Type, index_type: KEYS},
{column_name: userEmail, validation_class:
UTF8Type, index_type: KEYS},
{column_name: userName, validation_class: UTF8Type},
{column_name: userLastName, validation_class: UTF8Type},
{column_name: userOwnPhoneKey,
validation_class: UTF8Type, index_type: KEYS},
{column_name: userOwnPhone,
validation_class: UTF8Type, index_type: KEYS},
{column_name: userPasswordMD5,
validation_class: UTF8Type},
{column_name: userDOB, validation_class: UTF8Type},
{column_name: userGender, validation_class: UTF8Type},
{column_name: userProfilePicMD5,
validation_class: UTF8Type},
{column_name: userAbout, validation_class: UTF8Type},
{column_name: userLastSession,
validation_class: UTF8Type}
{column_name: userMasterKey, validation_class: UTF8Type}
];

create column family CF_SBMessages
with comparator = UTF8Type
and column_metadata =
[

{column_name: SBMessageId, validation_class:
UTF8Type, index_type: KEYS},
{column_name: fromSBCode, validation_class:
UTF8Type, index_type: KEYS},
{column_name: SBMessageDate, validation_class:
UTF8Type, index_type: KEYS},
{column_name: SBMessageType, validation_class:
UTF8Type},
{column_name: SBMessageText, validation_class:
UTF8Type},
{column_name: SBMessageAttachments,
validation_class: UTF8Type},
];


I've read about the importance of keeping the cardinality of the
secondary indexes low (great article at
http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/),
and I'm afraid that we did completely the opposite (we did consider
the secondary indexes as alternate indexes).
I guess here is some work to do to create other CFs to store these
secondary indexes.

Anyway, I still don't understand why did appear these peaks (by the
way, last night there wasn't any)


neither 'nodetool repair' nor 'hinted hanoff/read repair' work for secondary indexes

2013-02-01 Thread Alexei Bakanov
Hi again,

Once started playing with CCM it's hard to stop, such a great tool.
My issue with secondary indexes is following: neither explicit
'nodetool repair' nor implicit 'hinted handoffs/read repairs' resolve
inconsistencies in data I get from secondary indexes.
I observe this for both one- and 2-datacenter deployments, independent
of caching settings. Rebuilding/droping and creating index or
restarting nodes doesn't help.

In the following scenario I start up 2 nodes and insert some rows with
CL.ONE. During this process I deliberately stop and start the nodes in
order to trigger inconsistencies.
I then query all data by its index with read CL.ONE and stop if I see
that data is missing. I see that none of C* repair mechanisms work for
secondary indexes.

$ ccm create --cassandra-version 1.2.1 --nodes 2 -p RandomPartitioner
test2ndIndexRepair
$ ccm start
$ ccm node1 cli
-> create keyspace and column family  (please find schemas attached)
$ python populate_repair.py (in first terminal)
$ ccm node1 stop; sleep 10; ccm node1 start   (in second terminal,
while populate_repair.py runs)
$ ccm node2 stop; sleep 10; ccm node2 start   (in second terminal,
while populate_repair.py runs. Hinted Handoffs do the work but
unfortunately not on Secondary Indexes)

$ python fetcher_repair.py

254
255
256
Traceback (most recent call last):
  File "fetcher_repair.py", line 19, in 
raise Exception('missing rows for userId %s, data length is
%d'%(userId, len(data)))
Exception: missing rows for userId 256, data length is 0

$ ccm cli
[default@unknown] use testks;
Authenticated to keyspace: testks
[default@testks] get cf1 where 'indexedColumn'='userId_256';

0 Row Returned.
Elapsed time: 47 msec(s).

$ python fetcher_repair.py  (running one more time in hope that 'read
repair' kicked in after the last query, but unfortunately no)

254
255
256
Traceback (most recent call last):
  File "fetcher_repair.py", line 19, in 
raise Exception('missing rows for userId %s, data length is
%d'%(userId, len(data)))
Exception: missing rows for userId 256, data length is 0

$ ccm node1 repair
$ ccm node2 repair
$ ccm cli

[default@unknown] use testks;
Authenticated to keyspace: testks
[default@testks] get cf1 where 'indexedColumn'='userId_256';

0 Row Returned.


Both cassandra instances run with -Xms1927M -Xmx1927M -Xmn400M

Thanks for help.

Best regards,
Alexei

--START cassandra-cli schemas 
create keyspace testks
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {datacenter1 : 2}
  and durable_writes = true;

use testks;

create column family cf1
  with column_type = 'Standard'
  and comparator = 'AsciiType'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 1.0
  and dclocal_read_repair_chance = 1.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'KEYS_ONLY'
  and column_metadata = [
{column_name : 'indexedColumn',
validation_class : UTF8Type,
index_name : 'INDEX1',
index_type : 0}]
  and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};
--FINISH cassandra-cli schemas 

--START populate_repair.py --
import datetime
from pycassa.batch import Mutator

import pycassa

pool = pycassa.ConnectionPool('testks', timeout=5,
server_list=['127.0.0.1:9160', '127.0.0.2:9160'])
cf = pycassa.ColumnFamily(pool, 'cf1')

for userId in xrange(0, 2000):
print userId
b = Mutator(pool, queue_size=200)
for itemId in xrange(20):
rowKey = 'userId_%s:itemId_%s'%(userId, itemId)
for message_number in xrange(10):
b.insert(cf, rowKey, {'indexedColumn': 'userId_%s'%userId,
str(message_number): str(message_number)})
b.send()

pool.dispose()
--FINISH populate_repair.py --

--START fetcher_repair.py --
import pycassa
from pycassa.columnfamily import ColumnFamily
from pycassa.pool import ConnectionPool
from pycassa.index import *

pool = pycassa.ConnectionPool('testks', server_list=['127.0.0.1:9160',
'127.0.0.2:9160'])
cf = pycassa.ColumnFamily(pool, 'cf1')

for userId in xrange(2000):
print userId
index_expr = create_index_expression('indexedColumn', 'userId_%s'%userId)
index_clause = create_index_clause([index_expr], count=1000)
data = list(cf.get_indexed_slices(index_clause=index_clause))
if len(data) != 20:
raise Exception('missing rows for userId %s, data length is
%d'%(userId, len(data)))
pool.dispose()

--FINISH fetcher_repair.py --


Re: Inserting via thrift interface to column family created with Compound Key via cql3

2013-02-01 Thread aaron morton
Whats the full error stack on the client ? 

Are you using a pre-build thrift client or you own ? If the later try using a 
pre built client first, like Hector or pycassa. If it works there look into how 
that code works and go from there. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 5:24 AM, Oleksandr Petrov  wrote:

> BTW, thanks for chiming in!
> 
> No-no, I'm using Thrift client, not inserting via cql.
> I'm serializing via CompositeType, actually. 
> CompositeType.getInstance(UTF8Type, UTF8Type).decompose(["firstkeypart", 
> "secondkeypart"]);
> 
> Hm... From what you say I understand that it's technically possible :/ 
> So I must be wrong somewhere,
> 



Re: cluster issues

2013-02-01 Thread aaron morton
For Data Stax Enterprise specific questions try the support forums 
http://www.datastax.com/support-forums/

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 8:27 AM, S C  wrote:

> I am using DseDelegateSnitch
> 
> Thanks,
> SC
> From: aa...@thelastpickle.com
> Subject: Re: cluster issues
> Date: Tue, 29 Jan 2013 20:15:45 +1300
> To: user@cassandra.apache.org
> 
>   • We can always be proactive in keeping the time sync. But, Is there 
> any way to recover from a time drift (in a reactive manner)? Since it was a 
> lab environment, I dropped the KS (deleted data directory)
> There is a way to remove future dated columns, but it not for the faint 
> hearted. 
> 
> Basically:
> 1) Drop the gc_grace_seconds to 0
> 2) Delete the column with a timestamp way in the future, so it is guaranteed 
> to be higher than the value you want to delete. 
> 3) Flush the CF
> 4) Compact all the SSTables that contain the row. The easiest way to do that 
> is a major compaction, but we normally advise not to do that because it 
> creates one big file. You can also do a user defined compaction. 
> 
>   • Are there any other scenarios that would lead a cluster look like 
> below? Note:Actual topology of the cluster - ONE Cassandra node and TWO 
> Analytic nodes.
>   •
> What snitch are you using?
> If you have the property file snitch do all nodes have the same configuration 
> ?
> 
> There is a lot of sickness there. If possible I would scrub and start again. 
> 
> Cheers
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/01/2013, at 6:29 AM, S C  wrote:
> 
> One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I 
> figured this pretty quickly, I had few questions that am looking for some 
> answers.
> 
>   • We can always be proactive in keeping the time sync. But, Is there 
> any way to recover from a time drift (in a reactive manner)? Since it was a 
> lab environment, I dropped the KS (deleted data directory).
>   • Are there any other scenarios that would lead a cluster look like 
> below?Note:Actual topology of the cluster - ONE Cassandra node and TWO 
> Analytic nodes.
> 
> 
> On 192.168.2.100
> Address DC  RackStatus State   LoadOwns   
>  Token   
>   
>  113427455640312821154458202477256070485 
> 192.168.2.100  Cassandra   rack1   Up Normal  601.34 MB   33.33%  
> 0   
> 192.168.2.101  Analytics   rack1   Down   Normal  149.75 MB   33.33%  
> 56713727820156410577229101238628035242  
> 192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%  
> 113427455640312821154458202477256070485   
> 
> On 192.168.2.101
> Address DC  RackStatus State   LoadOwns   
>  Token   
>   
>  113427455640312821154458202477256070485 
> 192.168.2.100  Analytics   rack1   Down   Normal  ?   33.33%  
> 0  
> 192.168.2.101  Analytics   rack1   Up Normal  158.59 MB   33.33%  
> 56713727820156410577229101238628035242  
> 192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%  
> 113427455640312821154458202477256070485
> 
> On 192.168.2.102
> Address DC  RackStatus State   LoadOwns   
>  Token   
>   
>  113427455640312821154458202477256070485 
> 192.168.2.100  Analytics   rack1   Down   Normal  ?   33.33%  
> 0  
> 192.168.2.101  Analytics   rack1   Down   Normal  ?   33.33%  
> 56713727820156410577229101238628035242  
> 192.168.2.102  Analytics   rack1   Up Normal  117.02 MB   33.33%  
> 113427455640312821154458202477256070485 
> 
> 
> Appreciate your valuable inputs.
> 
> Thanks,
> SC



Re: CASSANDRA-5152

2013-02-01 Thread aaron morton
Can you update the ticket with your experiences ? 
https://issues.apache.org/jira/browse/CASSANDRA-5152

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 11:13 AM, yen-fen_...@mcafee.com wrote:

> I had the same problem with 1.2.0.  The problem went away after readline was 
> easy-installed.
>  
> Regards,
> Yen-Fen Hsu



Re: why set replica placement strategy at keyspace level ?

2013-02-01 Thread aaron morton
Many of my mental models bother people :)

This particular one came from my understanding of Big Table and the code. 

For me this works, I think of (internal) rows as roughly "containing" the CF's. 

In the CQL world it works for me as well, the partition key (first part of the 
primary key) is important and identifies the storage "container" that has the 
columns. 

Your milage may vary
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 4:43 PM, Edward Capriolo  wrote:

> That should not bother you.
> 
> For example, if your doing an hbase scan that crosses two column families,
> that count end up being two (disk) seeks.
> 
> Having an API that hides the seeks from you does not give you better
> performance, it only helps you when your debating with people that do not
> understand the fundamentals.



Re: Cassandra pending compaction tasks keeps increasing

2013-02-01 Thread aaron morton
> Will that cause  the symptom of no data streamed from other nodes? Other 
> nodes still think the node had all the data?
AFAIk they will not make assumptions like that. 

>  Can I just change it in yaml and restart C* and it will correct itself?
It's a schema config change, check the help for the CLI or the CQL docs. 

> Any side effect? Since we are using SSD, a bit bigger SSD won't slow down the 
> read too much, I suppose that is the main concern for bigger size of SSTable?
Do some experiments to see how it works, and let others know :) 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 5:30 PM, Wei Zhu  wrote:

> Some updates:
> Since we still have not fully turned on the system. We did something crazy 
> today. We tried to treat the node as dead one. (My boss wants us to practice 
> replacing a dead node before going to full production) and boot strap it. 
> Here is what we did:
> 
>   • drain the node
>   • check nodetool on other nodes, and this node is marked down (the 
> token for this node is 100)
>   • clear the data, commit log, saved cache
>   • change initial_token from 100 to 99 in the yaml file
>   • start the node
>   • check nodetool, the down node of 100 disappeared by itself (!!) and 
> new node with token 99 showed up
>   • checked log, see the message saying bootstrap completed. But only a 
> couple of MB streamed. 
>   • nodetool movetoken 98
>   • nodetool, see the node with token 98 comes up. 
>   • check log, see the message saying bootstrap completed. But still only 
> a couple of MB streamed.
> The only reason I can think of is that the new node has the same IP as the 
> "dead" node we tried to replace? Will that cause  the symptom of no data 
> streamed from other nodes? Other nodes still think the node had all the data?
> 
> We had to do nodetool repair -pr to bring in the data. After 3 hours, 150G  
> transferred. And no surprise, pending compaction tasks are now at 30K. There 
> are about 30K SStable transferred and I guess all of them needs to be 
> compacted since we use LCS.
> 
> My concern is that if we did nothing wrong, replacing a dead node will cause 
> such a hugh back log of pending compaction. It might take a week to clear 
> that off. And we have RF = 3, we still need to bring in the data for the 
> other two replicates since we use "pr" for nodetool repair. It will take 
> about 3 weeks to fully replace a 200G node using LCS? We tried everything we 
> can to speed up the compaction and no luck. The only thing I can think of is 
> to increase the default size of SSTable, so less number of compaction will be 
> needed. Can I just change it in yaml and restart C* and it will correct 
> itself? Any side effect? Since we are using SSD, a bit bigger SSD won't slow 
> down the read too much, I suppose that is the main concern for bigger size of 
> SSTable?
>  
> I think 1.2 comes with parallel LC which should help the situation. But we 
> are not going to upgrade for a little while.
> 
> Did I miss anything? It might not be practical to use LCS for 200G node? But 
> if  we use Sized compaction, we need to have at least 400G for the 
> HD...Although SSD is cheap now, still hard to convince the management. three 
> replicates + double the Disk for compaction? that is 6 times of the real data 
> size!
> 
> Sorry for the long email. Any suggestion or advice?
> 
> Thanks.
> -Wei 
> 
> From: "aaron morton" 
> To: "Cassandra User" 
> Sent: Tuesday, January 29, 2013 12:59:42 PM
> Subject: Re: Cassandra pending compaction tasks keeps increasing
> 
> * Will try it tomorrow. Do I need to restart server to change the log level?
> You can set it via JMX, and supposedly log4j is configured to watch the 
> config file. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/01/2013, at 9:36 PM, Wei Zhu  wrote:
> 
> Thanks for the reply. Here is some information:
> 
> Do you have wide rows ? Are you seeing logging about "Compacting wide rows" ? 
> 
> * I don't see any log about "wide rows"
> 
> Are you seeing GC activity logged or seeing CPU steal on a VM ? 
> 
> * There is some GC, but CPU general is under 20%. We have heap size of 8G, 
> RAM is at 72G.
> 
> Have you tried disabling multithreaded_compaction ? 
> 
> * By default, it's disabled. We enabled it, but doesn't see much difference. 
> Even a little slower with it's enabled. Is it bad to enable it? We have SSD, 
> according to comment in yaml, it should help while using SSD.
> 
> Are you using Key Caches ? Have you tried disabling 
> compaction_preheat_key_cache? 
> 
> * We have fairly big Key caches, we set as 10% of Heap which is 800M. Yes, 
> compaction_preheat_key_cache is disabled. 
> 
> Can you enabled DEBUG level logging and make them available ? 
> 
> * Will try it tomorrow. Do I need

Re: CPU hotspot at BloomFilterSerializer#deserialize

2013-02-01 Thread aaron morton
> 5. the problematic Data file contains only 5 to 10 keys data but large(2.4G)
So very large rows ? 
What does nodetool cfstats or cfhistograms say about the row sizes ? 


> 1. what is happening?

I think this is partially large rows and partially the query pattern, this is 
only by roughly correct 
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my talk here 
http://www.datastax.com/events/cassandrasummit2012/presentations

> 3. any more info required to proceed?

Do some tests with different query techniques…

Get a single named column. 
Get the first 10 columns using the natural column order.
Get the last 10 columns using the reversed order. 

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 7:20 PM, Takenori Sato  wrote:

> Hi all,
> 
> We have a situation that CPU loads on some of our nodes in a cluster has 
> spiked occasionally since the last November, which is triggered by requests 
> for rows that reside on two specific sstables.
> 
> We confirmed the followings(when spiked):
> 
> version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8
> jdk: Oracle 1.6.0
> 
> 1. a profiling showed that BloomFilterSerializer#deserialize was the 
> hotspot(70% of the total load by running threads)
> 
> * the stack trace looked like this(simplified)
> 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb
> 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow
> ...
> 90.4% - org.apache.cassandra.db.CollationController.collectTimeOrderedData
> ...
> 89.5% - org.apache.cassandra.db.columniterator.SSTableNamesIterator.read
> ...
> 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter
> 68.9% - org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize
> 66.7% - java.io.DataInputStream.readLong
> 
> 2. Usually, 1 should be so fast that a profiling by sampling can not detect
> 
> 3. no pressure on Cassandra's VM heap nor on machine in overal
> 
> 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by "iostat 1 
> 1000")
> 
> 5. the problematic Data file contains only 5 to 10 keys data but large(2.4G)
> 
> 6. the problematic Filter file size is only 256B(could be normal)
> 
> 
> So now, I am trying to read the Filter file in the same way 
> BloomFilterSerializer#deserialize does as possible as I can, in order to see 
> if the file is something wrong.
> 
> Could you give me some advise on:
> 
> 1. what is happening?
> 2. the best way to simulate the BloomFilterSerializer#deserialize
> 3. any more info required to proceed?
> 
> Thanks,
> Takenori



Re: initial_token

2013-02-01 Thread Víctor Hugo Oliveira Molinar
Do not set initial_token when using murmur3partitioner.
instead, set num_tokens.

For example, u have 3 hosts with the same hardware setup, then, for each
one set the same num_tokens.
But now consider adding another better host, this time i'd suggest you to
set previous num_tokens * 2.

num_tokens: 128 (worse machines)
num_tokens: 256(twice better machine)

This is the setup of virtual nodes.
Check current datastax docs for it.


On Thu, Jan 31, 2013 at 8:43 PM, Edward Capriolo wrote:

> This is the bad side of changing default. There are going to be a few
> groups unfortunates.
>
> The first group, who only can not setup their cluster, and eventually
> figure out their tokens. (this thread)
> The second group, who assume their tokens were correct and run around
> with an unbalanced cluster thinking the performance sucks. (the
> threads for the next few months)
> The third group, who will google "how to balance my ring" and find a
> page with random partitioner instructions. (the occasional thread for
> the next N years)
> The fourth group, because as of now map reduce is highly confused by this.
>
> On Thu, Jan 31, 2013 at 4:52 PM, Rob Coli  wrote:
> > On Thu, Jan 31, 2013 at 12:17 PM, Edward Capriolo 
> wrote:
> >> Now by default a new partitioner is chosen Murmer3.
> >
> > "Now" = as of 1.2, to be unambiguous.
> >
> > =Rob
> >
> > --
> > =Robert Coli
> > AIM>ALK - rc...@palominodb.com
> > YAHOO - rcoli.palominob
> > SKYPE - rcoli_palominodb
>


Re: Understanding Virtual Nodes on Cassandra 1.2

2013-02-01 Thread aaron morton
> Are there tickets/documents explain how data be replicated on Virtual Nodes?
This  http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
Check the changes.txt file, they link to tickets. 

not many people use BOP so you may be exploring new'ish territory. Try asking 
someone on the IRC channel. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 11:47 PM, Manu Zhang  wrote:

> On Thu 31 Jan 2013 03:43:32 AM CST, Zhong Li wrote:
>> Are there tickets/documents explain how data be replicated on Virtual
>> Nodes? If there are multiple tokens on one physical host, may a chance
>> two or more tokens chosen by replication strategy located on same
>> host? If move/remove/add a token
>> manually, does Cassandra Engine validate the case?
>> 
>> Thanks.
>> 
>> 
>> On Jan 30, 2013, at 12:46 PM, Zhong Li wrote:
>> 
 You add a physical node and that in turn adds num_token tokens to
 the ring.
>>> 
>>> No, I am talking about Virtual Nodes with order preserving
>>> partitioner. For an existing host with multiple tokens setting list
>>> on cassandra.inital_token. After initial bootstrapping, the host will
>>> not aware changes of cassandra.inital_token. If I want add a new
>>> token( virtual node), I have to rebuild the host with new token list.
>>> 
>>> My question is if there is way to add a virtual nodes without rebuild it?
>>> 
>>> Thanks,
>>> 
>>> On Jan 30, 2013, at 10:21 AM, Manu Zhang wrote:
>>> 
 On Wed 30 Jan 2013 02:29:27 AM CST, Zhong Li wrote:
> One more question, can I add a virtual node manually without reboot
> and rebuild a host data?
> 
> I checked nodetool command, there is no option to add a node.
> 
> Thanks.
> 
> Zhong
> 
> 
> On Jan 29, 2013, at 11:09 AM, Zhong Li wrote:
> 
>> I was misunderstood this
>> http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 ,
>> especially
>> "If you want to get started with vnodes on a fresh cluster, however,
>> that is fairly straightforward. Just don’t set the
>> |initial_token| parameter in your|conf/cassandra.yaml| and instead
>> enable the |num_tokens| parameter. A good default value for this
>> is 256"
>> 
>> Also I couldn't find document about set multiple tokens
>> for cassandra.inital_token
>> 
>> Anyway, I just tested, it does work to set  comma separated list of
>> tokens.
>> 
>> Thanks,
>> 
>> Zhong
>> 
>> 
>> On Jan 29, 2013, at 3:06 AM, aaron morton wrote:
>> 
 After I searched some document on Datastax website and some old
 ticket, seems that it works for random partitioner only, and leaves
 order preserved partitioner out of the luck.
>>> Links ?
>>> 
 or allow add Virtual Nodes manually?
>>> If not looked into it but there is a cassandra.inital_token startup
>>> param that takes a comma separated list of tokens for the node.
>>> 
>>> There also appears to be support for the ordered partitions to
>>> generate random tokens.
>>> 
>>> But you would still have the problem of having to balance your row
>>> keys around the token space.
>>> 
>>> Cheers
>>> -
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com 
>>> 
>>> 
>>> On 29/01/2013, at 10:31 AM, Zhong Li >> 
>>> > wrote:
>>> 
 Hi All,
 
 Virtual Nodes is great feature. After I searched some document on
 Datastax website and some old ticket, seems that it works for
 random partitioner only, and leaves order preserved partitioner out
 of the luck. I may misunderstand, please correct me. if it doesn't
 love order preserved partitioner, would be possible to add support
 multiple initial_token(s) for  order preserved partitioner  or
 allow add Virtual Nodes manually?
 
 Thanks,
 
 Zhong
>>> 
>> 
> 
 
 You add a physical node and that in turn adds num_token tokens to
 the ring.
>>> 
>> 
> 
> no, those tokens will be skipped



Re: JDBC : CreateresultSet fails with null column in CqlResultSet

2013-02-01 Thread aaron morton
I think http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/issues/list 
is the place to raise the issue. 

Can you update the mail thread with the ticket as well?

Thanks

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/02/2013, at 3:25 AM, Andy Cobley  wrote:

> As you may be aware I've been trying to track down a problem using JDBC 1.1.2 
> with Cassandra 1.2.0  I was getting a null pointer exception in the result 
> set.  I've done some digging into the JDBC driver  and found the following.
> 
> In CassandraResultSet.java the new result set is Instantiated in 
> 
> CassandraResultSet(Statement statement, CqlResult resultSet, String keyspace)
> 
> I decided to trace the result set with the following code:
> 
> rowsIterator = resultSet.getRowsIterator();
>System.out.println("---");
>while(rowsIterator.hasNext()){
>   CqlRow row = rowsIterator.next();
>   curRowKey = row.getKey();
>   System.out.println("Row Key "+curRowKey);
>   List cols = row.getColumns();
>   Iterator iterator;
>   iterator = cols.iterator(); 
>  while (iterator.hasNext()){
>  Column col=(Column)iterator.next();
>  String Name= new String(col.getName());
>   String Value = new String(col.getValue());
>  System.out.println("Col "+Name+ " : "+Value);
>   }
>}
> 
> This produced the following output:
> 
> ---
> Row Key [B@617e53c9
> Col key : jsmith
> Col : 
> Col password : ch@ngem3a
> Row Key [B@2caee320
> Col key : jbrown
> Col : 
> Col gender : male
> ---
> 
> As you can see there is a black column at position 2 in each of the rows.  As 
> this resultset has come from the Cassandra thrift client ( I believe) the 
> problem amy lay there.  There is no blank column defined by my SQL create 
> statements I believe. 
> 
> If I'm correct here, should I raise a ticket with JDBC or Cassandra ? (for 
> now I've patched my local JDBC driver so it doesn't create a TypedColumn if 
> the result set produces a null column)
> 
> Andy
> 
> 
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
> 
> 



Re: JDBC : CreateresultSet fails with null column in CqlResultSet

2013-02-01 Thread Andy Cobley
Aaron,

Ticket is at 

http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/issues/detail?id=61

Andy

On 1 Feb 2013, at 18:01, aaron morton  wrote:

> I think 
> http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/issues/list is 
> the place to raise the issue. 
> 
> Can you update the mail thread with the ticket as well?
> 
> Thanks
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 


The University of Dundee is a Scottish Registered Charity, No. SC015096.




conditional update or insert

2013-02-01 Thread Jay Svc
Hi All,

On each row I have a column which maintains the timestamp like
"lastUpdated" etc.

While inserting such row I want to make sure that the row should be only
updated if the lastUpdated is older than the new one I am inserting.

One way to do this is -

Read the record first check the timestamp if newer is latest then update.

Since I have higher volume of read and writes load. This additional read
will add to it.

Any alternative to achieve this?

Thanks,
Jay


Re: initial_token

2013-02-01 Thread Edward Capriolo
You do not just want to vnodes without being sure. Some queries are
not optimized for vnodes and issue 128 slices to solve some
secondaryIndexQueries.


On Fri, Feb 1, 2013 at 12:55 PM, Víctor Hugo Oliveira Molinar
 wrote:
> Do not set initial_token when using murmur3partitioner.
> instead, set num_tokens.
>
> For example, u have 3 hosts with the same hardware setup, then, for each one
> set the same num_tokens.
> But now consider adding another better host, this time i'd suggest you to
> set previous num_tokens * 2.
>
> num_tokens: 128 (worse machines)
> num_tokens: 256(twice better machine)
>
> This is the setup of virtual nodes.
> Check current datastax docs for it.
>
>
> On Thu, Jan 31, 2013 at 8:43 PM, Edward Capriolo 
> wrote:
>>
>> This is the bad side of changing default. There are going to be a few
>> groups unfortunates.
>>
>> The first group, who only can not setup their cluster, and eventually
>> figure out their tokens. (this thread)
>> The second group, who assume their tokens were correct and run around
>> with an unbalanced cluster thinking the performance sucks. (the
>> threads for the next few months)
>> The third group, who will google "how to balance my ring" and find a
>> page with random partitioner instructions. (the occasional thread for
>> the next N years)
>> The fourth group, because as of now map reduce is highly confused by this.
>>
>> On Thu, Jan 31, 2013 at 4:52 PM, Rob Coli  wrote:
>> > On Thu, Jan 31, 2013 at 12:17 PM, Edward Capriolo
>> >  wrote:
>> >> Now by default a new partitioner is chosen Murmer3.
>> >
>> > "Now" = as of 1.2, to be unambiguous.
>> >
>> > =Rob
>> >
>> > --
>> > =Robert Coli
>> > AIM>ALK - rc...@palominodb.com
>> > YAHOO - rcoli.palominob
>> > SKYPE - rcoli_palominodb
>
>


Re: Cassandra pending compaction tasks keeps increasing

2013-02-01 Thread Derek Williams
Did the node list itself as a seed node in cassandra.yaml? Unless something
has changed, a node that considers itself a seed will not auto bootstrap.
Although I haven't tried it, I think running 'nodetool rebuild' will cause
it to stream in the data it needs without doing a repair.


On Wed, Jan 30, 2013 at 9:30 PM, Wei Zhu  wrote:

> Some updates:
> Since we still have not fully turned on the system. We did something crazy
> today. We tried to treat the node as dead one. (My boss wants us to
> practice replacing a dead node before going to full production) and boot
> strap it. Here is what we did:
>
>
>- drain the node
>- check nodetool on other nodes, and this node is marked down (the
>token for this node is 100)
>- clear the data, commit log, saved cache
>- change initial_token from 100 to 99 in the yaml file
>- start the node
>- check nodetool, the down node of 100 disappeared by itself (!!) and
>new node with token 99 showed up
>- checked log, see the message saying bootstrap completed. But only a
>couple of MB streamed.
>- nodetool movetoken 98
>- nodetool, see the node with token 98 comes up.
>- check log, see the message saying bootstrap completed. But still
>only a couple of MB streamed.
>
> The only reason I can think of is that the new node has the same IP as the
> "dead" node we tried to replace? Will that cause  the symptom of no data
> streamed from other nodes? Other nodes still think the node had all the
> data?
>
> We had to do nodetool repair -pr to bring in the data. After 3 hours,
> 150G  transferred. And no surprise, pending compaction tasks are now at
> 30K. There are about 30K SStable transferred and I guess all of them needs
> to be compacted since we use LCS.
>
> My concern is that if we did nothing wrong, replacing a dead node will
> cause such a hugh back log of pending compaction. It might take a week to
> clear that off. And we have RF = 3, we still need to bring in the data for
> the other two replicates since we use "pr" for nodetool repair. It will
> take about 3 weeks to fully replace a 200G node using LCS? We tried
> everything we can to speed up the compaction and no luck. The only thing I
> can think of is to increase the default size of SSTable, so less number of
> compaction will be needed. Can I just change it in yaml and restart C* and
> it will correct itself? Any side effect? Since we are using SSD, a bit
> bigger SSD won't slow down the read too much, I suppose that is the main
> concern for bigger size of SSTable?
>
> I think 1.2 comes with parallel LC which should help the situation. But we
> are not going to upgrade for a little while.
>
> Did I miss anything? It might not be practical to use LCS for 200G node?
> But if  we use Sized compaction, we need to have at least 400G for the
> HD...Although SSD is cheap now, still hard to convince the management.
> three replicates + double the Disk for compaction? that is 6 times of the
> real data size!
>
> Sorry for the long email. Any suggestion or advice?
>
> Thanks.
> -Wei
>
> --
> *From: *"aaron morton" 
> *To: *"Cassandra User" 
> *Sent: *Tuesday, January 29, 2013 12:59:42 PM
>
> *Subject: *Re: Cassandra pending compaction tasks keeps increasing
>
> * Will try it tomorrow. Do I need to restart server to change the log
> level?
>
> You can set it via JMX, and supposedly log4j is configured to watch the
> config file.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/01/2013, at 9:36 PM, Wei Zhu  wrote:
>
> Thanks for the reply. Here is some information:
>
> Do you have wide rows ? Are you seeing logging about "Compacting wide
> rows" ?
>
> * I don't see any log about "wide rows"
>
> Are you seeing GC activity logged or seeing CPU steal on a VM ?
>
> * There is some GC, but CPU general is under 20%. We have heap size of 8G,
> RAM is at 72G.
>
> Have you tried disabling multithreaded_compaction ?
>
> * By default, it's disabled. We enabled it, but doesn't see much
> difference. Even a little slower with it's enabled. Is it bad to enable it?
> We have SSD, according to comment in yaml, it should help while using SSD.
>
> Are you using Key Caches ? Have you tried disabling
> compaction_preheat_key_cache?
>
> * We have fairly big Key caches, we set as 10% of Heap which is 800M. Yes,
> compaction_preheat_key_cache is disabled.
>
> Can you enabled DEBUG level logging and make them available ?
>
> * Will try it tomorrow. Do I need to restart server to change the log
> level?
>
>
> -Wei
>
> --
>
> From: "aaron morton" 
> To: user@cassandra.apache.org
> Sent: Monday, January 28, 2013 11:31:42 PM
> Subject: Re: Cassandra pending compaction tasks keeps increasing
>
>
>
>
>
>
>
> * Why nodetool repair increases the data size that much? It's not likely
> that much data needs to be repaired. Will that happen for all t

Not enough replicas???

2013-02-01 Thread Stephen.M.Thompson
I need to offer my profound thanks to this community which has been so helpful 
in trying to figure this system out.

I've setup a simple ring with two nodes and I'm trying to insert data to them.  
I get failures 100% with this error:

me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough 
replicas present to handle consistency level.

I'm not doing anything fancy - this is just from setting up the cluster 
following the basic instructions from datastax for a simple one data center 
cluster.  My config is basically the default except for the changes they 
discuss (except that I have configured for my IP addresses... my two boxes are 
.126 and .127)

cluster_name: 'MyDemoCluster'
num_tokens: 256
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
 - seeds: "10.28.205.126"
listen_address: 10.28.205.126
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

Nodetool shows both nodes active in the ring, status = up, state = normal.

For the CF:

   ColumnFamily: SystemEvent
 Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
 Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
 GC grace seconds: 864000
 Compaction min/max thresholds: 4/32
 Read repair chance: 0.1
 DC Local Read repair chance: 0.0
 Replicate on write: true
 Caching: KEYS_ONLY
 Bloom Filter FP chance: default
 Built indexes: [SystemEvent.IdxName]
 Column Metadata:
   Column Name: eventTimeStamp
 Validation Class: org.apache.cassandra.db.marshal.DateType
   Column Name: name
 Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Index Name: IdxName
 Index Type: KEYS
 Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
 Compression Options:
   sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor

Any ideas?


Re: Not enough replicas???

2013-02-01 Thread Edward Capriolo
Please include the information on how your keyspace was created. This
may indicate you set the replication factor to 3, when you only have 1
node, or some similar condition.

On Fri, Feb 1, 2013 at 4:57 PM,   wrote:
> I need to offer my profound thanks to this community which has been so
> helpful in trying to figure this system out.
>
>
>
> I’ve setup a simple ring with two nodes and I’m trying to insert data to
> them.  I get failures 100% with this error:
>
>
>
> me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be
> enough replicas present to handle consistency level.
>
>
>
> I’m not doing anything fancy – this is just from setting up the cluster
> following the basic instructions from datastax for a simple one data center
> cluster.  My config is basically the default except for the changes they
> discuss (except that I have configured for my IP addresses… my two boxes are
> .126 and .127)
>
>
>
> cluster_name: 'MyDemoCluster'
>
> num_tokens: 256
>
> seed_provider:
>
>   - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>
> parameters:
>
>  - seeds: "10.28.205.126"
>
> listen_address: 10.28.205.126
>
> rpc_address: 0.0.0.0
>
> endpoint_snitch: RackInferringSnitch
>
>
>
> Nodetool shows both nodes active in the ring, status = up, state = normal.
>
>
>
> For the CF:
>
>
>
>ColumnFamily: SystemEvent
>
>  Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>
>  Default column value validator:
> org.apache.cassandra.db.marshal.UTF8Type
>
>  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>
>  GC grace seconds: 864000
>
>  Compaction min/max thresholds: 4/32
>
>  Read repair chance: 0.1
>
>  DC Local Read repair chance: 0.0
>
>  Replicate on write: true
>
>  Caching: KEYS_ONLY
>
>  Bloom Filter FP chance: default
>
>  Built indexes: [SystemEvent.IdxName]
>
>  Column Metadata:
>
>Column Name: eventTimeStamp
>
>  Validation Class: org.apache.cassandra.db.marshal.DateType
>
>Column Name: name
>
>  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>
>  Index Name: IdxName
>
>  Index Type: KEYS
>
>  Compaction Strategy:
> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
>
>  Compression Options:
>
>sstable_compression:
> org.apache.cassandra.io.compress.SnappyCompressor
>
>
>
> Any ideas?


Re: Cassandra pending compaction tasks keeps increasing

2013-02-01 Thread Wei Zhu
That is must be it.
Yes. it happens to be the seed. I should have tried "rebuild". Instead I did 
repair and now I am sitting here waiting for the compaction to finish...

Thanks.
-Wei



 From: Derek Williams 
To: user@cassandra.apache.org; Wei Zhu  
Sent: Friday, February 1, 2013 1:56 PM
Subject: Re: Cassandra pending compaction tasks keeps increasing
 

Did the node list itself as a seed node in cassandra.yaml? Unless something has 
changed, a node that considers itself a seed will not auto bootstrap. Although 
I haven't tried it, I think running 'nodetool rebuild' will cause it to stream 
in the data it needs without doing a repair.



On Wed, Jan 30, 2013 at 9:30 PM, Wei Zhu  wrote:

Some updates:
>Since we still have not fully turned on the system. We did something crazy 
>today. We tried to treat the node as dead one. (My boss wants us to practice 
>replacing a dead node before going to full production) and boot strap it. Here 
>is what we did:
>
>
>   * drain the node
>   * check nodetool on other nodes, and this node is marked down (the 
> token for this node is 100)
>
>   * clear the data, commit log, saved cache
>   * change initial_token from 100 to 99 in the yaml file
>   * start the node
>   * check nodetool, the down node of 100 disappeared by itself (!!) and 
> new node with token 99 showed up
>   * checked log, see the message saying bootstrap completed. But only a 
> couple of MB streamed. 
>
>   * nodetool movetoken 98
>   * nodetool, see the node with token 98 comes up. 
>
>   * check log, see the message saying bootstrap completed. But still only 
> a couple of MB streamed. The only reason I can think of is that the new node 
> has the same IP as the "dead" node we tried to replace? Will that cause  the 
> symptom of no data streamed from other nodes? Other nodes still think the 
> node had all the data?
>
>We had to do nodetool repair -pr to bring in the data. After 3 hours, 150G  
>transferred. And no surprise, pending compaction tasks are now at 30K. There 
>are about 30K SStable transferred and I guess all of them needs to be 
>compacted since we use LCS.
>
>My concern is that if we did nothing wrong, replacing a dead node will cause 
>such a hugh back log of pending compaction. It might take a week to clear that 
>off. And we have RF = 3, we still need to bring in the data for the other two 
>replicates since we use "pr" for nodetool repair. It will take about 3 weeks 
>to fully replace a
 200G node using LCS? We tried everything we can to speed up the compaction and 
no luck. The only thing I can think of is to increase the default size of 
SSTable, so less number of compaction will be needed. Can I just change it in 
yaml and restart C* and it will correct itself? Any side effect? Since we are 
using SSD, a bit bigger SSD won't slow down the read too much, I suppose that 
is the main concern for bigger size of SSTable?
> 
>I think 1.2 comes with parallel LC which should help the situation. But we are 
>not going to upgrade for a little while.
>
>Did I miss anything? It might not be practical to use LCS for 200G node? But 
>if  we use Sized compaction, we need to have at least 400G for the 
>HD...Although SSD is cheap now, still hard to convince the management. three 
>replicates + double the Disk for compaction? that is 6 times of the real data 
>size!
>
>Sorry for the long email. Any suggestion or advice?
>
>Thanks.
>-Wei 
>
>>
>
>From: "aaron morton" 
>To: "Cassandra User" 
>Sent: Tuesday, January 29, 2013 12:59:42 PM
>
>Subject: Re: Cassandra pending compaction tasks keeps increasing
>
>
>* Will try it tomorrow. Do I need to restart server to change the log level?
>>You can set it via JMX, and supposedly log4j is configured to watch the 
>>config file. 
>
>
>Cheers
>
>
>-
>Aaron Morton
>Freelance Cassandra Developer
>New Zealand
>
>
>@aaronmorton
>http://www.thelastpickle.com
>
>On 29/01/2013, at 9:36 PM, Wei Zhu  wrote:
>
>Thanks for the reply. Here is some information:
>>
>>Do you have wide rows ? Are you seeing logging about "Compacting wide rows" ? 
>>
>>* I don't see any log about "wide rows"
>>
>>Are you seeing GC activity logged or seeing CPU steal on a VM ? 
>>
>>* There is some GC, but CPU general is under 20%. We have heap size of 8G, 
>>RAM is at 72G.
>>
>>Have you tried disabling multithreaded_compaction ? 
>>
>>* By default, it's disabled. We enabled it, but doesn't see much difference. 
>>Even a little slower with it's enabled. Is it bad to enable it? We have SSD, 
>>according to comment in yaml, it should help while using SSD.
>>
>>Are you using Key Caches ? Have you tried disabling 
>>compaction_preheat_key_cache? 
>>
>>* We have fairly big Key caches, we set as 10%
 of Heap which is 800M. Yes, compaction_preheat_key_cache is disabled. 
>>
>>Can you enabled DEBUG level logging and make them available ? 
>>
>>* Will try it tomorrow. Do I need 

Re: Cassandra behavior on single node

2013-02-01 Thread Edward Capriolo
You are likely hitting the point where compaction is running all the time
and consuming all the weak cloud io. Ebs is not suggested for performance
you should use the ephermal drives.

On Friday, February 1, 2013, Marcelo Elias Del Valle wrote:

> Hello,
>
>  I am trying to figure out why the following behavior happened. Any
> help would be highly appreciated.
>  This graph shows the server resources allocation of my single
> cassandra machine (running at Amazon EC2):
> http://mvalle.com/downloads/cassandra_host1.png
>  I ran a hadoop process that reads a CSV file and writtes data to
> Cassandra. For about 1 h, the process ran fine, but taking about 100% of
> CPU. After 1 h, my hadoop process started to have its connection attempts
> refused by cassandra, as shown bellow.
>  Since them, it has been taking 100% of the machine IO. It has been 2
> h already since the IO is 100% on the machine running Cassandra.
>  I am running Cassandra under Amazon EBS, which is slow, but I didn't
> think it would be that slow. Just wondering, is it normal for Cassandra to
> use a high amount of CPU? I am guessing all the writes were going to the
> memtables and when it was time to flush the server went down.
>  Makes sense? I am still learning Cassandra as it's the first time I
> use it in production, so I am not sure if I am missing something really
> basic here.
>
> 2013-02-01 16:44:43,741 ERROR com.s1mbi0se.dmp.input.service.InputService 
> (Thread-18): EXCEPTION:PoolTimeoutException: [host=(10.84.65.108):9160, 
> latency=5005(5005), attempts=1] Timed out waiting for connection
> com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: 
> PoolTimeoutException: [host=nosql1.s1mbi0se.com.br(10.84.65.108):9160, 
> latency=5005(5005), attempts=1] Timed out waiting for connection
>   at 
> com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:201)
>   at 
> com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:158)
>   at 
> com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:60)
>   at 
> com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:50)
>   at 
> com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:229)
>   at 
> com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$1.execute(ThriftColumnFamilyQueryImpl.java:186)
>   at 
> com.s1mbi0se.dmp.input.service.InputService.searchUserByKey(InputService.java:700)
>
> ...
>   at 
> com.s1mbi0se.dmp.importer.map.ImporterMapper.map(ImporterMapper.java:20)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>   at 
> org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268)
> 2013-02-01 16:44:43,743 ERROR com.s1mbi0se.dmp.input.service.InputService 
> (Thread-15): EXCEPTION:PoolTimeoutException:
>
>
> Best regards,
>
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr
>


Re: CQL binary protocol

2013-02-01 Thread aaron morton
The spec for the protocol is here 
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol.spec;hb=refs/heads/cassandra-1.2

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/02/2013, at 6:42 AM, Gabriel Ciuloaica  wrote:

> Hi,
> 
> You may take a look to java-driver project. It has an implementation for 
> connection pool.
> 
> Cheers,
> Gabi
> 
> On 1/31/13 6:48 PM, Vivek Mishra wrote:
>> Hi,
>> Any connection pool API available for cassandra transport 
>> Client(org.apache.cassandra.transport.Client)? 
>> 
>> -Vivek
> 



Re: rangeQuery to traverse keys backward?

2013-02-01 Thread aaron morton
There is no facility to do a get_range in reverse. 

Rows are ordered by their token, and using the Random or Murmur3 partitioner 
this means they are randomly ordered. So there is not much need to go 
backwards, or get 10 rows from either side of a particular row. 

Can you change your data model to not require precise range scans ? 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/02/2013, at 1:36 PM, Yuhan Zhang  wrote:

> Hi all,
> 
> I'm tryinng to use get_range to traverse the rows by page by providing a 
> :start_key and an :finish_key.
> 
> This works fine when I traverse forward with :start_key=>last_key, 
> :finish_key=>""
> However, when I tried to traversed backward with :start_key="", 
> :finish_key=>first_key, this always gave me the first few rows in the column 
> family.
> (my goal is to get  the rows adjacent to my "first_key")
> 
> looks like it always takes priority of :start_key over the :finish_key.
> 
> as for column range,  there is an option to reverse the order. but there is 
> an option for  traversing rows.
> so I'm wondering whether cassandra is capable of doing this task with the 
> current api
> 
> I tried both twitter cassandra client and hector client, but couldn't find a 
> way to perform it.
> have someone been able to do this?
> 
> 
> Thank you
> 
> Yuhan 
> The information contained in this e-mail is for the exclusive use of the 
> intended recipient(s) and may be confidential, proprietary, and/or legally 
> privileged. Inadvertent disclosure of this message does not constitute a 
> waiver of any privilege.  If you receive this message in error, please do not 
> directly or indirectly print, copy, retransmit, disseminate, or otherwise use 
> the information. In addition, please delete this e-mail and all copies and 
> notify the sender.