Re: Row cache off-heap ?

2013-03-12 Thread Alain RODRIGUEZ
I am using C*1.1.6.

"Did you restart the node after changing the row_cache_size_in_mb ?"

No, I didn't. I used the nodetool setcachecapacity and didn't restart the
node.

"The changes in GC activity are not huge and may not be due to cache
activity"

I find them hudge, and just happened on the node in which I had enabled row
cache. I just enabled it on .164 node from 10:45 to 10:48 and the heap size
doubled from 3.5GB to 7GB (out of 8, which induced memory pressure). About
GC, all the collections increased a lot compare to the other nodes with row
caching disabled.

"What is the output from nodetool info?"

I can give it to you but, row cache i now disabled.

Token: 85070591730234615865843651857942052864
Gossip active: true
Thrift active: true
Load : 201.61 GB
Generation No: 1362749056
Uptime (seconds) : 328675
Heap Memory (MB) : 5157.58 / 8152.00
Data Center  : eu-west
Rack : 1b
Exceptions   : 24
Key Cache: size 104857584 (bytes), capacity 104857584 (bytes),
106814132 hits, 120131310 requests, 0.858 recent hit rate, 14400 save
period in seconds
Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
NaN recent hit rate, 0 save period in seconds

I think it won't help, but I can't try things now unless we are quire sure
it will work smooth, we are on heavy load.

Anyway, thanks for trying to help once again.




2013/3/12 aaron morton 

> What version are you using?
>
> Sounds like you have configured it correctly. Did you restart the node
> after changing the row_cache_size_in_mb ?
> The changes in GC activity are not huge and may not be due to cache
> activity. Have they continued after you enabled the row cache?
>
> What is the output from nodetool info?
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/03/2013, at 5:30 AM, Sávio Teles 
> wrote:
>
> I have the same problem!
>
> 2013/3/11 Alain RODRIGUEZ 
>
>> I can add that I have JNA corectly loaded, from the logs: "JNA mlockall
>> successful"
>>
>>
>> 2013/3/11 Alain RODRIGUEZ 
>>
>>> Any clue on this ?
>>>
>>> Row cache well configured could avoid us a lot of disk read, and IO
>>> is definitely our bottleneck... If someone could explain why the row cache
>>> has so much impact on my JVM and how to avoid it, it would be appreciated
>>> :).
>>>
>>>
>>> 2013/3/8 Alain RODRIGUEZ 
>>>
 Hi,

 We have some issue having a high read throughput. I wanted to alleviate
 things by turning the row cache ON.

 I set the row cache to 200 on one node and enable caching 'ALL' on the
 3 most read CF. There is the effect this operation had on my JVM:
 http://img692.imageshack.us/img692/4171/datastaxopscenterr.png

 It looks like the row cache was somehow stored in-heap. I looked at my
 cassandra.yaml and I have the following configuration: row_cache_provider:
 SerializingCacheProvider (which should be enough to store row cache
 off-heap as described above in this file: "SerializingCacheProvider
 serialises the contents of the row and stores it in native memory, i.e.,
 off the JVM Heap")

 What's wrong ?

>>>
>>>
>>
>
>
> --
> Atenciosamente,
> Sávio S. Teles de Oliveira
> voice: +55 62 9136 6996
> http://br.linkedin.com/in/savioteles
>  Mestrando em Ciências da Computação - UFG
> Arquiteto de Software
> Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG
>
>
>


Re: Quorum read after quorum write guarantee

2013-03-12 Thread André Cruz
On Mar 12, 2013, at 6:04 AM, aaron morton  wrote:

>> by a multiget will not find the just inserted data.
> Can you explain how the data is not found. 
> Does it not find new columns or does it return stale columns ? 

It does not find new columns, I don't overwrite data.

> If the read is run again does it return the expected value? 

Yes, when I go and check manually after this happens, the data is already 
there. However, I think I saw an instance when the multiget query was executed 
twice in quick succession, the first time it returned all the results, the 
second time it didn't. This only happened once, so I filed it under X-Files.

> if you are getting stale data double check the the nodes / clients have their 
> clocks synchronised. 

That's a good tip, I'll check.

> If you are doing reads and writes using QUOURM double check that your code is 
> correct. If it is provide some more info on what you are seeing. 

It seems correct. What more info could be of value?

Thanks,
André



Re: Pig / Map Reduce on Cassandra

2013-03-12 Thread cscetbon.ext
I'm already using Cassandra 1.2.2 with only one line to test the cassandra 
access :

rows = LOAD 'cassandra://twissandra/users' USING 
org.apache.cassandra.hadoop.pig.CassandraStorage();

extracted from the sample script provided in the sources
--
Cyril SCETBON

On Mar 12, 2013, at 6:57 AM, aaron morton 
mailto:aa...@thelastpickle.com>> wrote:

any idea why the function loadFunc does not work correctly ?
No sorry.
Not sure why you are linking to the CQL info or what Pig script / config you 
are running.
Did you follow the example in the examples/pig in the source distribution ?

Also please use at least cassandra 1.1.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/03/2013, at 9:39 AM, 
cscetbon@orange.com wrote:

You said all versions. However, when I try to access 
cassandra://twissandra/users based on 
http://www.datastax.com/docs/1.0/dml/using_cql I get :

2013-03-11 17:35:48,444 [main] INFO  org.apache.pig.Main - Apache Pig version 
0.11.0 (r1446324) compiled Feb 14 2013, 16:40:57
2013-03-11 17:35:48,445 [main] INFO  org.apache.pig.Main - Logging error 
messages to: /Users/cyril/pig_1363019748442.log
2013-03-11 17:35:48.583 java[13809:1203] Unable to load realm info from 
SCDynamicStore
2013-03-11 17:35:48,750 [main] INFO  org.apache.pig.impl.util.Utils - Default 
bootup file /Users/cyril/.pigbootup not found
2013-03-11 17:35:48,831 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: file:///
2013-03-11 17:35:49,235 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2245: Cannot get schema from loadFunc 
org.apache.cassandra.hadoop.pig.CassandraStorage

with pig 0.11.0

any idea why the function loadFunc does not work correctly ?

thanks
--
Cyril SCETBON

On Jan 18, 2013, at 7:00 PM, aaron morton 
mailto:aa...@thelastpickle.com>> wrote:

Silly question -- but does hive/pig hadoop etc work with cassandra
1.1.8?  Or only with 1.2?
all versions.

We are using astyanax library, which seems
to fail horribly on 1.2,
How does it fail ?
If you think you have a bug post it at https://github.com/Netflix/astyanax

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/01/2013, at 7:48 AM, James Lyons 
mailto:james.ly...@gmail.com>> wrote:

Silly question -- but does hive/pig hadoop etc work with cassandra
1.1.8?  Or only with 1.2?  We are using astyanax library, which seems
to fail horribly on 1.2, so we're still on 1.1.8.  But we're just
starting out with this and i'm still debating between cassandra and
hbase.  So I just want to know if there is a limitation here or not,
as I have no idea when 1.2 support will exist in astyanax.

That said, are there other java (scala) libraries that people use to
connect to cassandra that support 1.2?

-James-

On Thu, Jan 17, 2013 at 8:30 AM,  
mailto:cscetbon@orange.com>> wrote:
Ok, I understand that I need to manage both cassandra and hadoop components
and that pig will use hadoop components to launch its tasks which will use
Cassandra as the Storage engine.

Thanks
--
Cyril SCETBON

On Jan 17, 2013, at 4:03 PM, James Schappet 
mailto:jschap...@gmail.com>> wrote:

This really depends on how you design your Hadoop Cluster.  The testing I
have done, had Hadoop and Cassandra Nodes collocated on the same hosts.
Remember that Pig code runs inside of your hadoop cluster, and connects to
Cassandra as the Database engine.


I have not done any testing with Hive, so someone else will have to answer
that question.


From: mailto:cscetbon@orange.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Thursday, January 17, 2013 8:58 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Pig / Map Reduce on Cassandra

Jimmy,

I understand that CFS can replace HDFS for those who use Hadoop. I just want
to use pig and hive on cassandra. I know that pig samples are provided and
work now with cassandra natively (they are part of the core). However, does
it mean that the process will be spread over nodes with
number_of_mapper=number_of_nodes or something like that ?
Can Hive connect to Cassandra 1.2 easily too ?

--
Cyril Scetbon

On Jan 17, 2013, at 2:42 PM, James Schappet 
mailto:jschap...@gmail.com>> wrote:

CFS is Cassandra File System:
http://www.datastax.com/dev/blog/cassandra-file-system-design


But you don't need CFS to connect from PIG to Cassandra.  The latest
versions of Cassandra Source ship with examples of connecting from pig to
cassandra.


apache-cassandra-1.2.0-src/examples/pig   --
http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-src.tar.gz

--Jimmy


From: mailto:cscetbon@orange.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Thursd

word of caution to those switching to LCS from SizeTieredStrategy

2013-03-12 Thread Hiller, Dean
We tested it in QA, but in production it brought our cluster to a halt even 
though we setcompactionthroughput to 1, we were severely limited.  Nodetool 
stop compaction did not seem to have any impact either.  We ended up increasing 
memory on one node to help alleviate some issue(cranked it up to 16G and we 
hope to put it back down to 8 G later).

Just be careful if you are trying to make the switch.  Also, I added this jira 
ticket which would be extremely nice to turn on LCS for a CF/node and do a node 
by node switch…

https://issues.apache.org/jira/browse/CASSANDRA-5335


Dean


Re: commitlog -deleted keyspaces.

2013-03-12 Thread Hiller, Dean
Here is our cluster which has 10 billion rows on 6 nodes and about 1.2TB
[root@sdi-ci ~]# clush -g datanodes du -sh /opt/datastore/commitlog
a5: 1.1G /opt/datastore/commitlog
a3: 1.1G /opt/datastore/commitlog
a1: 1.1G /opt/datastore/commitlog
a2: 1006M /opt/datastore/commitlog
a4: 1.1G /opt/datastore/commitlog
a6: 1.1G /opt/datastore/commitlog

If you run nodetool drain on a node, you can wipe the commit logs after that(we 
QA tested this but don't take my word for it).  We also found out drain was 
moving data to the sstables but did not seem to delete from the commit log at 
all as commit log space used remained the same after a drain.  We did not fully 
test removing the commit log files so you should try to do that yourself with a 
test in QA.

Later,
Dean

From: a k mailto:kumaramit.ex...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, March 12, 2013 10:46 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: commitlog -deleted keyspaces.

We are running a 4 node cluster version 1.1.0 and our commit logs seem to be 
ever growing. We have a total about 250 GB per node in the keyspaces/column 
families and the commit logs are at about 30 GB.There have been several 
deletions of keyspaces in our setup and I am concerned about a few things.

First being the size of the commit logs, we have not modified the  
"commitlog_total_space_in_mb" in yaml so I assume it is the default   4092MB, 
We don't have "memtable_flush_after_mins" setting either.

Is this sane? Why would the size of the commit logs be so large (I am worried 
about the startup time, when it has to replay the commit logs) and why would 
the default size of 4092 MB not be enforced.

Would it cause us trouble when we upgrade to 1.2?

Another thing I have noticed is that upon restarts, the old keyspaces that were 
deleted re-appear although with less data, I would imagine that has nothing to 
do with the commit logs.

Can I safely delete the commitlogs after the nodetool flush?



Many thanks
Amit



Re: word of caution to those switching to LCS from SizeTieredStrategy

2013-03-12 Thread Edward Capriolo
yes LCS has its own compacting thing. It does not honor min compaction, max
compaction, and no-ops major compaction. The issue is that at the moment
you change your system moves all your sized data to L0 and then starts a
huge compaction grid to level it.

It would be great to just make this change to one server and see
read/write.

Check out the write sampling mode.

Edward

On Tue, Mar 12, 2013 at 9:08 AM, Hiller, Dean  wrote:

> We tested it in QA, but in production it brought our cluster to a halt
> even though we setcompactionthroughput to 1, we were severely limited.
>  Nodetool stop compaction did not seem to have any impact either.  We ended
> up increasing memory on one node to help alleviate some issue(cranked it up
> to 16G and we hope to put it back down to 8 G later).
>
> Just be careful if you are trying to make the switch.  Also, I added this
> jira ticket which would be extremely nice to turn on LCS for a CF/node and
> do a node by node switch…
>
> https://issues.apache.org/jira/browse/CASSANDRA-5335
>
>
> Dean
>


Re: commitlog -deleted keyspaces.

2013-03-12 Thread a k
Thanks Dean. I will try the node drain next, however Do you know if this is
a known issue/bug with 1.1, I scanned through some 200 odd jira entries
that have commit log in the text for some clues -but no luck.

Amit


On Tue, Mar 12, 2013 at 12:17 PM, Hiller, Dean  wrote:

> Here is our cluster which has 10 billion rows on 6 nodes and about 1.2TB
> [root@sdi-ci ~]# clush -g datanodes du -sh /opt/datastore/commitlog
> a5: 1.1G /opt/datastore/commitlog
> a3: 1.1G /opt/datastore/commitlog
> a1: 1.1G /opt/datastore/commitlog
> a2: 1006M /opt/datastore/commitlog
> a4: 1.1G /opt/datastore/commitlog
> a6: 1.1G /opt/datastore/commitlog
>
> If you run nodetool drain on a node, you can wipe the commit logs after
> that(we QA tested this but don't take my word for it).  We also found out
> drain was moving data to the sstables but did not seem to delete from the
> commit log at all as commit log space used remained the same after a drain.
>  We did not fully test removing the commit log files so you should try to
> do that yourself with a test in QA.
>
> Later,
> Dean
>
> From: a k mailto:kumaramit.ex...@gmail.com>>
> Reply-To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Date: Tuesday, March 12, 2013 10:46 AM
> To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Subject: commitlog -deleted keyspaces.
>
> We are running a 4 node cluster version 1.1.0 and our commit logs seem to
> be ever growing. We have a total about 250 GB per node in the
> keyspaces/column families and the commit logs are at about 30 GB.There have
> been several deletions of keyspaces in our setup and I am concerned about a
> few things.
>
> First being the size of the commit logs, we have not modified the
>  "commitlog_total_space_in_mb" in yaml so I assume it is the default
> 4092MB, We don't have "memtable_flush_after_mins" setting either.
>
> Is this sane? Why would the size of the commit logs be so large (I am
> worried about the startup time, when it has to replay the commit logs) and
> why would the default size of 4092 MB not be enforced.
>
> Would it cause us trouble when we upgrade to 1.2?
>
> Another thing I have noticed is that upon restarts, the old keyspaces that
> were deleted re-appear although with less data, I would imagine that has
> nothing to do with the commit logs.
>
> Can I safely delete the commitlogs after the nodetool flush?
>
>
>
> Many thanks
> Amit
>
>


Re: commitlog -deleted keyspaces.

2013-03-12 Thread Raman
Can someone refer me to a C* tutorial on how to define dynamic schema 
and populate data.


I am trying to an inheritance hierarchy object population into C*.

I want to handle all Base/Derived Class objects as dynamic schema each 
with its own set of

attributes...

Thanks
Raman


Re: Cassandra OOM, many deletedColumn

2013-03-12 Thread 金剑
Thanks for you reply. we will try both of your recommentation. The OS
memory is 8G, For JVM Heap it is 2G, DeletedColumn used 1.4G which are
rooted from readStage thread. Do you think we need increase the size of JVM
Heap?

 Configuration for the index columnFamily is

create column family purge
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 1.0
  and gc_grace = 1800
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';


Best Regards!

Jian Jin


2013/3/9 aaron morton 

> You need to provide some details of the machine and the JVM configuration.
> But lets say you need to have 4Gb to 8GB for the JVM heap.
>
> If you have many deleted columns I would say you have a *lot* of garbage
> in each row. Consider reducing the gc_grace seconds so the columns are
> purged more frequently, not however that columns are only purged when all
> fragments of the row are part of the minor compaction.
>
> If you have a mixed write / delete work load consider using the Levelled
> compaction strategy
> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/03/2013, at 10:37 PM, Jason Wee  wrote:
>
> hmm.. did you managed to take a look using nodetool tpstats? That may give
> you indication further..
>
> Jason
>
>
> On Thu, Mar 7, 2013 at 1:56 PM, 金剑  wrote:
>
>> Hi,
>>
>> My version is  1.1.7
>>
>> Our use case is : we have a index columnfamily to record how many
>> resource is stored for a user. The number might vary from tens to millions.
>>
>> We provide a feature to let user to delete resource according prefix.
>>
>>
>>  we found some cassandra will OOM after some period. The cluster is a
>> kind of cross-datacenter ring.
>>
>> 1. Exception in cassandra log:
>>
>> ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java
>> (line 135) Exception in thread Thread[Thread-5810,5,main]
>> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
>> shut down
>> at
>> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>> at
>> java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655)
>>
>> at
>> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)
>>
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
>>
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
>>
>> ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java
>> (line 135) Exception in thread Thread[Thread-5819,5,main]
>> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
>> shut down
>> at
>> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>> at
>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>>
>> at
>> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)
>>
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
>>
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
>>
>> ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java
>> (line 135) Exception in thread Thread[Thread-36,5,main]
>> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
>> shut down
>> at
>> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>> at
>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>>
>> at
>> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)
>>
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
>>
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
>>
>> ERROR [Thread-3990] 2013-02-04 05:38:13,902 AbstractCassandraDaemon.java
>> (line 135) Exception in thread Thread[Thread-3990,5,main]
>> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
>> shut down
>> at
>> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(Debugga