Re: Data not replicating to all datacenters

2012-12-04 Thread aaron morton
> A few days after writing the data, we tried this on cassandra-cli
The default consistency level in the CLI is ONE, did you change it to LOCAL 
QUOURM ? 

(I'm assuming your example is for two reads from the same CF)

It looks like the first read was done at a lower CL, and the value returned is 
valid in the sense that one replica did not have any data. Behind the scenes 
Read Repair was active on the request and it repaired the one replica the first 
read was from. So the next time round the value was there. 

If you want strongly consistent behaviour using QUOURM or LOCAL QUOURM for both 
reads and writes. 

Cheers  

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 4/12/2012, at 9:33 AM, Owen Davies  wrote:

> We have written a large amount of data to Cassandra from another
> database. When writing the client was set to write local quorum.
> 
> A few days after writing the data, we tried this on cassandra-cli
> 
> get example['key'][123];
> Value was not found
> Elapsed time: 50 msec(s).
> 
> Then a bit later
> 
> get datapoints['key'][123];
> => (column=123, value=456, timestamp=1354095697384001)
> Elapsed time: 77 msec(s).
> 
> We assume this is to do with replication, with the first read causing
> a repair, but there dosen't seem to be any way of seeing what date is
> on which servers to validate.
> 
> I have not had a chance yet of trying the previous suggestion.
> 
> Owen
> 
> On 3 December 2012 20:18, aaron morton  wrote:
>> When reading, sometimes the data is there,
>> sometimes it is not, which we think is a replication issue, even
>> though we have left it plenty of time after the writes.
>> 
>> Can you provide some more information on this ?
>> Are you talking about writes to one DC and reads from another ?
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 4/12/2012, at 6:45 AM, Шамим  wrote:
>> 
>> Yes, it's should be. dc1:3 means u have 3 copy of every piece of row, with
>> local quorum you always get a good consistency from 3 nodes.
>> First you have to calculate token for data center dc1 and add offset 100 to
>> token for the second data center which will resolve your problem. After
>> creating the keyspace you can run the nodetool command with ring KEYSPACE
>> NAME which should show u the load for every node as 33%
>> Hope it will help u
>> Shamim
>> 
>> Hi Shamim
>> 
>> I have read a bit about the Tokens. I understand how that could effect
>> the data distribution at first, but I don't understand if we have
>> specified Options: [dc1:3, dc2:3], surely after a while all the data
>> will be on every server?
>> 
>> Thanks,
>> 
>> Owen
>> 
>> On 3 December 2012 14:06, Шамим  wrote:
>> 
>> Hello Owen,
>> Seems you did not configure token for all nodes correctly. See the section
>> Calculating Tokens for multiple data centers here
>> http://www.datastax.com/docs/0.8/install/cluster_init
>> 
>> Best regards
>> Shamim
>> ---
>> On Mon, Dec 3, 2012 at 4:42 PM, Owen Davies  wrote:
>> 
>> We have a 2 data center test cassandra setup running, and are writing
>> to it using LOCAL_QUORUM. When reading, sometimes the data is there,
>> sometimes it is not, which we think is a replication issue, even
>> though we have left it plenty of time after the writes.
>> 
>> We have the following setup:
>> 
>> cassandra -v: 1.1.6
>> 
>> cassandra.yaml
>> ---
>> 
>> cluster_name: something
>> 
>> endpoint_snitch: PropertyFileSnitch
>> 
>> cassandra-topology.properties
>> 
>> 192.168.1.1=dc1:rack1
>> 192.168.1.2=dc1:rack1
>> 192.168.1.3=dc1:rack1
>> 
>> 192.168.2.1=dc2:rack1
>> 192.168.2.2=dc2:rack1
>> 192.168.2.3=dc3:rack1
>> 
>> default=nodc:norack
>> 
>> cassandra-cli
>> 
>> Keyspace: example:
>> Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>> Durable Writes: true
>> Options: [dc1:3, dc2:3]
>> 
>> nodetool ring
>> ---
>> 
>> Address DC Rack Status State Load
>> Effective-Ownership Token
>> 
>> 159447687142037741049740936276011715300
>> 192.168.1.2 dc1 rack1 Up Normal 111.17 GB
>> 100.00% 67165620003619490909052924699950283577
>> 192.168.1.1 dc1 rack1 Up Normal 204.57 GB
>> 100.00% 71045951808949151217931264995073558408
>> 192.168.2.1 dc2 rack1 Up Normal 209.92 GB
>> 100.00% 107165019770579893816561717940612111506
>> 192.168.1.3 dc1 rack1 Up Normal 209.92 GB
>> 100.00% 11416536395796636002672965495595953
>> 192.168.2.3 dc2 rack1 Up Normal 198.22 GB
>> 100.00% 147717787092318068320268200174271353451
>> 192.168.2.2 dc2 rack1 Up Normal 179.31 GB
>> 100.00% 159447687142037741049740936276011715300
>> 
>> Does anyone have any ideas why every server does not have the same
>> amount of data 

Re: Data not replicating to all datacenters

2012-12-04 Thread Owen Davies
In our main application we are using local quorum to read. I realise
the default for cli is one, the point is that we want all our data on
all the servers, hence specifying [dc1:3, dc2:3] as the replication
strategy. After a couple of days we would expect it to have
replicated.

As I said, we will try the first suggestion to see if that helps.

Again, thanks to both of you.

Owen

On 4 December 2012 09:06, aaron morton  wrote:
> A few days after writing the data, we tried this on cassandra-cli
>
> The default consistency level in the CLI is ONE, did you change it to LOCAL
> QUOURM ?
>
> (I'm assuming your example is for two reads from the same CF)
>
> It looks like the first read was done at a lower CL, and the value returned
> is valid in the sense that one replica did not have any data. Behind the
> scenes Read Repair was active on the request and it repaired the one replica
> the first read was from. So the next time round the value was there.
>
> If you want strongly consistent behaviour using QUOURM or LOCAL QUOURM for
> both reads and writes.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/12/2012, at 9:33 AM, Owen Davies  wrote:
>
> We have written a large amount of data to Cassandra from another
> database. When writing the client was set to write local quorum.
>
> A few days after writing the data, we tried this on cassandra-cli
>
> get example['key'][123];
> Value was not found
> Elapsed time: 50 msec(s).
>
> Then a bit later
>
> get datapoints['key'][123];
> => (column=123, value=456, timestamp=1354095697384001)
> Elapsed time: 77 msec(s).
>
> We assume this is to do with replication, with the first read causing
> a repair, but there dosen't seem to be any way of seeing what date is
> on which servers to validate.
>
> I have not had a chance yet of trying the previous suggestion.
>
> Owen
>
> On 3 December 2012 20:18, aaron morton  wrote:
>
> When reading, sometimes the data is there,
> sometimes it is not, which we think is a replication issue, even
> though we have left it plenty of time after the writes.
>
> Can you provide some more information on this ?
> Are you talking about writes to one DC and reads from another ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/12/2012, at 6:45 AM, Шамим  wrote:
>
> Yes, it's should be. dc1:3 means u have 3 copy of every piece of row, with
> local quorum you always get a good consistency from 3 nodes.
> First you have to calculate token for data center dc1 and add offset 100 to
> token for the second data center which will resolve your problem. After
> creating the keyspace you can run the nodetool command with ring KEYSPACE
> NAME which should show u the load for every node as 33%
> Hope it will help u
> Shamim
>
> Hi Shamim
>
> I have read a bit about the Tokens. I understand how that could effect
> the data distribution at first, but I don't understand if we have
> specified Options: [dc1:3, dc2:3], surely after a while all the data
> will be on every server?
>
> Thanks,
>
> Owen
>
> On 3 December 2012 14:06, Шамим  wrote:
>
> Hello Owen,
> Seems you did not configure token for all nodes correctly. See the section
> Calculating Tokens for multiple data centers here
> http://www.datastax.com/docs/0.8/install/cluster_init
>
> Best regards
> Shamim
> ---
> On Mon, Dec 3, 2012 at 4:42 PM, Owen Davies  wrote:
>
> We have a 2 data center test cassandra setup running, and are writing
> to it using LOCAL_QUORUM. When reading, sometimes the data is there,
> sometimes it is not, which we think is a replication issue, even
> though we have left it plenty of time after the writes.
>
> We have the following setup:
>
> cassandra -v: 1.1.6
>
> cassandra.yaml
> ---
>
> cluster_name: something
>
> endpoint_snitch: PropertyFileSnitch
>
> cassandra-topology.properties
> 
> 192.168.1.1=dc1:rack1
> 192.168.1.2=dc1:rack1
> 192.168.1.3=dc1:rack1
>
> 192.168.2.1=dc2:rack1
> 192.168.2.2=dc2:rack1
> 192.168.2.3=dc3:rack1
>
> default=nodc:norack
>
> cassandra-cli
> 
> Keyspace: example:
> Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
> Durable Writes: true
> Options: [dc1:3, dc2:3]
>
> nodetool ring
> ---
>
> Address DC Rack Status State Load
> Effective-Ownership Token
>
> 159447687142037741049740936276011715300
> 192.168.1.2 dc1 rack1 Up Normal 111.17 GB
> 100.00% 67165620003619490909052924699950283577
> 192.168.1.1 dc1 rack1 Up Normal 204.57 GB
> 100.00% 71045951808949151217931264995073558408
> 192.168.2.1 dc2 rack1 Up Normal 209.92 GB
> 100.00% 107165019770579893816561717940612111506
> 192.168.1.3 dc1 rack1 Up Norma

Re: strange row cache behavior

2012-12-04 Thread aaron morton
> Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes), 0 
> hits, 2576 requests, NaN recent hit rate, 0 save period in seconds

So the cache is pretty much full, there is only 1 MB free. 

There were 2,576 read requests that tried to get a row from the cache. Zero of 
those had a hit. If you have 6 nodes and RF 2, each node has  one third of the 
data in the cluster (from the effective ownership info). So depending on the 
read workload the number of read requests on each node may be different. 

What I think is happening is reads are populating the row cache, then 
subsequent reads are evicting items from the row cache before you get back to 
reading the original rows. So if you read rows 1 to 5, they are put in the 
cache, when you read rows 6 to 10 they are put in and evict rows 1 to 5. Then 
you read rows 1 to 5 again they are not in the cache. 

Try testing with a lower number of hot rows, and/or a bigger row cache. 

But to be honest, with rows in the 10's of MB you will probably only get good 
cache performance with a small set of hot rows. 

Hope that helps. 



-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/12/2012, at 5:11 AM, Yiming Sun  wrote:

> Does anyone have any comments/suggestions for me regarding this?  Thanks
> 
> 
> I am trying to understand some strange behavior of cassandra row cache.  We 
> have a 6-node Cassandra cluster in a single data center on 2 racks, and the 
> neighboring nodes on the ring are from alternative racks.  Each node has 1GB 
> row cache, with key cache disabled.   The cluster uses PropertyFileSnitch, 
> and the ColumnFamily I fetch from uses NetworkTopologyStrategy, with 
> replication factor of 2.  My client code uses Hector to fetch a fixed set of 
> rows from cassandra
> 
> What I don't quite understand is even after I ran the client code several 
> times, there are always some nodes with 0 row cache hits, despite that the 
> row cache from all nodes are filled and all nodes receive requests.
> 
> Which nodes have 0 hits seem to be strongly related to the following:
> 
>  - the set of row keys to fetch
>  - the order of the set of row keys to fetch
>  - the list of hosts passed to Hector's CassandraHostConfigurator
>  - the order of the list of hosts passed to Hector
> 
> Can someone shed some lights on how exactly the row cache works and hopefully 
> also explain the behavior I have been seeing?  I thought if the fixed set of 
> the rows keys are the only thing I am fetching (each row should be on the 
> order of 10's of MBs, no more than 100MB), and each node gets requests, and 
> its row cache is filled, there's gotta be some hits.  Apparent this is not 
> the case.   Thanks.
> 
> cluster information:
> 
> Address DC  RackStatus State   Load
> Effective-Ownership Token   
>   
>  141784319550391026443072753096570088105 
> x.x.x.1DC1 r1  Up Normal  587.46 GB   33.33%  
> 0   
> x.x.x.2DC1 r2  Up Normal  591.21 GB   33.33%  
> 28356863910078205288614550619314017621  
> x.x.x.3DC1 r1  Up Normal  594.97 GB   33.33%  
> 56713727820156410577229101238628035242  
> x.x.x.4DC1 r2  Up Normal  587.15 GB   33.33%  
> 85070591730234615865843651857942052863  
> x.x.x.5DC1 r1  Up Normal  590.26 GB   33.33%  
> 113427455640312821154458202477256070484 
> x.x.x.6DC1 r2  Up Normal  583.21 GB   33.33%  
> 141784319550391026443072753096570088105
> 
> 
> [user@node]$ ./checkinfo.sh   
> *** x.x.x.4
> Token: 85070591730234615865843651857942052863
> Gossip active: true
> Thrift active: true
> Load : 587.15 GB
> Generation No: 1354074048
> Uptime (seconds) : 36957
> Heap Memory (MB) : 2027.29 / 3948.00
> Data Center  : DC1
> Rack : r2
> Exceptions   : 0
> 
> Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, 
> NaN recent hit rate, 14400 save period in seconds
> Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes), 0 
> hits, 2576 requests, NaN recent hit rate, 0 save period in seconds
> 
> *** x.x.x.6
> Token: 141784319550391026443072753096570088105
> Gossip active: true
> Thrift active: true
> Load : 583.21 GB
> Generation No: 1354074461
> Uptime (seconds) : 36535
> Heap Memory (MB) : 828.71 / 3948.00
> Data Center  : DC1
> Rack : r2
> Exceptions   : 0
> 
> Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, 
> NaN recent hit rate, 

Re: Row caching + Wide row column family == almost crashed?

2012-12-04 Thread aaron morton
I responded on your other thread. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 4/12/2012, at 5:31 PM, Yiming Sun  wrote:

> I ran into a different problem with Row cache recently, sent a message to the 
> list, but it didn't get picked up.  I am hoping someone can help me 
> understand the issue.  Our data also has rather wide rows, not necessarily in 
> the thousands range, but definitely in the upper-hundreds levels.   They are 
> hosted in v1.1.1.   I was doing a performance test and enabled off-heap row 
> cache of 1GB for each of our cassandra node (each node has at least 16GB of 
> memory).   The test code was requesting a fixed set of 5000 rows from the 
> cluster and ran a few times, but using nodetool info,  the row cache hit rate 
> was very low, and a few of the nodes had 0 hits despite the row cache was 
> full.
> 
> so what i was trying to understand is how the row cache can be full but with 
> 0 hits?
> 
> 
> On Mon, Dec 3, 2012 at 6:55 PM, Bill de hÓra  wrote:
> A Cassandra JVM will generally not function well with with caches and wide 
> rows. Probably the most important thing to understand is Ed's point, that the 
> row cache caches the entire row, not just the slice that was read out. What 
> you've seen is almost exactly the observed behaviour I'd expect with enabling 
> either cache provider over wide rows.
> 
>  - the on-heap cache will result in evictions that crush the JVM trying to 
> manage garbage. This is also the case so if the rows have an uneven size 
> distribution (as small rows can push out a single large row, large rows push 
> out many small ones, etc).
> 
>  - the off heap cache will spend a lot of time serializing and deserializing 
> wide rows, such that it can increase latency relative to just reading from 
> disk and leverage the filesystem's cache directly.
> 
> The cache resizing behaviour does exist to preserve the server's memory, but 
> it can also cause a death spiral in the on-heap case, because a relatively 
> smaller cache may result in data being evicted more frequently.  I've seen 
> cases where sizing up the cache can stabilise a server's memory.
> 
> This isn't just a Cassandra thing, it simply happens to be very evident with 
> that system - generally to get an effective benefit from a cache, the data 
> should be contiguously sized and not too large to allow effective cache 
> 'lining'.
> 
> Bill
> 
> 
> On 02/12/12 21:36, Mike wrote:
> Hello,
> 
> We recently hit an issue within our Cassandra based application.  We
> have a relatively new Column Family with some very wide rows (10's of
> thousands of columns, or more in some cases).  During a periodic
> activity, we the range of columns to retrieve various pieces of
> information, a segment at a time.
> 
> We do these same queries frequently at various stages of the process,
> and I thought the application could see a performance benefit from row
> caching.  We have a small row cache (100MB per node) already enabled,
> and I enabled row caching on the new column family.
> 
> The results were very negative.  When performing range queries with a
> limit of 200 results, for a small minority of the rows in the new column
> family, performance plummeted.  CPU utilization on the Cassandra node
> went through the roof, and it started chewing up memory.  Some queries
> to this column family hung completely.
> 
> According to the logs, we started getting frequent GCInspector
> messages.  Cassandra started flushing the largest mem_tables due to
> hitting the "flush_largest_memtables_at" of 75%, and scaling back the
> key/row caches.  However, to Cassandra's credit, it did not die with an
> OutOfMemory error.  Its measures to emergency measures to conserve
> memory worked, and the cluster stayed up and running.  No real errors
> showed in the logs, except for Messages getting drop, which I believe
> was caused by what was going on with CPU and memory.
> 
> Disabling row caching on this new column family has resolved the issue
> for now, but, is there something fundamental about row caching that I am
> missing?
> 
> We are running Cassandra 1.1.2 with a 6 node cluster, with a replication
> factor of 3.
> 
> Thanks,
> -Mike
> 
> 
> 
> 



Nodes not synced

2012-12-04 Thread Adeel Akbar

Hi,

I have setup 2 nodes cluster with Replica factor 2. I have restored 
snapshot of another cluster on Node A and restarted cassandra process. 
Node B stil not get any update/data from Node A. Do we need to execute 
any command to sync both nodes?


# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
Address   DC  RackStatus State Load
Effective-Ownership Token

91902851206288351623775585543017122534
XX.XX.XX.XXA  0   0   Up Normal *264.98 GB* 
0.00% 59394911263811417432307015371109991999
XX.XX.XX.XXB  0   0   Up Normal *67.34 KB* 
0.00%   91902851206288351623775585543017122534

--


Looking for your prompt response.

*Adeel*



Re: strange row cache behavior

2012-12-04 Thread Yiming Sun
Hi Aaron,

Thank you,and your explanation makes sense.  At the time, I thought having
1GB of row cache on each node was plenty enough, because there was an
aggregated 6GB cache, but you are right, with each row in 10's of MBs, some
of the nodes can go into a constant load and evict cycle and would have
negative effects on the performance.  I will try as you suggested to 1.)
reduce the requested entry set, and 2.) increase the row cache size and see
if they get better hits, and also do 3) by reversing the requested entry
list in alternate runs.

Our data space has close to 3 million rows, but we haven't gotten enough
usage statistics to know what rows are hot.  Does this mean we should not
enable row caches until we are absolutely sure about what's hot (I think
there is a reason why row caches are disabled by default) ?  It also seems
from my test that OS page cache works much better, but it could be that OS
page cache can utilize all the available memory so it is essentially larger
-- I guess I will find out by doing 2.) above.

best,

-- Y.



On Tue, Dec 4, 2012 at 4:47 AM, aaron morton wrote:

> > Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes),
> 0 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds
>
> So the cache is pretty much full, there is only 1 MB free.
>
> There were 2,576 read requests that tried to get a row from the cache.
> Zero of those had a hit. If you have 6 nodes and RF 2, each node has  one
> third of the data in the cluster (from the effective ownership info). So
> depending on the read workload the number of read requests on each node may
> be different.
>
> What I think is happening is reads are populating the row cache, then
> subsequent reads are evicting items from the row cache before you get back
> to reading the original rows. So if you read rows 1 to 5, they are put in
> the cache, when you read rows 6 to 10 they are put in and evict rows 1 to
> 5. Then you read rows 1 to 5 again they are not in the cache.
>
> Try testing with a lower number of hot rows, and/or a bigger row cache.
>
> But to be honest, with rows in the 10's of MB you will probably only get
> good cache performance with a small set of hot rows.
>
> Hope that helps.
>
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/12/2012, at 5:11 AM, Yiming Sun  wrote:
>
> > Does anyone have any comments/suggestions for me regarding this?  Thanks
> >
> >
> > I am trying to understand some strange behavior of cassandra row cache.
>  We have a 6-node Cassandra cluster in a single data center on 2 racks, and
> the neighboring nodes on the ring are from alternative racks.  Each node
> has 1GB row cache, with key cache disabled.   The cluster uses
> PropertyFileSnitch, and the ColumnFamily I fetch from uses
> NetworkTopologyStrategy, with replication factor of 2.  My client code uses
> Hector to fetch a fixed set of rows from cassandra
> >
> > What I don't quite understand is even after I ran the client code
> several times, there are always some nodes with 0 row cache hits, despite
> that the row cache from all nodes are filled and all nodes receive requests.
> >
> > Which nodes have 0 hits seem to be strongly related to the following:
> >
> >  - the set of row keys to fetch
> >  - the order of the set of row keys to fetch
> >  - the list of hosts passed to Hector's CassandraHostConfigurator
> >  - the order of the list of hosts passed to Hector
> >
> > Can someone shed some lights on how exactly the row cache works and
> hopefully also explain the behavior I have been seeing?  I thought if the
> fixed set of the rows keys are the only thing I am fetching (each row
> should be on the order of 10's of MBs, no more than 100MB), and each node
> gets requests, and its row cache is filled, there's gotta be some hits.
>  Apparent this is not the case.   Thanks.
> >
> > cluster information:
> >
> > Address DC  RackStatus State   Load
>  Effective-Ownership Token
> >
>141784319550391026443072753096570088105
> > x.x.x.1DC1 r1  Up Normal  587.46 GB   33.33%
>  0
> > x.x.x.2DC1 r2  Up Normal  591.21 GB   33.33%
>  28356863910078205288614550619314017621
> > x.x.x.3DC1 r1  Up Normal  594.97 GB   33.33%
>  56713727820156410577229101238628035242
> > x.x.x.4DC1 r2  Up Normal  587.15 GB   33.33%
>  85070591730234615865843651857942052863
> > x.x.x.5DC1 r1  Up Normal  590.26 GB   33.33%
>  113427455640312821154458202477256070484
> > x.x.x.6DC1 r2  Up Normal  583.21 GB   33.33%
>  141784319550391026443072753096570088105
> >
> >
> > [user@node]$ ./checkinfo.sh
> > *** x.x.x.4
> > Token: 85070591730234615865843651857

Re: Row caching + Wide row column family == almost crashed?

2012-12-04 Thread Yiming Sun
Yup, got it.  Thanks Aaron.


On Tue, Dec 4, 2012 at 4:47 AM, aaron morton wrote:

> I responded on your other thread.
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/12/2012, at 5:31 PM, Yiming Sun  wrote:
>
> I ran into a different problem with Row cache recently, sent a message to
> the list, but it didn't get picked up.  I am hoping someone can help me
> understand the issue.  Our data also has rather wide rows, not necessarily
> in the thousands range, but definitely in the upper-hundreds levels.   They
> are hosted in v1.1.1.   I was doing a performance test and enabled off-heap
> row cache of 1GB for each of our cassandra node (each node has at least
> 16GB of memory).   The test code was requesting a fixed set of 5000 rows
> from the cluster and ran a few times, but using nodetool info,  the row
> cache hit rate was very low, and a few of the nodes had 0 hits despite the
> row cache was full.
>
> so what i was trying to understand is how the row cache can be full but
> with 0 hits?
>
>
> On Mon, Dec 3, 2012 at 6:55 PM, Bill de hÓra  wrote:
>
>> A Cassandra JVM will generally not function well with with caches and
>> wide rows. Probably the most important thing to understand is Ed's point,
>> that the row cache caches the entire row, not just the slice that was read
>> out. What you've seen is almost exactly the observed behaviour I'd expect
>> with enabling either cache provider over wide rows.
>>
>>  - the on-heap cache will result in evictions that crush the JVM trying
>> to manage garbage. This is also the case so if the rows have an uneven size
>> distribution (as small rows can push out a single large row, large rows
>> push out many small ones, etc).
>>
>>  - the off heap cache will spend a lot of time serializing and
>> deserializing wide rows, such that it can increase latency relative to just
>> reading from disk and leverage the filesystem's cache directly.
>>
>> The cache resizing behaviour does exist to preserve the server's memory,
>> but it can also cause a death spiral in the on-heap case, because a
>> relatively smaller cache may result in data being evicted more frequently.
>>  I've seen cases where sizing up the cache can stabilise a server's memory.
>>
>> This isn't just a Cassandra thing, it simply happens to be very evident
>> with that system - generally to get an effective benefit from a cache, the
>> data should be contiguously sized and not too large to allow effective
>> cache 'lining'.
>>
>> Bill
>>
>>
>> On 02/12/12 21:36, Mike wrote:
>>
>>> Hello,
>>>
>>> We recently hit an issue within our Cassandra based application.  We
>>> have a relatively new Column Family with some very wide rows (10's of
>>> thousands of columns, or more in some cases).  During a periodic
>>> activity, we the range of columns to retrieve various pieces of
>>> information, a segment at a time.
>>>
>>> We do these same queries frequently at various stages of the process,
>>> and I thought the application could see a performance benefit from row
>>> caching.  We have a small row cache (100MB per node) already enabled,
>>> and I enabled row caching on the new column family.
>>>
>>> The results were very negative.  When performing range queries with a
>>> limit of 200 results, for a small minority of the rows in the new column
>>> family, performance plummeted.  CPU utilization on the Cassandra node
>>> went through the roof, and it started chewing up memory.  Some queries
>>> to this column family hung completely.
>>>
>>> According to the logs, we started getting frequent GCInspector
>>> messages.  Cassandra started flushing the largest mem_tables due to
>>> hitting the "flush_largest_memtables_at" of 75%, and scaling back the
>>> key/row caches.  However, to Cassandra's credit, it did not die with an
>>> OutOfMemory error.  Its measures to emergency measures to conserve
>>> memory worked, and the cluster stayed up and running.  No real errors
>>> showed in the logs, except for Messages getting drop, which I believe
>>> was caused by what was going on with CPU and memory.
>>>
>>> Disabling row caching on this new column family has resolved the issue
>>> for now, but, is there something fundamental about row caching that I am
>>> missing?
>>>
>>> We are running Cassandra 1.1.2 with a 6 node cluster, with a replication
>>> factor of 3.
>>>
>>> Thanks,
>>> -Mike
>>>
>>>
>>>
>>
>
>


Re: Data backup and restore

2012-12-04 Thread Tomas Nunez
Hi

I think he was talking about the "fragmentation" of the snapshot. In
cassandra 1.0.X all ColumnFamilies are in the same directory, but in
cassandra 1.1.X each ColumnFamily is in its own directory, and snapshots of
each ColumnFamily are inside this directory.

1.0.X Snapshot directory:
/cassandra/data//snapshots/

1.1.X Snapshot directory
/cassandra/data///snapshots/

In 1.0.X you can restore a Keyspace backup by copying just one directory.
In 1.1.X it seems you need to copy one directory for each ColumnFamily,
which is a little more complicated.



2012/12/1 Tyler Hobbs 

> The nodetool snapshot command has keyspace and column family options (from
> nodetool --help):
>
> snapshot [keyspaces...] -cf [columnfamilyName] -t [snapshotName] - Take a
> snapshot of the optionally specified column family of the specified
> keyspaces using optional name snapshotName
>
>
> On Wed, Nov 28, 2012 at 5:40 AM, Adeel Akbar <
> adeel.ak...@panasiangroup.com> wrote:
>
>>  Dear All,
>>
>> I have Cassandra 1.1.4 cluster with 2 nodes. I need to take backup and
>> restore on staging for testing purpose. I have taken snapshot with below
>> mentioned command but It created snapshot on every Keyspace's column
>> family. Is there any other way to take backup and restore quick.
>>
>> /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost snapshot -t
>> cassandra_bkup
>>
>> *Snapshot directory:*
>> /var/log/cassandra/data//>
>> --
>>
>>
>> Thanks & Regards
>>
>> *Adeel** Akbar*
>>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>
>


-- 
[image: Groupalia] 
www.groupalia.com Tomàs NúñezIT-SysprodTel. + 34
93 159 31 00 Fax. + 34 93 396 18 52Llull, 95-97, 2º planta, 08005
BarcelonaSkype:
tomas.nunez.groupaliatomas.nu...@groupalia.com[image:
Twitter] Twitter [image: Twitter]
 Facebook [image: Twitter]
 Linkedin 
<><><><>

Re: Row caching + Wide row column family == almost crashed?

2012-12-04 Thread Mike

Thanks for all the responses!

On 12/3/2012 6:55 PM, Bill de hÓra wrote:
A Cassandra JVM will generally not function well with with caches and 
wide rows. Probably the most important thing to understand is Ed's 
point, that the row cache caches the entire row, not just the slice 
that was read out. What you've seen is almost exactly the observed 
behaviour I'd expect with enabling either cache provider over wide rows.


 - the on-heap cache will result in evictions that crush the JVM 
trying to manage garbage. This is also the case so if the rows have an 
uneven size distribution (as small rows can push out a single large 
row, large rows push out many small ones, etc).


 - the off heap cache will spend a lot of time serializing and 
deserializing wide rows, such that it can increase latency relative to 
just reading from disk and leverage the filesystem's cache directly.


The cache resizing behaviour does exist to preserve the server's 
memory, but it can also cause a death spiral in the on-heap case, 
because a relatively smaller cache may result in data being evicted 
more frequently.  I've seen cases where sizing up the cache can 
stabilise a server's memory.


This isn't just a Cassandra thing, it simply happens to be very 
evident with that system - generally to get an effective benefit from 
a cache, the data should be contiguously sized and not too large to 
allow effective cache 'lining'.


Bill

On 02/12/12 21:36, Mike wrote:

Hello,

We recently hit an issue within our Cassandra based application.  We
have a relatively new Column Family with some very wide rows (10's of
thousands of columns, or more in some cases).  During a periodic
activity, we the range of columns to retrieve various pieces of
information, a segment at a time.

We do these same queries frequently at various stages of the process,
and I thought the application could see a performance benefit from row
caching.  We have a small row cache (100MB per node) already enabled,
and I enabled row caching on the new column family.

The results were very negative.  When performing range queries with a
limit of 200 results, for a small minority of the rows in the new column
family, performance plummeted.  CPU utilization on the Cassandra node
went through the roof, and it started chewing up memory.  Some queries
to this column family hung completely.

According to the logs, we started getting frequent GCInspector
messages.  Cassandra started flushing the largest mem_tables due to
hitting the "flush_largest_memtables_at" of 75%, and scaling back the
key/row caches.  However, to Cassandra's credit, it did not die with an
OutOfMemory error.  Its measures to emergency measures to conserve
memory worked, and the cluster stayed up and running.  No real errors
showed in the logs, except for Messages getting drop, which I believe
was caused by what was going on with CPU and memory.

Disabling row caching on this new column family has resolved the issue
for now, but, is there something fundamental about row caching that I am
missing?

We are running Cassandra 1.1.2 with a 6 node cluster, with a replication
factor of 3.

Thanks,
-Mike








Diagnosing memory issues

2012-12-04 Thread Mike

Hello,

Our Cassandra cluster has, relatively recently, started experiencing 
memory pressure that I am in the midsts of diagnosing.  Our system has 
uneven levels of traffic, relatively light during the day, but extremely 
heavy during some overnight processing.  We have started getting a message:


WARN [ScheduledTasks:1] 2012-12-04 09:08:58,579 GCInspector.java (line 
145) Heap is 0.7520105072262254 full.  You may need to reduce memtable 
and/or cache sizes.  Cassandra will now flush up to the two largest 
memtables to free up memory.  Adjust flush_largest_memtables_at 
threshold in cassandra.yaml if you don't want Cassandra to do this 
automatically


I've started implementing some instrumentation to gather stats from JMX 
to determine what is happening.  However, last night, the GCInspector 
was kind enough to log the information below.  Couple of things jumped 
out at me.


The maximum heap for the Cassandra is 4GB.  We are running Cassandra 
1.1.2, on a 6 node cluster, with a replication factor of 3.  All our 
queries use LOCAL_QUORUM consistency.


Adding up the caches + the memtable "data" in the trace below, comes to 
under 600MB


The number that really jumps out at me below is the number of Pending 
requests for the Message Service.  24,000+ pending requests.


Does this number represent the number of outstanding client requests 
that this node is processing?  If so, does this mean we potentially have 
24,000 responses being pulled into memory, thereby causing this memory 
issue?  What else should I look at?


INFO [ScheduledTasks:1] 2012-12-04 09:00:37,585 StatusLogger.java (line 
57) Pool NameActive   Pending   Blocked
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,695 StatusLogger.java 
(line 72) ReadStage3266 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,696 StatusLogger.java 
(line 72) RequestResponseStage  0   193 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,696 StatusLogger.java 
(line 72) ReadRepairStage   0 0 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,696 StatusLogger.java 
(line 72) MutationStage 2 2 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,697 StatusLogger.java 
(line 72) ReplicateOnWriteStage 5 5 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,698 StatusLogger.java 
(line 72) GossipStage   013 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,698 StatusLogger.java 
(line 72) AntiEntropyStage  0 0 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,698 StatusLogger.java 
(line 72) MigrationStage0 0 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,699 StatusLogger.java 
(line 72) StreamStage   0 0 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,699 StatusLogger.java 
(line 72) MemtablePostFlusher   0 0 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,699 StatusLogger.java 
(line 72) FlushWriter   0 0 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,700 StatusLogger.java 
(line 72) MiscStage 0 0 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,700 StatusLogger.java 
(line 72) commitlog_archiver0 0 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,700 StatusLogger.java 
(line 72) InternalResponseStage 0 0 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,701 StatusLogger.java 
(line 72) AntiEntropySessions   0 0 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,701 StatusLogger.java 
(line 72) HintedHandoff 0 0 0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,702 StatusLogger.java 
(line 77) CompactionManager 2 4
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,702 StatusLogger.java 
(line 89) MessagingServicen/a24,229
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,702 StatusLogger.java 
(line 99) Cache Type Size Capacity KeysToSave Provider
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,702 StatusLogger.java 
(line 100) KeyCache2184533 2184533 all
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,703 StatusLogger.java 
(line 106) RowCache   52385581 
52428800  all 
org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,703 StatusLogger.java 
(line 113) ColumnFamilyMemtable ops,data
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,703 StatusLogger.java 
(line 116) system.NodeIdInfo 0,0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,704 StatusLogger.java 
(line 116) system.IndexInfo  0,0
 INFO [ScheduledTasks:1] 2012-12-04 09:00:37,705 StatusLogger.java 
(line 116

[BETA RELEASE] Apache Cassandra 1.2.0-beta3 released

2012-12-04 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of the third beta for
the future Apache Cassandra 1.2.0.

Let me first stress that this is beta software and as such is *not* ready
for
production use.

This release is still beta and as such may contain bugs. Any help testing
this beta would be gladely appreciated and if you were to encounter any
problem
during your testing, please report[3,4] them. Be sure to a look at the
change
log[1] and the release notes[2] to see where Cassandra 1.2 differs from the
previous series.

Apache Cassandra 1.2.0-beta3[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 12x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/LEmPN (CHANGES.txt)
[2]: http://goo.gl/tI66z (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.2.0-beta3


Re: Having issues with node token collisions when starting up Cassandra nodes in a cluster on VMWare.

2012-12-04 Thread aaron morton
> We do not want to manually set the value for initial_token for each node 
> (kind of defeats the goal of being dynamic..)
You *really* do want to do this. 
Adding without setting a token will result in an unbalanced cluster. 

The 1.1 distro includes a token generator in tools/bin/token-generator. 

> 1) Kill all cassandra instances and delete data & commit log files on each 
> node.
Did you delete the System keyspace data ?

> 3) Run nodetool -h W.W.W.W  ring and see:
> -
> Address DC  RackStatus State   Load
> Effective-Ownership Token
> S.S.S.S datacenter1 rack1   Up Normal  28.37 GB
> 100.00% 24360745721352799263907128727168388463
Is the W.W.W.W machine a different node running in the cluster ? Or was node 
tool ran on S.S.S.S ?

>  INFO [GossipStage:1] 2012-11-29 21:16:02,195 StorageService.java (line 1138) 
> Nodes /X.X.X.X and /Y.Y.Y.Y have the same token 
> 113436792799830839333714191906879955254.  /X.X.X.X is the new owner
This looks like the previous ring state was read from the system keyspace. 
Either by one of the other nodes, which then gossiped it around, or by this one.


When a node automatically selects a token at bootstrap it logs a message such 
as "New token will be {} to assume load from {}" do you see that ? If not the 
token has been read from the system KS. 

Hope that helps

 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/12/2012, at 11:46 AM, John Buczkowski  
wrote:

> Hi:
> Are there any known issues with initial_token collision when adding nodes to 
> a cluster in a VM environment?
> I'm working on a 4 node cluster set up on a VM. We're running into issues 
> when we attempt to add nodes to the cluster.
> In the cassandra.yaml file, initial_token is left blank.
> Since we're running > 1.0 cassandra, auto_bootstrap should be true by default.
> 
> It's my understanding that each of the nodes in the cluster should be 
> assigned an initial token at startup.
> This is not what we're currently seeing. 
> We do not want to manually set the value for initial_token for each node 
> (kind of defeats the goal of being dynamic..)
> We also have set the partitioner to random:  partitioner: 
> org.apache.cassandra.dht.RandomPartitioner
> I've outlined the steps we follow and results we are seeing below.
> Can someone please asdvise as to what we're missing here?
> 
> Here are the detailed steps we are taking:
> 1) Kill all cassandra instances and delete data & commit log files on each 
> node.
> 2) Startup Seed Node (S.S.S.S)
> -
> Starts up fine.
> 3) Run nodetool -h W.W.W.W  ring and see:
> -
> Address DC  RackStatus State   Load
> Effective-Ownership Token
> S.S.S.S datacenter1 rack1   Up Normal  28.37 GB
> 100.00% 24360745721352799263907128727168388463
> 
> 4) X.X.X.X Startup
> -
>  INFO [GossipStage:1] 2012-11-29 21:16:02,194 Gossiper.java (line 850) Node 
> /X.X.X.X is now part of the cluster
>  INFO [GossipStage:1] 2012-11-29 21:16:02,194 Gossiper.java (line 816) 
> InetAddress /X.X.X.X is now UP
>  INFO [GossipStage:1] 2012-11-29 21:16:02,195 StorageService.java (line 1138) 
> Nodes /X.X.X.X and /Y.Y.Y.Y have the same token 
> 113436792799830839333714191906879955254.  /X.X.X.X is the new owner
>  WARN [GossipStage:1] 2012-11-29 21:16:02,195 TokenMetadata.java (line 160) 
> Token 113436792799830839333714191906879955254 changing ownership from 
> /Y.Y.Y.Y to /X.X.X.X
> 5) Run nodetool -h W.W.W.W  ring and see:
> -
> Address DC  RackStatus State   Load
> Effective-Ownership Token
>   
>  113436792799830839333714191906879955254
> S.S.S.S datacenter1 rack1   Up Normal  28.37 GB
> 100.00% 24360745721352799263907128727168388463
> W.W.W.W datacenter1 rack1   Up Normal  123.87 KB   
> 100.00% 113436792799830839333714191906879955254
> 
> 6) Y.Y.Y.Y Startup
> -
>  INFO [GossipStage:1] 2012-11-29 21:17:36,458 Gossiper.java (line 850) Node 
> /Y.Y.Y.Y is now part of the cluster
>  INFO [GossipStage:1] 2012-11-29 21:17:36,459 Gossiper.java (line 816) 
> InetAddress /Y.Y.Y.Y is now UP
>  INFO [GossipStage:1] 2012-11-29 21:17:36,459 StorageService.java (line 1138) 
> Nodes /Y.Y.Y.Y and /X.X.X.X have the same token 
> 113436792799830839333714191906879955254.  /Y.Y.Y.Y is the new owner
>  WARN [GossipStage:1] 2012-11-29 21:17:36,459 TokenMetadata.java (line 160) 
> Token 113436792799830839333714191906879955254 changing ownership from 
> /X.X.X.X to /Y.Y.Y.Y
> 
> 7) Run nodetool -h W.W.W.W  ring and see:
> ---

Re: Help on MMap of SSTables

2012-12-04 Thread aaron morton
> Will MMapping data files be detrimental for reads, in this case?
No. 

> In general, when should we opt for MMap data files and what are the factors 
> that need special attention when enabling the same?
mmapping is the default, so I would say use it until you have a reason not to. 

mmapping will map the entire file, but pages of data are read into memory on 
demand and purged when space is needed. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 4/12/2012, at 11:59 PM, Ravikumar Govindarajan 
 wrote:

> Our current SSTable sizes are far greater than RAM. {150 Gigs of data, 32GB 
> RAM}. Currently we run with mlockall and mmap_index_only options and don't 
> experience swapping at all.
> 
> We use wide rows and size-tiered-compaction, so a given key will definitely 
> be spread across multiple sstables. Will MMapping data files be detrimental 
> for reads, in this case?
> 
> In general, when should we opt for MMap data files and what are the factors 
> that need special attention when enabling the same?
> 
> --
> Ravi



Re: Nodes not synced

2012-12-04 Thread aaron morton
> Node B stil not get any update/data from Node A. Do we need to execute any 
> command to sync both nodes?
Are you seeing the MutationStage completed tasks count change in nodetool 
tpstats ?

> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
> Address   DC  RackStatus State   Load
> Effective-Ownership Token   
>   
>  91902851206288351623775585543017122534  
> XX.XX.XX.XXA  0   0   Up Normal  264.98 GB   
> 0.00%   59394911263811417432307015371109991999  
> XX.XX.XX.XXB  0   0   Up Normal  67.34 KB
> 0.00%   91902851206288351623775585543017122534  
Something looks odd with the effective ownership here. 
Check the schema has the RF you think it does. 

You can also run nodetool repair to make sure the data is fully distributed. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/12/2012, at 2:26 AM, Adeel Akbar  wrote:

> Hi,
> 
> I have setup 2 nodes cluster with Replica factor 2. I have restored snapshot 
> of another cluster on Node A and restarted cassandra process. Node B stil not 
> get any update/data from Node A. Do we need to execute any command to sync 
> both nodes?
> 
> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
> Address   DC  RackStatus State   Load
> Effective-Ownership Token   
>   
>  91902851206288351623775585543017122534  
> XX.XX.XX.XXA  0   0   Up Normal  264.98 GB   
> 0.00%   59394911263811417432307015371109991999  
> XX.XX.XX.XXB  0   0   Up Normal  67.34 KB
> 0.00%   91902851206288351623775585543017122534
> -- 
> 
> Looking for your prompt response. 
> 
> Adeel
> 



Re: strange row cache behavior

2012-12-04 Thread aaron morton
>  Does this mean we should not enable row caches until we are absolutely sure 
> about what's hot (I think there is a reason why row caches are disabled by 
> default) ?
Yes and Yes. 
Row cache takes memory and CPU, unless you know you are getting a benefit from 
it leave it off. The key cache and os disk cache will help. If you find latency 
is an issue then start poking around.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/12/2012, at 4:23 AM, Yiming Sun  wrote:

> Hi Aaron,
> 
> Thank you,and your explanation makes sense.  At the time, I thought having 
> 1GB of row cache on each node was plenty enough, because there was an 
> aggregated 6GB cache, but you are right, with each row in 10's of MBs, some 
> of the nodes can go into a constant load and evict cycle and would have 
> negative effects on the performance.  I will try as you suggested to 1.) 
> reduce the requested entry set, and 2.) increase the row cache size and see 
> if they get better hits, and also do 3) by reversing the requested entry list 
> in alternate runs.
> 
> Our data space has close to 3 million rows, but we haven't gotten enough 
> usage statistics to know what rows are hot.  Does this mean we should not 
> enable row caches until we are absolutely sure about what's hot (I think 
> there is a reason why row caches are disabled by default) ?  It also seems 
> from my test that OS page cache works much better, but it could be that OS 
> page cache can utilize all the available memory so it is essentially larger 
> -- I guess I will find out by doing 2.) above.
> 
> best,
> 
> -- Y.
> 
> 
> 
> On Tue, Dec 4, 2012 at 4:47 AM, aaron morton  wrote:
> > Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes), 0 
> > hits, 2576 requests, NaN recent hit rate, 0 save period in seconds
> 
> So the cache is pretty much full, there is only 1 MB free.
> 
> There were 2,576 read requests that tried to get a row from the cache. Zero 
> of those had a hit. If you have 6 nodes and RF 2, each node has  one third of 
> the data in the cluster (from the effective ownership info). So depending on 
> the read workload the number of read requests on each node may be different.
> 
> What I think is happening is reads are populating the row cache, then 
> subsequent reads are evicting items from the row cache before you get back to 
> reading the original rows. So if you read rows 1 to 5, they are put in the 
> cache, when you read rows 6 to 10 they are put in and evict rows 1 to 5. Then 
> you read rows 1 to 5 again they are not in the cache.
> 
> Try testing with a lower number of hot rows, and/or a bigger row cache.
> 
> But to be honest, with rows in the 10's of MB you will probably only get good 
> cache performance with a small set of hot rows.
> 
> Hope that helps.
> 
> 
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 1/12/2012, at 5:11 AM, Yiming Sun  wrote:
> 
> > Does anyone have any comments/suggestions for me regarding this?  Thanks
> >
> >
> > I am trying to understand some strange behavior of cassandra row cache.  We 
> > have a 6-node Cassandra cluster in a single data center on 2 racks, and the 
> > neighboring nodes on the ring are from alternative racks.  Each node has 
> > 1GB row cache, with key cache disabled.   The cluster uses 
> > PropertyFileSnitch, and the ColumnFamily I fetch from uses 
> > NetworkTopologyStrategy, with replication factor of 2.  My client code uses 
> > Hector to fetch a fixed set of rows from cassandra
> >
> > What I don't quite understand is even after I ran the client code several 
> > times, there are always some nodes with 0 row cache hits, despite that the 
> > row cache from all nodes are filled and all nodes receive requests.
> >
> > Which nodes have 0 hits seem to be strongly related to the following:
> >
> >  - the set of row keys to fetch
> >  - the order of the set of row keys to fetch
> >  - the list of hosts passed to Hector's CassandraHostConfigurator
> >  - the order of the list of hosts passed to Hector
> >
> > Can someone shed some lights on how exactly the row cache works and 
> > hopefully also explain the behavior I have been seeing?  I thought if the 
> > fixed set of the rows keys are the only thing I am fetching (each row 
> > should be on the order of 10's of MBs, no more than 100MB), and each node 
> > gets requests, and its row cache is filled, there's gotta be some hits.  
> > Apparent this is not the case.   Thanks.
> >
> > cluster information:
> >
> > Address DC  RackStatus State   Load
> > Effective-Ownership Token
> > 
> >141784319550391026443072753096570088105
> > x.x.x.1DC1 r1  Up Normal  587.46 GB   33.33%
> >   0
> > x

Re: Data backup and restore

2012-12-04 Thread aaron morton
I wrote a script to sym link the snapshots together the other day 
https://github.com/amorton/cass_snapshot_link

I've not really used it in anger yet. That is to say I wrote it for fun and it 
worked on my mac book. If you use it let me know if it works. 

Cheers
A

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/12/2012, at 6:51 AM, Alain RODRIGUEZ  wrote:

> Hi Adeel,
> 
> I am not sure this is the best solution but we did it this way:
> 
> On one production server :
> - $cassandra-cli -f show_schema > schema (show_schema file contains "use 
> ; show schema;")
> - Then open the schema file and remove headers lines, your file must start by 
> "create keyspace..." 
> 
> On one dev server
> - Copy the schema file
> - $cassandra-cli -f schema
> 
> Now you have your keyspace in your dev env.
> 
> if you have 1 node dev cluster and RF=total number of node in production, 
> then simply snapshot whatever you want to restore (CF or all keyspace) then 
> copy these files in the CF directories of your test cluster and refresh the 
> new sstables.
> 
> if RF < total number of node in production then you need to take the sstable 
> from various nodes and take care not overriding files with the same name 
> while copying files.
> 
> I have some bash/parallel-ssh scripts to do this, but not on this computer.
> 
> I would be glad learning how other people do this.
> 
> Alain
> 
> 
> 2012/12/4 Yang 
> my guess (from what I learnt on this forum): you probably have to manually 
> create the schema on the new cluster. shutdown new cluster. overwrite the 
> column family files with your backup on all nodes in the new cluster, then 
> boot up.
> 
> 
> On Tue, Dec 4, 2012 at 8:19 AM, Adeel Akbar  
> wrote:
> Hi Tomas,
> 
> You are right and now my question is how I restore on test cluster. Do I need 
> to create column families and then copy snapshot on each directory?
> 
> 
> Thanks & Regards
> 
> Adeel Akbar
> 
> On 12/4/2012 9:09 PM, Tomas Nunez wrote:
>> Hi
>> 
>> I think he was talking about the "fragmentation" of the snapshot. In 
>> cassandra 1.0.X all ColumnFamilies are in the same directory, but in 
>> cassandra 1.1.X each ColumnFamily is in its own directory, and snapshots of 
>> each ColumnFamily are inside this directory.
>> 
>> 1.0.X Snapshot directory:
>> /cassandra/data//snapshots/
>> 
>> 1.1.X Snapshot directory
>> /cassandra/data///snapshots/
>> 
>> In 1.0.X you can restore a Keyspace backup by copying just one directory. In 
>> 1.1.X it seems you need to copy one directory for each ColumnFamily, which 
>> is a little more complicated.
>> 
>> 
>> 
>> 2012/12/1 Tyler Hobbs 
>> The nodetool snapshot command has keyspace and column family options (from 
>> nodetool --help):
>> 
>> snapshot [keyspaces...] -cf [columnfamilyName] -t [snapshotName] - Take a 
>> snapshot of the optionally specified column family of the specified 
>> keyspaces using optional name snapshotName
>> 
>> 
>> On Wed, Nov 28, 2012 at 5:40 AM, Adeel Akbar  
>> wrote:
>> Dear All, 
>> 
>> I have Cassandra 1.1.4 cluster with 2 nodes. I need to take backup and 
>> restore on staging for testing purpose. I have taken snapshot with below 
>> mentioned command but It created snapshot on every Keyspace's column family. 
>> Is there any other way to take backup and restore quick. 
>> 
>> /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost snapshot -t 
>> cassandra_bkup
>> 
>> Snapshot directory:
>> /var/log/cassandra/data//> 
>> -- 
>> 
>> Thanks & Regards
>> 
>> Adeel Akbar
>> 
>> 
>> 
>> 
>> -- 
>> Tyler Hobbs
>> DataStax
>> 
>> 
>> 
>> 
>> -- 
>> 
>> www.groupalia.com
>> Tomàs Núñez
>> IT-Sysprod
>> Tel. + 34 93 159 31 00 
>> Fax. + 34 93 396 18 52
>> Llull, 95-97, 2º planta, 08005 Barcelona
>> Skype: tomas.nunez.groupalia
>> tomas.nu...@groupalia.com
>>  Twitter Facebook> Attachment.png> Linkedin
> 
> 
> 



Re: Diagnosing memory issues

2012-12-04 Thread aaron morton
For background, a discussion on estimating working set 
http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html . You can 
also just look at the size of tenured heap after a CMS. 

Are you seeing lots of ParNew or CMS ?

GC activity is a result of configuration *and* workload. Look in your data 
model for wide rows, or long lived rows that get a lot of deletes, and look in 
your code for large reads / writes (e.g. sometimes we read 100,000 columns from 
a row).

> The number that really jumps out at me below is the number of Pending 
> requests for the Message Service.  24,000+ pending requests.
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,702 StatusLogger.java (line 89) 
> MessagingServicen/a24,229
Technically speaking that ain't right. 
The whole server looks unhappy. 

Are there any errors in the logs ? 
Are all the nodes up ? 

A very blunt approach is to reduce the in_memory_compaction_limit and the 
concurrent_compactors or compaction_throughput_mb_per_sec. This reduces the 
impact compaction and repair have on the system and may give you breathing 
space to look at other causes. Once you have a feel for what's going on you can 
turn them up. 

Hope that helps. 
A

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/12/2012, at 7:04 AM, Mike  wrote:

> Hello,
> 
> Our Cassandra cluster has, relatively recently, started experiencing memory 
> pressure that I am in the midsts of diagnosing.  Our system has uneven levels 
> of traffic, relatively light during the day, but extremely heavy during some 
> overnight processing.  We have started getting a message:
> 
> WARN [ScheduledTasks:1] 2012-12-04 09:08:58,579 GCInspector.java (line 145) 
> Heap is 0.7520105072262254 full.  You may need to reduce memtable and/or 
> cache sizes.  Cassandra will now flush up to the two largest memtables to 
> free up memory.  Adjust flush_largest_memtables_at threshold in 
> cassandra.yaml if you don't want Cassandra to do this automatically
> 
> I've started implementing some instrumentation to gather stats from JMX to 
> determine what is happening.  However, last night, the GCInspector was kind 
> enough to log the information below.  Couple of things jumped out at me.
> 
> The maximum heap for the Cassandra is 4GB.  We are running Cassandra 1.1.2, 
> on a 6 node cluster, with a replication factor of 3.  All our queries use 
> LOCAL_QUORUM consistency.
> 
> Adding up the caches + the memtable "data" in the trace below, comes to under 
> 600MB
> 
> The number that really jumps out at me below is the number of Pending 
> requests for the Message Service.  24,000+ pending requests.
> 
> Does this number represent the number of outstanding client requests that 
> this node is processing?  If so, does this mean we potentially have 24,000 
> responses being pulled into memory, thereby causing this memory issue?  What 
> else should I look at?
> 
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,585 StatusLogger.java (line 57) 
> Pool NameActive   Pending   Blocked
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,695 StatusLogger.java (line 72) 
> ReadStage3266 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,696 StatusLogger.java (line 72) 
> RequestResponseStage  0   193 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,696 StatusLogger.java (line 72) 
> ReadRepairStage   0 0 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,696 StatusLogger.java (line 72) 
> MutationStage 2 2 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,697 StatusLogger.java (line 72) 
> ReplicateOnWriteStage 5 5 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,698 StatusLogger.java (line 72) 
> GossipStage   013 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,698 StatusLogger.java (line 72) 
> AntiEntropyStage  0 0 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,698 StatusLogger.java (line 72) 
> MigrationStage0 0 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,699 StatusLogger.java (line 72) 
> StreamStage   0 0 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,699 StatusLogger.java (line 72) 
> MemtablePostFlusher   0 0 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,699 StatusLogger.java (line 72) 
> FlushWriter   0 0 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,700 StatusLogger.java (line 72) 
> MiscStage 0 0 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,700 StatusLogger.java (line 72) 
> commitlog_archiver0 0 0
> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,700 StatusLogger.java (line 72) 
> InternalResponseS

Re: strange row cache behavior

2012-12-04 Thread Yiming Sun
Got it.  Thanks again, Aaron.

-- Y.


On Tue, Dec 4, 2012 at 3:07 PM, aaron morton wrote:

>  Does this mean we should not enable row caches until we are absolutely
> sure about what's hot (I think there is a reason why row caches are
> disabled by default) ?
>
> Yes and Yes.
> Row cache takes memory and CPU, unless you know you are getting a benefit
> from it leave it off. The key cache and os disk cache will help. If you
> find latency is an issue then start poking around.
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/12/2012, at 4:23 AM, Yiming Sun  wrote:
>
> Hi Aaron,
>
> Thank you,and your explanation makes sense.  At the time, I thought having
> 1GB of row cache on each node was plenty enough, because there was an
> aggregated 6GB cache, but you are right, with each row in 10's of MBs, some
> of the nodes can go into a constant load and evict cycle and would have
> negative effects on the performance.  I will try as you suggested to 1.)
> reduce the requested entry set, and 2.) increase the row cache size and see
> if they get better hits, and also do 3) by reversing the requested entry
> list in alternate runs.
>
> Our data space has close to 3 million rows, but we haven't gotten enough
> usage statistics to know what rows are hot.  Does this mean we should not
> enable row caches until we are absolutely sure about what's hot (I think
> there is a reason why row caches are disabled by default) ?  It also seems
> from my test that OS page cache works much better, but it could be that OS
> page cache can utilize all the available memory so it is essentially larger
> -- I guess I will find out by doing 2.) above.
>
> best,
>
> -- Y.
>
>
>
> On Tue, Dec 4, 2012 at 4:47 AM, aaron morton wrote:
>
>> > Row Cache: size 1072651974 (bytes), capacity 1073741824
>> (bytes), 0 hits, 2576 requests, NaN recent hit rate, 0 save period in
>> seconds
>>
>> So the cache is pretty much full, there is only 1 MB free.
>>
>> There were 2,576 read requests that tried to get a row from the cache.
>> Zero of those had a hit. If you have 6 nodes and RF 2, each node has  one
>> third of the data in the cluster (from the effective ownership info). So
>> depending on the read workload the number of read requests on each node may
>> be different.
>>
>> What I think is happening is reads are populating the row cache, then
>> subsequent reads are evicting items from the row cache before you get back
>> to reading the original rows. So if you read rows 1 to 5, they are put in
>> the cache, when you read rows 6 to 10 they are put in and evict rows 1 to
>> 5. Then you read rows 1 to 5 again they are not in the cache.
>>
>> Try testing with a lower number of hot rows, and/or a bigger row cache.
>>
>> But to be honest, with rows in the 10's of MB you will probably only get
>> good cache performance with a small set of hot rows.
>>
>> Hope that helps.
>>
>>
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 1/12/2012, at 5:11 AM, Yiming Sun  wrote:
>>
>> > Does anyone have any comments/suggestions for me regarding this?  Thanks
>> >
>> >
>> > I am trying to understand some strange behavior of cassandra row cache.
>>  We have a 6-node Cassandra cluster in a single data center on 2 racks, and
>> the neighboring nodes on the ring are from alternative racks.  Each node
>> has 1GB row cache, with key cache disabled.   The cluster uses
>> PropertyFileSnitch, and the ColumnFamily I fetch from uses
>> NetworkTopologyStrategy, with replication factor of 2.  My client code uses
>> Hector to fetch a fixed set of rows from cassandra
>> >
>> > What I don't quite understand is even after I ran the client code
>> several times, there are always some nodes with 0 row cache hits, despite
>> that the row cache from all nodes are filled and all nodes receive requests.
>> >
>> > Which nodes have 0 hits seem to be strongly related to the following:
>> >
>> >  - the set of row keys to fetch
>> >  - the order of the set of row keys to fetch
>> >  - the list of hosts passed to Hector's CassandraHostConfigurator
>> >  - the order of the list of hosts passed to Hector
>> >
>> > Can someone shed some lights on how exactly the row cache works and
>> hopefully also explain the behavior I have been seeing?  I thought if the
>> fixed set of the rows keys are the only thing I am fetching (each row
>> should be on the order of 10's of MBs, no more than 100MB), and each node
>> gets requests, and its row cache is filled, there's gotta be some hits.
>>  Apparent this is not the case.   Thanks.
>> >
>> > cluster information:
>> >
>> > Address DC  RackStatus State   Load
>>  Effective-Ownership Token
>> >
>>141784319550391026443072753096570088105
>> > x.x.x.1DC1 r1  Up Normal  587.46 GB
>> 33.33%

Fwd: Loading SSTables failing via Cassandra SSTableLoader on mulit-node cluster.

2012-12-04 Thread Pradeep Kumar Mantha
Hi!

I am trying to load sstables generated onto a running multi-node
Cassandra cluster.  But I see problems only with multi-cluster and
single node works fine.

Cassandra version used is 1.1.2 .
The cassandra cluster seems to be active.

-bash-3.2$ nodetool -host 129.56.57.45 -p 7199 ring
Address DC  RackStatus State   Load
Effective-Ownership Token

13087783343113017825514407978144931209
129.56.57.45datacenter1 rack1   Up Normal  57.49 KB
92.31%  0
129.56.57.46datacenter1 rack1   Up Normal  50.6 KB
7.69%   13087783343113017825514407978144931209
-bash-3.2$


I tried sstableloader from cassandra node ( 129.56.57.45) annd other
outside machine. But I get the same error in both the cases.


Error:

-bash-3.2$ sstableloader -d 129.56.57.45 Blast/Blast_NR/
log4j:WARN No appenders could be found for logger
(org.apache.cassandra.io.sstable.SSTableReader).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
for more info.
Streaming revelant part of Blast/Blast_NR/Blast-Blast_NR-hd-1-Data.db
to [/129.56.57.46, /129.56.57.45]

progress: [/129.56.57.46 0/0 (100)] [/129.56.57.45 0/1 (0)] [total: 0
- 0MB/s (avg: 0MB/s)]Streaming session to /129.56.57.45 failed
Exception in thread "Streaming to /129.56.57.45:1"
java.lang.RuntimeException: java.net.ConnectException: Connection
timed out
at 
org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at java.net.Socket.connect(Socket.java:469)
at java.net.Socket.(Socket.java:366)Streaming session to
/129.56.57.46 failed

at java.net.Socket.(Socket.java:267)
at 
org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:96)
at 
org.apache.cassandra.streaming.FileStreamTask.connectAttempt(FileStreamTask.java:245)
at 
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 3 more
Exception in thread "Streaming to /129.56.57.46:1"
java.lang.RuntimeException: java.net.ConnectException: Connection
timed out
at 
org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at java.net.Socket.connect(Socket.java:469)
at java.net.Socket.(Socket.java:366)
at java.net.Socket.(Socket.java:267)
at 
org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:96)
at 
org.apache.cassandra.streaming.FileStreamTask.connectAttempt(FileStreamTask.java:245)
at 
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 3 more
progress: [/129.56.57.46 0/0 (100)] [/129.56.57.45 0/1 (0)] [total: 0
- 0MB/s (avg: 0MB/s)]Streaming to the following hosts failed:
[/129.56.57.45, /129.56.57.46]



Configuration on 129.56.57.45(cassandra.yaml):

rpc_address: 129.56.57.45
listen_address: 129.56.57.45
storage_port: 7000
rpc_port: 9160
seed_provider:
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring.  You must change this if you are running
# multiple no