date:20120920

Re: Losing keyspace on cassandra upgrade

2012-09-20 Thread Thomas Stets

On Wed, Sep 19, 2012 at 5:12 PM, Michael Kjellman
wrote:

> Sounds like you are loosing your system keyspace. When you say nothing
> important changed between yaml files do you mean with or without your
> changes?
>

I compared the 1.1.1 cassandra.yaml (with my changes) to the cassandra.yaml
distributed with 1.1.5. The only differences were my changes (hosts, ports
ad paths), and some comments.

>
> Did your data directories change in the migration? Permissions okay?
>

The data directory containing my keyspace has not changed. Directly after
startup cassandra began a compaction of its
system keyspace (something I saw in all cases), so that obviouly has
changes. Permissions are OK.

  Thomas

Re: Invalid Counter Shard errors?

2012-09-20 Thread Alain RODRIGUEZ

"I think that's inconsistent with the hypothesis that unclean shutdown is
the sole cause of these problems"

I agree, we just never shut down any node, neither had any crash, and yet
we have these bugs.

About your side note :

We know about it, but we couldn't find any other way to be able to provide
real-time analytics. If you do so, we would be really glad to hear about it.
We need both to serve statistics in real-time and be accurate about prices
and we need a coherence between what's shown in our graphics and tables and
the invoices we provide to our customers.
What we do is trying to avoid timeouts as much as possible (increasing the
time before a timeout and getting a the lowest CPU load possible). In order
to keep a low latency for the user we write first the events in a queue
message (Kestrel) and then we process it with storm, which writes the
events and increments counters in Cassandra.

Once again if you got a clue about a better way of doing this, we are
always happy to learn and try to enhance our architecture and our process.

Alain

2012/9/20 Peter Schuller 

> The significance I think is: If it is indeed the case that the higher
> value is always *in fact* correct, I think that's inconsistent with
> the hypothesis that unclean shutdown is the sole cause of these
> problems - as long as the client is truly submitting non-idempotent
> counter increments without a read-before-write.
>
> As a side note: If hou're using these counters for stuff like
> determining amounts of money to be payed by somebody, consider the
> non-idempotense of counter increments. Any write that increments a
> counter, that fails by e.g. Timeout *MAY OR MAY NOT* have been applied
> and cannot be safely retried. Cassandra counters are generally not
> useful if *strict* correctness is desired, for this reason.
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>

Re: CQL3 - collections

2012-09-20 Thread Sylvain Lebresne

I wrote an answer on the blog post
(http://www.datastax.com/dev/blog/cql3_collections#comment-127093).

--
Sylvain

On Thu, Sep 20, 2012 at 7:13 AM, Roshni Rajagopal
 wrote:
> Hi,
>
>  CQL3, has collections support as described in this link
> http://www.datastax.com/dev/blog/cql3_collections
>
> So looks like when you have a column called email and you could store 1
> value called a...@xyz.com, now you can store a list.
> would it be possible to store a name value pair, like column name is Item Id
> and value is a collection with 2 elements in each row like {{'name' ,
> 'apple'}, {'Descr' , 'Granny Smith'}, {'Qty','3'} }
>
> or would we need to have a column named Item Id and value is {'apple',
> 'granny smith', '3'}. and the application needs to figure out that 1st
> element is name, second is descr etc
>
> Also can you store different data types in each row of the list or do all
> elements need to be of the same type?
>
> Im guessing we cant do both- but can anyone confirm.
>
>
>
>
> Regards,
> Roshni

Re: Data Modeling - JSON vs Composite columns

2012-09-20 Thread Sylvain Lebresne

On Wed, Sep 19, 2012 at 2:00 PM, Roshni Rajagopal
 wrote:
> Hi,
>
> There was a conversation on this some time earlier, and to continue it
>
> Suppose I want to associate a user to  an item, and I want to also store 3
> commonly used attributes without needing to go to an entity item column
> family , I have 2 options :-
>
> A) use composite columns
> UserId1 : {
>  : = Betty Crocker,
>  : = Cake
> : = 5
>  : = Nutella,
>  : = Choc spread
> : = 15
> }
>
> B) use a json with the data
> UserId1 : {
>   = {name: Betty Crocker,descr: Cake, Qty: 5},
>   ={name: Nutella,descr: Choc spread, Qty: 15}
> }

> How does approach B work in CQL

The "way" CQL3 (the precision is because basically CQL2 doesn't have any good
solution for this problem) handle this is basically by using composite columns,
i.e. it uses solution A). However, the whole point is that the syntax is, we
think, much more friendly that if you were to use composites yourself in say
thrift.

More concretely, you'd handle your 'shopping cart' example with the following
table:
  CREATE TABLE shopping_cart (
  userId uuid,
  itemId uuid,
  name text,
  descr text,
  qty int,
  PRIMARY KEY (userId, itemId)
  )

The way this will be layout internally is pretty much exactly the layout you've
described for A).

> Can we read/write a JSON easily in CQL?

There is no specific support of JSON by CQL (or any of the Cassandra data model
for that matter). You can insert JSON string obviously, but Cassandra won't use
it in any way (and so you'd have to read the string and extract whatever
field you're interested client side). Obviously, using the "CQL" way described
above is the preferred way.

--
Sylvain

Re: Invalid Counter Shard errors?

2012-09-20 Thread Alain RODRIGUEZ

Oh, i just saw your first mail.

"I don't see a negative number in you paste?"

(03a227f0-a5c3-11e1--b7f5e49dceff, 1, -1) and
(03a227f0-a5c3-11e1--b7f5e49dceff,
1, 1)
(03a227f0-a5c3-11e1--b7f5e49dceff, 4, -5000) and
(03a227f0-a5c3-11e1--b7f5e49dceff, 4, 2)
(03a227f0-a5c3-11e1--b7f5e49dceff, 19, -3) and
(03a227f0-a5c3-11e1--b7f5e49dceff,
19, 19)

The counts on the left parentheses are negative values and we
never decrements counters.

Thanks for your explanations.

Alain

2012/9/20 Alain RODRIGUEZ 

> "I think that's inconsistent with the hypothesis that unclean shutdown is
> the sole cause of these problems"
>
> I agree, we just never shut down any node, neither had any crash, and yet
> we have these bugs.
>
> About your side note :
>
> We know about it, but we couldn't find any other way to be able to provide
> real-time analytics. If you do so, we would be really glad to hear about it.
>  We need both to serve statistics in real-time and be accurate about
> prices and we need a coherence between what's shown in our graphics and
> tables and the invoices we provide to our customers.
> What we do is trying to avoid timeouts as much as possible (increasing the
> time before a timeout and getting a the lowest CPU load possible). In order
> to keep a low latency for the user we write first the events in a queue
> message (Kestrel) and then we process it with storm, which writes the
> events and increments counters in Cassandra.
>
> Once again if you got a clue about a better way of doing this, we are
> always happy to learn and try to enhance our architecture and our process.
>
> Alain
>
>
> 2012/9/20 Peter Schuller 
>
>> The significance I think is: If it is indeed the case that the higher
>> value is always *in fact* correct, I think that's inconsistent with
>> the hypothesis that unclean shutdown is the sole cause of these
>> problems - as long as the client is truly submitting non-idempotent
>> counter increments without a read-before-write.
>>
>> As a side note: If hou're using these counters for stuff like
>> determining amounts of money to be payed by somebody, consider the
>> non-idempotense of counter increments. Any write that increments a
>> counter, that fails by e.g. Timeout *MAY OR MAY NOT* have been applied
>> and cannot be safely retried. Cassandra counters are generally not
>> useful if *strict* correctness is desired, for this reason.
>>
>> --
>> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>>
>
>

Re: Data Modeling - JSON vs Composite columns

2012-09-20 Thread Sylvain Lebresne

On Wed, Sep 19, 2012 at 3:32 PM, Brian O'Neill  wrote:
> That said, I'm keeping a close watch on:
> https://issues.apache.org/jira/browse/CASSANDRA-3647
>
> But if this is CQL only, I'm not sure how much use it will be for us
> since we're coming in from different clients.
> Anyone know how/if collections will be available from other clients?

It is CQL only. Internally collections are implemented using multiple
columns but there is nothing specific in the storage engine for them,
so a high level thrift client could do what CQL do.

--
Sylvain

sometimes get timeout while batch inserting. (using pycassa)

2012-09-20 Thread Yan Chunlu

I am testing the performance of 1 cassandra node on a production server.  I
wrote a script to insert 1 million items into cassandra. the data is like
below:

*prefix = "benchmark_"*
*dct = {}*
*for i in range(0,100):*
*key = "%s%d" % (prefix,i)*
*dct[key] = "abc"*200*

and the inserting code is like this:
*
*
*cf.batch(write_consistency_level = CL_ONEl):*
*cf.insert('%s%s' % (prefix, key),*
*  {'value': pickle.dumps(val)},
*
*  ttl = None)*


sometimes I get timeout error (detail here:https://gist.github.com/3754965)
 while it's executing. sometime it runs okay.

while the script and cassandra run smoothly on my macbook(for many times),
the configuration of my mac is " 2.4 GHz Intel Core 2 Duo", 8GB memory, SSD
disk though.

really have no idea why is this...

the reason I am do this test is that on the other production server, my 3
nodes cluster also give the pycassa client "timeout" error. make the system
unstable. but I am not sure what the problem is. is it the bug of python
library?
thanks for any further help!

the test script is running on server A and cassandra is running on server
B.
the CPU of B is : "Intel(R) Xeon(R) CPU X3470  @ 2.93GHz Quadcore"

the sys stats on B is normal:

*vmstat 2*
procs ---memory-- ---swap-- -io -system--
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
wa
 1  0 3643716 134876 191720 235262411 14400 22  3
74  0
 1  0 3643716 132016 191728 235518000 0   288 4701 16764  9  4
87  0
 0  0 3643716 129700 191736 235799600 0  5772 3775 17139  9  4
87  0
 0  0 3643716 127468 191744 2360420   32032   404 4490 17487 11  3
85  0
*
*
*iostat -x 2*

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00   230.001.00   15.00 6.00   980.00   123.25
0.032.008.001.60   1.12   1.80
sdb   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  11.521.211.990.480.00   84.80

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   7.00   184.00   12.50   12.0078.00   784.0070.37
0.114.658.320.83   1.88   4.60
sdb   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00

*free -t*
 total   used   free sharedbuffers cached
Mem:  16467952   16378592  89360  0 1520322452216
-/+ buffers/cache:   137743442693608
Swap:  728743636437163643720
Total:23755388   200223083733080

*uptime*
 04:52:57 up 422 days, 19:59,  1 user,  load average: 2.71, 2.09, 1.48

Re: sometimes get timeout while batch inserting. (using pycassa)

2012-09-20 Thread Yan Chunlu

forgot to mention the rpc configuration in cassandra.yaml is:

rpc_timeout_in_ms: 2

and the cassandra version on production server is: 1.1.3

the cassandra version I am using on my macbook is:  1.0.10

On Thu, Sep 20, 2012 at 6:07 PM, Yan Chunlu  wrote:

> I am testing the performance of 1 cassandra node on a production server.
>  I wrote a script to insert 1 million items into cassandra. the data is
> like below:
>
> *prefix = "benchmark_"*
> *dct = {}*
> *for i in range(0,100):*
> *key = "%s%d" % (prefix,i)*
> *dct[key] = "abc"*200*
>
> and the inserting code is like this:
> *
> *
> *cf.batch(write_consistency_level = CL_ONEl):*
> *cf.insert('%s%s' % (prefix, key),*
> *  {'value':
> pickle.dumps(val)},*
> *  ttl = None)*
>
>
> sometimes I get timeout error (detail here:https://gist.github.com/3754965)
>  while it's executing. sometime it runs okay.
>
> while the script and cassandra run smoothly on my macbook(for many times),
> the configuration of my mac is " 2.4 GHz Intel Core 2 Duo", 8GB memory,
> SSD disk though.
>
> really have no idea why is this...
>
> the reason I am do this test is that on the other production server, my 3
> nodes cluster also give the pycassa client "timeout" error. make the system
> unstable. but I am not sure what the problem is. is it the bug of python
> library?
> thanks for any further help!
>
> the test script is running on server A and cassandra is running on server
> B.
> the CPU of B is : "Intel(R) Xeon(R) CPU X3470  @ 2.93GHz Quadcore"
>
> the sys stats on B is normal:
>
> *vmstat 2*
> procs ---memory-- ---swap-- -io -system--
> cpu
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
> wa
>  1  0 3643716 134876 191720 235262411 14400 22  3
> 74  0
>  1  0 3643716 132016 191728 235518000 0   288 4701 16764  9  4
> 87  0
>  0  0 3643716 129700 191736 235799600 0  5772 3775 17139  9  4
> 87  0
>  0  0 3643716 127468 191744 2360420   32032   404 4490 17487 11  3
> 85  0
> *
> *
> *iostat -x 2*
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda   0.00   230.001.00   15.00 6.00   980.00   123.25
> 0.032.008.001.60   1.12   1.80
> sdb   0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>   11.521.211.990.480.00   84.80
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda   7.00   184.00   12.50   12.0078.00   784.0070.37
> 0.114.658.320.83   1.88   4.60
> sdb   0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
>
> *free -t*
>  total   used   free sharedbuffers cached
> Mem:  16467952   16378592  89360  0 1520322452216
> -/+ buffers/cache:   137743442693608
> Swap:  728743636437163643720
> Total:23755388   200223083733080
>
> *uptime*
>  04:52:57 up 422 days, 19:59,  1 user,  load average: 2.71, 2.09, 1.48
>
>
>

[problem with OOM in nodes]

2012-09-20 Thread Denis Gabaydulin

Hi, all!

We have a cluster with virtual 7 nodes (disk storage is connected to
nodes with iSCSI). The storage schema is:

Reports:{
1:{
1:{"value1":"some val", "value2":"some val"},
2:{"value1":"some val", "value2":"some val"}
...
},
2:{
1:{"value1":"some val", "value2":"some val"},
2:{"value1":"some val", "value2":"some val"}
...
}
...
}

create keyspace osmp_reports
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 4}
  and durable_writes = true;

use osmp_reports;

create column family QueryReportResult
  with column_type = 'Super'
  and comparator = 'BytesType'
  and subcomparator = 'BytesType'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'BytesType'
  and read_repair_chance = 1.0
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 432000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'KEYS_ONLY';

=

Read/Write CL: 2

Most of the reports are small, but some of them could have a half
mullion of rows (xml). Typical operations on this dataset is:

count report rows by report_id (top level id of super column);
get columns (report_rows) by range predicate and limit for given report_id.

A data is written once and hasn't never been updated.

So, time to time a couple of nodes crashes with OOM exception. Heap
dump says, that we have a lot of super columns in memory.
For example, I see one of the reports is in memory entirely. How it
could be possible? If we don't load the whole report, cassandra could
whether do this for some internal reasons?

What should we do to avoid OOMs?

Re: Composite Column Types Storage

2012-09-20 Thread Ravikumar Govindarajan

As I understand from the link below, burning column index-info onto the
sstable index files will not only eliminate sstables but also reduce disk
seeks from 3 to 2 for wide rows.

Our index files are always mmapped, so there is only one random seek for a
named column query. I think that is a wonderful improvement

Shouldn't we be wary of the spike in heap usage by promoting column indexes
to index file?

It should be nice to have say 128th entry written out to disk, while load
every 512th index in memory during start-up, just as a balancing factor?

--
Ravi

On Tue, Sep 18, 2012 at 4:47 PM, Sylvain Lebresne wrote:

> > Range queries do not use bloom filters. It holds good for
> composite-columns
> > also right?
>
> Since I assume you are referring to column's bloom filters (key's bloom
> filters
> are always used) then yes, that holds good for composite columns.
> Currently,
> composite column name are completely opaque to the storage engine.
>
> >  alone could have gone into the bloom-filter, speeding up
> my
> > queries really effectively
>
> True, though https://issues.apache.org/jira/browse/CASSANDRA-2319 (in 1.2
> only
> however) should help quite a lot here. Basically it will allow to skip the
> sstable based on the column index. Granted, this is less fined grained
> than a
> bloom filter (though on the other side there is no false positive), but I
> suspect that in most real life workload it won't be too much worse.
>
> --
> Sylvain
>

Re: Losing keyspace on cassandra upgrade

2012-09-20 Thread Thomas Stets

A follow-up:

Currently I'm back on version 1.1.1.

I tried - unsuccessfully - the following things:

1. Create the missing keyspace on the 1.1.5 node, then copy the files back
into the data directory.
This failed, since the keyspace was already known on the other node in the
cluster.

2. shut down the 1.1.1 node, that still has the keyspace. Then create the
keyspace on the 1.1.5 node.
This failes since the node could not distribute the information through the
cluster.

3. Restore the system keyspace from the snapshot I made before the upgrade.
The restore seemed to work, but the node behaved just like after the
update: it just forgot my keyspace.

Right now I'm at a loss on how to proceed. Any ideas? I'm pretty sure I can
reproduce the problem,
so if anyone has an idea on what to try, or where to look, I can do some
tests (within limits)

On Wed, Sep 19, 2012 at 4:43 PM, Thomas Stets wrote:

> I consistently keep losing my keyspace on upgrading from cassandra 1.1.1
> to 1.1.5
>
> I have the same cassandra keyspace on all our staging systems:
>
> development:  a 3-node cluster
> integration: a 3-node cluster
> QS: a 2-node cluster
> (productive will be a 4-node cluster, which is as yet not active)
>
> All clusters were running cassandra 1.1.1. Before going productive I
> wanted to upgrade to the
> latest productive version of cassandra.
>
> In all cases my keyspace disappeared when I started the cluster with
> cassandra 1.1.5.
> On the development system I didn't realize at first what was happening. I
> just wondered that nodetool
> showed a very low amount of data. On integration I saw the problem
> quickly, but could not recover the
> data. I re-installed the cassandra cluster from scratch, and populated it
> with our test data, so our
> developers could work.
>
 ...

>
>
>   TIA, Thomas
>

Re: [problem with OOM in nodes]

2012-09-20 Thread Denis Gabaydulin

p.s. Cassandra 1.1.4

On Thu, Sep 20, 2012 at 3:27 PM, Denis Gabaydulin  wrote:
> Hi, all!
>
> We have a cluster with virtual 7 nodes (disk storage is connected to
> nodes with iSCSI). The storage schema is:
>
> Reports:{
> 1:{
> 1:{"value1":"some val", "value2":"some val"},
> 2:{"value1":"some val", "value2":"some val"}
> ...
> },
> 2:{
> 1:{"value1":"some val", "value2":"some val"},
> 2:{"value1":"some val", "value2":"some val"}
> ...
> }
> ...
> }
>
> create keyspace osmp_reports
>   with placement_strategy = 'SimpleStrategy'
>   and strategy_options = {replication_factor : 4}
>   and durable_writes = true;
>
> use osmp_reports;
>
> create column family QueryReportResult
>   with column_type = 'Super'
>   and comparator = 'BytesType'
>   and subcomparator = 'BytesType'
>   and default_validation_class = 'BytesType'
>   and key_validation_class = 'BytesType'
>   and read_repair_chance = 1.0
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 432000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy =
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY';
>
> =
>
> Read/Write CL: 2
>
> Most of the reports are small, but some of them could have a half
> mullion of rows (xml). Typical operations on this dataset is:
>
> count report rows by report_id (top level id of super column);
> get columns (report_rows) by range predicate and limit for given report_id.
>
> A data is written once and hasn't never been updated.
>
> So, time to time a couple of nodes crashes with OOM exception. Heap
> dump says, that we have a lot of super columns in memory.
> For example, I see one of the reports is in memory entirely. How it
> could be possible? If we don't load the whole report, cassandra could
> whether do this for some internal reasons?
>
> What should we do to avoid OOMs?

OOM when applying migrations

2012-09-20 Thread Vanger


Hello,
We are trying to add new nodes to our *6-node* cassandra cluster with 
RF=3 cassandra version 1.0.11. We are *adding 18 new nodes* one-by-one.


First strange thing, I've noticed, is the number of completed 
MigrationStage in nodetool tpstats grows for every new node, while 
schema is not changed. For now with 21-nodes ring, for final join it 
shows 184683 migrations, while with 7-nodes it was about 50k migrations.
In fact it seems that this number is not a number of applied migrations. 
When i grep log file with

grep "Applying migration" /var/log/cassandra/system.log -c
For each new node result is pretty much the same - around 7500 "Applying 
migration" found in log.


And the real problem is that now new nodes fail with Out Of Memory while 
building schema from migrations. In logs we can find the following:


WARN [ScheduledTasks:1] 2012-09-19 18:51:22,497 GCInspector.java (line 
145) Heap is 0.7712290960125684 full.  You may need to reduce memtable 
and/or cache sizes. Cassandra will now flush up to the two largest 
memtables to free up memory.  Adjust flush_largest_memtables_at 
threshold in cassandra.yaml if you don't want Cassandra to do this 
automatically
 INFO [ScheduledTasks:1] 2012-09-19 18:51:22,498 StorageService.java 
(line 2658) Unable to reduce heap usage since there are no dirty column 
families


 WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 GCInspector.java (line 
139) Heap is 0.853078131310858 full. You may need to reduce memtable 
and/or cache sizes. Cassandra is now reducing cache sizes to free up 
memory. Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java 
(line 187) Reducing AppUser RowCache capacity from 10 to 0 to reduce 
memory pressure
 WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java 
(line 187) Reducing AppUser KeyCache capacity from 10 to 0 to reduce 
memory pressure
 WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java 
(line 187) Reducing PaymentClaim KeyCache capacity from 5 to 0 to 
reduce memory pressure
 WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java 
(line 187) Reducing Organization RowCache capacity from 1000 to 0 to 
reduce memory pressure

 .
 INFO [main] 2012-09-19 18:57:14,181 StorageService.java (line 668) 
JOINING: waiting for schema information to complete
ERROR [Thread-28] 2012-09-19 18:57:14,198 AbstractCassandraDaemon.java 
(line 139) Fatal exception in thread Thread[Thread-28,5,main]

java.lang.OutOfMemoryError: Java heap space
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:140)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:115)

...
ERROR [ReadStage:353] 2012-09-19 18:57:20,453 
AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
Thread[ReadStage:353,5,main]

java.lang.OutOfMemoryError: Java heap space
at 
org.apache.cassandra.service.MigrationManager.makeColumns(MigrationManager.java:256)
at 
org.apache.cassandra.db.DefinitionsUpdateVerbHandler.doVerb(DefinitionsUpdateVerbHandler.java:51)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)



Originally "max heap size" was set to 6G. Then we increased heap size 
limit to 8G and it works. But warnings still present


 WARN [ScheduledTasks:1] 2012-09-20 11:39:11,373 GCInspector.java (line 
145) Heap is 0.7760745735786222 full. You may need to reduce memtable 
and/or cache sizes.  Cassandra will now flush up to the two largest 
memtables to free up memory.  Adjust flush_largest_memtables_at 
threshold in cassandra.yaml if you don't want Cassandra to do this 
automatically
 INFO [ScheduledTasks:1] 2012-09-20 11:39:11,374 StorageService.java 
(line 2658) Unable to reduce heap usage since there are no dirty column 
families


It is probably a bug in applying migrations.
Could anyone explain why cassandra behaves this way? Could you please 
recommend us smth to cope with this situation?

Thank you in advance.

--
W/ best regards,
Sergey B.

Re: OOM when applying migrations

2012-09-20 Thread Jason Wee

Hi, when the heap is going more than 70% usage, you should be able to see
in the log, many flushing, or reducing the row cache size down. Did you
restart the cassandra daemon in the node that thrown OOM?

On Thu, Sep 20, 2012 at 9:11 PM, Vanger  wrote:

>  Hello,
> We are trying to add new nodes to our *6-node* cassandra cluster with
> RF=3 cassandra version 1.0.11. We are *adding 18 new nodes* one-by-one.
>
> First strange thing, I've noticed, is the number of completed
> MigrationStage in nodetool tpstats grows for every new node, while schema
> is not changed. For now with 21-nodes ring, for final join it shows 184683
> migrations, while with 7-nodes it was about 50k migrations.
> In fact it seems that this number is not a number of applied migrations.
> When i grep log file with
> grep "Applying migration" /var/log/cassandra/system.log -c
> For each new node result is pretty much the same - around 7500 "Applying
> migration" found in log.
>
> And the real problem is that now new nodes fail with Out Of Memory while
> building schema from migrations. In logs we can find the following:
>
> WARN [ScheduledTasks:1] 2012-09-19 18:51:22,497 GCInspector.java (line
> 145) Heap is 0.7712290960125684 full.  You may need to reduce memtable
> and/or cache sizes.  Cassandra will now flush up to the two largest
> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
> in cassandra.yaml if you don't want Cassandra to do this automatically
>  INFO [ScheduledTasks:1] 2012-09-19 18:51:22,498 StorageService.java (line
> 2658) Unable to reduce heap usage since there are no dirty column families
> 
>  WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 GCInspector.java (line
> 139) Heap is 0.853078131310858 full.  You may need to reduce memtable
> and/or cache sizes.  Cassandra is now reducing cache sizes to free up
> memory.  Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you
> don't want Cassandra to do this automatically
>  WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java
> (line 187) Reducing AppUser RowCache capacity from 10 to 0 to reduce
> memory pressure
>  WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java
> (line 187) Reducing AppUser KeyCache capacity from 10 to 0 to reduce
> memory pressure
>  WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java
> (line 187) Reducing PaymentClaim KeyCache capacity from 5 to 0 to
> reduce memory pressure
>  WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java
> (line 187) Reducing Organization RowCache capacity from 1000 to 0 to reduce
> memory pressure
>  .
>  INFO [main] 2012-09-19 18:57:14,181 StorageService.java (line 668)
> JOINING: waiting for schema information to complete
> ERROR [Thread-28] 2012-09-19 18:57:14,198 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[Thread-28,5,main]
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:140)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:115)
> ...
> ERROR [ReadStage:353] 2012-09-19 18:57:20,453 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[ReadStage:353,5,main]
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.cassandra.service.MigrationManager.makeColumns(MigrationManager.java:256)
> at
> org.apache.cassandra.db.DefinitionsUpdateVerbHandler.doVerb(DefinitionsUpdateVerbHandler.java:51)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>
>
> Originally "max heap size" was set to 6G. Then we increased heap size
> limit to 8G and it works. But warnings still present
>
>  WARN [ScheduledTasks:1] 2012-09-20 11:39:11,373 GCInspector.java (line
> 145) Heap is 0.7760745735786222 full.  You may need to reduce memtable
> and/or cache sizes.  Cassandra will now flush up to the two largest
> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
> in cassandra.yaml if you don't want Cassandra to do this automatically
>  INFO [ScheduledTasks:1] 2012-09-20 11:39:11,374 StorageService.java (line
> 2658) Unable to reduce heap usage since there are no dirty column families
>
> It is probably a bug in applying migrations.
> Could anyone explain why cassandra behaves this way? Could you please
> recommend us smth to cope with this situation?
> Thank you in advance.
>
> --
> W/ best regards,
> Sergey B.
>
>

Re: Composite Column Types Storage

2012-09-20 Thread Sylvain Lebresne

> As I understand from the link below, burning column index-info onto the
> sstable index files will not only eliminate sstables but also reduce disk
> seeks from 3 to 2 for wide rows.

Yes.

> Shouldn't we be wary of the spike in heap usage by promoting column indexes
> to index file?

If you're talking about the index files getting bigger, that's not
really a problem per-se, mmapped files are not part of the heap and
it's all dealt by the file system.
Now it's true that the column index is also promoted in the index
summary that is loaded in memory. However, how much is loaded in this
summary is still configurable, so overall that shouldn't be a problem
either (fyi,  https://issues.apache.org/jira/browse/CASSANDRA-4478 is
relevant to that discussion too).

--
Sylvain

Re: sometimes get timeout while batch inserting. (using pycassa)

2012-09-20 Thread Tyler Hobbs

That's showing a client-side socket timeout.  By default, the timeout for
pycassa connections is fairly low, at 0.5 seconds. With the default batch
insert size of 100 rows, you're probably hitting this timeout
occasionally.  I suggest lowering the batch size and using multiple threads
for the highest write throughput, but you could also just increase the
timeout on the ConnectionPool if you don't care that much.

P.S.: There is a pycassa-specific mailing list:
https://groups.google.com/forum/?fromgroups#!forum/pycassa-discuss

On Thu, Sep 20, 2012 at 5:14 AM, Yan Chunlu  wrote:

> forgot to mention the rpc configuration in cassandra.yaml is:
>
> rpc_timeout_in_ms: 2
>
> and the cassandra version on production server is: 1.1.3
>
> the cassandra version I am using on my macbook is:  1.0.10
>
>
> On Thu, Sep 20, 2012 at 6:07 PM, Yan Chunlu  wrote:
>
>> I am testing the performance of 1 cassandra node on a production server.
>>  I wrote a script to insert 1 million items into cassandra. the data is
>> like below:
>>
>> *prefix = "benchmark_"*
>> *dct = {}*
>> *for i in range(0,100):*
>> *key = "%s%d" % (prefix,i)*
>> *dct[key] = "abc"*200*
>>
>> and the inserting code is like this:
>> *
>> *
>> *cf.batch(write_consistency_level = CL_ONEl):*
>> *cf.insert('%s%s' % (prefix, key),*
>> *  {'value':
>> pickle.dumps(val)},*
>> *  ttl = None)*
>>
>>
>> sometimes I get timeout error (detail here:
>> https://gist.github.com/3754965)  while it's executing. sometime it runs
>> okay.
>>
>> while the script and cassandra run smoothly on my macbook(for many
>> times), the configuration of my mac is " 2.4 GHz Intel Core 2 Duo", 8GB
>> memory, SSD disk though.
>>
>> really have no idea why is this...
>>
>> the reason I am do this test is that on the other production server, my 3
>> nodes cluster also give the pycassa client "timeout" error. make the system
>> unstable. but I am not sure what the problem is. is it the bug of python
>> library?
>> thanks for any further help!
>>
>> the test script is running on server A and cassandra is running on server
>> B.
>> the CPU of B is : "Intel(R) Xeon(R) CPU X3470  @ 2.93GHz Quadcore"
>>
>> the sys stats on B is normal:
>>
>> *vmstat 2*
>> procs ---memory-- ---swap-- -io -system--
>> cpu
>>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
>> id wa
>>  1  0 3643716 134876 191720 235262411 14400 22  3
>> 74  0
>>  1  0 3643716 132016 191728 235518000 0   288 4701 16764  9
>>  4 87  0
>>  0  0 3643716 129700 191736 235799600 0  5772 3775 17139  9
>>  4 87  0
>>  0  0 3643716 127468 191744 2360420   32032   404 4490 17487 11
>>  3 85  0
>> *
>> *
>> *iostat -x 2*
>>
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda   0.00   230.001.00   15.00 6.00   980.00
>> 123.25 0.032.008.001.60   1.12   1.80
>> sdb   0.00 0.000.000.00 0.00 0.00
>> 0.00 0.000.000.000.00   0.00   0.00
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>   11.521.211.990.480.00   84.80
>>
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda   7.00   184.00   12.50   12.0078.00   784.00
>>  70.37 0.114.658.320.83   1.88   4.60
>> sdb   0.00 0.000.000.00 0.00 0.00
>> 0.00 0.000.000.000.00   0.00   0.00
>>
>> *free -t*
>>  total   used   free sharedbuffers cached
>> Mem:  16467952   16378592  89360  0 1520322452216
>> -/+ buffers/cache:   137743442693608
>> Swap:  728743636437163643720
>> Total:23755388   200223083733080
>>
>> *uptime*
>>  04:52:57 up 422 days, 19:59,  1 user,  load average: 2.71, 2.09, 1.48
>>
>>
>>
>


-- 
Tyler Hobbs
DataStax

any ways to have compaction use less disk space?

2012-09-20 Thread Hiller, Dean

While diskspace is cheap, nodes are not that cheap, and usually systems have a 
1T limit on each node which means we would love to really not add more nodes 
until we hit 70% disk space instead of the normal 50% that we have read about 
due to compaction.

Is there any way to use less disk space during compactions?
Is there any work being done so that compactions take less space in the future 
meaning we can buy less nodes?

Thanks,
Dean

Re: any ways to have compaction use less disk space?

2012-09-20 Thread Aaron Turner

1. Use compression

2. Used Leveled Compaction

Also, 1TB/node is a lot larger then the normal recommendation...
generally speaking more in the 300-400GB range.

On Thu, Sep 20, 2012 at 8:10 PM, Hiller, Dean  wrote:
> While diskspace is cheap, nodes are not that cheap, and usually systems have 
> a 1T limit on each node which means we would love to really not add more 
> nodes until we hit 70% disk space instead of the normal 50% that we have read 
> about due to compaction.
>
> Is there any way to use less disk space during compactions?
> Is there any work being done so that compactions take less space in the 
> future meaning we can buy less nodes?
>
> Thanks,
> Dean



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
"carpe diem quam minimum credula postero"

Using the commit log for external synchronization

2012-09-20 Thread Ben Hood

Hi,

I'd like to incrementally synchronize data written to Cassandra into
an external store without having to maintain an index to do this, so I
was wondering whether anybody is using the commit log to establish
what updates have taken place since a given point in time?

Cheers,

Ben

Code example for CompositeType.Builder and SSTableSimpleUnsortedWriter

2012-09-20 Thread Edward Kibardin

Hi Everyone,

I'm writing a conversion tool from CSV files to SSTable
using SSTableSimpleUnsortedWriter and unable to find a good example of
using CompositeType.Builder with SSTableSimpleUnsortedWriter.
It also will be great if someone had an sample code for insert/update only
a single value in composites (if it possible in general).

Quick Google search didn't help me, so I'll be very appreciated for the
correct sample ;)

Thanks in advance,
Ed

Re: OOM when applying migrations

2012-09-20 Thread Tyler Hobbs

This should explain the schema issue in 1.0 that has been fixed in 1.1:
http://www.datastax.com/dev/blog/the-schema-management-renaissance

On Thu, Sep 20, 2012 at 10:17 AM, Jason Wee  wrote:

> Hi, when the heap is going more than 70% usage, you should be able to see
> in the log, many flushing, or reducing the row cache size down. Did you
> restart the cassandra daemon in the node that thrown OOM?
>
>
> On Thu, Sep 20, 2012 at 9:11 PM, Vanger  wrote:
>
>>  Hello,
>> We are trying to add new nodes to our *6-node* cassandra cluster with
>> RF=3 cassandra version 1.0.11. We are *adding 18 new nodes* one-by-one.
>>
>> First strange thing, I've noticed, is the number of completed
>> MigrationStage in nodetool tpstats grows for every new node, while schema
>> is not changed. For now with 21-nodes ring, for final join it shows 184683
>> migrations, while with 7-nodes it was about 50k migrations.
>> In fact it seems that this number is not a number of applied migrations.
>> When i grep log file with
>> grep "Applying migration" /var/log/cassandra/system.log -c
>> For each new node result is pretty much the same - around 7500 "Applying
>> migration" found in log.
>>
>> And the real problem is that now new nodes fail with Out Of Memory while
>> building schema from migrations. In logs we can find the following:
>>
>> WARN [ScheduledTasks:1] 2012-09-19 18:51:22,497 GCInspector.java (line
>> 145) Heap is 0.7712290960125684 full.  You may need to reduce memtable
>> and/or cache sizes.  Cassandra will now flush up to the two largest
>> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
>> in cassandra.yaml if you don't want Cassandra to do this automatically
>>  INFO [ScheduledTasks:1] 2012-09-19 18:51:22,498 StorageService.java
>> (line 2658) Unable to reduce heap usage since there are no dirty column
>> families
>> 
>>  WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 GCInspector.java (line
>> 139) Heap is 0.853078131310858 full.  You may need to reduce memtable
>> and/or cache sizes.  Cassandra is now reducing cache sizes to free up
>> memory.  Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you
>> don't want Cassandra to do this automatically
>>  WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java
>> (line 187) Reducing AppUser RowCache capacity from 10 to 0 to reduce
>> memory pressure
>>  WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java
>> (line 187) Reducing AppUser KeyCache capacity from 10 to 0 to reduce
>> memory pressure
>>  WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java
>> (line 187) Reducing PaymentClaim KeyCache capacity from 5 to 0 to
>> reduce memory pressure
>>  WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java
>> (line 187) Reducing Organization RowCache capacity from 1000 to 0 to reduce
>> memory pressure
>>  .
>>  INFO [main] 2012-09-19 18:57:14,181 StorageService.java (line 668)
>> JOINING: waiting for schema information to complete
>> ERROR [Thread-28] 2012-09-19 18:57:14,198 AbstractCassandraDaemon.java
>> (line 139) Fatal exception in thread Thread[Thread-28,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:140)
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:115)
>> ...
>> ERROR [ReadStage:353] 2012-09-19 18:57:20,453
>> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
>> Thread[ReadStage:353,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>> at
>> org.apache.cassandra.service.MigrationManager.makeColumns(MigrationManager.java:256)
>> at
>> org.apache.cassandra.db.DefinitionsUpdateVerbHandler.doVerb(DefinitionsUpdateVerbHandler.java:51)
>> at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>>
>>
>> Originally "max heap size" was set to 6G. Then we increased heap size
>> limit to 8G and it works. But warnings still present
>>
>>  WARN [ScheduledTasks:1] 2012-09-20 11:39:11,373 GCInspector.java (line
>> 145) Heap is 0.7760745735786222 full.  You may need to reduce memtable
>> and/or cache sizes.  Cassandra will now flush up to the two largest
>> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
>> in cassandra.yaml if you don't want Cassandra to do this automatically
>>  INFO [ScheduledTasks:1] 2012-09-20 11:39:11,374 StorageService.java
>> (line 2658) Unable to reduce heap usage since there are no dirty column
>> families
>>
>> It is probably a bug in applying migrations.
>> Could anyone explain why cassandra behaves this way? Could you please
>> recommend us smth to cope with this situation?
>> Thank you in advance.
>>
>> --
>> W/ best regards,
>> Sergey B.
>>
>>
>


-- 
Tyler Hobbs
DataStax

Re: [problem with OOM in nodes]

2012-09-20 Thread Tyler Hobbs

I'm not 100% that I understand your data model and read patterns correctly,
but it sounds like you have large supercolumns and are requesting some of
the subcolumns from individual super columns.  If that's the case, the
issue is that Cassandra must deserialize the entire supercolumn in memory
whenever you read *any* of the subcolumns.  This is one of the reasons why
composite columns are recommended over supercolumns.

On Thu, Sep 20, 2012 at 6:45 AM, Denis Gabaydulin  wrote:

> p.s. Cassandra 1.1.4
>
> On Thu, Sep 20, 2012 at 3:27 PM, Denis Gabaydulin 
> wrote:
> > Hi, all!
> >
> > We have a cluster with virtual 7 nodes (disk storage is connected to
> > nodes with iSCSI). The storage schema is:
> >
> > Reports:{
> > 1:{
> > 1:{"value1":"some val", "value2":"some val"},
> > 2:{"value1":"some val", "value2":"some val"}
> > ...
> > },
> > 2:{
> > 1:{"value1":"some val", "value2":"some val"},
> > 2:{"value1":"some val", "value2":"some val"}
> > ...
> > }
> > ...
> > }
> >
> > create keyspace osmp_reports
> >   with placement_strategy = 'SimpleStrategy'
> >   and strategy_options = {replication_factor : 4}
> >   and durable_writes = true;
> >
> > use osmp_reports;
> >
> > create column family QueryReportResult
> >   with column_type = 'Super'
> >   and comparator = 'BytesType'
> >   and subcomparator = 'BytesType'
> >   and default_validation_class = 'BytesType'
> >   and key_validation_class = 'BytesType'
> >   and read_repair_chance = 1.0
> >   and dclocal_read_repair_chance = 0.0
> >   and gc_grace = 432000
> >   and min_compaction_threshold = 4
> >   and max_compaction_threshold = 32
> >   and replicate_on_write = true
> >   and compaction_strategy =
> > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
> >   and caching = 'KEYS_ONLY';
> >
> > =
> >
> > Read/Write CL: 2
> >
> > Most of the reports are small, but some of them could have a half
> > mullion of rows (xml). Typical operations on this dataset is:
> >
> > count report rows by report_id (top level id of super column);
> > get columns (report_rows) by range predicate and limit for given
> report_id.
> >
> > A data is written once and hasn't never been updated.
> >
> > So, time to time a couple of nodes crashes with OOM exception. Heap
> > dump says, that we have a lot of super columns in memory.
> > For example, I see one of the reports is in memory entirely. How it
> > could be possible? If we don't load the whole report, cassandra could
> > whether do this for some internal reasons?
> >
> > What should we do to avoid OOMs?
>



-- 
Tyler Hobbs
DataStax

Re: Cassandra supercolumns with same name

2012-09-20 Thread Tyler Hobbs

If you're seeing that in cassandra-cli, it's possible that there are some
non-printable characters in the name that the cli doesn't display, like the
NUL char (ascii 0).  I opened a ticket for that somewhere, but in the
meantime, you may want to verify that they are identical with a real client.

On Tue, Sep 18, 2012 at 4:03 AM, aaron morton wrote:

> They are. Can you provide some more information ?
>
> What happens when you read the super column ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/09/2012, at 5:33 AM, Cyril Auburtin 
> wrote:
>
> First sorry but I'm using an old version 0.7.10
>
> and recently I've come up seeing this
>
> => (super_column=mymed_embrun.ma...@gmail.com,
>  (column=permission, value=1, timestamp=1347895421475))
> => (super_column=mymed_embrun.ma...@gmail.com,
>  (column=email, value=embrun.ma...@gmail.com, timestamp=1347894698217)
>  (column=id, value=mymed_embrun.ma...@gmail.com,
> timestamp=1347894698217)
>  (column=permission, value=0, timestamp=1347894698217)
>  (column=profile, value=e24af776b4a025456bd50f55633b2419,
> timestamp=1347894698217))
>
> as a part of of a supercolumnFamily
>
> I thought supercolumn was meant to be unique?
>
>
>


-- 
Tyler Hobbs
DataStax

Re: Using the commit log for external synchronization

2012-09-20 Thread Data Craftsman 木匠

This will be a good new feature. I guess the development team don't
have time on this yet.  ;)


On Thu, Sep 20, 2012 at 1:29 PM, Ben Hood <0x6e6...@gmail.com> wrote:
> Hi,
>
> I'd like to incrementally synchronize data written to Cassandra into
> an external store without having to maintain an index to do this, so I
> was wondering whether anybody is using the commit log to establish
> what updates have taken place since a given point in time?
>
> Cheers,
>
> Ben



-- 
Thanks,

Charlie (@mujiang) 木匠
===
Data Architect Developer 汉唐 田园牧歌DBA
http://mujiang.blogspot.com

Re: Using the commit log for external synchronization

2012-09-20 Thread Michael Kjellman

+1. Would be a pretty cool feature

Right now I write once to cassandra and once to kafka.

On 9/20/12 4:13 PM, "Data Craftsman 木匠" 
wrote:

>This will be a good new feature. I guess the development team don't
>have time on this yet.  ;)
>
>
>On Thu, Sep 20, 2012 at 1:29 PM, Ben Hood <0x6e6...@gmail.com> wrote:
>> Hi,
>>
>> I'd like to incrementally synchronize data written to Cassandra into
>> an external store without having to maintain an index to do this, so I
>> was wondering whether anybody is using the commit log to establish
>> what updates have taken place since a given point in time?
>>
>> Cheers,
>>
>> Ben
>
>
>
>-- 
>Thanks,
>
>Charlie (@mujiang) 木匠
>===
>Data Architect Developer 汉唐 田园牧歌DBA
>http://mujiang.blogspot.com


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook

Re: Using the commit log for external synchronization

2012-09-20 Thread Brian O'Neill


Along those lines...

We sought to use triggers for external synchronization.   If you read through 
this issue:
https://issues.apache.org/jira/browse/CASSANDRA-1311

You'll see the idea of leveraging a commit log for synchronization, via 
triggers.

We went ahead and implemented this concept in:
https://github.com/hmsonline/cassandra-triggers

With that, via AOP, you get handed the mutation as things change.  We used it 
for synchronizing SOLR.  

fwiw,
-brian



On Sep 20, 2012, at 7:18 PM, Michael Kjellman wrote:

> +1. Would be a pretty cool feature
> 
> Right now I write once to cassandra and once to kafka.
> 
> On 9/20/12 4:13 PM, "Data Craftsman 木匠" 
> wrote:
> 
>> This will be a good new feature. I guess the development team don't
>> have time on this yet.  ;)
>> 
>> 
>> On Thu, Sep 20, 2012 at 1:29 PM, Ben Hood <0x6e6...@gmail.com> wrote:
>>> Hi,
>>> 
>>> I'd like to incrementally synchronize data written to Cassandra into
>>> an external store without having to maintain an index to do this, so I
>>> was wondering whether anybody is using the commit log to establish
>>> what updates have taken place since a given point in time?
>>> 
>>> Cheers,
>>> 
>>> Ben
>> 
>> 
>> 
>> -- 
>> Thanks,
>> 
>> Charlie (@mujiang) 木匠
>> ===
>> Data Architect Developer 汉唐 田园牧歌DBA
>> http://mujiang.blogspot.com
> 
> 
> 'Like' us on Facebook for exclusive content and other resources on all 
> Barracuda Networks solutions.
> Visit http://barracudanetworks.com/facebook
> 
> 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/

Re: Is Cassandra right for me?

2012-09-20 Thread aaron morton

> Actually, if I use community edition for now, I wouldn't be able to use 
> hadoop against data stored in CFS? 
AFAIK DSC is a packaged deployment of Apache Cassandra. You should be ale to 
use Hadoop against it, in the same way you can use hadoop against Apache 
Cassandra. 

You "can do" anything with computers if you have enough time and patience. DSE 
reduces the amount of time and patience needed to run Hadoop over Cassandra. 
Specifically it helps by providing a HDFS and Hive Meta Store that run on 
Cassandra. This reduces the number of moving parts you need to provision. 

> Would writes on HDFS be so quick as in Cassandra?
Yes and no. 
HDFS uses a big bock size, so while it may absorb writes quickly you may not be 
able to read them immediately. 
Remember you may need a HDFS layer for intermediate results. 
 
> would I have advantages in using Cassandra instead of HBase?

Cassandra provides no single point of failure, great scalability, tuneable 
consistency, a flexible data model and very easy single package deployment. My 
HBase knowledge is limited, but I would check those points and go with whatever 
you feel comfortable with. 

> If everything in my model fits into a relational database, if my data is 
> structured, would it still be a good idea to use Cassandra? Why?
It's reasonable to use cassandra for structured data. After a few iterations of 
development you may find that the current structure is not the best for a 
non-RDBMS. e.g. It's often easier to work with larger entities that violate 
Normal Form requirements.

There are lots of advantages to use Cassandra, just as there are benefits to 
using a RDBMS rather than custom flat files. If you feel your project will 
benefit from those advantages, and/or you are technically curious, I would 
recommend  trying Cassandra. 

Chose a small part of your product and create a Proof of Concept, it should 
only take a week or so. Make as many mistakes as you can as fast as you can and 
have fun.   

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/09/2012, at 1:51 AM, Marcelo Elias Del Valle  wrote:

> Aaron,
> 
> Thank you very much for the answers! Helped me a lot!
> I would like just a bit more clarification about the points bellow, if 
> you allow me:
> 
> You can query your data using Hadoop easily enough. You may want take a look 
> at DSE from  http://datastax.com/ it makes using Hadoop and Solr with 
> cassandra easier.
> Actually, if I use community edition for now, I wouldn't be able to use 
> hadoop against data stored in CFS? We are considering the enterprise edition 
> here, but the best scenario would be using it just when really needed. Would 
> writes on HDFS be so quick as in Cassandra?
> 
> It depends on how many moving parts you are comfortable with. Same for the 
> questions about HDFS etc. Start with the smallest about of infrastructure.
> Sorry, I didn't really understand this part. I am not sure what you wanted to 
> say, but the question was about using nosql instead a relational database in 
> this case. If learning nosql is not a problem, would I have advantages in 
> using Cassandra instead of HBase? If everything in my model fits into a 
> relational database, if my data is structured, would it still be a good idea 
> to use Cassandra? Why?
> 
> 
> Thanks,
> Marcelo.
> 
> 2012/9/18 aaron morton 
>> Also, I saw a presentation which said that if I don't have rows with more 
>> than a hundred rows in Cassandra, whether I am doing something wrong or I 
>> shouldn't be using Cassandra. 
> I do not agree with that statement. (I read that as rows with ore than a 
> hundred _columns_)
> 
>> I need to support a high volume of writes per second. I might have a billion 
>> writes per hour
> Thats about 280K /sec. Netflix did a benchmark that shows 1.1M/sec 
> http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
> 
>> I need to write non-structured data that will be processed later by hadoop 
>> processes to generate structured data from it. Later, I index the structured 
>> data using SOLR or SOLANDRA, so the data can be consulted by my end user 
>> application. Is Cassandra recommended for that, or should I be thinking in 
>> writting directly to HDFS files, for instance? What's the main advantage I 
>> get from storing data in a nosql service like Cassandra, when compared to 
>> storing files into HDFS?
> You can query your data using Hadoop easily enough. You may want take a look 
> at DSE from  http://datastax.com/ it makes using Hadoop and Solr with 
> cassandra easier. 
> 
>> If I don't need to perform complicated queries in Cassandra, should I store 
>> the json-like data just as a column value? I am afraid of doing something 
>> wrong here, as I would need just to store the json file and some more 5 or 6 
>> fields to query the files later.
> Store the data in the way that best supports the read queries you want to 
>

Re: Row caches

2012-09-20 Thread aaron morton

Set the caching attribute for the CF. It defaults to keys_only, other values 
are both or rows_only. 

See http://www.datastax.com/dev/blog/caching-in-cassandra-1-1

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/09/2012, at 1:34 PM, Jason Wee  wrote:

> which version is that? in version, 1.1.2 , nodetool does take the column 
> family.
> 
> setcachecapacity - 
> Set the key and row cache capacities of a given column family
> 
> On Wed, Sep 19, 2012 at 2:15 AM, rohit reddy  
> wrote:
> Hi,
> 
> Is it possible to enable row cache per column family after the column family 
> is created.
> 
> nodetool setcachecapacity does not take the column family as input.
> 
> Thanks
> Rohit
>

Re: Disk configuration in new cluster node

2012-09-20 Thread aaron morton

> Would it help if I partitioned the computing resources of my physical 
> machines into VMs? 
No. 
Just like cutting a cake into smaller pieces does not mean you can eat more 
without getting fat.

In the general case, regular HDD and 1 Gbe and 8 to 16 virtual cores and 8GB to 
16GB ram, you can expect to comfortably run up 400GB of data (maybe 500GB). 
That is replicated storage,  so 400 / 3 = 133GB if you replicate data 3 times. 
  
Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/09/2012, at 3:42 PM, Віталій Тимчишин  wrote:

> Network also matters. It would take a lot of time sending 6TB over 1Gb link, 
> even fully saturating it. IMHO You can try with 10Gb, but you will need to 
> raise your streaming/compaction limits a lot.
> Also you will need to ensure that your compaction can keep up. It is often 
> done in one thread and I am not sure if it will be enough for you. As of 
> parallel compaction, I don't know exact limitations and if it will be working 
> in your case.
> 
> 2012/9/18 Casey Deccio 
> On Tue, Sep 18, 2012 at 1:54 AM, aaron morton  wrote:
>> each with several disks having large capacity, totaling 10 - 12 TB.  Is this 
>> (another) bad idea?
> 
> Yes. Very bad. 
> If you had 6TB on average system with spinning disks you would measure 
> duration of repairs and compactions in days. 
> 
> If you want to store 12 TB of data you will need more machines. 
>  
> 
> Would it help if I partitioned the computing resources of my physical 
> machines into VMs?  For example, I put four VMs on each of three virtual 
> machines, each with a dedicated 2TB drive.  I can now have four tokens in the 
> ring and a RF of 3.  And of course, I can arrange them into a way that makes 
> the most sense.  Is this getting any better, or am I missing the point?
> 
> Casey
> 
> 
> 
> -- 
> Best regards,
>  Vitalii Tymchyshyn

Re: Solr Use Cases

2012-09-20 Thread aaron morton

> Also, Cassandra is great for writes but not as optimized for reads. 

From cassandra 1.0 read throughout on a par with writes 
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance

You milage may vary depending on the workload. 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/09/2012, at 3:08 AM, Michael Kjellman  wrote:

> If I were you I would look into ElasticSearch unless you are okay updating 
> the search cache very infrequently.
> 
> I tried Solandra vs ElasticSearch in our use case and there was no contest.
> 
> Also, Cassandra is great for writes but not as optimized for reads. Honestly, 
> it all depends on your use case and what brand of Lucine depends on it.
> 
> I would benchmark it and see what sticks.
> 
> On Sep 19, 2012, at 5:28 AM, "Roshni Rajagopal" 
> mailto:roshni_rajago...@hotmail.com>> wrote:
> 
> Hi,
> 
> Im new to Solr, and I hear that Solr is a great tool for improving search 
> performance
> Im unsure whether Solr or DSE Search is a must for all cassandra deployments
> 
> 1. For performance - I thought cassandra had great read & write performance. 
> When should solr be used ?
> Taking the following use cases for cassandra from the datastax FAQ page, in 
> which cases would Solr be useful, and whether for all?
> 
> *   Time series data management
> *   High-velocity device data ingestion and analysis
> *   Media streaming (e.g., music, movies)
> *   Social media input and analysis
> *   Online web retail (e.g., shopping carts, user transactions)
> *   Web log management / analysis
> *   Web click-stream analysis
> *   Real-time data analytics
> *   Online gaming (e.g., real-time messaging)
> *   Write-intensive transaction systems
> *   Buyer event analytics
> *   Risk analysis and management
> 
> 2. what changes to cassandra data modeling does Solr bring? We have some 
> guidelines & best practices around cassandra data modeling.
> Is Solr so powerful, that it does not matter how data is modelled in 
> cassandra? Are there different best practices for cassandra data modeling 
> when Solr is in the picture?
> Is this something we should keep in mind while modeling for cassandra today- 
> that it should be  good to be used via Solr in future?
> 
> 3. Does Solr come with any drawbacks like its not real time ?
> 
> I can & should read the manual, but it will be great if someone can explain 
> at a high level.
> 
> Thank you!
> 
> 
> Regards,
> Roshni
> 
> 'Like' us on Facebook for exclusive content and other resources on all 
> Barracuda Networks solutions.
> Visit http://barracudanetworks.com/facebook
> 
>

Re: Setting the default replication factor for Solandra cores

2012-09-20 Thread aaron morton

> I want to set the replication factor = 2, 
This is part of the CREATE KEYSPACE command, not sure where this is in 
solandra. 

I would recommend using RF 3 as a minimum. 

> , and the default replications strategy to be RackAwareStrategy.
That's a very old strategy. 
The default is NetworkTopologyStrategy which is the new RAS. 
Again , this is set as part of the CREATE KEYSPACE command. 

You can update both using UPDATE KEYSPACE in the cli. Check the operations page 
in the wiki for info on changing replication. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/09/2012, at 3:17 AM, Michael Kjellman  wrote:

> If I recall correctly you should make those changes in the schema through the 
> CLI.
> 
> I never ended up running Solandra in production though so I'm not sure if 
> anyone else has better options. Why is the CLI not enough?
> 
> On Sep 19, 2012, at 5:56 AM, "Safdar Kureishy" 
> mailto:safdar.kurei...@gmail.com>> wrote:
> 
> Hi,
> 
> This question is related to Solandra, but since it sits on top of Cassandra, 
> I figured I'd use this mailing list (since there isn't another one I know of 
> for Solandra). Apologies in advance if this is the wrong place for this.
> 
> I'm trying to setup a Solandra cluster, across 2 centers. I want to set the 
> replication factor = 2, and the default replications strategy to be 
> RackAwareStrategy. Is there somewhere in cassandra.yaml or 
> solandra.properties that I can provide these parameters so that I won't need 
> to use cassandra-cli manually? I couldn't find any such property so far...
> 
> Thanks in advance.
> 
> Safdar
> 
> 'Like' us on Facebook for exclusive content and other resources on all 
> Barracuda Networks solutions.
> Visit http://barracudanetworks.com/facebook
> 
>

Re: Correct model

2012-09-20 Thread aaron morton

> I created the following model: an UserCF, whose key is a userID generated by 
> TimeUUID, and a RequestCF, whose key is composite: UserUUID + timestamp. For 
> each user, I will store basic data and, for each request, I will insert a lot 
> of columns.

I would consider:

# User CF
* row_key: user_id
* columns: user properties, key=value

# UserRequests CF
* row_key:  where partition_start is the start of a 
time partition that makes sense in your domain. e.g. partition monthly. 
Generally want to avoid rows the grow forever, as a rule of thumb avoid rows 
more than a few 10's of MB. 
* columns: two possible approaches:
1) If the requests are immutable and you generally want all of the data 
store the request in a single column using JSON or similar, with the column 
name a timestamp. 
2) Otherwise use a composite column name of  to store the request in many columns. 
* In either case consider using Reversed comparators so the most recent 
columns are first  see http://thelastpickle.com/2011/10/03/Reverse-Comparators/

# GlobalRequests CF
* row_key: partition_start - time partition as above. It may be easier 
to use the same partition scheme. 
* column name: 
* column value: empty 

> - Select all the requests for an user

Work out the current partition client side, get the first N columns. Then page. 

> - Select all the users which has new requests, since date D
Worm out the current partition client side, get the first N columns from 
GlobalRequests, make a multi get call to UserRequests 

NOTE: Assuming the size of the global requests space is not huge.

Hope that helps. 
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/09/2012, at 11:19 AM, Marcelo Elias Del Valle  wrote:

> In your first email, you get a request and seem to shove it and a user in
> generating the ids which means that user never generates a request ever
> again???  If a user sends multiple requests in, how are you looking up his
> TimeUUID row key from your first email(I would do the same in my
> implementation)?
> 
> Actually, I don't get it from Cassandra. I am using Cassandra for the writes, 
> but to find the userId I look on a pre-indexed structure, because I think the 
> reads would be faster this way. I need to find the userId by some key fields, 
> so I use an index like this:
> 
> user ID 5596 -> { name -> "john denver", phone -> " ", field3 -> 
> "field 3 data", field 10 -> "field 10 data"}
>
> The values are just examples. This part is not implemented yet and I am 
> looking for alternatives. Currently we have some similar indexes in SOLR, but 
> we are thinking in keeping the index in memory and replicating manually in 
> the cluster, or using Voldemort, etc. 
> I might be wrong, but I think Cassandra is great for writes, but a solution 
> like this would be better for reads.
> 
>  
> If you had an ldap unique username, I would just use that as the primary
> key meaning you NEVER have to do reads.  If you have a username and need
> to lookup a UUID, you would have to do that in both implementationsŠnot a
> real big deal thoughŠa quick quick lookup table does the trick there and
> in most cases is still fast enough(ie. Read before write here is ok in a
> lot of cases).
> 
> That X-ref table would simple be rowkey=username and value=users real
> primary key
> 
> Though again, we use ldap and know no one's username is really going to
> change so username is our primary key.
> 
> In my case, a single user can have thousands of requests. In my userCF, I 
> will have just 1 user with uuid X, but I am not sure about what to have in my 
> requestCF.
>  
> -- 
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr

Re: Setting the default replication factor for Solandra cores

2012-09-20 Thread shubham srivastava

With Solandra as well you can use the Cassandra Cli to do the needful. The
location would be [~/Solandra/bin/] .

Regards,
Shubham

On Fri, Sep 21, 2012 at 6:56 AM, aaron morton wrote:

> I want to set the replication factor = 2,
>
> This is part of the CREATE KEYSPACE command, not sure where this is in
> solandra.
>
> I would recommend using RF 3 as a minimum.
>
> , and the default replications strategy to be RackAwareStrategy.
>
> That's a very old strategy.
> The default is NetworkTopologyStrategy which is the new RAS.
> Again , this is set as part of the CREATE KEYSPACE command.
>
> You can update both using UPDATE KEYSPACE in the cli. Check the operations
> page in the wiki for info on changing replication.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 20/09/2012, at 3:17 AM, Michael Kjellman 
> wrote:
>
> If I recall correctly you should make those changes in the schema through
> the CLI.
>
> I never ended up running Solandra in production though so I'm not sure if
> anyone else has better options. Why is the CLI not enough?
>
> On Sep 19, 2012, at 5:56 AM, "Safdar Kureishy"  mailto:safdar.kurei...@gmail.com >> wrote:
>
> Hi,
>
> This question is related to Solandra, but since it sits on top of
> Cassandra, I figured I'd use this mailing list (since there isn't another
> one I know of for Solandra). Apologies in advance if this is the wrong
> place for this.
>
> I'm trying to setup a Solandra cluster, across 2 centers. I want to set
> the replication factor = 2, and the default replications strategy to be
> RackAwareStrategy. Is there somewhere in cassandra.yaml or
> solandra.properties that I can provide these parameters so that I won't
> need to use cassandra-cli manually? I couldn't find any such property so
> far...
>
> Thanks in advance.
>
> Safdar
>
> 'Like' us on Facebook for exclusive content and other resources on all
> Barracuda Networks solutions.
> Visit http://barracudanetworks.com/facebook
>
>
>
>

Re: Should row keys be inserted in ascending order?

2012-09-20 Thread Tyler Hobbs

Rows are actually stored on disk in the order of the hash of their keys
when using RandomPartitioner.

Furthermore, the rows are stored in SSTables, which are immutable, and are
periodically compacted together.  There's no shifting involved.  This gives
an overview: http://wiki.apache.org/cassandra/MemtableSSTable

On Thu, Sep 20, 2012 at 7:52 PM, Cory Mintz  wrote:

> The DataModel page in the Cassandra Wiki (
> http://wiki.apache.org/cassandra/DataModel) says:
>
>
>
> "In Cassandra, each column family is stored in a separate file, and the
> file is sorted in row (i.e. key) major order."
>
>
>
> Does this mean that new row keys should be ascending? If they are not
> ascending does that mean all
>
> of the data after the new key needs to be shifted down?
>
>
> Thanks.
>
> Cory
>

-- 
Tyler Hobbs
DataStax

Re: persistent compaction issue (1.1.4 and 1.1.5)

2012-09-20 Thread Michael Kjellman

Ended up switching the biggest offending column families back to size tiered 
compaction and pending compactions across the cluster dropped to 0 very quickly.

On Sep 19, 2012, at 10:55 PM, "Michael Kjellman"  
wrote:

> After changing my ss_table_size as recommended my pending compactions across 
> the cluster have leveled off at 34808 but it isn't progressing after 24 hours 
> at that level.
> 
> As I've already changed the most offending column families I think the only 
> option I have left is to remove the .json files from all of the column 
> families and do another rolling restart...
> 
> Developing... Thanks for the help so far
> 
> On Sep 19, 2012, at 10:35 PM, "Віталій Тимчишин" 
> mailto:tiv...@gmail.com>> wrote:
> 
> I did see problems with schema agreement on 1.1.4, but they did go away after 
> rolling restart (BTW: it would be still good to check describe schema for 
> unreachable). Same rolling restart helped to force compactions after moving 
> to Leveled compaction. If your compactions still don't go, you can try 
> removing *.json files from the data directory of the stopped node to force 
> moving all SSTables to level0.
> 
> Best regards, Vitalii Tymchyshyn
> 
> 2012/9/19 Michael Kjellman 
> mailto:mkjell...@barracuda.com>>
> Potentially the pending compactions are a symptom and not the root
> cause/problem.
> 
> When updating a 3rd column family with a larger sstable_size_in_mb it
> looks like the schema may not be in a good state
> 
> [default@] UPDATE COLUMN FAMILY screenshots WITH
> compaction_strategy=LeveledCompactionStrategy AND
> compaction_strategy_options={sstable_size_in_mb: 200};
> 290cf619-57b0-3ad1-9ae3-e313290de9c9
> Waiting for schema agreement...
> Warning: unreachable nodes 10.8.30.102The schema has not settled in 10
> seconds; further migrations are ill-advised until it does.
> Versions are UNREACHABLE:[10.8.30.102],
> 290cf619-57b0-3ad1-9ae3-e313290de9c9:[10.8.30.15, 10.8.30.14, 10.8.30.13,
> 10.8.30.103, 10.8.30.104, 10.8.30.105, 10.8.30.106],
> f1de54f5-8830-31a6-9cdd-aaa6220cccd1:[10.8.30.101]
> 
> 
> However, tpstats looks good. And the schema changes eventually do get
> applied on *all* the nodes (even the ones that seem to have different
> schema versions). There are no communications issues between the nodes and
> they are all in the same rack
> 
> root@:~# nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked
> All time blocked
> ReadStage 0 01254592 0
>0
> RequestResponseStage  0 09480827 0
>0
> MutationStage 0 08662263 0
>0
> ReadRepairStage   0 0 339158 0
>0
> ReplicateOnWriteStage 0 0  0 0
>0
> GossipStage   0 01469197 0
>0
> AntiEntropyStage  0 0  0 0
>0
> MigrationStage0 0   1808 0
>0
> MemtablePostFlusher   0 0248 0
>0
> StreamStage   0 0  0 0
>0
> FlushWriter   0 0248 0
>4
> MiscStage 0 0  0 0
>0
> commitlog_archiver0 0  0 0
>0
> InternalResponseStage 0 0   5286 0
>0
> HintedHandoff 0 0 21 0
>0
> 
> Message type   Dropped
> RANGE_SLICE  0
> READ_REPAIR  0
> BINARY   0
> READ 0
> MUTATION 0
> REQUEST_RESPONSE 0
> 
> So I'm guessing maybe the different schema versions may be potentially
> stopping compactions? Will compactions still happen if there are different
> versions of the schema?
> 
> 
> 
> 
> 
> On 9/18/12 11:38 AM, "Michael Kjellman" 
> mailto:mkjell...@barracuda.com>> wrote:
> 
>> Thanks, I just modified the schema on the worse offending column family
>> (as determined by the .json) from 10MB to 200MB.
>> 
>> Should I kick off a compaction on this cf now/repair?/scrub?
>> 
>> Thanks
>> 
>> -michael
>> 
>> From: Віталій Тимчишин 
>> mailto:tiv...@gmail.com>>>
>> Reply-To: 
>> "user@cassandra.apache.org>"
>> mailto:user@cassandra.apache.org>>>
>> To: 
>> "user@cassandra.apache.org

Re: Row caches

2012-09-20 Thread rohit reddy

Got it. Thanks for the replies

On Fri, Sep 21, 2012 at 6:30 AM, aaron morton wrote:

> Set the caching attribute for the CF. It defaults to keys_only, other
> values are both or rows_only.
>
> See http://www.datastax.com/dev/blog/caching-in-cassandra-1-1
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/09/2012, at 1:34 PM, Jason Wee  wrote:
>
> which version is that? in version, 1.1.2 , nodetool does take the column
> family.
>
> setcachecapacity
> - Set the key and row cache capacities of a given column family
>
> On Wed, Sep 19, 2012 at 2:15 AM, rohit reddy 
> wrote:
>
>> Hi,
>>
>> Is it possible to enable row cache per column family after the column
>> family is created.
>>
>> *nodetool setcachecapacity* does not take the column family as input.
>>
>> Thanks
>> Rohit
>>
>
>
>

Re: Losing keyspace on cassandra upgrade

Re: Invalid Counter Shard errors?

Re: CQL3 - collections

Re: Data Modeling - JSON vs Composite columns

Re: Invalid Counter Shard errors?

Re: Data Modeling - JSON vs Composite columns

sometimes get timeout while batch inserting. (using pycassa)

Re: sometimes get timeout while batch inserting. (using pycassa)

[problem with OOM in nodes]

Re: Composite Column Types Storage

Re: Losing keyspace on cassandra upgrade

Re: [problem with OOM in nodes]

OOM when applying migrations

Re: OOM when applying migrations

Re: Composite Column Types Storage

Re: sometimes get timeout while batch inserting. (using pycassa)

any ways to have compaction use less disk space?

Re: any ways to have compaction use less disk space?

Using the commit log for external synchronization

Code example for CompositeType.Builder and SSTableSimpleUnsortedWriter

Re: OOM when applying migrations

Re: [problem with OOM in nodes]

Re: Cassandra supercolumns with same name

Re: Using the commit log for external synchronization

Re: Using the commit log for external synchronization

Re: Using the commit log for external synchronization

Re: Is Cassandra right for me?

Re: Row caches

Re: Disk configuration in new cluster node

Re: Solr Use Cases

Re: Setting the default replication factor for Solandra cores

Re: Correct model

Re: Setting the default replication factor for Solandra cores

Re: Should row keys be inserted in ascending order?

Re: persistent compaction issue (1.1.4 and 1.1.5)

Re: Row caches

36 matches

Site Navigation

Mail list logo

Footer information