Re: Cassandra disk usage

2014-04-14 Thread Yulian Oifa
Hello
The load of data on 3 nodes is :

Address DC  RackStatus State   Load
OwnsToken

113427455640312821154458202477256070485
172.19.10.1 19  10  Up Normal  22.16 GB
33.33%  0
172.19.10.2 19  10  Up Normal  19.89 GB
33.33%  56713727820156410577229101238628035242
172.19.10.3 19  10  Up Normal  30.74 GB
33.33%  113427455640312821154458202477256070485

Best regards
Yulian Oifa



On Sun, Apr 13, 2014 at 9:17 PM, Mark Reddy  wrote:

> i I will change the data i am storing to decrease the usage , in value i
>> will find some small value to store.Previously i used same value since this
>> table is index only for search purposed and does not really has value.
>
>
> If you don't need a value, you don't have to store anything. You can store
> the column name and leave the value empty, this is a common practice.
>
> 1) What should be recommended read and write consistency and replication
>> factor for 3 nodes with option of future increase server numbers?
>
>
> Both consistency level and replication factor are tuneable depending on
> your application constraints. I'd say a CL or quorum and RF of 3 is the
> general practice.
>
> Still it has 1.5X of overall data how can this be resolved and what is
>> reason for that?
>
>
> As Michał pointed out there is a 15 byte column overhead to consider
> here, where:
>
> total_column_size = column_name_size + column_value_size + 15
>
>
> This link might shed some light on this:
> http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html
>
> Also i see that data is in different size on all nodes , does that means
>> that servers are out of sync
>
>
> How much is it out by? Data size may differ due to deletes, as you
> mentioned you do deletes. What is the output of 'nodetool ring'?
>
>
> On Sun, Apr 13, 2014 at 6:42 PM, Michal Michalski <
> michal.michal...@boxever.com> wrote:
>
>> > Each columns have name of 15 chars ( digits ) and same 15 chars in
>> value ( also digits ).
>> > Each column should have 30 bytes.
>>
>> Remember about the standard Cassandra's column overhead which is, as far
>> as I remember, 15 bytes, so it's 45 bytes in total - 50% more than you
>> estimated, which kind of matches your 3 GB vs 4.5 GB case.
>>
>> There's also a per-row overhead, but I'm not sure about its size in
>> current C* versions - I remember it was about 25 bytes or so some time ago,
>> but it's not important in your case.
>>
>> Kind regards,
>> Michał Michalski,
>> michal.michal...@boxever.com
>>
>>
>> On 13 April 2014 17:48, Yulian Oifa  wrote:
>>
>>> Hello Mark and thanks for you reply.
>>> 1) i store is as UTF8String.All digits are from 0x30 to 0x39 and should
>>> take 1 byte each digit. Since all characters are digits it should have 15
>>> bytes.
>>> 2) I will change the data i am storing to decrease the usage , in value
>>> i will find some small value to store.Previously i used same value since
>>> this table is index only for search purposed and does not really has value.
>>> 3) You are right i read and write in quorum and it was my mistake ( i
>>> though that if i write in quorum then data will be written to 2 nodes only).
>>> If i check the keyspace
>>> create keyspace USER_DATA
>>>   with placement_strategy = 'NetworkTopologyStrategy'
>>>   and strategy_options = [{19 : 3}]
>>>   and durable_writes = true;
>>>
>>> it has replication factor of 3.
>>> Therefore i have several questions
>>> 1) What should be recommended read and write consistency and replication
>>> factor for 3 nodes with option of future increase server numbers?
>>> 2) Still it has 1.5X of overall data how can this be resolved and what
>>> is reason for that?
>>> 3) Also i see that data is in different size on all nodes , does that
>>> means that servers are out of sync???
>>>
>>> Thanks and best regards
>>> Yulian Oifa
>>>
>>>
>>> On Sun, Apr 13, 2014 at 7:03 PM, Mark Reddy wrote:
>>>
 What are you storing these 15 chars as; string, int, double, etc.? 15
 chars does not translate to 15 bytes.

 You may be mixing up replication factor and quorum when you say "Cassandra
 cluster has 3 servers, and data is stored in quorum ( 2 servers )."
 You read and write at quorum (N/2)+1 where N=total_number_of_nodes and your
 data is replicated to the number of nodes you specify in your replication
 factor. Could you clarify?

 Also if you are concerned about disk usage, why are you storing the
 same 15 char value in both the column name and value? You could just store
 it as the name and half your data usage :)




 On Sun, Apr 13, 2014 at 4:26 PM, Yulian Oifa wrote:

> I have column family with 2 raws.
> 2 raws have overall 100 million columns.
> Each columns have name of 15 chars ( digits ) and same 15 chars in
> value ( also digits ).
> Each column should have 30 bytes.
> The

Replication Factor question

2014-04-14 Thread Markus Jais
Hello,

currently reading the "Practical Cassandra". In the section about replication 
factors the book says:

"It is generally not recommended to set a replication factor of 3 if you have 
fewer than six nodes in a data center".

Why is that? What problems would arise if I had a replication factor of 3 and 
only 5 nodes?

Does that mean that for a replication of 4 I would need at least 8 nodes and 
for a factor of 5 at least 10 nodes?

Not saying that I would factor 5 andn 10 nodes, just curious about how this 
works.

All the best,

Markus

Re: Replication Factor question

2014-04-14 Thread Sergey Murylev
Hi Markus,
> "It is generally not recommended to set a replication factor of 3 if
> you have fewer than six nodes in a data center".
Actually you can create a cluster with 3 nodes and replication level 3.
But in this case if one of them would fail cluster become inconsistent.
In this way minimum reasonable nodes number is 4 for replication level 3.
In this case we can tolerate single node failure. But in this situation
each node would contain 3/4 of all data. This is not very good. Number 6
is recommended because in this case each node contain 1/2 of all data,
this is quite adequate overhead.

Typically Cassandra clusters don't have big replication level, typically
it is 3 (failure of any single node don't crush cluster) or 5 (failure
of any two nodes don't crush cluster).

For more details you should look to replication level calculator
.

--
Thanks,
Sergey

On 14/04/14 13:25, Markus Jais wrote:
> Hello,
>
> currently reading the "Practical Cassandra". In the section about
> replication factors the book says:
>
> "It is generally not recommended to set a replication factor of 3 if
> you have fewer than six nodes in a data center".
>
> Why is that? What problems would arise if I had a replication factor
> of 3 and only 5 nodes?
>
> Does that mean that for a replication of 4 I would need at least 8
> nodes and for a factor of 5 at least 10 nodes?
>
> Not saying that I would factor 5 andn 10 nodes, just curious about how
> this works.
>
> All the best,
>
> Markus



signature.asc
Description: OpenPGP digital signature


Re: Replication Factor question

2014-04-14 Thread Tupshin Harper
I do not agree with this advice.  It can be perfectly reasonable to have
#nodes < 2*RF.

It is common to deploy a 3 node cluster with RF=3 and it works fine as long
as each node can handle 100% of your data, and keep up with the workload.

-Tupshin
On Apr 14, 2014 5:25 AM, "Markus Jais"  wrote:

> Hello,
>
> currently reading the "Practical Cassandra". In the section about
> replication factors the book says:
>
> "It is generally not recommended to set a replication factor of 3 if you
> have fewer than six nodes in a data center".
>
> Why is that? What problems would arise if I had a replication factor of 3
> and only 5 nodes?
>
> Does that mean that for a replication of 4 I would need at least 8 nodes
> and for a factor of 5 at least 10 nodes?
>
> Not saying that I would factor 5 andn 10 nodes, just curious about how
> this works.
>
> All the best,
>
> Markus
>


Re: Replication Factor question

2014-04-14 Thread Markus Jais
Hi all,

thanks. Very helpful.

@Tupshin: With a 3 node cluster and RF 3 isn't it a problem if one node fails 
(due to hardware problems, for example). According to the C* docs, writes fail 
if the number of nodes is smaller than the RF.
I agree that it will run fine as long as all nodes are up and they can handle 
the load but eventually hardware will fail.

Markus





Tupshin Harper  schrieb am 13:44 Montag, 14.April 2014:
 
I do not agree with this advice.  It can be perfectly reasonable to have #nodes 
< 2*RF. 
>It is common to deploy a 3 node cluster with RF=3 and it works fine as long as 
>each node can handle 100% of your data, and keep up with the workload. 
>-Tupshin 
>On Apr 14, 2014 5:25 AM, "Markus Jais"  wrote:
>
>Hello,
>>
>>
>>currently reading the "Practical Cassandra". In the section about replication 
>>factors the book says:
>>
>>
>>"It is generally not recommended to set a replication factor of 3 if you have 
>>fewer than six nodes in a data center".
>>
>>
>>Why is that? What problems would arise if I had a replication factor of 3 and 
>>only 5 nodes?
>>
>>
>>Does that mean that for a replication of 4 I would need at least 8 nodes and 
>>for a factor of 5 at least 10 nodes?
>>
>>
>>Not saying that I would factor 5 andn 10 nodes, just curious about how this 
>>works.
>>
>>
>>All the best,
>>
>>
>>Markus
>
>

Re: Replication Factor question

2014-04-14 Thread Tupshin Harper
With 3 nodes, and RF=3, you can always use CL=ALL if all nodes are up,
QUORUM if 1 node is down, and ONE if any two nodes are down.

The exact same thing is true if you have more nodes.

-Tupshin
On Apr 14, 2014 7:51 AM, "Markus Jais"  wrote:

> Hi all,
>
> thanks. Very helpful.
>
> @Tupshin: With a 3 node cluster and RF 3 isn't it a problem if one node
> fails (due to hardware problems, for example). According to the C* docs,
> writes fail if the number of nodes is smaller than the RF.
> I agree that it will run fine as long as all nodes are up and they can
> handle the load but eventually hardware will fail.
>
> Markus
>
>
>
>
>
>   Tupshin Harper  schrieb am 13:44 Montag, 14.April
> 2014:
>
> I do not agree with this advice.  It can be perfectly reasonable to have
> #nodes < 2*RF.
> It is common to deploy a 3 node cluster with RF=3 and it works fine as
> long as each node can handle 100% of your data, and keep up with the
> workload.
> -Tupshin
> On Apr 14, 2014 5:25 AM, "Markus Jais"  wrote:
>
> Hello,
>
> currently reading the "Practical Cassandra". In the section about
> replication factors the book says:
>
> "It is generally not recommended to set a replication factor of 3 if you
> have fewer than six nodes in a data center".
>
> Why is that? What problems would arise if I had a replication factor of 3
> and only 5 nodes?
>
> Does that mean that for a replication of 4 I would need at least 8 nodes
> and for a factor of 5 at least 10 nodes?
>
> Not saying that I would factor 5 andn 10 nodes, just curious about how
> this works.
>
> All the best,
>
> Markus
>
>
>
>


Re: clearing tombstones?

2014-04-14 Thread William Oberman
I'm still somewhat in the middle of the process, but it's far enough along
to report back.

1.) I changed GCGraceSeconds of the CF to 0 using cassandra-cli
2.)  I ran nodetool compact on a single node of the nine (I'll call it
"1").  It took 5-7 hours, and reduced the CF from ~450 to ~75GG (*).
3.)  I ran nodetool compact on nodes 2, 3,  while watching write/read
latency averages in OpsCenter.  I got all of the way to 9 without any ill
effect
4.) 2->9 all completed with similar results

(*) So, I left one one detail that changed the math (I said above I
expected to clear down to at most 50GB).  I found a small bug in my delete
code mid-last week.  Basically, it deleted all of the rows I wanted, but
due to a race condition, there was a chance I'd delete rows in the middle
of doing new inserts.  Luckily, even in this case, it wasn't "end of the
world", but I stopped the cleanup anyways and added a time check (as all of
the rows I wanted to delete were older than 30 days).  I *thought* I'd
restarted the cleanup threads on a smaller dataset due to all of the
deletes, but instead I saw millions & millions of empty rows (the
tombstones).  Thus the start of this "clear the tombstones" subtask to the
original goal, and the reason I didn't see a 90%+ reduction in size.

In any case, now I'm running the cleanup process again, which will be
followed by ANOTHER round of compactions, and then I'll finally turn
GCGraceSeconds back on.

On the read/write production side, you'd never know anything happened.
 Good job on the distributed system! :-)

Thanks again,

will


On Fri, Apr 11, 2014 at 1:02 PM, Mark Reddy  wrote:

> Thats great Will, if you could update the thread with the actions you
> decide to take and the results that would be great.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 5:53 PM, William Oberman  > wrote:
>
>> I've learned a *lot* from this thread.  My thanks to all of the
>> contributors!
>>
>> Paulo: Good luck with LCS.  I wish I could help there, but all of my CF's
>> are SizeTiered (mostly as I'm on the same schema/same settings since 0.7...)
>>
>> will
>>
>>
>>
>> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib wrote:
>>
>>>
>>> Levelled Compaction is a wholly different beast when it comes to
>>> tombstones.
>>>
>>> The tombstones are inserted, like any other write really, at the lower
>>> levels in the leveldb hierarchy.
>>>
>>> They are only removed after they have had the chance to "naturally"
>>> migrate upwards in the leveldb hierarchy to the highest level in your data
>>> store.  How long that takes depends on:
>>>  1. The amount of data in your store and the number of levels your LCS
>>> strategy has
>>> 2. The amount of new writes entering the bottom funnel of your leveldb,
>>> forcing upwards compaction and combining
>>>
>>> To give you an idea, I had a similar scenario and ran a (slow,
>>> throttled) delete job on my cluster around December-January.  Here's a
>>> graph of the disk space usage on one node.  Notice the still-diclining
>>> usage long after the cleanup job has finished (sometime in January).  I
>>> tend to think of tombstones in LCS as little bombs that get to explode much
>>> later in time:
>>>
>>> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg
>>>
>>>
>>>
>>> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
>>> paulo.mo...@chaordicsystems.com> wrote:
>>>
>>> I have a similar problem here, I deleted about 30% of a very large CF
>>> using LCS (about 80GB per node), but still my data hasn't shrinked, even if
>>> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
>>> scrub forces a minor compaction?
>>>
>>> Cheers,
>>>
>>> Paulo
>>>
>>>
>>> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy wrote:
>>>
 Yes, running nodetool compact (major compaction) creates one large
 SSTable. This will mess up the heuristics of the SizeTiered strategy (is
 this the compaction strategy you are using?) leading to multiple 'small'
 SSTables alongside the single large SSTable, which results in increased
 read latency. You will incur the operational overhead of having to manage
 compactions if you wish to compact these smaller SSTables. For all these
 reasons it is generally advised to stay away from running compactions
 manually.

 Assuming that this is a production environment and you want to keep
 everything running as smoothly as possible I would reduce the gc_grace on
 the CF, allow automatic minor compactions to kick in and then increase the
 gc_grace once again after the tombstones have been removed.


 On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
 ober...@civicscience.com> wrote:

> So, if I was impatient and just "wanted to make this happen now", I
> could:
>
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) run nodetool compact (*)
> 3.) Change GCGraceSeconds of the CF back to 10 days
>
> Since I have ~900M tombstones, even if I miss a few due to

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread DuyHai Doan
Hello William

 From the doc:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_tuning_bloom_filters_c.html

After updating the value of bloom_filter_fp_chance on a table, Bloom
filters need to be regenerated in one of these ways:

   - Initiate 
compaction
   - Upgrade 
SSTables


Regards

 Duy Hai DOAN


On Mon, Apr 14, 2014 at 3:44 PM, William Oberman
wrote:

> I had a thread on this forum about clearing junk from a CF.  In my case,
> it's ~90% of ~1 billion rows.
>
> One side effect I had hoped for was a reduction in the size of the bloom
> filter.  But, according to nodetool cfstats, it's still fairly large
> (~1.5GB of RAM).
>
> Do bloom filters ever resize themselves when the CF suddenly gets smaller?
>
>
> My next test will be restarting one of the instances, though I'll have to
> wait on that operation so I thought I'd ask in the meantime.
>
> will
>


Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Michal Michalski
Bloom filters are built on creation / rebuild of SSTable. If you removed
the data, but the old SSTables weren't compacted or you didn't rebuild them
manually, bloom filters will stay the same size.

M.

Kind regards,
Michał Michalski,
michal.michal...@boxever.com


On 14 April 2014 14:44, William Oberman  wrote:

> I had a thread on this forum about clearing junk from a CF.  In my case,
> it's ~90% of ~1 billion rows.
>
> One side effect I had hoped for was a reduction in the size of the bloom
> filter.  But, according to nodetool cfstats, it's still fairly large
> (~1.5GB of RAM).
>
> Do bloom filters ever resize themselves when the CF suddenly gets smaller?
>
>
> My next test will be restarting one of the instances, though I'll have to
> wait on that operation so I thought I'd ask in the meantime.
>
> will
>


bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
I had a thread on this forum about clearing junk from a CF.  In my case,
it's ~90% of ~1 billion rows.

One side effect I had hoped for was a reduction in the size of the bloom
filter.  But, according to nodetool cfstats, it's still fairly large
(~1.5GB of RAM).

Do bloom filters ever resize themselves when the CF suddenly gets smaller?

My next test will be restarting one of the instances, though I'll have to
wait on that operation so I thought I'd ask in the meantime.

will


Re: bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
I didn't cross link my thread, but the basic idea is I've done:

1.) Process that deleted ~900M of ~1G rows from a CF
2.) Set GCGraceSeconds to 0 on CF
3.) Run nodetool compact on all N nodes

And I checked, and all N nodes have bloom filters using 1.5 +/- .2 GB of
RAM (I didn't explicitly write down the before numbers, but they seem about
the same) .  So, compaction didn't change the BF's (unless cassandra needs
a 2nd compaction to see all of the data cleared by the 1st compaction).

will


On Mon, Apr 14, 2014 at 9:52 AM, Michal Michalski <
michal.michal...@boxever.com> wrote:

> Bloom filters are built on creation / rebuild of SSTable. If you removed
> the data, but the old SSTables weren't compacted or you didn't rebuild them
> manually, bloom filters will stay the same size.
>
> M.
>
> Kind regards,
> Michał Michalski,
> michal.michal...@boxever.com
>
>
> On 14 April 2014 14:44, William Oberman  wrote:
>
>> I had a thread on this forum about clearing junk from a CF.  In my case,
>> it's ~90% of ~1 billion rows.
>>
>> One side effect I had hoped for was a reduction in the size of the bloom
>> filter.  But, according to nodetool cfstats, it's still fairly large
>> (~1.5GB of RAM).
>>
>> Do bloom filters ever resize themselves when the CF suddenly gets
>> smaller?
>>
>> My next test will be restarting one of the instances, though I'll have to
>> wait on that operation so I thought I'd ask in the meantime.
>>
>> will
>>
>
>


Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Michal Michalski
Did you set Bloom Filter's FP chance before or after the step 3) above? If
you did it before, C* should build Bloom Filters properly. If not - that's
the reason.

Kind regards,
Michał Michalski,
michal.michal...@boxever.com


On 14 April 2014 15:04, William Oberman  wrote:

> I didn't cross link my thread, but the basic idea is I've done:
>
> 1.) Process that deleted ~900M of ~1G rows from a CF
> 2.) Set GCGraceSeconds to 0 on CF
> 3.) Run nodetool compact on all N nodes
>
> And I checked, and all N nodes have bloom filters using 1.5 +/- .2 GB of
> RAM (I didn't explicitly write down the before numbers, but they seem about
> the same) .  So, compaction didn't change the BF's (unless cassandra needs
> a 2nd compaction to see all of the data cleared by the 1st compaction).
>
> will
>
>
> On Mon, Apr 14, 2014 at 9:52 AM, Michal Michalski <
> michal.michal...@boxever.com> wrote:
>
>> Bloom filters are built on creation / rebuild of SSTable. If you removed
>> the data, but the old SSTables weren't compacted or you didn't rebuild them
>> manually, bloom filters will stay the same size.
>>
>> M.
>>
>> Kind regards,
>> Michał Michalski,
>> michal.michal...@boxever.com
>>
>>
>> On 14 April 2014 14:44, William Oberman  wrote:
>>
>>> I had a thread on this forum about clearing junk from a CF.  In my case,
>>> it's ~90% of ~1 billion rows.
>>>
>>> One side effect I had hoped for was a reduction in the size of the bloom
>>> filter.  But, according to nodetool cfstats, it's still fairly large
>>> (~1.5GB of RAM).
>>>
>>> Do bloom filters ever resize themselves when the CF suddenly gets
>>> smaller?
>>>
>>> My next test will be restarting one of the instances, though I'll have
>>> to wait on that operation so I thought I'd ask in the meantime.
>>>
>>> will
>>>
>>
>>
>
>
>


Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Michal Michalski
Sorry, I misread the question - I thought you've also changed FP chance
value, not only removed the data.

Kind regards,
Michał Michalski,
michal.michal...@boxever.com


On 14 April 2014 15:07, Michal Michalski wrote:

> Did you set Bloom Filter's FP chance before or after the step 3) above? If
> you did it before, C* should build Bloom Filters properly. If not - that's
> the reason.
>
> Kind regards,
> Michał Michalski,
> michal.michal...@boxever.com
>
>
> On 14 April 2014 15:04, William Oberman  wrote:
>
>> I didn't cross link my thread, but the basic idea is I've done:
>>
>> 1.) Process that deleted ~900M of ~1G rows from a CF
>> 2.) Set GCGraceSeconds to 0 on CF
>> 3.) Run nodetool compact on all N nodes
>>
>> And I checked, and all N nodes have bloom filters using 1.5 +/- .2 GB of
>> RAM (I didn't explicitly write down the before numbers, but they seem about
>> the same) .  So, compaction didn't change the BF's (unless cassandra needs
>> a 2nd compaction to see all of the data cleared by the 1st compaction).
>>
>> will
>>
>>
>> On Mon, Apr 14, 2014 at 9:52 AM, Michal Michalski <
>> michal.michal...@boxever.com> wrote:
>>
>>> Bloom filters are built on creation / rebuild of SSTable. If you removed
>>> the data, but the old SSTables weren't compacted or you didn't rebuild them
>>> manually, bloom filters will stay the same size.
>>>
>>> M.
>>>
>>> Kind regards,
>>> Michał Michalski,
>>> michal.michal...@boxever.com
>>>
>>>
>>> On 14 April 2014 14:44, William Oberman wrote:
>>>
 I had a thread on this forum about clearing junk from a CF.  In my
 case, it's ~90% of ~1 billion rows.

 One side effect I had hoped for was a reduction in the size of the
 bloom filter.  But, according to nodetool cfstats, it's still fairly large
 (~1.5GB of RAM).

 Do bloom filters ever resize themselves when the CF suddenly gets
 smaller?

 My next test will be restarting one of the instances, though I'll have
 to wait on that operation so I thought I'd ask in the meantime.

 will

>>>
>>>
>>
>>
>>
>


Re: bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
Ah, so I could change the chance value to "poke it".  Good to know!


On Mon, Apr 14, 2014 at 10:12 AM, Michal Michalski <
michal.michal...@boxever.com> wrote:

> Sorry, I misread the question - I thought you've also changed FP chance
> value, not only removed the data.
>
> Kind regards,
> Michał Michalski,
> michal.michal...@boxever.com
>
>
> On 14 April 2014 15:07, Michal Michalski wrote:
>
>> Did you set Bloom Filter's FP chance before or after the step 3) above?
>> If you did it before, C* should build Bloom Filters properly. If not -
>> that's the reason.
>>
>> Kind regards,
>> Michał Michalski,
>> michal.michal...@boxever.com
>>
>>
>> On 14 April 2014 15:04, William Oberman  wrote:
>>
>>> I didn't cross link my thread, but the basic idea is I've done:
>>>
>>> 1.) Process that deleted ~900M of ~1G rows from a CF
>>> 2.) Set GCGraceSeconds to 0 on CF
>>> 3.) Run nodetool compact on all N nodes
>>>
>>> And I checked, and all N nodes have bloom filters using 1.5 +/- .2 GB of
>>> RAM (I didn't explicitly write down the before numbers, but they seem about
>>> the same) .  So, compaction didn't change the BF's (unless cassandra needs
>>> a 2nd compaction to see all of the data cleared by the 1st compaction).
>>>
>>> will
>>>
>>>
>>> On Mon, Apr 14, 2014 at 9:52 AM, Michal Michalski <
>>> michal.michal...@boxever.com> wrote:
>>>
 Bloom filters are built on creation / rebuild of SSTable. If you
 removed the data, but the old SSTables weren't compacted or you didn't
 rebuild them manually, bloom filters will stay the same size.

 M.

 Kind regards,
 Michał Michalski,
 michal.michal...@boxever.com


 On 14 April 2014 14:44, William Oberman wrote:

> I had a thread on this forum about clearing junk from a CF.  In my
> case, it's ~90% of ~1 billion rows.
>
> One side effect I had hoped for was a reduction in the size of the
> bloom filter.  But, according to nodetool cfstats, it's still fairly large
> (~1.5GB of RAM).
>
> Do bloom filters ever resize themselves when the CF suddenly gets
> smaller?
>
> My next test will be restarting one of the instances, though I'll have
> to wait on that operation so I thought I'd ask in the meantime.
>
> will
>


>>>
>>>
>>>
>>
>


Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Mark Reddy
Hi Will,

You can run 'nodetool upgradesstables', this will rewrite the SSTables and
regenerate the bloom filters for those tables, This will reduce their
usage.


Mark


On Mon, Apr 14, 2014 at 3:16 PM, William Oberman
wrote:

> Ah, so I could change the chance value to "poke it".  Good to know!
>
>
>
> On Mon, Apr 14, 2014 at 10:12 AM, Michal Michalski <
> michal.michal...@boxever.com> wrote:
>
>> Sorry, I misread the question - I thought you've also changed FP chance
>> value, not only removed the data.
>>
>> Kind regards,
>> Michał Michalski,
>> michal.michal...@boxever.com
>>
>>
>> On 14 April 2014 15:07, Michal Michalski wrote:
>>
>>> Did you set Bloom Filter's FP chance before or after the step 3) above?
>>> If you did it before, C* should build Bloom Filters properly. If not -
>>> that's the reason.
>>>
>>> Kind regards,
>>> Michał Michalski,
>>> michal.michal...@boxever.com
>>>
>>>
>>> On 14 April 2014 15:04, William Oberman wrote:
>>>
 I didn't cross link my thread, but the basic idea is I've done:

 1.) Process that deleted ~900M of ~1G rows from a CF
 2.) Set GCGraceSeconds to 0 on CF
 3.) Run nodetool compact on all N nodes

 And I checked, and all N nodes have bloom filters using 1.5 +/- .2 GB
 of RAM (I didn't explicitly write down the before numbers, but they seem
 about the same) .  So, compaction didn't change the BF's (unless cassandra
 needs a 2nd compaction to see all of the data cleared by the 1st
 compaction).

 will


 On Mon, Apr 14, 2014 at 9:52 AM, Michal Michalski <
 michal.michal...@boxever.com> wrote:

> Bloom filters are built on creation / rebuild of SSTable. If you
> removed the data, but the old SSTables weren't compacted or you didn't
> rebuild them manually, bloom filters will stay the same size.
>
> M.
>
> Kind regards,
> Michał Michalski,
> michal.michal...@boxever.com
>
>
> On 14 April 2014 14:44, William Oberman wrote:
>
>> I had a thread on this forum about clearing junk from a CF.  In my
>> case, it's ~90% of ~1 billion rows.
>>
>> One side effect I had hoped for was a reduction in the size of the
>> bloom filter.  But, according to nodetool cfstats, it's still fairly 
>> large
>> (~1.5GB of RAM).
>>
>> Do bloom filters ever resize themselves when the CF suddenly gets
>> smaller?
>>
>> My next test will be restarting one of the instances, though I'll
>> have to wait on that operation so I thought I'd ask in the meantime.
>>
>> will
>>
>
>



>>>
>>
>
>
>


Re: C* 1.2.15 Decommission issues

2014-04-14 Thread Jeremiah D Jordan
Russell,
The hinted handoff manager is checking for hints to see if it needs to pass 
those off during the decommission so that the hints don't get lost.  You most 
likely have a lot of hints, or a bunch of tombstones, or something in the table 
causing the query to timeout.  You aren't seeing any other exceptions in your 
logs before the timeout are you?  Raising the read timeout period on your nodes 
before you decommission them, or manually deleting the hints CF, should most 
likely get your past this.  If you delete them, you would then want to make 
sure you ran a full cluster repair when you are done with all of your 
decommissions, to propagate data from any hints you deleted.

-Jeremiah Jordan

On Apr 10, 2014, at 1:08 PM, Russell Bradberry  wrote:

> We have about a 30 node cluster running the latest C* 1.2 series DSE.  One 
> datacenter uses VNodes and the other datacenter has VNodes Disabled (because 
> it is running DSE-Seearch)
> 
> We have been replacing nodes in the VNode datacenter with faster ones and we 
> have yet to have a successful decommission.  Every time we attempt to 
> decommission a node we get an “Operation Timed Out” error and the 
> decommission fails.  We keep retrying it and sometimes it will work and other 
> times we will just give up and force the node removal.  It seems though, that 
> all the data has streamed out of the node before the decommission fails.
> 
> What exactly does it need to read before leaving that would cause this?  We 
> also have noticed that in several nodes after the removal that there are 
> ghost entries for the removed node in the system.peers table and this doesn’t 
> get removed until we restart Cassandra on that node.
> 
> Also, we have noticed that running repairs with VNodes is considerably 
> slower. Is this a misconfiguration? Or is it expected that VNodes repairs 
> will be slow?
> 
> 
> Here is the stack trace from the decommission failure:
> 
> Exception in thread "main" java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
> received only 0 responses.
> at 
> org.apache.cassandra.db.HintedHandOffManager.getHintsSlice(HintedHandOffManager.java:578)
> at 
> org.apache.cassandra.db.HintedHandOffManager.listEndpointsPendingHints(HintedHandOffManager.java:528)
> at 
> org.apache.cassandra.service.StorageService.streamHints(StorageService.java:2925)
> at 
> org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2905)
> at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2866)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
> at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
> at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
> at sun.rmi.transport.Transport$1.run(Transport.java:177)
> at sun.rmi.transport.Transport$1.run(Transport.java:174)
> 

Re: Cassandra Strange behaviour

2014-04-14 Thread Yulian Oifa
Adding some more log:
INFO [FlushWriter:22] 2014-04-14 19:23:13,443 Memtable.java (line 254)
Completed flushing /opt/cassandra/data/USER_DATA/freeNumbers-g-1074-Data.db
(37824462 bytes)
 WARN [CompactionExecutor:258] 2014-04-14 19:23:31,915
CompactionManager.java (line 509) insufficient space to compact all
requested files
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1060-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-890-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-978-Data.db')
 INFO [COMMIT-LOG-WRITER] 2014-04-14 19:23:45,037 CommitLogSegment.java
(line 59) Creating new commitlog segment
/opt/cassandra/commitlog/CommitLog-1397492625037.log
 WARN [CompactionExecutor:258] 2014-04-14 19:23:51,916
CompactionManager.java (line 509) insufficient space to compact all
requested files
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1060-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-890-Data.db')
ERROR [CompactionExecutor:258] 2014-04-14 19:23:51,916
CompactionManager.java (line 513) insufficient space to compact even the
two smallest files, aborting
 INFO [NonPeriodicTasks:1] 2014-04-14 19:24:01,073 MeteredFlusher.java
(line 62) flushing high-traffic column family CFS(Keyspace='USER_DATA',
ColumnFamily='freeNumbers')
 INFO [NonPeriodicTasks:1] 2014-04-14 19:24:01,074 ColumnFamilyStore.java
(line 1128) Enqueuing flush of
Memtable-freeNumbers@1751772888(37509120/308358832
serialized/live bytes, 833536 ops)
 INFO [FlushWriter:22] 2014-04-14 19:24:01,074 Memtable.java (line 237)
Writing Memtable-freeNumbers@1751772888(37509120/308358832 serialized/live
bytes, 833536 ops)
 INFO [FlushWriter:22] 2014-04-14 19:24:02,575 Memtable.java (line 254)
Completed flushing /opt/cassandra/data/USER_DATA/freeNumbers-g-1075-Data.db
(37917606 bytes)

Best regards
Yulian Oifa


Cassandra Strange behaviour

2014-04-14 Thread Yulian Oifa
Hello to all
I have cassandra cluster with 3 nodes and RF=3 writing with Quorum.
Application wrote today several millions of records to specific CF.
After that one of servers went wild , he eats up the disk.
As i see from logs hinted handoff from 2 other servers are occuring to disk
server.
On this server i see that data is flushed to disk each several seconds :
 WARN [CompactionExecutor:249] 2014-04-14 19:17:38,633
CompactionManager.java (line 509) insufficient space to compact all
requested files
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-548-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1060-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-890-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-978-Data.db')
 WARN [CompactionExecutor:249] 2014-04-14 19:17:58,647
CompactionManager.java (line 509) insufficient space to compact all
requested files
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1060-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-890-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-978-Data.db')
 INFO [COMMIT-LOG-WRITER] 2014-04-14 19:18:06,232 CommitLogSegment.java
(line 59) Creating new commitlog segment
/opt/cassandra/commitlog/CommitLog-1397492286232.log
 WARN [CompactionExecutor:249] 2014-04-14 19:18:18,648
CompactionManager.java (line 509) insufficient space to compact all
requested files
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1060-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-890-Data.db')
ERROR [CompactionExecutor:249] 2014-04-14 19:18:18,649
CompactionManager.java (line 513) insufficient space to compact even the
two smallest files, aborting
 INFO [NonPeriodicTasks:1] 2014-04-14 19:18:25,228 MeteredFlusher.java
(line 62) flushing high-traffic column family CFS(Keyspace='USER_DATA',
ColumnFamily='freeNumbers')
 INFO [NonPeriodicTasks:1] 2014-04-14 19:18:25,228 ColumnFamilyStore.java
(line 1128) Enqueuing flush of
Memtable-freeNumbers@1950635535(37693440/309874109
serialized/live bytes, 837632 ops)
 INFO [FlushWriter:22] 2014-04-14 19:18:25,229 Memtable.java (line 237)
Writing Memtable-freeNumbers@1950635535(37693440/309874109 serialized/live
bytes, 837632 ops)
 INFO [FlushWriter:22] 2014-04-14 19:18:26,871 Memtable.java (line 254)
Completed flushing /opt/cassandra/data/USER_DATA/freeNumbers-g-1066-Data.db
(38103944 bytes)
 INFO [CompactionExecutor:251] 2014-04-14 19:18:26,872
CompactionManager.java (line 542) Compacting Minor:
[SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1065-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1063-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1064-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1066-Data.db')]
 INFO [CompactionExecutor:251] 2014-04-14 19:18:26,878
CompactionController.java (line 146) Compacting large row
USER_DATA/freeNumbers:8bdf9678-6d70-11e3-85ab-80e385abf85d (151810145
bytes) incrementally


However total data of this CF is around 4.5 GB , while disk usage for this
CF on this server overcomes 20GB.
I have tried restart of this server , cyclic restart of all servers and no
luck it continues to write data .i can not run compact also
How can i stop that?
Best regards
Yulian Oifa


Re: Replication Factor question

2014-04-14 Thread Robert Coli
On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais  wrote:

> "It is generally not recommended to set a replication factor of 3 if you
> have fewer than six nodes in a data center".
>

I have a detailed post about this somewhere in the archives of this list
(which I can't seem to find right now..) but briefly, the "6-for-3" advice
relates to the percentage of capacity you have remaining when you have a
node down. It has become slightly less accurate over time because vnodes
reduce bootstrap time and there have been other improvements to node
startup time.

If you have fewer than 6 nodes with RF=3, you lose >1/6th of capacity when
you lose a single node, which is a significant percentage of total cluster
capacity. You then lose another meaningful percentage of your capacity when
your existing nodes participate in rebuilding the missing node. If you are
then unlucky enough to lose another node, you are missing a very
significant percentage of your cluster capacity and have to use a
relatively small fraction of it to rebuild the now two down nodes.

I wouldn't generalize the rule of thumb as "don't run under N=RF*2", but
rather as "probably don't run RF=3 under about 6 nodes". IOW, in my view,
the most operationally sane initial number of nodes for RF=3 is likely
closer to 6 than 3.

=Rob


Logs of commitlog files

2014-04-14 Thread Donald Smith
1. With cassandra 2.0.6, we have 547G of files in /var/lib/commitlog/.  I 
started a "nodetool flush" 65 minutes ago; it's still running.  The 17536 
commitlog files have been created in the last 3 days.  (The node has 2.1T of 
sstables data in /var/lib/cassandra/data/.  This is in staging, not prod.) Why 
so many commit logs?  Here are our commitlog-related settings in cassandra.yaml:

commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
# The size of the individual commitlog file segments.  A commitlog
# archiving commitlog segments (see commitlog_archiving.properties),
commitlog_segment_size_in_mb: 32
# Total space to use for commitlogs.  Since commitlog segments are
# segment and remove it.  So a small total commitlog space will tend
# commitlog_total_space_in_mb: 4096

Maybe we should set commitlog_total_space_in_mb to something other than the 
default. According to OpsCenter, commitlog_total_space_in_mb is "None".But 
it seems odd that there'd be so many commit logs.

The node is under heavy write load.   There are about 2900 compactions pending.

We are NOT archiving commitlogs, via commitlog_archiving.properties.

BTW, the documentation for nodetool 
says:
Flush

Flushes memtables (in memory) to SSTables (on disk), which also enables 
CommitLog segments to be deleted.
But even after doing a flush, the /var/lib/commitlog dir still has 1G of files, 
even after waiting 30  minutes.  Each file is 32M in size, plus or minus a few 
bytes.  I tried this on other clusters, with much smaller amounts of data.   
Even restarting Cassandra doesn't help.

I surmise that the 1GB of commit logs are normal: they probably allocate that 
space as a workspace.


Thanks,  Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.com

[AudienceScience]

<>

Re: Replication Factor question

2014-04-14 Thread Tupshin Harper
tl;dr make sure you have enough capacity in the event of node failure. For
light workloads, that can be fulfilled with nodes=rf.

-Tupshin
On Apr 14, 2014 2:35 PM, "Robert Coli"  wrote:

> On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais  wrote:
>
>> "It is generally not recommended to set a replication factor of 3 if you
>> have fewer than six nodes in a data center".
>>
>
> I have a detailed post about this somewhere in the archives of this list
> (which I can't seem to find right now..) but briefly, the "6-for-3" advice
> relates to the percentage of capacity you have remaining when you have a
> node down. It has become slightly less accurate over time because vnodes
> reduce bootstrap time and there have been other improvements to node
> startup time.
>
> If you have fewer than 6 nodes with RF=3, you lose >1/6th of capacity when
> you lose a single node, which is a significant percentage of total cluster
> capacity. You then lose another meaningful percentage of your capacity when
> your existing nodes participate in rebuilding the missing node. If you are
> then unlucky enough to lose another node, you are missing a very
> significant percentage of your cluster capacity and have to use a
> relatively small fraction of it to rebuild the now two down nodes.
>
> I wouldn't generalize the rule of thumb as "don't run under N=RF*2", but
> rather as "probably don't run RF=3 under about 6 nodes". IOW, in my view,
> the most operationally sane initial number of nodes for RF=3 is likely
> closer to 6 than 3.
>
> =Rob
>
>


Re: Intermittent long application pauses on nodes

2014-04-14 Thread Ken Hancock
My searching my list archives shows this thread evaporated.  Was a root
cause ever found?  Very curious.




On Mon, Feb 3, 2014 at 11:52 AM, Benedict Elliott Smith <
belliottsm...@datastax.com> wrote:

> Hi Frank,
>
> The "9391" under RevokeBias is the number of milliseconds spent
> synchronising on the safepoint prior to the VM operation, i.e. the time it
> took to ensure all application threads were stopped. So this is the
> culprit. Notice that the time spent spinning/blocking for the threads we
> are supposed to be waiting on is very low; it looks to me that this is time
> spent waiting for CMS threads to yield, though it is very hard to say with
> absolute certainty. It doesn't look like the issue is actually the
> RevokeBias itself, anyway.
>
> I think we should take this off list. It definitely warrants a ticket,
> though I expect this will be difficult to pin down, so you will have to be
> willing to experiment a bit with us, but we would be very grateful for the
> help. If you can pin down and share a specific workload that triggers this
> we may be able to do it without you though!
>
> It's possible that this is a JVM issue, but if so there may be some
> remedial action we can take anyway. There are some more flags we should
> add, but we can discuss that once you open a ticket. If you could include
> the strange JMX error as well, that might be helpful.
>
> Thanks,
>
> Benedict
>
>
> On 3 February 2014 15:34, Frank Ng  wrote:
>
>> I was able to send SafePointStatistics to another log file via the
>> additional JVM flags and recently noticed a pause of 9.3936600 seconds.
>> Here are the log entries:
>>
>> GC Log file:
>> ---
>> 2014-01-31T07:49:14.755-0500: 137460.842: Total time for which
>> application threads were stopped: 0.1095540 seconds
>> 2014-01-31T07:51:01.870-0500: 137567.957: Total time for which
>> application threads were stopped: 9.3936600 seconds
>> 2014-01-31T07:51:02.537-0500: 137568.623: Total time for which
>> application threads were stopped: 0.1207440 seconds
>>
>> JVM Stdout Log File:
>> ---
>>  vmop [threads: total initially_running
>> wait_to_block][time: spin block sync cleanup vmop] page_trap_count
>> 137460.734: GenCollectForAllocation  [ 421
>> 00   ][ 0 0 23  0 84 ]  0
>>  vmop [threads: total initially_running
>> wait_to_block][time: spin block sync cleanup vmop] page_trap_count
>> 137558.562: RevokeBias   [ 462
>> 29   ][13 0   9391  1  0 ]  0
>> 
>> > ctxk='javax/management/ObjectName$Property'
>> witness='javax/management/ObjectName$PatternProperty' stamp='137568.503'/>
>> 
>>  vmop [threads: total initially_running
>> wait_to_block][time: spin block sync cleanup vmop] page_trap_count
>> 137568.500: Deoptimize   [ 481
>> 15   ][ 0 0118  0  1 ]  0
>>  vmop [threads: total initially_running
>> wait_to_block][time: spin block sync cleanup vmop] page_trap_count
>> 137569.625: no vm operation  [ 483
>> 01   ][ 0 0 18  0  0 ]  0
>>  vmop [threads: total initially_running
>> wait_to_block][time: spin block sync cleanup vmop] page_trap_count
>> 137571.641: no vm operation  [ 483
>> 01   ][ 0 0 42  1  0 ]  0
>>  vmop [threads: total initially_running
>> wait_to_block][time: spin block sync cleanup vmop] page_trap_count
>> 137575.703: no vm operation  [ 483
>> 01   ][ 0 0 25  1  0 ]  0
>>
>> If SafepointStatistics are printed before the Application Stop times,
>> then it seems that the RevokeBias was the cause of the pause.
>> If SafepointStatistics are printed after the Application Stop times, then
>> it seems that the Deoptimize was the cause of the pause.
>> In addition, I see a strange dependency failed error relating to JMX in
>> the JVM stdout log file.
>>
>> thanks
>>
>>
>> On Wed, Jan 29, 2014 at 4:44 PM, Benedict Elliott Smith <
>> belliottsm...@datastax.com> wrote:
>>
>>> Add some more flags: -XX:+UnlockDiagnosticVMOptions -XX:LogFile=${path}
>>> -XX:+LogVMOutput
>>>
>>> I never figured out what kills stdout for C*. It's a library we depend
>>> on, didn't try too hard to figure out which one.
>>>
>>>
>>> On 29 January 2014 21:07, Frank Ng  wrote:
>>>
 Benedict,
 Thanks for the advice.  I've tried turning on
 PrintSafepointStatistics.  However, that info is only sent to the STDOUT
 console.  The cassandra startup script closes the STDOUT when it finishes,
 so nothing is shown for safepoint statistics once it's done starting up.
 Do you know how to startup cassandra and send all 

RE: Lots of commitlog files

2014-04-14 Thread Donald Smith
Another thing.   cassandra.yaml says:

# Total space to use for commitlogs.  Since commitlog segments are
# mmapped, and hence use up address space, the default size is 32
# on 32-bit JVMs, and 1024 on 64-bit JVMs.
#
# If space gets above this value (it will round up to the next nearest
# segment multiple), Cassandra will flush every dirty CF in the oldest
# segment and remove it.  So a small total commitlog space will tend
# to cause more flush activity on less-active columnfamilies.
# commitlog_total_space_in_mb: 4096

We're using a 64 bit linux with a 64 bit JVM:

Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)

but our commit log files are each 32MB in size. Is this indicative of a bug?  
Shouldn't they be 1024MB in size?

  Don

From: Donald Smith
Sent: Monday, April 14, 2014 12:04 PM
To: 'user@cassandra.apache.org'
Subject: Logs of commitlog files

1. With cassandra 2.0.6, we have 547G of files in /var/lib/commitlog/.  I 
started a "nodetool flush" 65 minutes ago; it's still running.  The 17536 
commitlog files have been created in the last 3 days.  (The node has 2.1T of 
sstables data in /var/lib/cassandra/data/.  This is in staging, not prod.) Why 
so many commit logs?  Here are our commitlog-related settings in cassandra.yaml:
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
# The size of the individual commitlog file segments.  A commitlog
# archiving commitlog segments (see commitlog_archiving.properties),
commitlog_segment_size_in_mb: 32
# Total space to use for commitlogs.  Since commitlog segments are
# segment and remove it.  So a small total commitlog space will tend
# commitlog_total_space_in_mb: 4096

Maybe we should set commitlog_total_space_in_mb to something other than the 
default. According to OpsCenter, commitlog_total_space_in_mb is "None".But 
it seems odd that there'd be so many commit logs.

The node is under heavy write load.   There are about 2900 compactions pending.

We are NOT archiving commitlogs, via commitlog_archiving.properties.

BTW, the documentation for nodetool 
says:
Flush

Flushes memtables (in memory) to SSTables (on disk), which also enables 
CommitLog segments to be deleted.
But even after doing a flush, the /var/lib/commitlog dir still has 1G of files, 
even after waiting 30  minutes.  Each file is 32M in size, plus or minus a few 
bytes.  I tried this on other clusters, with much smaller amounts of data.   
Even restarting Cassandra doesn't help.

I surmise that the 1GB of commit logs are normal: they probably allocate that 
space as a workspace.


Thanks,  Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.com

[AudienceScience]

<>