UNSUBSCRIBE

2018-10-28 Thread Flavien Charlon



"Not enough replicas available for query" after reboot

2016-02-04 Thread Flavien Charlon
Hi,

My cluster was running fine. I rebooted all three nodes (one by one), and
now all nodes are back up and running. "nodetool status" shows UP for all
three nodes on all three nodes:

--  AddressLoad   Tokens  OwnsHost ID
Rack
UN  xx.xx.xx.xx331.84 GB  1   ?
d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
UN  xx.xx.xx.xx317.2 GB   1   ?
de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
UN  xx.xx.xx.xx  291.61 GB  1   ?
b489c970-68db-44a7-90c6-be734b41475f  RAC1

However, now the client application fails to run queries on the cluster
with:

Cassandra.UnavailableException: Not enough replicas available for query at
> consistency Quorum (2 required but only 1 alive)


The replication factor is 3. I am running Cassandra 2.1.7.

Any idea where that could come from or how to troubleshoot this further?

Best,
Flavien


Re: "Not enough replicas available for query" after reboot

2016-02-04 Thread Flavien Charlon
Yes, all three nodes see all three nodes as UN.

Also, connecting from a local Cassandra machine using cqlsh, I can run the
same query just fine (with QUORUM consistency level).

On 4 February 2016 at 21:02, Robert Coli  wrote:

> On Thu, Feb 4, 2016 at 12:53 PM, Flavien Charlon <
> flavien.char...@gmail.com> wrote:
>
>> My cluster was running fine. I rebooted all three nodes (one by one), and
>> now all nodes are back up and running. "nodetool status" shows UP for all
>> three nodes on all three nodes:
>>
>> --  AddressLoad   Tokens  OwnsHost ID
>>   Rack
>> UN  xx.xx.xx.xx331.84 GB  1   ?
>> d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
>> UN  xx.xx.xx.xx317.2 GB   1   ?
>> de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
>> UN  xx.xx.xx.xx  291.61 GB  1   ?
>> b489c970-68db-44a7-90c6-be734b41475f  RAC1
>>
>> However, now the client application fails to run queries on the cluster
>> with:
>>
>> Cassandra.UnavailableException: Not enough replicas available for query
>>> at consistency Quorum (2 required but only 1 alive)
>>
>>
> Do *all* nodes see each other as UP/UN?
>
> =Rob
>
>


Re: "Not enough replicas available for query" after reboot

2016-02-04 Thread Flavien Charlon
I'm using the C# driver 2.5.2. I did try to restart the client application,
but that didn't make any difference, I still get the same error after
restart.

On 4 February 2016 at 21:54,  wrote:

> What client are you using?
>
>
>
> It is possible that the client saw nodes down and has kept them marked
> that way (without retrying). Depending on the client, you may have options
> to set in RetryPolicy, FailoverPolicy, etc. A bounce of the client will
> probably fix the problem for now.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Flavien Charlon [mailto:flavien.char...@gmail.com]
> *Sent:* Thursday, February 04, 2016 4:06 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: "Not enough replicas available for query" after reboot
>
>
>
> Yes, all three nodes see all three nodes as UN.
>
>
>
> Also, connecting from a local Cassandra machine using cqlsh, I can run the
> same query just fine (with QUORUM consistency level).
>
>
>
> On 4 February 2016 at 21:02, Robert Coli  wrote:
>
> On Thu, Feb 4, 2016 at 12:53 PM, Flavien Charlon <
> flavien.char...@gmail.com> wrote:
>
> My cluster was running fine. I rebooted all three nodes (one by one), and
> now all nodes are back up and running. "nodetool status" shows UP for all
> three nodes on all three nodes:
>
>
>
> --  AddressLoad   Tokens  OwnsHost ID
>   Rack
>
> UN  xx.xx.xx.xx331.84 GB  1   ?
> d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
>
> UN  xx.xx.xx.xx317.2 GB   1   ?
> de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
>
> UN  xx.xx.xx.xx  291.61 GB  1   ?
> b489c970-68db-44a7-90c6-be734b41475f  RAC1
>
>
>
> However, now the client application fails to run queries on the cluster
> with:
>
>
>
> Cassandra.UnavailableException: Not enough replicas available for query at
> consistency Quorum (2 required but only 1 alive)
>
>
>
> Do *all* nodes see each other as UP/UN?
>
>
>
> =Rob
>
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


Re: "Not enough replicas available for query" after reboot

2016-02-04 Thread Flavien Charlon
No, there was no other change. I did run "apt-get upgrade" before
rebooting, but Cassandra has not been upgraded.

On 4 February 2016 at 22:48, Bryan Cheng  wrote:

> Hey Flavien!
>
> Did your reboot come with any other changes (schema, configuration,
> topology, version)?
>
> On Thu, Feb 4, 2016 at 2:06 PM, Flavien Charlon  > wrote:
>
>> I'm using the C# driver 2.5.2. I did try to restart the client
>> application, but that didn't make any difference, I still get the same
>> error after restart.
>>
>> On 4 February 2016 at 21:54,  wrote:
>>
>>> What client are you using?
>>>
>>>
>>>
>>> It is possible that the client saw nodes down and has kept them marked
>>> that way (without retrying). Depending on the client, you may have options
>>> to set in RetryPolicy, FailoverPolicy, etc. A bounce of the client will
>>> probably fix the problem for now.
>>>
>>>
>>>
>>>
>>>
>>> Sean Durity
>>>
>>>
>>>
>>> *From:* Flavien Charlon [mailto:flavien.char...@gmail.com]
>>> *Sent:* Thursday, February 04, 2016 4:06 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: "Not enough replicas available for query" after reboot
>>>
>>>
>>>
>>> Yes, all three nodes see all three nodes as UN.
>>>
>>>
>>>
>>> Also, connecting from a local Cassandra machine using cqlsh, I can run
>>> the same query just fine (with QUORUM consistency level).
>>>
>>>
>>>
>>> On 4 February 2016 at 21:02, Robert Coli  wrote:
>>>
>>> On Thu, Feb 4, 2016 at 12:53 PM, Flavien Charlon <
>>> flavien.char...@gmail.com> wrote:
>>>
>>> My cluster was running fine. I rebooted all three nodes (one by one),
>>> and now all nodes are back up and running. "nodetool status" shows UP for
>>> all three nodes on all three nodes:
>>>
>>>
>>>
>>> --  AddressLoad   Tokens  OwnsHost ID
>>> Rack
>>>
>>> UN  xx.xx.xx.xx331.84 GB  1   ?
>>> d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
>>>
>>> UN  xx.xx.xx.xx317.2 GB   1   ?
>>> de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
>>>
>>> UN  xx.xx.xx.xx  291.61 GB  1   ?
>>> b489c970-68db-44a7-90c6-be734b41475f  RAC1
>>>
>>>
>>>
>>> However, now the client application fails to run queries on the cluster
>>> with:
>>>
>>>
>>>
>>> Cassandra.UnavailableException: Not enough replicas available for query
>>> at consistency Quorum (2 required but only 1 alive)
>>>
>>>
>>>
>>> Do *all* nodes see each other as UP/UN?
>>>
>>>
>>>
>>> =Rob
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> The information in this Internet Email is confidential and may be
>>> legally privileged. It is intended solely for the addressee. Access to this
>>> Email by anyone else is unauthorized. If you are not the intended
>>> recipient, any disclosure, copying, distribution or any action taken or
>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>> When addressed to our clients any opinions or advice contained in this
>>> Email are subject to the terms and conditions expressed in any applicable
>>> governing The Home Depot terms of business or client engagement letter. The
>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>> content of this attachment and for any damages or losses arising from any
>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>> items of a destructive nature, which may be contained in this attachment
>>> and shall not be liable for direct, indirect, consequential or special
>>> damages in connection with this e-mail message or its attachment.
>>>
>>
>>
>


Re: "Not enough replicas available for query" after reboot

2016-02-04 Thread Flavien Charlon
Yes, that works with consistency ALL.

I restarted one of the Cassandra instances, and seems it's working again
now. I'm not sure what happened.

On 4 February 2016 at 23:48, Peddi, Praveen  wrote:

> Are you able to run queries using cqlsh with consistency ALL?
>
> On Feb 4, 2016, at 6:32 PM, Flavien Charlon 
> wrote:
>
> No, there was no other change. I did run "apt-get upgrade" before
> rebooting, but Cassandra has not been upgraded.
>
> On 4 February 2016 at 22:48, Bryan Cheng  wrote:
>
>> Hey Flavien!
>>
>> Did your reboot come with any other changes (schema, configuration,
>> topology, version)?
>>
>> On Thu, Feb 4, 2016 at 2:06 PM, Flavien Charlon <
>> flavien.char...@gmail.com> wrote:
>>
>>> I'm using the C# driver 2.5.2. I did try to restart the client
>>> application, but that didn't make any difference, I still get the same
>>> error after restart.
>>>
>>> On 4 February 2016 at 21:54,  wrote:
>>>
>>>> What client are you using?
>>>>
>>>>
>>>>
>>>> It is possible that the client saw nodes down and has kept them marked
>>>> that way (without retrying). Depending on the client, you may have options
>>>> to set in RetryPolicy, FailoverPolicy, etc. A bounce of the client will
>>>> probably fix the problem for now.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Sean Durity
>>>>
>>>>
>>>>
>>>> *From:* Flavien Charlon [mailto:flavien.char...@gmail.com]
>>>> *Sent:* Thursday, February 04, 2016 4:06 PM
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Re: "Not enough replicas available for query" after reboot
>>>>
>>>>
>>>>
>>>> Yes, all three nodes see all three nodes as UN.
>>>>
>>>>
>>>>
>>>> Also, connecting from a local Cassandra machine using cqlsh, I can run
>>>> the same query just fine (with QUORUM consistency level).
>>>>
>>>>
>>>>
>>>> On 4 February 2016 at 21:02, Robert Coli  wrote:
>>>>
>>>> On Thu, Feb 4, 2016 at 12:53 PM, Flavien Charlon <
>>>> flavien.char...@gmail.com> wrote:
>>>>
>>>> My cluster was running fine. I rebooted all three nodes (one by one),
>>>> and now all nodes are back up and running. "nodetool status" shows UP for
>>>> all three nodes on all three nodes:
>>>>
>>>>
>>>>
>>>> --  AddressLoad   Tokens  OwnsHost ID
>>>> Rack
>>>>
>>>> UN  xx.xx.xx.xx331.84 GB  1   ?
>>>> d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
>>>>
>>>> UN  xx.xx.xx.xx317.2 GB   1   ?
>>>> de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
>>>>
>>>> UN  xx.xx.xx.xx  291.61 GB  1   ?
>>>> b489c970-68db-44a7-90c6-be734b41475f  RAC1
>>>>
>>>>
>>>>
>>>> However, now the client application fails to run queries on the cluster
>>>> with:
>>>>
>>>>
>>>>
>>>> Cassandra.UnavailableException: Not enough replicas available for query
>>>> at consistency Quorum (2 required but only 1 alive)
>>>>
>>>>
>>>>
>>>> Do *all* nodes see each other as UP/UN?
>>>>
>>>>
>>>>
>>>> =Rob
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> The information in this Internet Email is confidential and may be
>>>> legally privileged. It is intended solely for the addressee. Access to this
>>>> Email by anyone else is unauthorized. If you are not the intended
>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>> When addressed to our clients any opinions or advice contained in this
>>>> Email are subject to the terms and conditions expressed in any applicable
>>>> governing The Home Depot terms of business or client engagement letter. The
>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>> content of this attachment and for any damages or losses arising from any
>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>> items of a destructive nature, which may be contained in this attachment
>>>> and shall not be liable for direct, indirect, consequential or special
>>>> damages in connection with this e-mail message or its attachment.
>>>>
>>>
>>>
>>
>


Compaction failing to trigger

2015-01-18 Thread Flavien Charlon
Hi,

I am using Size Tier Compaction (Cassandra 2.1.2). Minor compaction is not
triggering even though it should. See the SSTables on disk:
http://pastebin.com/PSwZ5mrT

You can see that we have 41 SSTable between 60MB and 85MB, which should
trigger compaction unless I am missing something.

Is that a bug?

Thanks,
Flavien


Re: Compaction failing to trigger

2015-01-18 Thread Flavien Charlon
It's set on all the tables, as I'm using the default for all the tables.
But for that particular table there are 41 SSTables between 60MB and 85MB,
it should only take 4 for the compaction to kick in.

As this is probably a bug and going back in the mailing list archive, it
seems it's already been reported:

   - Is there a workaround?
   - What is the JIRA ticket number?
   - Will it be fixed in 2.1.3?

Thanks
Flavien

On 19 January 2015 at 01:23, 严超  wrote:

> Seems like Size Tier Compaction is based on table, which table did you
> set the compaction strategy?
> A minor compaction does not involve all the tables in a keyspace.
> Ref:
>
> http://datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_configure_compaction_t.html
>
> http://datastax.com/documentation/cql/3.1/cql/cql_reference/tabProp.html?scroll=tabProp__moreCompaction
>
> *Best Regards!*
>
>
> *Chao Yan--**My twitter:Andy Yan @yanchao727
> <https://twitter.com/yanchao727>*
>
>
> *My Weibo:http://weibo.com/herewearenow
> <http://weibo.com/herewearenow>--*
>
> 2015-01-19 3:51 GMT+08:00 Flavien Charlon :
>
>> Hi,
>>
>> I am using Size Tier Compaction (Cassandra 2.1.2). Minor compaction is
>> not triggering even though it should. See the SSTables on disk:
>> http://pastebin.com/PSwZ5mrT
>>
>> You can see that we have 41 SSTable between 60MB and 85MB, which should
>> trigger compaction unless I am missing something.
>>
>> Is that a bug?
>>
>> Thanks,
>> Flavien
>>
>
>


Re: Compaction failing to trigger

2015-01-19 Thread Flavien Charlon
Thanks Roland. Good to know, I will try that. Do you know the JIRA ticket
number of that bug?

Thanks,
Flavien

On 19 January 2015 at 06:15, Roland Etzenhammer 
wrote:

> Hi Flavien,
>
> I hit some problem with minor compations recently (just some days ago) -
> but with many more tables. In my case compactions got not triggered, you
> can check this with nodetool compactionstats.
>
> Reason for me was that those minor compactions did not get triggered since
> there were almost no reads on that tables. Setting 'cold_reads_to_omit' to
> 0 did the job for me:
>
> ALTER TABLE  WITH compaction = {'class':
> 'SizeTieredCompactionStrategy', 'min_threshold': '4', 'max_threshold':
> '32', 'cold_reads_to_omit': 0.0};
>
> Credits to Tyler and Eric for the pointers.
>
> Cheers,
> Roland
>


How do replica become out of sync

2015-01-19 Thread Flavien Charlon
Hi,

When writing to Cassandra using CL = Quorum (or anything less than ALL), is
it correct to say that Cassandra tries to write to all the replica, but
only waits for Quorum?

If so, what can cause some replica to become out of sync when they're all
online?

Thanks
Flavien


Re: How do replica become out of sync

2015-01-19 Thread Flavien Charlon
Thanks Andi. The reason I was asking is that even though my nodes have been
100% available and no write has been rejected, when running an incremental
repair, the logs still indicate that some ranges are out of sync (which
then results in large amounts of compaction), how can this be possible?

I have found this (http://stackoverflow.com/a/20928922/980059) which seems
to indicate this could be because in parallel repairs, the merkle trees are
computed at different times, which results in repair thinking some ranges
are out of sync if the data has changed in the meantime.

Is that accurate? Does sequential repair alleviate this issue (since it
uses snapshots)?

Thanks
Flavien

On 19 January 2015 at 23:57, Andreas Finke 
wrote:

>  Hi,
>
>
>  right, QUORUM means that data is written to all replicas but the
> coordinator waits for QUORUM responses before returning back to client. If
> a replica is out of sync due to network or internal issue than consistency
> is ensured through:
>
>  - HintedHandoff (Automatically
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_about_hh_c.html
> )
> - ReadRepair (Automatically
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dmlClientRequestsRead.html
> )
> - nodetool repair (Manually
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
> )
>
>  Regards
> Andi
>  --
> *From:* Flavien Charlon [flavien.char...@gmail.com]
> *Sent:* 19 January 2015 22:50
> *To:* user@cassandra.apache.org
> *Subject:* How do replica become out of sync
>
>   Hi,
>
>  When writing to Cassandra using CL = Quorum (or anything less than ALL),
> is it correct to say that Cassandra tries to write to all the replica, but
> only waits for Quorum?
>
>  If so, what can cause some replica to become out of sync when they're
> all online?
>
>  Thanks
> Flavien
>


Re: Compaction failing to trigger

2015-01-21 Thread Flavien Charlon
>
> What version of Cassandra are you running?


2.1.2

Are they all "live"? Are there pending compactions, or exceptions regarding
> compactions in your logs?


Yes they are all live according to cfstats. There is no pending compaction
or exception in the logs.

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/


This doesn't really answer my question, I asked whether this particular bug
(which I can't find in JIRA) is planned to be fixed in 2.1.3, not whether
2.1.3 would be production ready.

While we're on this topic, the version numbering is very misleading.
Version which are not recommended for production should be very explicitly
labelled as such (beta for example), and 2.1.0 should really be what you
call now 2.1.6.

Setting 'cold_reads_to_omit' to 0 did the job for me


Thanks, I've tried it, and it works. This should probably be made the
default IMO.

Flavien


On 20 January 2015 at 22:51, Eric Stevens  wrote:

> @Rob - he's probably referring to the thread titled "Reasons for nodes not
> compacting?" where Tyler speculates that the tables are falling below the
> cold read threshold for compaction.  He speculated it may be a bug.  At the
> same time in a different thread, Roland had a similar problem, and Tyler's
> proposed workaround seemed to work for him.
>
> On Tue, Jan 20, 2015 at 3:35 PM, Robert Coli  wrote:
>
>> On Sun, Jan 18, 2015 at 6:06 PM, Flavien Charlon <
>> flavien.char...@gmail.com> wrote:
>>
>>> It's set on all the tables, as I'm using the default for all the tables.
>>> But for that particular table there are 41 SSTables between 60MB and 85MB,
>>> it should only take 4 for the compaction to kick in.
>>>
>>
>> What version of Cassandra are you running?
>>
>> Are they all "live"? Are there pending compactions, or exceptions
>> regarding compactions in your logs?
>>
>>
>>> As this is probably a bug and going back in the mailing list archive, it
>>> seems it's already been reported:
>>>
>>
>> This is a weird statement. Are you saying that you've found it in the
>> mailing list archives? If so, why not paste the threads so those of us who
>> might remember can refer to them?
>>
>>>
>>>- Will it be fixed in 2.1.3?
>>>
>>>
>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>>
>>
>> =Rob
>>
>>
>


Re: How do replica become out of sync

2015-01-21 Thread Flavien Charlon
Quite a few, see here: http://pastebin.com/SMnprHdp. In total about 3,000
ranges across the 3 nodes.

This is with vnodes disabled. It was at least an order of magnitude worse
when we had it enabled.

Flavien

On 20 January 2015 at 22:22, Robert Coli  wrote:

> On Mon, Jan 19, 2015 at 5:44 PM, Flavien Charlon <
> flavien.char...@gmail.com> wrote:
>
>> Thanks Andi. The reason I was asking is that even though my nodes have
>> been 100% available and no write has been rejected, when running an
>> incremental repair, the logs still indicate that some ranges are out of
>> sync (which then results in large amounts of compaction), how can this be
>> possible?
>>
>
> This is most likely, as you conjecture, due to slight differences between
> nodes at the time of Merkle Tree calculation.
>
> How many rows differ?
>
> =Rob
>
>


Re: Does nodetool repair stop the node to answer requests ?

2015-01-22 Thread Flavien Charlon
I don't think you can do nodetool repair on a single node cluster.

Still, one day or another you'll have to reboot your server, at which point
your cluster will be down. If you want high availability, you should use a
3 nodes cluster with RF = 3.

On 22 January 2015 at 18:10, Robert Coli  wrote:

> On Thu, Jan 22, 2015 at 9:36 AM, SEGALIS Morgan 
> wrote:
>
>> So I wondered, does a nodetool repair make the server stop serving
>> requests, or does it just use a lot of ressources but still serves request ?
>>
>
> In pathological cases, repair can cause a node to seriously degrade. If
> you are operating correctly, it just uses lots of resources but still
> serves requests.
>
> =Rob
> http://twitter.com/rcolidba
>


Re: How to deal with too many sstables

2015-02-02 Thread Flavien Charlon
Did you run incremental repair? Incremental repair is broken in 2.1 and
tends to create way too many SSTables.

On 2 February 2015 at 18:05, 曹志富  wrote:

> Hi,all:
> I have 18 nodes C* cluster with cassandra2.1.2.Some nodes have aboud
> 40,000+ sstables.
>
> my compaction strategy is STCS.
>
> Could someone give me some solution to deal with this situation.
>
> Thanks.
> --
> 曹志富
> 手机:18611121927
> 邮箱:caozf.zh...@gmail.com
> 微博:http://weibo.com/boliza/
>


Re: Out of Memory Error While Opening SSTables on Startup

2015-02-10 Thread Flavien Charlon
I already experienced the same problem (hundreds of thousands of SSTables)
with Cassandra 2.1.2. It seems to appear when running an incremental repair
while there is a medium to high insert load on the cluster. The repair goes
in a bad state and starts creating way more SSTables than it should (even
when there should be nothing to repair).

On 10 February 2015 at 15:46, Eric Stevens  wrote:

> This kind of recovery is definitely not my strong point, so feedback on
> this approach would certainly be welcome.
>
> As I understand it, if you really want to keep that data, you ought to be
> able to mv it out of the way to get your node online, then move those files
> in a several thousand at a time, nodetool refresh OpsCenter rollups60 &&
> nodetool compact OpsCenter rollups60; rinse and repeat.  This should let
> you incrementally restore the data in that keyspace without putting so many
> sstables in there that it ooms your cluster again.
>
> On Tue, Feb 10, 2015 at 3:38 PM, Chris Lohfink 
> wrote:
>
>> yeah... probably just 2.1.2 things and not compactions.  Still probably
>> want to do something about the 1.6 million files though.  It may be worth
>> just mv/rm'ing to 60 sec rollup data though unless really attached to it.
>>
>> Chris
>>
>> On Tue, Feb 10, 2015 at 4:04 PM, Paul Nickerson  wrote:
>>
>>> I was having trouble with snapshots failing while trying to repair that
>>> table (
>>> http://www.mail-archive.com/user@cassandra.apache.org/msg40686.html). I
>>> have a repair running on it now, and it seems to be going successfully this
>>> time. I am going to wait for that to finish, then try a manual nodetool
>>> compact. If that goes successfully, then would it be safe to chalk the lack
>>> of compaction on this table in the past up to 2.1.2 problems?
>>>
>>>
>>>  ~ Paul Nickerson
>>>
>>> On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink 
>>> wrote:
>>>
 Your cluster is probably having issues with compactions (with STCS you
 should never have this many).  I would probably punt with
 OpsCenter/rollups60. Turn the node off and move all of the sstables off to
 a different directory for backup (or just rm if you really don't care about
 1 minute metrics), than turn the server back on.

 Once you get your cluster running again go back and investigate why
 compactions stopped, my guess is you hit an exception in past that killed
 your CompactionExecutor and things just built up slowly until you got to
 this point.

 Chris

 On Tue, Feb 10, 2015 at 2:15 PM, Paul Nickerson 
 wrote:

> Thank you Rob. I tried a 12 GiB heap size, and still crashed out.
> There are 1,617,289 files under OpsCenter/rollups60.
>
> Once I downgraded Cassandra to 2.1.1 (apt-get install
> cassandra=2.1.1), I was able to start up Cassandra OK with the default 
> heap
> size formula.
>
> Now my cluster is running multiple versions of Cassandra. I think I
> will downgrade the rest to 2.1.1.
>
>  ~ Paul Nickerson
>
> On Tue, Feb 10, 2015 at 2:05 PM, Robert Coli 
> wrote:
>
>> On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson 
>> wrote:
>>
>>> I am getting an out of memory error why I try to start Cassandra on
>>> one of my nodes. Cassandra will run for a minute, and then exit without
>>> outputting any error in the log file. It is happening while 
>>> SSTableReader
>>> is opening a couple hundred thousand things.
>>>
>> ...
>>
>>> Does anyone know how I might get Cassandra on this node running
>>> again? I'm not very familiar with correctly tuning Java memory 
>>> parameters,
>>> and I'm not sure if that's the right solution in this case anyway.
>>>
>>
>> Try running 2.1.1, and/or increasing heap size beyond 8gb.
>>
>> Are there actually that many SSTables on disk?
>>
>> =Rob
>>
>>
>
>

>>>
>>
>


Re-bootstrap node after disk failure

2015-03-24 Thread Flavien Charlon
Hi,

What is the process to re-bootstrap a node after hard drive failure
(Cassandra 2.1.3)?

This is the same node as previously, but the data folder has been wiped,
and I would like to re-bootstrap it from the data stored on the other nodes
of the cluster (I have RF=3).

I am not using vnodes.

Thanks
Flavien


Re: Re-bootstrap node after disk failure

2015-03-24 Thread Flavien Charlon
Is it what this command does? In that case the documentation is misleading
because it says: "Use this command to bring up a new data center in an
existing cluster", which is not really what I'm trying to do.

On 24 March 2015 at 21:12, Phil Yang  wrote:

> you can use "nodetool rebuild" in this node.
>
> 2015-03-25 9:20 GMT+08:00 Flavien Charlon :
>
>> Hi,
>>
>> What is the process to re-bootstrap a node after hard drive failure
>> (Cassandra 2.1.3)?
>>
>> This is the same node as previously, but the data folder has been wiped,
>> and I would like to re-bootstrap it from the data stored on the other nodes
>> of the cluster (I have RF=3).
>>
>> I am not using vnodes.
>>
>> Thanks
>> Flavien
>>
>
>
>
> --
> Thanks,
> Phil Yang
>
>