Re: missing rows while importing data using sstable loader

2016-02-05 Thread Romain Hardouin
> What is the best practise to create sstables?

When you run a "nodetool flush" Cassandra persists all the memtables on disk, 
i.e. it produces sstables.
(You can create sstables by yourself thanks to  CQLSSTableWriter, but I don't 
think it was the point of your question.)


Re: missing rows while importing data using sstable loader

2016-02-05 Thread Victor Chen
Arindam,

What can you share regarding the source from which you are importing data?
Is it a separate cassandra cluster? If so, how many nodes and datacenters?
What is RF (replication factor) of source cluster? How certain are you that
the rows indeed exist in the set of sstables which you are loading into
sstableloader? I ask b/c as a hypothetical, if you load sstables from a
single node from a 3 node single DC source cluster w/ RF=2, you won't be
importing a full set of the data that existed in the source cluster. In the
aforementioned case, you'd need to load sstables from at least two nodes to
have imported a full set of the data, because of the RF (if RF was 3, then
all you would need is a single node. If RF=1, then you'd need all sstables
from all three nodes).

On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> Hi,
>
> I am importing data to a new cassandra cluster using sstableloader. The
> sstableloader runs without any warning or error. But I am missing around
> 1000 rows.
>
> Any feedback will be highly appreciated.
>
> Kind Regards,
> Arindam Choudhury
>


Re: missing rows while importing data using sstable loader

2016-02-05 Thread Jack Krupansky
I sent a message to DataStax Docs to add this nodetool flush suggestion to
the doc for sstableloader.

-- Jack Krupansky

On Fri, Feb 5, 2016 at 3:35 AM, Romain Hardouin  wrote:

> > What is the best practise to create sstables?
>
> When you run a "nodetool flush" Cassandra persists all the memtables on
> disk, i.e. it produces sstables.
> (You can create sstables by yourself thanks to  CQLSSTableWriter, but I
> don't think it was the point of your question.)
>


Re: Restart Cassandra automatically

2016-02-05 Thread Robert Coli
On Thu, Feb 4, 2016 at 8:26 PM, Debraj Manna 
wrote:

> What is the best way to keep cassandra running? My requirement is if for
> some reason cassandra stops then it should get started automatically.
>
I recommend against this mode of operation. When automatically restarting,
you have no idea how long Cassandra has been stopped and for what reason.
In some cases, you really do not want it to start up and attempt to
participate in whatever cluster it was formerly participating in.

I understand this creates a support overhead, especially with very large
clusters, but it's difficult for me to accept the premise that net
operational safety will be improved by naively restarting nodes.

=Rob


Re: Any tips on how to track down why Cassandra won't cluster?

2016-02-05 Thread Richard L. Burton III
Thanks everyone. The issue was a missing firewall entry in the security
groups that prevented Cassandra from clustering.

So Alain, initially I was using the private IP. I went to irc and someone
had mentioned to use the public IP, although I wasnt doing multi region
clustering.

I should at some point write a santy tool that checks all the configuration
options, validates ports, etc to help automate the validation process.

Cheers,

On Thu, Feb 4, 2016 at 6:37 PM, Alain RODRIGUEZ  wrote:

> Hi Richard,
>
> I think you just can't use EC2Snitch with public IPs.
>
> See
> https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html
>
> Precisely "Because private IPs are used, this snitch does not work across
> multiple regions"
>
> 54.*.*.* looks like a public one.
>
> You can stick with the private IPs (with limitation written above, even if
> you can workaround with a VPN tunnel across Regions). In this case set
> listen address to private IP and comment broadcast_address. You can also
> use the EC2MultiRegionSnitch, but then be careful with broadcast_address
> (public IP) and listen-address (private IP) configuration on the
> cassandra.yaml files and also with ports management on AWS console.
>
> Also, as you nodes already bootstrapped, you might have to clean the
> cassandra folder, usually something like rm -rf /var/lib/cassandra/*
> *warning: *you will loose all the data, but this "cluster" doesn't look
> like a running cluster, only you can know :-).
>
> Any suggestions on how to track down what might trigger this problem
>
>
> This kind of issue might be due to:
>
> - Different cluster names
> - *Bad configuration* (IPs, Snitch + configuration files, ...) <--
> probably your case
> - Ports (firewall, AWS rules...) <-- telnet might be useful here
> - Seeds being differents on the nodes <-- make sure that your seeds are
> the same on every node
>
> Hope this will be enough to get you out of this,
>
> C*heers,
> -
> Alain Rodriguez
> France
>
> The Last Pickle
> http://www.thelastpickle.com
>
>
>
> 2016-02-04 16:35 GMT+00:00 Victor Chen :
>
>> Along the lines of what Ben and Bryan suggested, what are you using to
>> verify ports are open? If you do something like:
>>
>> node1$ nc -zv node2 9042
>> node2$ nc -zv node1 9042
>>
>> does it succeed from both nodes?
>> Does the first node 'know' that it is a seed? i.e. do you have first node
>> listed in its own seed's list?
>> What does the system.log show as both nodes are spun up?
>>
>>
>> On Wed, Feb 3, 2016 at 7:20 PM, Bryan Cheng 
>> wrote:
>>
>>>
>>>
>>>
 On Wed, 3 Feb 2016 at 11:49 Richard L. Burton III 
 wrote:

>
> Any suggestions on how to track down what might trigger this problem?
> I'm not receiving any exceptions.
>

>>> You're not getting "Unable to gossip with any seeds" on the second node?
>>> What does nodetool status show on both machines?
>>>
>>
>>
>


-- 
-Richard L. Burton III
@rburton


Questions about Counter updates.

2016-02-05 Thread Dikang Gu
Hi there,

I have a cluster which has a lot of counter updates. My question is that
when I run the `nodetool tpstats`, I see a lot of MutationStage actions but
no CounterMutationStage stats. I'm wondering is it normal or is it
something I should worry about?

I'm using Cassandra 2.1.8 and the C driver.

Pool NameActive   Pending  Completed   Blocked  All
>> time blocked
>
> CounterMutationStage  0 0  0 0
>> 0
>
> ReadStage 0 0 25 0
>> 0
>
> RequestResponseStage  0 0 21 0
>> 0
>
> MutationStage 0 0   19284070 0
>> 0
>
>
Thanks

-- 
Dikang


Cassandra + OpsWorks

2016-02-05 Thread Richard L. Burton III
Although I have Chef + Knife Solo seeing up my servers, I'm very curious if
anyone is using Cassandra + OpsWorks.

The reason why I ask, it seems like a very good solution to setup servers
in AWS and also scaling it out.

-- 
-Richard L. Burton III
@rburton


Re: Cassandra + OpsWorks

2016-02-05 Thread Will Hayworth
I am! :)

I've made some changes to the community Chef cookbook to make things work
well (like seed search): https://github.com/wsh/cassandra-chef-cookbook.
(Note that that also has some default parameters and other stuff that may
just be applicable for my cluster, YMMV, etc.) My cluster is tiny (12 nodes
across three AZs in two regions--unfortunately, OpsWorks stacks are
per-region, so I have two :/), but so far so good.

___
Will Hayworth
Developer, Engagement Engine
My pronoun is "they". 



On Fri, Feb 5, 2016 at 12:18 PM, Richard L. Burton III 
wrote:

> Although I have Chef + Knife Solo seeing up my servers, I'm very curious
> if anyone is using Cassandra + OpsWorks.
>
> The reason why I ask, it seems like a very good solution to setup servers
> in AWS and also scaling it out.
>
> --
> -Richard L. Burton III
> @rburton
>