Cassandra on Red Hat 6.3

2013-02-18 Thread amulya rattan
I followed step-by-step instructions for installing Cassandra on Red Hat
Linux Server 6.3 from the datastax site, without much success. Apparently
it installs fine but starting cassandra service does nothing(no ports are
bound so opscenter/cli doesnt work). When I check service's status, it
shows "Cassandra dead but pid file exists". When I try launching Cassandra
from /usr/sbin, it throws "Error opening zip file or JAR manifest missing :
/lib/jamm-0.2.5.jar" and stop, so clearly that's why service isn't running.

While I investigate it further, I thought it'd be worthwhile to put this on
the list and see if anybody else saw similar issue. I must point out that
this is fresh machine with fresh Cassandra installation so no conflicts
with any previous installations are possible. So anybody else came across
something similar?

~Amulya


Re: NPE in running "ClientOnlyExample"

2013-02-18 Thread Abhijit Chanda
I hope you have already gone through this link *
https://github.com/zznate/hector-examples*. If not will suggest you to go
through, and you can also refer
http://hector-client.github.com/hector/build/html/documentation.html.


Best Regards,


On Mon, Feb 18, 2013 at 12:15 AM, Jain Rahul  wrote:

> Thanks Edward,
>
> My Bad. I was confused as It does seems to create keyspace also, As I
> understand (although i'm not sure)
>
>List cfDefList = new ArrayList();
> CfDef columnFamily = new CfDef(KEYSPACE, COLUMN_FAMILY);
> cfDefList.add(columnFamily);
> try
> {
> client.system_add_keyspace(new KsDef(KEYSPACE,
> "org.apache.cassandra.locator.SimpleStrategy", 1, cfDefList));
> int magnitude = client.describe_ring(KEYSPACE).size();
>
> Can I request you to please point me to some examples with I can start. I
> try to see some example from hector but it does seems to be in-line with
> Cassandra's 1.1 version.
>
> Regards,
> Rahul
>
>
> -Original Message-
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: 17 February 2013 21:49
> To: user@cassandra.apache.org
> Subject: Re: NPE in running "ClientOnlyExample"
>
> This is a bad example to follow. This is the internal client the Cassandra
> nodes use to talk to each other (fat client) usually you do not use this
> unless you want to write some embedded code on the Cassandra server.
>
> Typically clients use thrift/native transport. But you are likely getting
> the error you are seeing because the keyspace or column family is not
> created yet.
>
> On Sat, Feb 16, 2013 at 11:41 PM, Jain Rahul 
> wrote:
> > Hi All,
> >
> >
> >
> > I am newbie to Cassandra and trying to run an example program
> > "ClientOnlyExample"  taken from
> >
> https://raw.github.com/apache/cassandra/cassandra-1.2/examples/client_only/src/ClientOnlyExample.java
> .
> > But while executing  the program it gives me a null pointer exception.
> > Can you guys please help me out what I am missing.
> >
> >
> >
> > I am using Cassandra 1.2.1 version. I have pasted the logs at
> > http://pastebin.com/pmADWCYe
> >
> >
> >
> > Exception in thread "main" java.lang.NullPointerException
> >
> >   at
> > org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:71)
> >
> >   at
> > org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:66)
> >
> >   at
> > org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:61)
> >
> >   at
> > org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:56)
> >
> >   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:183)
> >
> >   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:204)
> >
> >   at ClientOnlyExample.testWriting(ClientOnlyExample.java:78)
> >
> >   at ClientOnlyExample.main(ClientOnlyExample.java:135)
> >
> >
> >
> > Regards,
> >
> > Rahul
> >
> > This email and any attachments are confidential, and may be legally
> > privileged and protected by copyright. If you are not the intended
> > recipient dissemination or copying of this email is prohibited. If you
> > have received this in error, please notify the sender by replying by
> > email and then delete the email completely from your system. Any views
> > or opinions are solely those of the sender. This communication is not
> > intended to form a binding contract unless expressly indicated to the
> contrary and properly authorised.
> > Any actions taken on the basis of this email are at the recipient's
> > own risk.
> This email and any attachments are confidential, and may be legally
> privileged and protected by copyright. If you are not the intended
> recipient dissemination or copying of this email is prohibited. If you have
> received this in error, please notify the sender by replying by email and
> then delete the email completely from your system. Any views or opinions
> are solely those of the sender. This communication is not intended to form
> a binding contract unless expressly indicated to the contrary and properly
> authorised. Any actions taken on the basis of this email are at the
> recipient's own risk.
>



-- 
Abhijit Chanda
+91-974395


Re: cassandra vs. mongodb quick question

2013-02-18 Thread Vegard Berget
 

Just out of curiosity :

When using compression, does this affect this one way or another?
 Is 300G (compressed) SSTable size, or total size of data?   

.vegard,
- Original Message -
From: user@cassandra.apache.org
To:
Cc:
Sent:Mon, 18 Feb 2013 08:41:25 +1300
Subject:Re: cassandra vs. mongodb quick question

 If you have spinning disk and 1G networking and no virtual nodes, I
would still say 300G to 500G is a soft limit. 
 If you are using virtual nodes, SSD, JBOD disk configuration or
faster networking you may go higher.  
 The limiting factors are the time it take to repair, the time it
takes to replace a node, the memory considerations for 100's of
millions of rows. If you the performance of those operations is
acceptable to you, then go crazy.  
 Cheers  
  - Aaron Morton Freelance Cassandra Developer New
Zealand 
 @aaronmorton http://www.thelastpickle.com [1]   
 On 16/02/2013, at 9:05 AM, "Hiller, Dean"  wrote: 
So I found out mongodb varies their node size from 1T to 42T per node
depending on the profile.  So if I was going to be writing a lot but
rarely changing rows, could I also use cassandra with a per node size
of +20T or is that not advisable?

Thanks,
Dean

 

Links:
--
[1] http://www.thelastpickle.com
[2] mailto:dean.hil...@nrel.gov



Firewall logging to Cassandra

2013-02-18 Thread Sloot, Hans-Peter
Hi,

Is anyone using Cassandra to store firewall logs ?
If so any points to share?

Regards  Hans-Peter



Hans-Peter Sloot
Oracle Technical Expert
Oracle 10g/11g Certified Master
Global Fact ATS NL
T + 31 6 303 83 499
[atos_logotype]






Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel bestemd 
voor de geadresseerde. Indien dit bericht niet voor u is bestemd, verzoeken wij 
u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Aangezien 
de integriteit van het bericht niet veilig gesteld is middels verzending via 
internet, kan Atos Nederland B.V. niet aansprakelijk worden gehouden voor de 
inhoud daarvan. Hoewel wij ons inspannen een virusvrij netwerk te hanteren, 
geven wij geen enkele garantie dat dit bericht virusvrij is, noch aanvaarden 
wij enige aansprakelijkheid voor de mogelijke aanwezigheid van een virus in dit 
bericht. Op al onze rechtsverhoudingen, aanbiedingen en overeenkomsten 
waaronder Atos Nederland B.V. goederen en/of diensten levert zijn met 
uitsluiting van alle andere voorwaarden de Leveringsvoorwaarden van Atos 
Nederland B.V. van toepassing. Deze worden u op aanvraag direct kosteloos 
toegezonden.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Atos Nederland B.V. group liability cannot be 
triggered for the message content. Although the sender endeavours to maintain a 
computer virus-free network, the sender does not warrant that this transmission 
is virus-free and will not be liable for any damages resulting from any virus 
transmitted. On all offers and agreements under which Atos Nederland B.V. 
supplies goods and/or services of whatever nature, the Terms of Delivery from 
Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly 
submitted to you on your request.

Atos Nederland B.V. / Utrecht
KvK Utrecht 30132762
<><>

RE: Mutation dropped

2013-02-18 Thread Kanwar Sangha
Thanks Aaron.

Does the rpc_timeout not control the client timeout ? Is there any param which 
is configurable to control the replication timeout between nodes ? Or the same 
param is used to control that since the other node is also like a client ?



From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: 17 February 2013 11:26
To: user@cassandra.apache.org
Subject: Re: Mutation dropped

You are hitting the maximum throughput on the cluster.

The messages are dropped because the node fails to start processing them before 
rpc_timeout.

However the request is still a success because the client requested CL was 
achieved.

Testing with RF 2 and CL 1 really just tests the disks on one local machine. 
Both nodes replicate each row, and writes are sent to each replica, so the only 
thing the client is waiting on is the local node to write to it's commit log.

Testing with (and running in prod) RF3 and CL QUROUM is a more real world 
scenario.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/02/2013, at 9:42 AM, Kanwar Sangha 
mailto:kan...@mavenir.com>> wrote:


Hi - Is there a parameter which can be tuned to prevent the mutations from 
being dropped ? Is this logic correct ?

Node A and B with RF=2, CL =1. Load balanced between the two.

--  Address   Load   Tokens  Owns (effective)  Host ID  
 Rack
UN  10.x.x.x   746.78 GB  256 100.0%
dbc9e539-f735-4b0b-8067-b97a85522a1a  rack1
UN  10.x.x.x   880.77 GB  256 100.0%
95d59054-be99-455f-90d1-f43981d3d778  rack1

Once we hit a very high TPS (around 50k/sec of inserts), the nodes start 
falling behind and we see the mutation dropped messages. But there are no 
failures on the client. Does that mean other node is not able to persist the 
replicated data ? Is there some timeout associated with replicated data 
persistence ?

Thanks,
Kanwar







From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 14 February 2013 09:08
To: user@cassandra.apache.org
Subject: Mutation dropped

Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a 
lot of mutation dropped messages.  I understand that this is due to the replica 
not being written to the
other node ? RF = 2, CL =1.

>From the wiki -
For MUTATION messages this means that the mutation was not applied to all 
replicas it was sent to. The inconsistency will be repaired by Read Repair or 
Anti Entropy Repair

Thanks,
Kanwar




Re: Nodetool doesn't shows two nodes

2013-02-18 Thread Boris Solovyov
I think it is actually more of a problem that there were no error messages
or other indication of what went wrong in the setup where the nodes
couldn't contact. Should I file issue report on this? Clearly Cassandra
must have tried to contact some IP on port 7000 and failed. Why didn't it
log? That would have saved me about 10 hours :-P


On Sun, Feb 17, 2013 at 11:54 PM, Jared Biel
wrote:

> This is something that I found while using the multi-region snitch -
> it uses public IPs for communication. See the original ticket here:
> https://issues.apache.org/jira/browse/CASSANDRA-2452. It'd be nice if
> it used the private IPs to communicate with nodes that are in the same
> region as itself, but I do not believe this is the case. Be aware that
> you will be charged for external data transfer even for nodes in the
> same region because the traffic will not fall under their free (for
> same AZ) or reduced (for intra-AZ) tiers.
>
> If you continue using this snitch in the mean time, it is not
> necessary (or recommended) to have those ports open to 0.0.0.0/0.
> You'll simply need to add the public IPs of your C* servers to the
> correct security group(s) to allow access.
>
> There's something else that's a little strange about the EC2 snitches:
> "us-east-1" is (incorrectly) represented as the datacenter "us-east".
> Other regions are recognized and named properly (us-west-2 for
> example) This is kind-of covered in the ticket here:
> https://issues.apache.org/jira/browse/CASSANDRA-4026 I wish it could
> be fixed properly.
>
> Good luck!
>
>
> On 17 February 2013 16:16, Boris Solovyov 
> wrote:
> > OK. I got it. I realized that storage_port wasn't actually open between
> the
> > nodes, because it is using the public IP. (I did find this information in
> > the docs, after looking more... it is in section on "Types of snitches."
> It
> > explains everything I found by try and error.)
> >
> > After opening this port 7000 to all IP addresses, the cluster boots OK
> and
> > the two nodes see each other. Now I have the happy result. But my nodes
> are
> > wide open to the entire internet on port 7000. This is a serious problem.
> > This obviously can't be put into production.
> >
> > I definitely need cross-continent deployment. Single AZ or single region
> > deployment is not going to be enough. How do people solve this in
> practice?
>


Re: Cassandra Geospatial Search

2013-02-18 Thread Hiller, Dean
We have not quite gotten to it yet and has been driven by paying customers
at this time and there is one customer who wants it but they themselves
keep pushing it out for other things they want.

Thanks,
Dean

On 2/15/13 4:16 PM, "Drew Kutcharian"  wrote:

>Hey Dean, do you guys have any thoughts on how to implement it yet?
>
>On Feb 15, 2013, at 6:18 AM, "Hiller, Dean"  wrote:
>
>> Yes, this is in PlayOrm's roadmap as well but not there yet.
>> 
>> Dean
>> 
>> On 2/13/13 6:42 PM, "Drew Kutcharian"  wrote:
>> 
>>> Hi Guys,
>>> 
>>> Has anyone on this mailing list tried to build a bounding box style
>>>(get
>>> the records inside a known bounding box) geospatial search? I've been
>>> researching this a bit and seems like the only attempt at this was by
>>> SimpleGeo guys, but there isn't much public info out there on how they
>>> did it besides the a video.
>>> 
>>> -- Drew
>>> 
>> 
>



Re: Nodetool doesn't shows two nodes

2013-02-18 Thread Edward Capriolo
These issues are more cloud specific then they are cassandra specific.
Cloud executives tell me in white papers that cloud is awesome and you
can fire all your sysadmins and network people and save money.

This is what happens when you believe cloud executives and their white
papers, you spend 10+ hours troubleshooting cloud networking problems.

On Mon, Feb 18, 2013 at 9:12 AM, Boris Solovyov
 wrote:
> I think it is actually more of a problem that there were no error messages
> or other indication of what went wrong in the setup where the nodes couldn't
> contact. Should I file issue report on this? Clearly Cassandra must have
> tried to contact some IP on port 7000 and failed. Why didn't it log? That
> would have saved me about 10 hours :-P
>
>
> On Sun, Feb 17, 2013 at 11:54 PM, Jared Biel 
> wrote:
>>
>> This is something that I found while using the multi-region snitch -
>> it uses public IPs for communication. See the original ticket here:
>> https://issues.apache.org/jira/browse/CASSANDRA-2452. It'd be nice if
>> it used the private IPs to communicate with nodes that are in the same
>> region as itself, but I do not believe this is the case. Be aware that
>> you will be charged for external data transfer even for nodes in the
>> same region because the traffic will not fall under their free (for
>> same AZ) or reduced (for intra-AZ) tiers.
>>
>> If you continue using this snitch in the mean time, it is not
>> necessary (or recommended) to have those ports open to 0.0.0.0/0.
>> You'll simply need to add the public IPs of your C* servers to the
>> correct security group(s) to allow access.
>>
>> There's something else that's a little strange about the EC2 snitches:
>> "us-east-1" is (incorrectly) represented as the datacenter "us-east".
>> Other regions are recognized and named properly (us-west-2 for
>> example) This is kind-of covered in the ticket here:
>> https://issues.apache.org/jira/browse/CASSANDRA-4026 I wish it could
>> be fixed properly.
>>
>> Good luck!
>>
>>
>> On 17 February 2013 16:16, Boris Solovyov 
>> wrote:
>> > OK. I got it. I realized that storage_port wasn't actually open between
>> > the
>> > nodes, because it is using the public IP. (I did find this information
>> > in
>> > the docs, after looking more... it is in section on "Types of snitches."
>> > It
>> > explains everything I found by try and error.)
>> >
>> > After opening this port 7000 to all IP addresses, the cluster boots OK
>> > and
>> > the two nodes see each other. Now I have the happy result. But my nodes
>> > are
>> > wide open to the entire internet on port 7000. This is a serious
>> > problem.
>> > This obviously can't be put into production.
>> >
>> > I definitely need cross-continent deployment. Single AZ or single region
>> > deployment is not going to be enough. How do people solve this in
>> > practice?
>
>


Re: Nodetool doesn't shows two nodes

2013-02-18 Thread Boris Solovyov
I don't think it is cloud at all, and I am no newcomer to sysadmin (though
am relative new to AWS cloud). The mistake is clearly mine, but also
clearly easy to make -- so I assume a lot of other people must make it too.
But the logs don't provide any guidance. Or is this another mistake I make,
which preventes the logs helping me?

Another thing I saw, by the way, the docs say you can clear out your
Cassandra data by "rm -rf /var/lib/cassandra/*" but actually it should be
/var/lib/cassandra/*/*. In RHEL setup, /var/lib/cassandra has root
ownership, so if you remove the 3 subdirectory  under it, Cassandra can't
create directories it needs to run. (I guess the docs assume you are
starting Cassandra by executing the binary, running it as root.) In any
case, if you then start the service, and the directories doesn't exists,
Cassandra dies. But what is the log message? Something totally obscure that
has nothing to do with "my data directory doesn't exists and I can't make
it" :-D

Dont' get me wrong, I am not complaining, so far things are going as well
as I expect from learning complex new opensource software! But just to
point out although all software cannot be perfect documented and have
perfect log messages, there probably 5% of problems/mistakes that made 95%
of time, and "firewall not open on port 7000" or "data directory not there"
seems in those, good idea to have helpful specific log messages. Now, that
my opinion, what is yours, should I file feature requests? As newcomer to
Cassandra I don't want to just walk in like bull in china shop and start
telling everyone what is wrong they should fix. To MAKE ME HAPPY :-D


On Mon, Feb 18, 2013 at 9:44 AM, Edward Capriolo wrote:

> These issues are more cloud specific then they are cassandra specific.
> Cloud executives tell me in white papers that cloud is awesome and you
> can fire all your sysadmins and network people and save money.
>
> This is what happens when you believe cloud executives and their white
> papers, you spend 10+ hours troubleshooting cloud networking problems.
>
> On Mon, Feb 18, 2013 at 9:12 AM, Boris Solovyov
>  wrote:
> > I think it is actually more of a problem that there were no error
> messages
> > or other indication of what went wrong in the setup where the nodes
> couldn't
> > contact. Should I file issue report on this? Clearly Cassandra must have
> > tried to contact some IP on port 7000 and failed. Why didn't it log? That
> > would have saved me about 10 hours :-P
> >
> >
> > On Sun, Feb 17, 2013 at 11:54 PM, Jared Biel <
> jared.b...@bolderthinking.com>
> > wrote:
> >>
> >> This is something that I found while using the multi-region snitch -
> >> it uses public IPs for communication. See the original ticket here:
> >> https://issues.apache.org/jira/browse/CASSANDRA-2452. It'd be nice if
> >> it used the private IPs to communicate with nodes that are in the same
> >> region as itself, but I do not believe this is the case. Be aware that
> >> you will be charged for external data transfer even for nodes in the
> >> same region because the traffic will not fall under their free (for
> >> same AZ) or reduced (for intra-AZ) tiers.
> >>
> >> If you continue using this snitch in the mean time, it is not
> >> necessary (or recommended) to have those ports open to 0.0.0.0/0.
> >> You'll simply need to add the public IPs of your C* servers to the
> >> correct security group(s) to allow access.
> >>
> >> There's something else that's a little strange about the EC2 snitches:
> >> "us-east-1" is (incorrectly) represented as the datacenter "us-east".
> >> Other regions are recognized and named properly (us-west-2 for
> >> example) This is kind-of covered in the ticket here:
> >> https://issues.apache.org/jira/browse/CASSANDRA-4026 I wish it could
> >> be fixed properly.
> >>
> >> Good luck!
> >>
> >>
> >> On 17 February 2013 16:16, Boris Solovyov 
> >> wrote:
> >> > OK. I got it. I realized that storage_port wasn't actually open
> between
> >> > the
> >> > nodes, because it is using the public IP. (I did find this information
> >> > in
> >> > the docs, after looking more... it is in section on "Types of
> snitches."
> >> > It
> >> > explains everything I found by try and error.)
> >> >
> >> > After opening this port 7000 to all IP addresses, the cluster boots OK
> >> > and
> >> > the two nodes see each other. Now I have the happy result. But my
> nodes
> >> > are
> >> > wide open to the entire internet on port 7000. This is a serious
> >> > problem.
> >> > This obviously can't be put into production.
> >> >
> >> > I definitely need cross-continent deployment. Single AZ or single
> region
> >> > deployment is not going to be enough. How do people solve this in
> >> > practice?
> >
> >
>


Re: Both nodes own 100% of cluster

2013-02-18 Thread Víctor Hugo Oliveira Molinar
Why have u assigned for both nodes a genenerated token? And how you
calculated it?

Shouldnt u choose one of them to has its token as the '0' start value?
At least that is what is said on the tutorials I've read.


On Mon, Feb 18, 2013 at 2:55 PM, Boris Solovyov wrote:

> What does the it mean that each node owns effective 100% of cluster? Both
> nodes report same output.
>
> [ec2-user@ip-10-152-162-228 ~]$ nodetool status
> Datacenter: us-east
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens  Owns (effective)  Host ID
>   Rack
> UN  10.152.162.22862.98 KB   256 100.0%
>  7c50a482-1a0b-4dda-a58c-9232c2f18149  1a
> UN  10.147.166.20760.98 KB   256 100.0%
>  4aebbf59-dbe5-4736-a7b7-6a59611e66e5  1a
>
>


Re: Both nodes own 100% of cluster

2013-02-18 Thread Boris Solovyov
These are running the latest Cassandra 1.2 with 256 vnodes each.


On Mon, Feb 18, 2013 at 2:07 PM, Víctor Hugo Oliveira Molinar <
vhmoli...@gmail.com> wrote:

> Why have u assigned for both nodes a genenerated token? And how you
> calculated it?
>
> Shouldnt u choose one of them to has its token as the '0' start value?
> At least that is what is said on the tutorials I've read.
>
>
> On Mon, Feb 18, 2013 at 2:55 PM, Boris Solovyov 
> wrote:
>
>> What does the it mean that each node owns effective 100% of cluster? Both
>> nodes report same output.
>>
>> [ec2-user@ip-10-152-162-228 ~]$ nodetool status
>> Datacenter: us-east
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address   Load   Tokens  Owns (effective)  Host ID
>> Rack
>> UN  10.152.162.22862.98 KB   256 100.0%
>>  7c50a482-1a0b-4dda-a58c-9232c2f18149  1a
>> UN  10.147.166.20760.98 KB   256 100.0%
>>  4aebbf59-dbe5-4736-a7b7-6a59611e66e5  1a
>>
>>
>


Re: Both nodes own 100% of cluster

2013-02-18 Thread Alain RODRIGUEZ
I don't know cassandra 1.2 very well, 100% of  effective own probably
probably means that you're running with RF = number of nodes.

Alain


2013/2/18 Boris Solovyov 

> These are running the latest Cassandra 1.2 with 256 vnodes each.
>
>
> On Mon, Feb 18, 2013 at 2:07 PM, Víctor Hugo Oliveira Molinar <
> vhmoli...@gmail.com> wrote:
>
>> Why have u assigned for both nodes a genenerated token? And how you
>> calculated it?
>>
>> Shouldnt u choose one of them to has its token as the '0' start value?
>> At least that is what is said on the tutorials I've read.
>>
>>
>> On Mon, Feb 18, 2013 at 2:55 PM, Boris Solovyov > > wrote:
>>
>>> What does the it mean that each node owns effective 100% of cluster?
>>> Both nodes report same output.
>>>
>>> [ec2-user@ip-10-152-162-228 ~]$ nodetool status
>>> Datacenter: us-east
>>> ===
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address   Load   Tokens  Owns (effective)  Host ID
>>> Rack
>>> UN  10.152.162.22862.98 KB   256 100.0%
>>>  7c50a482-1a0b-4dda-a58c-9232c2f18149  1a
>>> UN  10.147.166.20760.98 KB   256 100.0%
>>>  4aebbf59-dbe5-4736-a7b7-6a59611e66e5  1a
>>>
>>>
>>
>


Re: Both nodes own 100% of cluster

2013-02-18 Thread Hiller, Dean
Yes, for instance I have 6 nodes and have 50% ownership because I have RF=3, 
and 6/3 = 2 virtual entities that are written to which means each node owns 50%.

Dean

From: Alain RODRIGUEZ mailto:arodr...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, February 18, 2013 12:23 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Both nodes own 100% of cluster

I don't know cassandra 1.2 very well, 100% of  effective own probably probably 
means that you're running with RF = number of nodes.

Alain


2013/2/18 Boris Solovyov 
mailto:boris.solov...@gmail.com>>
These are running the latest Cassandra 1.2 with 256 vnodes each.


On Mon, Feb 18, 2013 at 2:07 PM, Víctor Hugo Oliveira Molinar 
mailto:vhmoli...@gmail.com>> wrote:
Why have u assigned for both nodes a genenerated token? And how you calculated 
it?

Shouldnt u choose one of them to has its token as the '0' start value?
At least that is what is said on the tutorials I've read.


On Mon, Feb 18, 2013 at 2:55 PM, Boris Solovyov 
mailto:boris.solov...@gmail.com>> wrote:
What does the it mean that each node owns effective 100% of cluster? Both nodes 
report same output.

[ec2-user@ip-10-152-162-228 ~]$ nodetool status
Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host ID  
 Rack
UN  10.152.162.22862.98 KB   256 100.0%
7c50a482-1a0b-4dda-a58c-9232c2f18149  1a
UN  10.147.166.20760.98 KB   256 100.0%
4aebbf59-dbe5-4736-a7b7-6a59611e66e5  1a






Re: Both nodes own 100% of cluster

2013-02-18 Thread Boris Solovyov
That makes sense, thanks.


On Mon, Feb 18, 2013 at 2:26 PM, Hiller, Dean  wrote:

> Yes, for instance I have 6 nodes and have 50% ownership because I have
> RF=3, and 6/3 = 2 virtual entities that are written to which means each
> node owns 50%.


Re: Deleting old items during compaction (WAS: Deleting old items)

2013-02-18 Thread aaron morton
Sorry, missed the Counters part.

You are probably interested in this one 
https://issues.apache.org/jira/browse/CASSANDRA-5228

Add your need to ticket to help it along. IMHO if you have write once, read 
many time series data the SSTables are effectively doing horizontal 
partitioning for you. So been able to "drop a partition" would make life 
easier. 

If you can delete entire row then the deletes have less impact than per column. 
However the old rows will not be purged from disk unless all fragments of the 
row are involved in a compaction process. So it may take some time to purge 
from disk, depending on the workload. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/02/2013, at 10:43 AM, Ilya Grebnov  wrote:

> According to https://issues.apache.org/jira/browse/CASSANDRA-2103 There is no 
> support for time to live (TTL) on counter columns. Did I miss something?
>  
> Thanks,
> Ilya
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Sunday, February 17, 2013 9:16 AM
> To: user@cassandra.apache.org
> Subject: Re: Deleting old items during compaction (WAS: Deleting old items)
>  
> That's what the TTL does. 
>  
> Manually delete all the older data now, then start using TTL. 
>  
> Cheers
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 13/02/2013, at 11:08 PM, Ilya Grebnov  wrote:
> 
> 
> Hi,
>  
> We looking for solution for same problem. We have a wide column family with 
> counters and we want to delete old data like 1 months old. One of potential 
> ideas was to implement hook in compaction code and drop column which we don’t 
> need. Is this a viable option?
>  
> Thanks,
> Ilya
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Tuesday, February 12, 2013 9:01 AM
> To: user@cassandra.apache.org
> Subject: Re: Deleting old items
>  
> So is it possible to delete all the data inserted in some CF between 2 dates 
> or data older than 1 month ?
> No. 
>  
> You need to issue row level deletes. If you don't know the row key you'll 
> need to do range scans to locate them. 
>  
> If you are deleting parts of wide rows consider reducing the 
> min_compaction_level_threshold on the CF to 2
>  
> Cheers
>  
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:
> 
> 
> 
> Hi,
>  
> I would like to know if there is a way to delete old/unused data easily ?
>  
> I know about TTL but there are 2 limitations of TTL:
>  
> - AFAIK, there is no TTL on counter columns
> - TTL need to be defined at write time, so it's too late for data already 
> inserted.
>  
> I also could use a standard "delete" but it seems inappropriate for such a 
> massive.
>  
> In some cases, I don't know the row key and would like to delete all the rows 
> starting by, let's say, "1050#..." 
>  
> Even better, I understood that columns are always inserted in C* with (name, 
> value, timestamp). So is it possible to delete all the data inserted in some 
> CF between 2 dates or data older than 1 month ?
>  
> Alain
>  



Re: nodetool repair with vnodes

2013-02-18 Thread aaron morton
> So, running it periodically on just one node is enough for cluster 
> maintenance ? 
In the special case where you have RF == Number of nodes. 

The recommended approach is to use -pr and run it on each node periodically. 

> Also: running it with -pr does output:
That does not look right. There should be messages about requesting and 
receiving merkle tree's from other nodes, and that certain CF's are in sync. 
These are all logged from the AntiEntropyService.

> Is there a way to run it only for all vnodes on a single physical node ?
it should be doing that. 

Look for messages like this in the log 
logger.info(String.format("[repair #%s] new session: will sync %s 
on range %s for %s.%s", getName(), repairedNodes(), range, tablename, 
Arrays.toString(cfnames)));

They say how much is going to be synced, and with what. Try running repair with 
-pr on one of nodes not already repaired. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/02/2013, at 11:12 AM, Marco Matarazzo  wrote:

>>> So, to me, it's like the "nodetool repair" command is running always on the 
>>> same single node and repairing everything.
>> If you use nodetool repair without the -pr flag in your setup (3 nodes and I 
>> assume RF 3) it will repair all token ranges in the cluster. 
> 
> That's correct, 3 nodes and RF 3. Sorry for not specifying it in the 
> beginning.
> 
> 
> So, running it periodically on just one node is enough for cluster 
> maintenance ? Does this depends on the fact that every vnode data is related 
> with the previous and next vnode, and this particular setup makes this enough 
> as it cover every physical node?
> 
> 
> Also: running it with -pr does output:
> 
> [2013-02-17 12:29:25,293] Nothing to repair for keyspace 'system'
> [2013-02-17 12:29:25,301] Starting repair command #2, repairing 1 ranges for 
> keyspace keyspace_test
> [2013-02-17 12:29:28,028] Repair session 487d0650-78f5-11e2-a73a-2f5b109ee83c 
> for range (-9177680845984855691,-9171525326632276709] finished
> [2013-02-17 12:29:28,028] Repair command #2 finished
> 
> … that, as far as I can understand, works on the first vnode on the specified 
> node, or so it seems from the output range. Am I right? Is there a way to run 
> it only for all vnodes on a single physical node ?
> 
> Thank you!
> 
> --
> Marco Matarazzo



Re: Cassandra on Red Hat 6.3

2013-02-18 Thread aaron morton
Nothing jumps out. 

Check /var/log/cassandra/output.log , that's where stdout and std err are 
directed. 

Check file permissions. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/02/2013, at 9:08 PM, amulya rattan  wrote:

> I followed step-by-step instructions for installing Cassandra on Red Hat 
> Linux Server 6.3 from the datastax site, without much success. Apparently it 
> installs fine but starting cassandra service does nothing(no ports are bound 
> so opscenter/cli doesnt work). When I check service's status, it shows 
> "Cassandra dead but pid file exists". When I try launching Cassandra from 
> /usr/sbin, it throws "Error opening zip file or JAR manifest missing : 
> /lib/jamm-0.2.5.jar" and stop, so clearly that's why service isn't running. 
> 
> While I investigate it further, I thought it'd be worthwhile to put this on 
> the list and see if anybody else saw similar issue. I must point out that 
> this is fresh machine with fresh Cassandra installation so no conflicts with 
> any previous installations are possible. So anybody else came across 
> something similar?
> 
> ~Amulya



Re: NPE in running "ClientOnlyExample"

2013-02-18 Thread aaron morton
An you can never go wrong relying on the documentation for the python pycassa 
library, it has some handy tutorials for getting started. 

http://pycassa.github.com/pycassa/

cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/02/2013, at 9:51 PM, Abhijit Chanda  wrote:

> I hope you have already gone through this link 
> https://github.com/zznate/hector-examples. If not will suggest you to go 
> through, and you can also refer 
> http://hector-client.github.com/hector/build/html/documentation.html.  
> 
> 
> Best Regards,
> 
> 
> On Mon, Feb 18, 2013 at 12:15 AM, Jain Rahul  wrote:
> Thanks Edward,
> 
> My Bad. I was confused as It does seems to create keyspace also, As I 
> understand (although i'm not sure)
> 
>List cfDefList = new ArrayList();
> CfDef columnFamily = new CfDef(KEYSPACE, COLUMN_FAMILY);
> cfDefList.add(columnFamily);
> try
> {
> client.system_add_keyspace(new KsDef(KEYSPACE, 
> "org.apache.cassandra.locator.SimpleStrategy", 1, cfDefList));
> int magnitude = client.describe_ring(KEYSPACE).size();
> 
> Can I request you to please point me to some examples with I can start. I try 
> to see some example from hector but it does seems to be in-line with 
> Cassandra's 1.1 version.
> 
> Regards,
> Rahul
> 
> 
> -Original Message-
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: 17 February 2013 21:49
> To: user@cassandra.apache.org
> Subject: Re: NPE in running "ClientOnlyExample"
> 
> This is a bad example to follow. This is the internal client the Cassandra 
> nodes use to talk to each other (fat client) usually you do not use this 
> unless you want to write some embedded code on the Cassandra server.
> 
> Typically clients use thrift/native transport. But you are likely getting the 
> error you are seeing because the keyspace or column family is not created yet.
> 
> On Sat, Feb 16, 2013 at 11:41 PM, Jain Rahul  wrote:
> > Hi All,
> >
> >
> >
> > I am newbie to Cassandra and trying to run an example program
> > "ClientOnlyExample"  taken from
> > https://raw.github.com/apache/cassandra/cassandra-1.2/examples/client_only/src/ClientOnlyExample.java.
> > But while executing  the program it gives me a null pointer exception.
> > Can you guys please help me out what I am missing.
> >
> >
> >
> > I am using Cassandra 1.2.1 version. I have pasted the logs at
> > http://pastebin.com/pmADWCYe
> >
> >
> >
> > Exception in thread "main" java.lang.NullPointerException
> >
> >   at
> > org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:71)
> >
> >   at
> > org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:66)
> >
> >   at
> > org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:61)
> >
> >   at
> > org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:56)
> >
> >   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:183)
> >
> >   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:204)
> >
> >   at ClientOnlyExample.testWriting(ClientOnlyExample.java:78)
> >
> >   at ClientOnlyExample.main(ClientOnlyExample.java:135)
> >
> >
> >
> > Regards,
> >
> > Rahul
> >
> > This email and any attachments are confidential, and may be legally
> > privileged and protected by copyright. If you are not the intended
> > recipient dissemination or copying of this email is prohibited. If you
> > have received this in error, please notify the sender by replying by
> > email and then delete the email completely from your system. Any views
> > or opinions are solely those of the sender. This communication is not
> > intended to form a binding contract unless expressly indicated to the 
> > contrary and properly authorised.
> > Any actions taken on the basis of this email are at the recipient's
> > own risk.
> This email and any attachments are confidential, and may be legally 
> privileged and protected by copyright. If you are not the intended recipient 
> dissemination or copying of this email is prohibited. If you have received 
> this in error, please notify the sender by replying by email and then delete 
> the email completely from your system. Any views or opinions are solely those 
> of the sender. This communication is not intended to form a binding contract 
> unless expressly indicated to the contrary and properly authorised. Any 
> actions taken on the basis of this email are at the recipient's own risk.
> 
> 
> 
> -- 
> Abhijit Chanda
> +91-974395



Re: cassandra vs. mongodb quick question

2013-02-18 Thread aaron morton
My experience is repair of 300GB compressed data takes longer than 300GB of 
uncompressed, but I cannot point to an exact number. Calculating the 
differences is mostly CPU bound and works on the non compressed data. 

Streaming uses compression (after uncompressing the on disk data).

So if you have 300GB of compressed data, take a look at how long repair takes 
and see if you are comfortable with that. You may also want to test replacing a 
node so you can get the procedure documented and understand how long it takes.  

The idea of the soft 300GB to 500GB limit cam about because of a number of 
cases where people had 1 TB on a single node and they were surprised it took 
days to repair or replace. If you know how long things may take, and that fits 
in your operations then go with it. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/02/2013, at 10:08 PM, Vegard Berget  wrote:

>  
> Just out of curiosity :
> 
> When using compression, does this affect this one way or another?  Is 300G 
> (compressed) SSTable size, or total size of data?   
> 
> .vegard,
> 
> 
> - Original Message -
> From:
> user@cassandra.apache.org
> 
> To:
> 
> Cc:
> 
> Sent:
> Mon, 18 Feb 2013 08:41:25 +1300
> Subject:
> Re: cassandra vs. mongodb quick question
> 
> 
> If you have spinning disk and 1G networking and no virtual nodes, I would 
> still say 300G to 500G is a soft limit. 
> 
> If you are using virtual nodes, SSD, JBOD disk configuration or faster 
> networking you may go higher. 
> 
> The limiting factors are the time it take to repair, the time it takes to 
> replace a node, the memory considerations for 100's of millions of rows. If 
> you the performance of those operations is acceptable to you, then go crazy. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 16/02/2013, at 9:05 AM, "Hiller, Dean"  wrote:
> 
> So I found out mongodb varies their node size from 1T to 42T per node 
> depending on the profile.  So if I was going to be writing a lot but rarely 
> changing rows, could I also use cassandra with a per node size of +20T or is 
> that not advisable?
> 
> Thanks,
> Dean
> 



Re: Firewall logging to Cassandra

2013-02-18 Thread aaron morton
You may be interested in something like this for connecting to flume 
https://github.com/thobbs/flume-cassandra-plugin

There is probably something similar for kafka out there. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/02/2013, at 10:14 PM, "Sloot, Hans-Peter"  
wrote:

> Hi,
>  
> Is anyone using Cassandra to store firewall logs ?
> If so any points to share?
>  
> Regards  Hans-Peter
>  
>  
> 
> Hans-Peter Sloot
> Oracle Technical Expert
> Oracle 10g/11g Certified Master
> Global Fact ATS NL
> T + 31 6 303 83 499
> 
>  
> 
> 
> 
> 
> 
> Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel bestemd 
> voor de geadresseerde. Indien dit bericht niet voor u is bestemd, verzoeken 
> wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen. 
> Aangezien de integriteit van het bericht niet veilig gesteld is middels 
> verzending via internet, kan Atos Nederland B.V. niet aansprakelijk worden 
> gehouden voor de inhoud daarvan. Hoewel wij ons inspannen een virusvrij 
> netwerk te hanteren, geven wij geen enkele garantie dat dit bericht virusvrij 
> is, noch aanvaarden wij enige aansprakelijkheid voor de mogelijke 
> aanwezigheid van een virus in dit bericht. Op al onze rechtsverhoudingen, 
> aanbiedingen en overeenkomsten waaronder Atos Nederland B.V. goederen en/of 
> diensten levert zijn met uitsluiting van alle andere voorwaarden de 
> Leveringsvoorwaarden van Atos Nederland B.V. van toepassing. Deze worden u op 
> aanvraag direct kosteloos toegezonden. 
> 
> This e-mail and the documents attached are confidential and intended solely 
> for the addressee; it may also be privileged. If you receive this e-mail in 
> error, please notify the sender immediately and destroy it. As its integrity 
> cannot be secured on the Internet, the Atos Nederland B.V. group liability 
> cannot be triggered for the message content. Although the sender endeavours 
> to maintain a computer virus-free network, the sender does not warrant that 
> this transmission is virus-free and will not be liable for any damages 
> resulting from any virus transmitted. On all offers and agreements under 
> which Atos Nederland B.V. supplies goods and/or services of whatever nature, 
> the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms 
> of Delivery shall be promptly submitted to you on your request. 
> 
> Atos Nederland B.V. / Utrecht
> KvK Utrecht 30132762
> 



Re: Cassandra on Red Hat 6.3

2013-02-18 Thread amulya rattan
It's throwing MalformedURLException

Error: Exception thrown by the agent : java.net.MalformedURLException:
Local host name unknown: java.net.UnknownHostException: ip-10-0-0-228:
ip-10-0-0-228

Where should I set the correct IP of the machine?




2013/2/19 aaron morton 

> Nothing jumps out.
>
> Check /var/log/cassandra/output.log , that's where stdout and std err are
> directed.
>
> Check file permissions.
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/02/2013, at 9:08 PM, amulya rattan  wrote:
>
> I followed step-by-step instructions for installing Cassandra on Red Hat
> Linux Server 6.3 from the datastax site, without much success. Apparently
> it installs fine but starting cassandra service does nothing(no ports are
> bound so opscenter/cli doesnt work). When I check service's status, it
> shows "Cassandra dead but pid file exists". When I try launching Cassandra
> from /usr/sbin, it throws "Error opening zip file or JAR manifest missing :
> /lib/jamm-0.2.5.jar" and stop, so clearly that's why service isn't running.
>
> While I investigate it further, I thought it'd be worthwhile to put this
> on the list and see if anybody else saw similar issue. I must point out
> that this is fresh machine with fresh Cassandra installation so no
> conflicts with any previous installations are possible. So anybody else
> came across something similar?
>
> ~Amulya
>
>
>


Re: Deleting old items during compaction (WAS: Deleting old items)

2013-02-18 Thread Alain RODRIGUEZ
"However the old rows will not be purged from disk unless all fragments of
the row are involved in a compaction process. So it may take some time to
purge from disk, depending on the workload. "

http://wiki.apache.org/cassandra/Counters

The doc says: "Counter removal is intrinsically limited. For instance, if
you issue very quickly the sequence "increment, remove, increment" it is
possible for the removal to be lost (if for some reason the remove happens
to be the last received messages). Hence, removal of counters is provided
for definitive removal only, that is when the deleted counter is not
increment afterwards. This holds for row deletion too: if you delete a row
of counters, incrementing any counter in that row (that existed before the
deletion) will result in an undetermined behavior. Note that if you need to
reset a counter, one option (that is unfortunately not concurrent safe)
could be to read its *value* and add *-value*."

Just wanted to add that we experienced it. While data is purged from disk,
we couldn't write anything in that row. I mean, weren't enable to create
any new column.

I just wanted to let you know in case it could help.



2013/2/18 aaron morton 

> Sorry, missed the Counters part.
>
> You are probably interested in this one
> https://issues.apache.org/jira/browse/CASSANDRA-5228
>
> Add your need to ticket to help it along. IMHO if you have write once,
> read many time series data the SSTables are effectively doing horizontal
> partitioning for you. So been able to "drop a partition" would make life
> easier.
>
> If you can delete entire row then the deletes have less impact than per
> column. However the old rows will not be purged from disk unless all
> fragments of the row are involved in a compaction process. So it may take
> some time to purge from disk, depending on the workload.
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/02/2013, at 10:43 AM, Ilya Grebnov  wrote:
>
> According to https://issues.apache.org/jira/browse/CASSANDRA-2103 There
> is no support for time to live (TTL) on counter columns. Did I miss
> something?
>
> Thanks,
> Ilya
> *From:* aaron morton [mailto:aa...@thelastpickle.com]
> *Sent:* Sunday, February 17, 2013 9:16 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Deleting old items during compaction (WAS: Deleting old
> items)
> ** **
> That's what the TTL does. 
> ** **
> Manually delete all the older data now, then start using TTL. 
> ** **
> Cheers
> ** **
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
> ** **
> On 13/02/2013, at 11:08 PM, Ilya Grebnov  wrote:
>
>
> 
> Hi,
>  
> We looking for solution for same problem. We have a wide column family
> with counters and we want to delete old data like 1 months old. One of
> potential ideas was to implement hook in compaction code and drop column
> which we don’t need. Is this a viable option?
>  
> Thanks,
> Ilya
> *From:* aaron morton [mailto:aa...@thelastpickle.com]
> *Sent:* Tuesday, February 12, 2013 9:01 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Deleting old items
>  
>
> So is it possible to delete all the data inserted in some CF between 2
> dates or data older than 1 month ?
>
> No. 
>  
> You need to issue row level deletes. If you don't know the row key you'll
> need to do range scans to locate them. 
>  
> If you are deleting parts of wide rows consider reducing the
> min_compaction_level_threshold on the CF to 2
>  
> Cheers
>  
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:
>
>
>
> 
> Hi,
>  
> I would like to know if there is a way to delete old/unused data easily ?*
> ***
>  
> I know about TTL but there are 2 limitations of TTL:
>  
> - AFAIK, there is no TTL on counter columns
> - TTL need to be defined at write time, so it's too late for data already
> inserted.
>  
> I also could use a standard "delete" but it seems inappropriate for such a
> massive.
>  
> In some cases, I don't know the row key and would like to delete all the
> rows starting by, let's say, "1050#..." 
>  
> Even better, I understood that columns are always inserted in C* with
> (name, value, timestamp). So is it possible to delete all the data inserted
> in some CF between 2 dates or data older than 1 month ?
>  
> Alain
>  
>
>
>


Re: cassandra vs. mongodb quick question(good additional info)

2013-02-18 Thread Hiller, Dean
I thought about this more, and even with a 10Gbit network, it would take 40 
days to bring up a replacement node if mongodb did truly have a 42T / node like 
I had heard.  I wrote the below email to the person I heard this from going 
back to basics which really puts some perspective on it….(and a lot of people 
don't even have a 10Gbit network like we do)

Nodes are hooked up by a 10G network at most right now where that is 10gigabit. 
 We are talking about 10Terabytes on disk per node recently.

Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second  (yes I could 
have divided by 8 in my head but eh…course when I saw the number, I went duh)

So trying to transfer 10 Terabytes  or 10,000 Gigabytes to a node that we are 
bringing online to replace a dead node would take approximately 5 days???

This means no one else is using the bandwidth too ;).  10,000Gigabytes * 1 
second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.55 days.  This is more likely 
11 days if we only use 50% of the network.

So bringing a new node up to speed is more like 11 days once it is crashed.  I 
think this is the main reason the 1Terabyte exists to begin with, right?

>From an ops perspective, this could sound like a nightmare scenario of waiting 
>10 days…..maybe it is livable though.  Either way, I thought it would be good 
>to share the numbers.  ALSO, that is assuming the bus with it's 10 disk can 
>keep up with 10G  Can it?  What is the limit of throughput on a bus / 
>second on the computers we have as on wikipedia there is a huge variance?

What is the rate of the disks too (multiplied by 10 of course)?  Will they keep 
up with a 10G rate for bringing a new node online?

This all comes into play even more so when you want to double the size of your 
cluster of course as all nodes have to transfer half of what they have to all 
the new nodes that come online(cassandra actually has a very data center/rack 
aware topology to transfer data correctly to not use up all bandwidth 
unecessarily…I am not sure mongodb has that).  Anyways, just food for thought.

From: aaron morton mailto:aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, February 18, 2013 1:39 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>, Vegard Berget 
mailto:p...@fantasista.no>>
Subject: Re: cassandra vs. mongodb quick question

My experience is repair of 300GB compressed data takes longer than 300GB of 
uncompressed, but I cannot point to an exact number. Calculating the 
differences is mostly CPU bound and works on the non compressed data.

Streaming uses compression (after uncompressing the on disk data).

So if you have 300GB of compressed data, take a look at how long repair takes 
and see if you are comfortable with that. You may also want to test replacing a 
node so you can get the procedure documented and understand how long it takes.

The idea of the soft 300GB to 500GB limit cam about because of a number of 
cases where people had 1 TB on a single node and they were surprised it took 
days to repair or replace. If you know how long things may take, and that fits 
in your operations then go with it.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/02/2013, at 10:08 PM, Vegard Berget 
mailto:p...@fantasista.no>> wrote:



Just out of curiosity :

When using compression, does this affect this one way or another?  Is 300G 
(compressed) SSTable size, or total size of data?

.vegard,

- Original Message -
From:
user@cassandra.apache.org

To:
mailto:user@cassandra.apache.org>>
Cc:

Sent:
Mon, 18 Feb 2013 08:41:25 +1300
Subject:
Re: cassandra vs. mongodb quick question


If you have spinning disk and 1G networking and no virtual nodes, I would still 
say 300G to 500G is a soft limit.

If you are using virtual nodes, SSD, JBOD disk configuration or faster 
networking you may go higher.

The limiting factors are the time it take to repair, the time it takes to 
replace a node, the memory considerations for 100's of millions of rows. If you 
the performance of those operations is acceptable to you, then go crazy.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/02/2013, at 9:05 AM, "Hiller, Dean" 
mailto:dean.hil...@nrel.gov>> wrote:

So I found out mongodb varies their node size from 1T to 42T per node depending 
on the profile.  So if I was going to be writing a lot but rarely changing 
rows, could I also use cassandra with a per node size of +20T or is that not 
advisable?

Thanks,
Dean




Cassandra backup

2013-02-18 Thread Kanwar Sangha
Hi - We have a req to store around 90 days of data per user. Last 7 days of 
data is going to be accessed frequently. Is there a way we can have the recent 
data (7 days) in SSD and the rest of the data in the
HDD ? Do we take a snapshot every 7 days and use a separate 'archive' cluster 
to serve the old data and a 'active' cluster to serve recent data ?

Any links/thoughts would be helpful.

Thanks,
Kanwar


Re: Cassandra backup

2013-02-18 Thread Michael Kjellman
There is this:

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement

But you'll need to design your data model around the fact that this is only as 
granular as 1 column family

Best,
michael

From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, February 18, 2013 6:06 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Cassandra backup

Hi – We have a req to store around 90 days of data per user. Last 7 days of 
data is going to be accessed frequently. Is there a way we can have the recent 
data (7 days) in SSD and the rest of the data in the
HDD ? Do we take a snapshot every 7 days and use a separate ‘archive’ cluster 
to serve the old data and a ‘active’ cluster to serve recent data ?

Any links/thoughts would be helpful.

Thanks,
Kanwar


RE: Cassandra backup

2013-02-18 Thread Kanwar Sangha
Thanks. I will look into the details.

One issue I see is that if I have only one column family which needs only the 
last 7 days data to be on SSD and the rest to be on the HDD, how will that work.

From: Michael Kjellman [mailto:mkjell...@barracuda.com]
Sent: 18 February 2013 20:08
To: user@cassandra.apache.org
Subject: Re: Cassandra backup

There is this:

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement

But you'll need to design your data model around the fact that this is only as 
granular as 1 column family

Best,
michael

From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, February 18, 2013 6:06 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Cassandra backup

Hi - We have a req to store around 90 days of data per user. Last 7 days of 
data is going to be accessed frequently. Is there a way we can have the recent 
data (7 days) in SSD and the rest of the data in the
HDD ? Do we take a snapshot every 7 days and use a separate 'archive' cluster 
to serve the old data and a 'active' cluster to serve recent data ?

Any links/thoughts would be helpful.

Thanks,
Kanwar


Re: Cassandra on Red Hat 6.3

2013-02-18 Thread Michael Shuler
On 02/18/2013 03:07 PM, amulya rattan wrote:
> It's throwing MalformedURLException
> 
> Error: Exception thrown by the agent : java.net.MalformedURLException:
> Local host name unknown: java.net.UnknownHostException: ip-10-0-0-228:
> ip-10-0-0-228
> 
> Where should I set the correct IP of the machine?

The correct question might be "where should I let the machine know who
itself is?"  ;-)

It looks like the machine's hostname is "ip-10-0-0-228" and it does not
know how to reach that name over the network, since it cannot resolve
that name to an IP address.  'ping ip-10-0-0-228' will also fail on this
machine, I assume.  However your machines are being installed and set up
is broken, if they don't know their own names, but this is easily fixable.

Here's an example - this box's hostname is "pono" and FQDN is
"pono.12.am" - the machine knows where to find these via entries in
/etc/hosts (as well as an entry for its real IP):

mshuler@pono:~$ ping -c1 pono
PING pono.12.am (127.0.1.1) 56(84) bytes of data.
64 bytes from pono.12.am (127.0.1.1): icmp_req=1 ttl=64 time=0.021 ms

--- pono.12.am ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.021/0.021/0.021/0.000 ms
mshuler@pono:~$ ping -c1 pono.12.am
PING pono.12.am (127.0.1.1) 56(84) bytes of data.
64 bytes from pono.12.am (127.0.1.1): icmp_req=1 ttl=64 time=0.021 ms

--- pono.12.am ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.021/0.021/0.021/0.000 ms
mshuler@pono:~$ cat /etc/hosts
127.0.0.1   localhost
127.0.1.1   pono.12.am pono
10.214.235.223  pono.12.am pono

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
mshuler@pono:~$

-- 
Kind regards,
Michael