Re: Cassandra feature enhancement

2015-04-09 Thread Ali Akhtar
If I were you, I would learn Cassandra's internals and how it works (there
are several Webinars that you can watch). Once you understand its
internals, then you'll be in a much better position to think of a feature
enhancement you can do.

You'll also be in a better position to do future Cassandra projects.

On Thu, Apr 9, 2015 at 2:35 PM, Divya Divs  wrote:

> hi sir..
>  I'm a m-tech student. my academic project is under cassandra. I have run
> the source code of cassandra in eclipse juno using ant build.
> https://github.com/apache/cassandra. i have to do some feature
> enhancement in cassandra and i have analyze my application in cassandra. So
> please tell me what kind of feature enhancementthat i can do in cassandra.
> tell me a simple feature enhancement thats enough.Please guide me. Thanks
> in advance.
>
> Thanks and Regards,
> Divya
>
>
>


Re: When to use STCS/DTCS/LCS

2015-04-09 Thread Divya Divs
hi sir..
 I'm a m-tech student. my academic project is under cassandra. I have run
the source code of cassandra in eclipse juno using ant build.
https://github.com/apache/cassandra. i have to do some feature enhancement
in cassandra and i have analyze my application in cassandra. So please tell
me what kind of feature enhancementthat i can do in cassandra. tell me a
simple feature enhancement thats enough.Please guide me. Thanks in advance.

Thanks and Regards,
Divya

>
>>


Re: [Cassandra 2.0] truncate table

2015-04-09 Thread Anuj Wadehra
You can try doing it from cassandra cli. Set consistency level to All and then 
truncate.


Anuj Wadehra

Sent from Yahoo Mail on Android

From:"Parth Setya" 
Date:Thu, 9 Apr, 2015 at 7:31 pm
Subject:Re: [Cassandra 2.0] truncate table

As per this thread

http://stackoverflow.com/questions/10520110/how-do-i-delete-all-data-in-a-cassandra-column-family

What you can do to physically remove the files is to go to 
/var/lib/cassandra/data/keyspace_name and then manually delete the directory 
with the name of that column family. Do this on all the nodes 

On Apr 9, 2015 7:26 PM, "Eduardo Cusa"  
wrote:

Hi Guys, I truncated a column family that has a size of 31 gb, and the disk 
space was not released


what else do i have to do?


Regards

Eduardo




Re: When to use STCS/DTCS/LCS

2015-04-09 Thread Alain RODRIGUEZ
+1 Jens, and there is a specific mailing list for developers (
http://cassandra.apache.org/#lists).

But it looks like a great move, good luck Divya.

C*heers,

Alain

2015-04-09 12:44 GMT+02:00 Jens Rantil :

> Divya,
>
> Please start a new thread for that. Or is your question related
> specifically to this thread?
>
> Thanks,
> Jens
>
> On Thu, Apr 9, 2015 at 11:34 AM, Divya Divs 
> wrote:
>
>> hi sir..
>>  I'm a m-tech student. my academic project is under cassandra. I have run
>> the source code of cassandra in eclipse juno using ant build.
>> https://github.com/apache/cassandra. i have to do some feature
>> enhancement in cassandra and i have analyze my application in cassandra. So
>> please tell me what kind of feature enhancementthat i can do in cassandra.
>> tell me a simple feature enhancement thats enough.Please guide me. Thanks
>> in advance.
>>
>> Thanks and Regards,
>> Divya
>>
>>>

>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook  Linkedin
> 
>  Twitter 
>


Re: [Cassandra 2.0] truncate table

2015-04-09 Thread Laing, Michael
Nodetool clearsnapshot

On Thursday, April 9, 2015, Eduardo Cusa 
wrote:

> Hi Guys, I truncated a column family that has a size of 31 gb, and the
> disk space was not released
>
> what else do i have to do?
>
> Regards
> Eduardo
>
>


Re: cqlsh commands for importing .CSV files into cassandra

2015-04-09 Thread Sebastian Estevez
Try this loader if cqlsh doesn't cut your needs --
https://github.com/brianmhess/cassandra-loader

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]





DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Wed, Apr 8, 2015 at 12:54 PM, Michael Dykman  wrote:

> http://docs.datastax.com/en/cql/3.0/cql/cql_reference/copy_r.html
>
> This only works through cqlsh.
>
> On Wed, Apr 8, 2015 at 1:48 PM, Divya Divs 
> wrote:
>
>> hi
>> Please tell me the cqlsh commands for importing .csv file datasets into
>> cassandra. please help to start. Iam using windows
>>
>
>
>
> --
>  - michael dykman
>  - mdyk...@gmail.com
>
>  May the Source be with you.
>


Re: Cassandra feature enhancement

2015-04-09 Thread Job Thomas
Hi Divya.


There is a SubQuery implementation available, in Git repository . You can do 
some  enhancement to make it to be useful for production.


https://github.com/jobmthomas/Cassandra-SubQuery

[https://avatars2.githubusercontent.com/u/4040100?v=3&s=400]

jobmthomas/Cassandra-SubQuery · GitHub
Cassandra-SubQuery - Su-Query- Implementation for Cassandra
Read more...






From: Ali Akhtar 
Sent: Thursday, April 9, 2015 3:13 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra feature enhancement

If I were you, I would learn Cassandra's internals and how it works (there are 
several Webinars that you can watch). Once you understand its internals, then 
you'll be in a much better position to think of a feature enhancement you can 
do.

You'll also be in a better position to do future Cassandra projects.

On Thu, Apr 9, 2015 at 2:35 PM, Divya Divs 
mailto:divya.divi2...@gmail.com>> wrote:
hi sir..
 I'm a m-tech student. my academic project is under cassandra. I have run the 
source code of cassandra in eclipse juno using ant build.  
https://github.com/apache/cassandra. i have to do some feature enhancement in 
cassandra and i have analyze my application in cassandra. So please tell me 
what kind of feature enhancementthat i can do in cassandra. tell me a simple 
feature enhancement thats enough.Please guide me. Thanks in advance.

Thanks and Regards,
Divya




This electronic mail (including any attachment thereto) may be confidential and 
privileged and is intended only for the individual or entity named above. Any 
unauthorized use, printing, copying, disclosure or dissemination of this 
communication may be subject to legal restriction or sanction. Accordingly, if 
you are not the intended recipient, please notify the sender by replying to 
this email immediately and delete this email (and any attachment thereto) from 
your computer system...Thank You.


Re: When to use STCS/DTCS/LCS

2015-04-09 Thread Jens Rantil
Divya,

Please start a new thread for that. Or is your question related
specifically to this thread?

Thanks,
Jens

On Thu, Apr 9, 2015 at 11:34 AM, Divya Divs 
wrote:

> hi sir..
>  I'm a m-tech student. my academic project is under cassandra. I have run
> the source code of cassandra in eclipse juno using ant build.
> https://github.com/apache/cassandra. i have to do some feature
> enhancement in cassandra and i have analyze my application in cassandra. So
> please tell me what kind of feature enhancementthat i can do in cassandra.
> tell me a simple feature enhancement thats enough.Please guide me. Thanks
> in advance.
>
> Thanks and Regards,
> Divya
>
>>
>>>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: When to use STCS/DTCS/LCS

2015-04-09 Thread Divya Divs
Thank you very much Alain.

On Thu, Apr 9, 2015 at 8:50 PM, Alain RODRIGUEZ  wrote:

> +1 Jens, and there is a specific mailing list for developers (
> http://cassandra.apache.org/#lists).
>
> But it looks like a great move, good luck Divya.
>
> C*heers,
>
> Alain
>
> 2015-04-09 12:44 GMT+02:00 Jens Rantil :
>
>> Divya,
>>
>> Please start a new thread for that. Or is your question related
>> specifically to this thread?
>>
>> Thanks,
>> Jens
>>
>> On Thu, Apr 9, 2015 at 11:34 AM, Divya Divs 
>> wrote:
>>
>>> hi sir..
>>>  I'm a m-tech student. my academic project is under cassandra. I have
>>> run the source code of cassandra in eclipse juno using ant build.
>>> https://github.com/apache/cassandra. i have to do some feature
>>> enhancement in cassandra and i have analyze my application in cassandra. So
>>> please tell me what kind of feature enhancementthat i can do in cassandra.
>>> tell me a simple feature enhancement thats enough.Please guide me. Thanks
>>> in advance.
>>>
>>> Thanks and Regards,
>>> Divya
>>>

>
>>
>>
>> --
>> Jens Rantil
>> Backend engineer
>> Tink AB
>>
>> Email: jens.ran...@tink.se
>> Phone: +46 708 84 18 32
>> Web: www.tink.se
>>
>> Facebook  Linkedin
>> 
>>  Twitter 
>>
>
>


Re: [Cassandra 2.0] truncate table

2015-04-09 Thread Eduardo Cusa
Yes, with Nodetool clearsnapshot we recovery 90gb aprox.
Thanks!
 El abr 9, 2015 11:44 AM, "Laing, Michael" 
escribió:

rtfm - trncate creates snapshots by default, they must be cleared on all
nodes to recover *disk space *as requested by the OP.

On Thu, Apr 9, 2015 at 10:17 AM, Anuj Wadehra 
wrote:

> You can try doing it from cassandra cli. Set consistency level to All and
> then truncate.
>
> Anuj Wadehra
>
> Sent from Yahoo Mail on Android
> 
> --
>   *From*:"Parth Setya" 
> *Date*:Thu, 9 Apr, 2015 at 7:31 pm
> *Subject*:Re: [Cassandra 2.0] truncate table
>
> As per this thread
>
>
> http://stackoverflow.com/questions/10520110/how-do-i-delete-all-data-in-a-cassandra-column-family
>
> What you can do to physically remove the files is to go to
> /var/lib/cassandra/data/keyspace_name and then manually delete the
> directory with the name of that column family. Do this on all the nodes
> On Apr 9, 2015 7:26 PM, "Eduardo Cusa" 
> wrote:
>
>> Hi Guys, I truncated a column family that has a size of 31 gb, and the
>> disk space was not released
>>
>> what else do i have to do?
>>
>> Regards
>> Eduardo
>>
>>


Re: [Cassandra 2.0] truncate table

2015-04-09 Thread Parth Setya
As per this thread

http://stackoverflow.com/questions/10520110/how-do-i-delete-all-data-in-a-cassandra-column-family

What you can do to physically remove the files is to go to
/var/lib/cassandra/data/keyspace_name and then manually delete the
directory with the name of that column family. Do this on all the nodes
On Apr 9, 2015 7:26 PM, "Eduardo Cusa" 
wrote:

> Hi Guys, I truncated a column family that has a size of 31 gb, and the
> disk space was not released
>
> what else do i have to do?
>
> Regards
> Eduardo
>
>


Cassandra feature enhancement

2015-04-09 Thread Divya Divs
hi sir..
 I'm a m-tech student. my academic project is under cassandra. I have run
the source code of cassandra in eclipse juno using ant build.
https://github.com/apache/cassandra. i have to do some feature enhancement
in cassandra and i have analyze my application in cassandra. So please tell
me what kind of feature enhancementthat i can do in cassandra. tell me a
simple feature enhancement thats enough.Please guide me. Thanks in advance.

Thanks and Regards,
Divya


Re: Availability testing of Cassandra nodes

2015-04-09 Thread Jiri Horky
Hi Jack,

it seems there is a some misunderstanding. There are two things. One is
that the Cassandra works for application, which may (and should) be true
even if some of the nodes are actually down. The other thing is that
even in this case you want to be notified that there are faulty
Cassandra nodes.

Now I am trying to tackle the later case, I am not having issues with
how client-side load balancing works.

Jirka H.

On 04/09/2015 07:15 AM, Ajay wrote:
> Adding Java driver forum.
>
> Even we like to know more on this.
>
> -
> Ajay
>
> On Wed, Apr 8, 2015 at 8:15 PM, Jack Krupansky
> mailto:jack.krupan...@gmail.com>> wrote:
>
> Just a couple of quick comments:
>
> 1. The driver is supposed to be doing availability and load
> balancing already.
> 2. If your cluster is lightly loaded, it isn't necessary to be so
> precise with load balancing.
> 3. If your cluster is heavily loaded, it won't help. Solution is
> to expand your cluster so that precise balancing of requests
> (beyond what the driver does) is not required.
>
> Is there anything special about your use case that you feel is
> worth the extra treatment?
>
> If you are having problems with the driver balancing requests and
> properly detecting available nodes or see some room for
> improvement, make sure to the issues so that they can be fixed.
>
>
> -- Jack Krupansky
>
> On Wed, Apr 8, 2015 at 10:31 AM, Jiri Horky  > wrote:
>
> Hi all,
>
> we are thinking of how to best proceed with availability
> testing of
> Cassandra nodes. It is becoming more and more apparent that it
> is rather
> complex task. We thought that we should try to read and write
> to each
> cassandra node to "monitoring" keyspace with a unique value
> with low
> TTL. This helps to find an issue but it also triggers flapping of
> unaffected hosts, as the key of the value which is beining
> inserted
> sometimes belongs to an affected host and sometimes not. Now,
> we could
> calculate the right value to insert so we can be sure it will
> hit the
> host we are connecting to, but then, you have replication
> factor and
> consistency level, so you can not be really sure that it
> actually tests
> ability of the given host to write values.
>
> So we ended up thinking that the best approach is to connect
> to each
> individual host, read some system keyspace (which might be on a
> different disk drive...), which should be local, and then
> check several
> JMX values that could indicate an error + JVM statitics (full
> heap, gc
> overhead). Moreover, we will more monitor our applications
> that are
> using cassandra (with mostly datastax driver) and try to get
> fail node
> information from them.
>
> How others do the testing?
>
> Jirka H.
>
>
>



[Cassandra 2.0] truncate table

2015-04-09 Thread Eduardo Cusa
Hi Guys, I truncated a column family that has a size of 31 gb, and the disk
space was not released

what else do i have to do?

Regards
Eduardo


Re: When to use STCS/DTCS/LCS

2015-04-09 Thread Job Thomas
Hi Divya.

There is a SubQuery implementation available, in Git repository . You can do 
some  more enhancement to make it to be useful for production.

https://github.com/jobmthomas/Cassandra-SubQuery

From: Divya Divs 
Sent: Thursday, April 9, 2015 9:00:43 PM
To: user@cassandra.apache.org
Subject: Re: When to use STCS/DTCS/LCS

Thank you very much Alain.

On Thu, Apr 9, 2015 at 8:50 PM, Alain RODRIGUEZ 
mailto:arodr...@gmail.com>> wrote:
+1 Jens, and there is a specific mailing list for developers 
(http://cassandra.apache.org/#lists).

But it looks like a great move, good luck Divya.

C*heers,

Alain

2015-04-09 12:44 GMT+02:00 Jens Rantil 
mailto:jens.ran...@tink.se>>:
Divya,

Please start a new thread for that. Or is your question related specifically to 
this thread?

Thanks,
Jens

On Thu, Apr 9, 2015 at 11:34 AM, Divya Divs 
mailto:divya.divi2...@gmail.com>> wrote:
hi sir..
 I'm a m-tech student. my academic project is under cassandra. I have run the 
source code of cassandra in eclipse juno using ant build.  
https://github.com/apache/cassandra. i have to do some feature enhancement in 
cassandra and i have analyze my application in cassandra. So please tell me 
what kind of feature enhancementthat i can do in cassandra. tell me a simple 
feature enhancement thats enough.Please guide me. Thanks in advance.

Thanks and Regards,
Divya




--
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook 
Linkedin
 Twitter



This electronic mail (including any attachment thereto) may be confidential and 
privileged and is intended only for the individual or entity named above. Any 
unauthorized use, printing, copying, disclosure or dissemination of this 
communication may be subject to legal restriction or sanction. Accordingly, if 
you are not the intended recipient, please notify the sender by replying to 
this email immediately and delete this email (and any attachment thereto) from 
your computer system...Thank You.


Re: OrderPreservingPartitioner and compound partition key

2015-04-09 Thread Serega Sheypak
I understand the reason, but If I user OrderPreservingPartitioner and have
compound partition key, can I use select using only FIRST component of
compound partition key?

2015-04-08 20:43 GMT+02:00 Robert Coli :

> On Wed, Apr 8, 2015 at 1:27 AM, Serega Sheypak 
> wrote:
>
>> and I set OrderPreservingPartitioner as a partitioner for the table
>>
>
> As a general statement, you almost certainly do not want to use the
> OrderPreservingPartitioner for any purpose.
>
> It should probably be called the
> DontUseThisIfYouWantMostOfTheAdvantagesOfADistributedSystemPartitioner.
>
> =Rob
>
>


Re: [Cassandra 2.0] truncate table

2015-04-09 Thread Laing, Michael
rtfm - trncate creates snapshots by default, they must be cleared on all
nodes to recover *disk space *as requested by the OP.

On Thu, Apr 9, 2015 at 10:17 AM, Anuj Wadehra 
wrote:

> You can try doing it from cassandra cli. Set consistency level to All and
> then truncate.
>
> Anuj Wadehra
>
> Sent from Yahoo Mail on Android
> 
> --
>   *From*:"Parth Setya" 
> *Date*:Thu, 9 Apr, 2015 at 7:31 pm
> *Subject*:Re: [Cassandra 2.0] truncate table
>
> As per this thread
>
>
> http://stackoverflow.com/questions/10520110/how-do-i-delete-all-data-in-a-cassandra-column-family
>
> What you can do to physically remove the files is to go to
> /var/lib/cassandra/data/keyspace_name and then manually delete the
> directory with the name of that column family. Do this on all the nodes
> On Apr 9, 2015 7:26 PM, "Eduardo Cusa" 
> wrote:
>
>> Hi Guys, I truncated a column family that has a size of 31 gb, and the
>> disk space was not released
>>
>> what else do i have to do?
>>
>> Regards
>> Eduardo
>>
>>


Re: When to use STCS/DTCS/LCS

2015-04-09 Thread Alain RODRIGUEZ
I guess this give a good idea of when to use one or the other (STCS / LCS),
did not hear of DTCS so far...

http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

C*heers,

Alain

2015-04-09 7:09 GMT+02:00 Ajay :

> Hi,
>
> What are the guidelines on when to use STCS/DTCS/LCS?. Most preferred way
> to test it with each of them and find the best fit. But is there some
> guidelines or best practices (out of experience) which one to use when?
>
> Thanks
> Ajay
>


Spark Cassandra Connector for Python

2015-04-09 Thread mwiewiorski

Hi,

At https://github.com/datastax/spark-cassandra-connector I see that you 
are extending API that Spark provides for interacting with RDDs to 
leverage some native Cassandra features. We are using Apache Cassandra 
together with PySpark to do some analytics and since we have community 
version, we use classic api calls like sc.newAPIHadoopRDD which means 
writing converters for data in Scala. We would like to use calls such as 
sc.cassandraTable but I don't see these methods anywhere in PySpark and 
https://github.com/datastax/spark-cassandra-connector does not even 
mention access from Python.


In 
http://www.datastax.com/documentation/datastax_enterprise/4.7/datastax_enterprise/spark/sparkPySpark.html 
I see however that you are using these methods in PySpark. Does it mean 
Spark Cassandra Connector for Python is available only in DataStax 
Enterprise and we have to buy it to use that API and features like 
server-side filtering from PySpark?


Also at 
https://github.com/Parsely/pyspark-cassandra/blob/master/src/main/python/pyspark_cassandra.py 
I see that there is some effort to interface CassandraSparkContext to 
Python, does it mean that those guys are duplicating your work?


Regards,
Marek Wiewiórski
Opera Software


RE: Availability testing of Cassandra nodes

2015-04-09 Thread SEAN_R_DURITY
I do two types of node monitoring. On each host, we have a process monitor 
looking for the cassandra process. If it goes down, it will get restarted (if a 
flag is set appropriately).

Secondly, from a remote host, I have an hourly check of all nodes where I 
essentially log in to each node and execute nodetool info. If that returns an 
error, then the node is probably “up,” but hung. (Or the flag above is not set 
properly and the host was bounced/patched, but cassandra did not start.) I 
email details to the support team to investigate.


Sean Durity

From: Jiri Horky [mailto:ho...@avast.com]
Sent: Thursday, April 09, 2015 4:32 AM
To: user@cassandra.apache.org; java-driver-u...@lists.datastax.com
Subject: Re: Availability testing of Cassandra nodes

Hi Jack,

it seems there is a some misunderstanding. There are two things. One is that 
the Cassandra works for application, which may (and should) be true even if 
some of the nodes are actually down. The other thing is that even in this case 
you want to be notified that there are faulty Cassandra nodes.

Now I am trying to tackle the later case, I am not having issues with how 
client-side load balancing works.

Jirka H.
On 04/09/2015 07:15 AM, Ajay wrote:
Adding Java driver forum.
Even we like to know more on this.
-
Ajay

On Wed, Apr 8, 2015 at 8:15 PM, Jack Krupansky 
mailto:jack.krupan...@gmail.com>> wrote:
Just a couple of quick comments:

1. The driver is supposed to be doing availability and load balancing already.
2. If your cluster is lightly loaded, it isn't necessary to be so precise with 
load balancing.
3. If your cluster is heavily loaded, it won't help. Solution is to expand your 
cluster so that precise balancing of requests (beyond what the driver does) is 
not required.

Is there anything special about your use case that you feel is worth the extra 
treatment?

If you are having problems with the driver balancing requests and properly 
detecting available nodes or see some room for improvement, make sure to the 
issues so that they can be fixed.


-- Jack Krupansky

On Wed, Apr 8, 2015 at 10:31 AM, Jiri Horky 
mailto:ho...@avast.com>> wrote:
Hi all,

we are thinking of how to best proceed with availability testing of
Cassandra nodes. It is becoming more and more apparent that it is rather
complex task. We thought that we should try to read and write to each
cassandra node to "monitoring" keyspace with a unique value with low
TTL. This helps to find an issue but it also triggers flapping of
unaffected hosts, as the key of the value which is beining inserted
sometimes belongs to an affected host and sometimes not. Now, we could
calculate the right value to insert so we can be sure it will hit the
host we are connecting to, but then, you have replication factor and
consistency level, so you can not be really sure that it actually tests
ability of the given host to write values.

So we ended up thinking that the best approach is to connect to each
individual host, read some system keyspace (which might be on a
different disk drive...), which should be local, and then check several
JMX values that could indicate an error + JVM statitics (full heap, gc
overhead). Moreover, we will more monitor our applications that are
using cassandra (with mostly datastax driver) and try to get fail node
information from them.

How others do the testing?

Jirka H.






The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.