Re: How to create a table in Cassandra

2012-01-28 Thread aaron morton
Check the documentation  here http://www.datastax.com/docs/1.0/index

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/01/2012, at 3:36 AM, anandbab...@polarisft.com wrote:

> 
> Can anyone tell me how to create a table in the Cassandra. I have
> installed it... and I am new to this...
> Thanks,
> Barnabas
> 
> 
> 
> This e-Mail may contain proprietary and confidential information and is sent 
> for the intended recipient(s) only.  If by an addressing or transmission 
> error this mail has been misdirected to you, you are requested to delete this 
> mail immediately. You are also hereby notified that any use, any form of 
> reproduction, dissemination, copying, disclosure, modification, distribution 
> and/or publication of this e-mail message, contents or its attachment other 
> than by its intended recipient/s is strictly prohibited.
> 
> Visit us at http://www.polarisFT.com
> 



Re: Thift vs. CQL

2012-01-28 Thread aaron morton
Use a higher level client for your language 
http://wiki.apache.org/cassandra/ClientOptions and avoid the question. 

The different is mostly of concern to client writers. 

(both are supported now and that situation may not last for ever. )

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/01/2012, at 1:26 AM, bxqdev wrote:

> Hello!
> 
>  Datastax's Cassandra documentation says that CQL API is the future of 
> Cassandra API. It's also says that eventually Thift API will be removed 
> completely. Is it true? Do you have any plans of removing Thift API, leaving 
> CQL API only??
> 
> thanks.



Re: Restart cassandra every X days?

2012-01-28 Thread aaron morton
There are no blockers to upgrading to 1.0.X.

A 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/01/2012, at 7:48 AM, R. Verlangen wrote:

> Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x stable 
> enough to upgrade for, or should we wait for a couple of weeks?
> 
> 2012/1/27 Edward Capriolo 
> I would not say that issuing restart after x days is a good idea. You are 
> mostly developing a superstition. You should find the source of the problem. 
> It could be jmx or thrift clients not closing connections. We don't restart 
> nodes on a regiment they work fine.
> 
> 
> On Thursday, January 26, 2012, Mike Panchenko  wrote:
> > There are two relevant bugs (that I know of), both resolved in somewhat 
> > recent versions, which make somewhat regular restarts beneficial
> > https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak in 
> > GCInspector, fixed in 0.7.9/0.8.5)
> > https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap fragmentation 
> > due to the way memtables used to be allocated, refactored in 1.0.0)
> > Restarting daily is probably too frequent for either one of those problems. 
> > We usually notice degraded performance in our ancient cluster after ~2 
> > weeks w/o a restart.
> > As Aaron mentioned, if you have plenty of disk space, there's no reason to 
> > worry about "cruft" sstables. The size of your active set is what matters, 
> > and you can determine if that's getting too big by watching for iowait (due 
> > to reads from the data partition) and/or paging activity of the java 
> > process. When you hit that problem, the solution is to 1. try to tune your 
> > caches and 2. add more nodes to spread the load. I'll reiterate - looking 
> > at raw disk space usage should not be your guide for that.
> > "Forcing" a gc generally works, but should not be relied upon (note 
> > "suggest" in 
> > http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()). It's 
> > great news that 1.0 uses a better mechanism for releasing unused sstables.
> > nodetool compact triggers a "major" compaction and is no longer a 
> > recommended by datastax (details here 
> > http://www.datastax.com/docs/1.0/operations/tuning#tuning-compaction bottom 
> > of the page).
> > Hope this helps.
> > Mike.
> > On Wed, Jan 25, 2012 at 5:14 PM, aaron morton  
> > wrote:
> >
> > That disk usage pattern is to be expected in pre 1.0 versions. Disk usage 
> > is far less interesting than disk free space, if it's using 60 GB and there 
> > is 200GB thats ok. If it's using 60Gb and there is 6MB free thats a problem.
> > In pre 1.0 the compacted files are deleted on disk by waiting for the JVM 
> > do decide to GC all remaining references. If there is not enough space (to 
> > store the total size of the files it is about to write or compact) on disk 
> > GC is forced and the files are deleted. Otherwise they will get deleted at 
> > some point in the future. 
> > In 1.0 files are reference counted and space is freed much sooner. 
> > With regard to regular maintenance, node tool cleanup remvos data from a 
> > node that it is no longer a replica for. This is only of use when you have 
> > done a token move. 
> > I would not recommend a daily restart of the cassandra process. You will 
> > lose all the run time optimizations the JVM has made (i think the mapped 
> > files pages will stay resident). As well as adding additional entropy to 
> > the system which must be repaired via HH, RR or nodetool repair. 
> > If you want to see compacted files purged faster the best approach would be 
> > to upgrade to 1.0. 
> > Hope that helps. 
> > -
> > Aaron Morton
> > Freelance Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> > On 26/01/2012, at 9:51 AM, R. Verlangen wrote:
> >
> > In his message he explains that it's for " Forcing a GC ". GC stands for 
> > garbage collection. For some more background see:  
> > http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) 
> > Cheers!
> >
> > 2012/1/25 
> >
> > Karl,
> >
> > Can you give a little more details on these 2 lines, what do they do?
> >
> > java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080
> > java.lang:type=Memory gc
> >
> > Thank you,
> > Mike
> >
> > -Original Message-
> > From: Karl Hiramoto [mailto:k...@hiramoto.org]
> > Sent: Wednesday, January 25, 2012 12:26 PM
> > To: user@cassandra.apache.org
> > Subject: Re: Restart cassandra every X days?
> >
> >
> > On 01/25/12 19:18, R. Verlangen wrote:
> >> Ok thank you for your feedback. I'll add these tasks to our daily
> >> cassandra maintenance cronjob. Hopefully this will keep things under
> >> controll.
> >
> > I forgot to mention that we found that Forcing a GC also cleans up some
> > space.
> >
> >
> > in a cronjob you can do this with
> > http://crawler.archive.org/cmdline-jmxclient/
> >
> >
> > my cron
> 



Re: Thift vs. CQL

2012-01-28 Thread aaron morton
Please disregard my "(both are supported now and that situation may not last 
for ever. )" comment. 

Aaron

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/01/2012, at 9:34 PM, aaron morton wrote:

> Use a higher level client for your language 
> http://wiki.apache.org/cassandra/ClientOptions and avoid the question. 
> 
> The different is mostly of concern to client writers. 
> 
> (both are supported now and that situation may not last for ever. )
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 28/01/2012, at 1:26 AM, bxqdev wrote:
> 
>> Hello!
>> 
>>  Datastax's Cassandra documentation says that CQL API is the future of 
>> Cassandra API. It's also says that eventually Thift API will be removed 
>> completely. Is it true? Do you have any plans of removing Thift API, leaving 
>> CQL API only??
>> 
>> thanks.
> 



Re: Thift vs. CQL

2012-01-28 Thread bxqdev

i like thrift and i want to use it.
i want to use procedure code, instead of query language.
i just need to be sure it won't be dropped on day in favor to cql.
in my opinion, they can't replace each other.
they are different apis with different philosophies.

On 1/28/2012 12:34 PM, aaron morton wrote:

Use a higher level client for your language 
http://wiki.apache.org/cassandra/ClientOptions and avoid the question.

The different is mostly of concern to client writers.

(both are supported now and that situation may not last for ever. )

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/01/2012, at 1:26 AM, bxqdev wrote:


Hello!

Datastax's Cassandra documentation says that CQL API is the future of Cassandra 
API. It's also says that eventually Thift API
will be removed completely. Is it true? Do you have any plans of removing Thift 
API, leaving CQL API only??

thanks.




Re: Thift vs. CQL

2012-01-28 Thread bxqdev



On 1/28/2012 12:41 PM, aaron morton wrote:

Please disregard my "(both are supported now and that situation may not last for 
ever. )" comment.


ok, but why did you change you mind?



Aaron

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/01/2012, at 9:34 PM, aaron morton wrote:


Use a higher level client for your language 
http://wiki.apache.org/cassandra/ClientOptions and avoid the question.

The different is mostly of concern to client writers.

(both are supported now and that situation may not last for ever. )

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com 

On 28/01/2012, at 1:26 AM, bxqdev wrote:


Hello!

Datastax's Cassandra documentation says that CQL API is the future of Cassandra 
API. It's also says that eventually Thift API
will be removed completely. Is it true? Do you have any plans of removing Thift 
API, leaving CQL API only??

thanks.






Re: Restart cassandra every X days?

2012-01-28 Thread R. Verlangen
Ok, seems that it's clear what I should do next ;-)

2012/1/28 aaron morton 

> There are no blockers to upgrading to 1.0.X.
>
> A
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 28/01/2012, at 7:48 AM, R. Verlangen wrote:
>
> Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x
> stable enough to upgrade for, or should we wait for a couple of weeks?
>
> 2012/1/27 Edward Capriolo 
>
>> I would not say that issuing restart after x days is a good idea. You are
>> mostly developing a superstition. You should find the source of the
>> problem. It could be jmx or thrift clients not closing connections. We
>> don't restart nodes on a regiment they work fine.
>>
>>
>> On Thursday, January 26, 2012, Mike Panchenko  wrote:
>> > There are two relevant bugs (that I know of), both resolved in somewhat
>> recent versions, which make somewhat regular restarts beneficial
>> > https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak in
>> GCInspector, fixed in 0.7.9/0.8.5)
>> > https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap
>> fragmentation due to the way memtables used to be allocated, refactored in
>> 1.0.0)
>> > Restarting daily is probably too frequent for either one of those
>> problems. We usually notice degraded performance in our ancient cluster
>> after ~2 weeks w/o a restart.
>> > As Aaron mentioned, if you have plenty of disk space, there's no reason
>> to worry about "cruft" sstables. The size of your active set is what
>> matters, and you can determine if that's getting too big by watching for
>> iowait (due to reads from the data partition) and/or paging activity of the
>> java process. When you hit that problem, the solution is to 1. try to tune
>> your caches and 2. add more nodes to spread the load. I'll reiterate -
>> looking at raw disk space usage should not be your guide for that.
>> > "Forcing" a gc generally works, but should not be relied upon (note
>> "suggest" in
>> http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()).
>> It's great news that 1.0 uses a better mechanism for releasing unused
>> sstables.
>> > nodetool compact triggers a "major" compaction and is no longer a
>> recommended by datastax (details here
>> http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom 
>> of the page).
>> > Hope this helps.
>> > Mike.
>> > On Wed, Jan 25, 2012 at 5:14 PM, aaron morton 
>> wrote:
>> >
>> > That disk usage pattern is to be expected in pre 1.0 versions. Disk
>> usage is far less interesting than disk free space, if it's using 60 GB and
>> there is 200GB thats ok. If it's using 60Gb and there is 6MB free thats a
>> problem.
>> > In pre 1.0 the compacted files are deleted on disk by waiting for the
>> JVM do decide to GC all remaining references. If there is not enough space
>> (to store the total size of the files it is about to write or compact) on
>> disk GC is forced and the files are deleted. Otherwise they will get
>> deleted at some point in the future.
>> > In 1.0 files are reference counted and space is freed much sooner.
>> > With regard to regular maintenance, node tool cleanup remvos data from
>> a node that it is no longer a replica for. This is only of use when you
>> have done a token move.
>> > I would not recommend a daily restart of the cassandra process. You
>> will lose all the run time optimizations the JVM has made (i think the
>> mapped files pages will stay resident). As well as adding additional
>> entropy to the system which must be repaired via HH, RR or nodetool repair.
>> > If you want to see compacted files purged faster the best approach
>> would be to upgrade to 1.0.
>> > Hope that helps.
>> > -
>> > Aaron Morton
>> > Freelance Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> > On 26/01/2012, at 9:51 AM, R. Verlangen wrote:
>> >
>> > In his message he explains that it's for " Forcing a GC ". GC stands
>> for garbage collection. For some more background see:
>> http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
>> > Cheers!
>> >
>> > 2012/1/25 
>> >
>> > Karl,
>> >
>> > Can you give a little more details on these 2 lines, what do they do?
>> >
>> > java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080
>> > java.lang:type=Memory gc
>> >
>> > Thank you,
>> > Mike
>> >
>> > -Original Message-
>> > From: Karl Hiramoto [mailto:k...@hiramoto.org]
>> > Sent: Wednesday, January 25, 2012 12:26 PM
>> > To: user@cassandra.apache.org
>> > Subject: Re: Restart cassandra every X days?
>> >
>> >
>> > On 01/25/12 19:18, R. Verlangen wrote:
>> >> Ok thank you for your feedback. I'll add these tasks to our daily
>> >> cassandra maintenance cronjob. Hopefully this will keep things under
>> >> controll.
>> >
>> > I forgot to mention that we found that Forcing a GC also cleans up some
>> > space.
>> >
>> >
>> > in a cronjob you can do this with
>> > http://crawler.archive.org/cmdline-jmxclient/
>> 

Re: Restart cassandra every X days?

2012-01-28 Thread Maxim Potekhin

Sorry if this has been covered, I was concentrating solely on 0.8x --
can I just d/l 1.0.x and continue using same data on same cluster?

Maxim


On 1/28/2012 7:53 AM, R. Verlangen wrote:

Ok, seems that it's clear what I should do next ;-)

2012/1/28 aaron morton >


There are no blockers to upgrading to 1.0.X.

A
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/01/2012, at 7:48 AM, R. Verlangen wrote:


Ok. Seems that an upgrade might fix these problems. Is Cassandra
1.x.x stable enough to upgrade for, or should we wait for a
couple of weeks?

2012/1/27 Edward Capriolo mailto:edlinuxg...@gmail.com>>

I would not say that issuing restart after x days is a good
idea. You are mostly developing a superstition. You should
find the source of the problem. It could be jmx or thrift
clients not closing connections. We don't restart nodes on a
regiment they work fine.


On Thursday, January 26, 2012, Mike Panchenko mailto:m...@mihasya.com>> wrote:
> There are two relevant bugs (that I know of), both resolved
in somewhat recent versions, which make somewhat regular
restarts beneficial
> https://issues.apache.org/jira/browse/CASSANDRA-2868
(memory leak in GCInspector, fixed in 0.7.9/0.8.5)
> https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap
fragmentation due to the way memtables used to be allocated,
refactored in 1.0.0)
> Restarting daily is probably too frequent for either one of
those problems. We usually notice degraded performance in our
ancient cluster after ~2 weeks w/o a restart.
> As Aaron mentioned, if you have plenty of disk space,
there's no reason to worry about "cruft" sstables. The size
of your active set is what matters, and you can determine if
that's getting too big by watching for iowait (due to reads
from the data partition) and/or paging activity of the java
process. When you hit that problem, the solution is to 1. try
to tune your caches and 2. add more nodes to spread the load.
I'll reiterate - looking at raw disk space usage should not
be your guide for that.
> "Forcing" a gc generally works, but should not be relied
upon (note "suggest" in
http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()

).
It's great news that 1.0 uses a better mechanism for
releasing unused sstables.
> nodetool compact triggers a "major" compaction and is no
longer a recommended by datastax (details here
http://www.datastax.com/docs/1.0/operations/tuning#tuning-compaction
bottom of the page).
> Hope this helps.
> Mike.
> On Wed, Jan 25, 2012 at 5:14 PM, aaron morton
mailto:aa...@thelastpickle.com>> wrote:
>
> That disk usage pattern is to be expected in pre 1.0
versions. Disk usage is far less interesting than disk free
space, if it's using 60 GB and there is 200GB thats ok. If
it's using 60Gb and there is 6MB free thats a problem.
> In pre 1.0 the compacted files are deleted on disk by
waiting for the JVM do decide to GC all remaining references.
If there is not enough space (to store the total size of the
files it is about to write or compact) on disk GC is forced
and the files are deleted. Otherwise they will get deleted at
some point in the future.
> In 1.0 files are reference counted and space is freed much
sooner.
> With regard to regular maintenance, node tool cleanup
remvos data from a node that it is no longer a replica for.
This is only of use when you have done a token move.
> I would not recommend a daily restart of the cassandra
process. You will lose all the run time optimizations the JVM
has made (i think the mapped files pages will stay resident).
As well as adding additional entropy to the system which must
be repaired via HH, RR or nodetool repair.
> If you want to see compacted files purged faster the best
approach would be to upgrade to 1.0.
> Hope that helps.
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com 
> On 26/01/2012, at 9:51 AM, R. Verlangen wrote:
>
> In his message he explains that it's for " Forcing a GC ".
GC stands for garbage collection. For some more background
see:
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)


Re: how to delete data with level compaction

2012-01-28 Thread Peter Schuller
> I'm using level compaction and I have about 200GB compressed in my
> largest CFs. The disks are getting full. This is time-series data so I
> want to drop data that is a couple of months old. It's pretty easy for
> me to iterate through the relevant keys and delete the rows. But will
> that do anything?
> I currently have the majority of sstables at generation 4. Deleting rows
> will initially just create a ton of tombstones. For them to actually
> free up significant space they need to get promoted to gen 4 and cause a
> compaction there, right? nodetool compact doesn't do anything with level
> compaction, it seems. Am I doomed?
> (Ok, I'll whip out my CC and order more disk ;-)

There is a delay before you will see the effects of deletions in terms
of disk space, yes. However, this is not normally a problem because
you effectively reach a steady state of disk usage. It only becomes a
problem if you're almost *entirely* full and are trying to delete data
in a panic.

How far away are you from entirely full? Are you just worried about
the future or are you about to run out of disk space right now?

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: how to delete data with level compaction

2012-01-28 Thread Thorsten von Eicken
On 1/28/2012 9:34 AM, Peter Schuller wrote:
>> I'm using level compaction and I have about 200GB compressed in my
>> largest CFs. The disks are getting full. This is time-series data so I
>> want to drop data that is a couple of months old. It's pretty easy for
>> me to iterate through the relevant keys and delete the rows. But will
>> that do anything?
>> I currently have the majority of sstables at generation 4. Deleting rows
>> will initially just create a ton of tombstones. For them to actually
>> free up significant space they need to get promoted to gen 4 and cause a
>> compaction there, right? nodetool compact doesn't do anything with level
>> compaction, it seems. Am I doomed?
>> (Ok, I'll whip out my CC and order more disk ;-)
> There is a delay before you will see the effects of deletions in terms
> of disk space, yes. However, this is not normally a problem because
> you effectively reach a steady state of disk usage. It only becomes a
> problem if you're almost *entirely* full and are trying to delete data
> in a panic.
>
> How far away are you from entirely full? Are you just worried about
> the future or are you about to run out of disk space right now?
I'm at 80%, so not quite panic yet ;-)

I'm wondering, in the steady state, how much of the space used will
contain deleted data.
Thorsten


Re: how to delete data with level compaction

2012-01-28 Thread Peter Schuller
> I'm at 80%, so not quite panic yet ;-)
>
> I'm wondering, in the steady state, how much of the space used will
> contain deleted data.

That depends entirely on your workload, including:

* How big the data that you are deleting is in relation to the size of
tombstones
* How long the average piece of data lives before being deleted
* How much other data never being deleted you have in relation to the
data that gets deleted
* How it varies over time and where you are in your "cycle"

I can't offer any mathematics that will give you the actual answered
for leveled compaction (and I don't have a good feel for it as I
haven't used leveled compaction in production systems yet). It's
strongly recommend graphing disk space for your particular
workload/application and see how it behaves over time. And *have
alerts* on disk space running out. Have a good amount of margin. Less
so with leveled compaction than size tiered compaction, but still
important.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)