last record rowId

2011-06-15 Thread karim abbouh
in my java application,when we try to insert we should all the time know the 
last rowId
in order the insert the new record in rowId+1,so for that we should save this 
rowId in a file
is there other way to know the last record rowId?
thanks
B.R


Re: Where is my data?

2011-06-15 Thread Sylvain Lebresne
You can use the thrift call describe_ring(). It will returns a map
that associate to each range of the
ring who is a replica. Once any range has all it's endpoint
unavailable, that range of the data is unavailable.

--
Sylvain

On Tue, Jun 14, 2011 at 11:33 PM, AJ  wrote:
> Is there an official deterministic formula to compute the various subsets of
> a given cluster that comprises a complete set of data (redundant rows ok)?
>  IOW, if multiple nodes become unavailable one at a time, at what point can
> I say <100% of my data is available?
>
> Obviously, the method would have to take into consideration the ring layout
> along with the partition type, the # of nodes, replication_factor,
> replication strat, etc..
>
> Thanks!
>


Re: possible 'coming back to life' bug with counters

2011-06-15 Thread Sylvain Lebresne
Let me point out that the current thread is about counter removal, not about
counter TTL. Counter expiration have other problems, so that even if you do not
care about incrementing a counter again after it expires, it will
still not work for you
(please look at the discussion on
https://issues.apache.org/jira/browse/CASSANDRA-2103
do details).

As for solutions, people are looking to specific compaction strategy
to achieve something
roughly similar to expiring counters (see
https://issues.apache.org/jira/browse/CASSANDRA-2735
for instance).

--
Sylvain

On Wed, Jun 15, 2011 at 8:29 AM, Viktor Jevdokimov
 wrote:
> What if it is OK for our case and we need counters with TTL?
> For us Counters and TTL both are important. After column is expired it is
> not important what value counter will have.
> Scanning millions rows just to delete expired ones is not a solution.
>
> 2011/6/14 Sylvain Lebresne 
>>
>> As listed here: http://wiki.apache.org/cassandra/Counters, counter
>> deletion is
>> provided as a convenience for permanent deletion of counters but, because
>> of the design of counters, it is never safe to issue an increment on a
>> counter that
>> has been deleted (that is, you will experience back to life behavior
>> sometimes in
>> that case).
>> More precisely, you'd have to wait long enough after a deletion to start
>> incrementing the counter again. But in the worst cases, long enough is
>> something
>> like gc_grace_seconds + major compaction.
>>
>> This is *not* something that is likely to change anytime soon (I don't
>> think this is
>> fixable with the current design for counters).
>>
>> --
>> Sylvain
>>
>> On Sat, Jun 11, 2011 at 3:54 AM, David Hawthorne 
>> wrote:
>> > Please take a look at this thread over in the hector-users mailing list:
>> >
>> > http://groups.google.com/group/hector-users/browse_thread/thread/99835159b9ea1766
>> > It looks as if the deleted columns are coming back to life when they
>> > shouldn't be.
>> > I don't want to open a bug on something if it's already got one that I
>> > just
>> > couldn't find when I scanned the list of open bugs.
>> > I'm using hector 0.8 against cassandra 0.8 release.  I can give you
>> > whatever
>> > logs or files you'd like.
>
>


Cassandra DC Upcoming Meetup

2011-06-15 Thread Chris Burroughs
Cassandra DC's first meetup of the pizza and talks variety will be on
July 6th. There will be an introductory sort of presentation and a
totally cool one on Pig integration.

If you are in the DC area it would be great to see you there.

http://www.meetup.com/Cassandra-DC-Meetup/events/22145481/


Re: New web client & future API

2011-06-15 Thread AJ

Nice interface... and someone has good taste in music.

BTW, I'm new to web programming, what did you use for the web 
components?  JSF, JavaScript, something else?


On 6/14/2011 7:42 AM, Markus Wiesenbacher | Codefreun.de wrote:


Hi,

what is the future API for Cassandra? Thrift, Avro, CQL?

I just released an early version of my web client 
(http://www.codefreun.de/apollo) which is Thrift-based, and therefore 
I would like to know what the future is ...


Many thanks
MW




Re: New web client & future API

2011-06-15 Thread Holger Hoffstaette
On Wed, 15 Jun 2011 10:04:53 +1200, aaron morton wrote:

> Avro is dead.

Just so that this is not misunderstood: "for Cassandra".
Avro itself (and -ipc) is far from dead.

-h




Re: last record rowId

2011-06-15 Thread Utku Can Topçu
As far as I can tell, this functionality doesn't exist.

However you can use such a method to insert the rowId into another column
within a seperate row, and request the latest column.
I think this would work for you. However every insert would need a get
request, which I think would be performance issue somehow.

Regards,
Utku

On Wed, Jun 15, 2011 at 11:14 AM, karim abbouh  wrote:

> in my java application,when we try to insert we should all the time know
> the last rowId
> in order the insert the new record in rowId+1,so for that we should save
> this rowId in a file
> is there other way to know the last record rowId?
> thanks
> B.R
>


Re: New web client & future API

2011-06-15 Thread Jeremy Hanna
Yes - avro is alive and well.  Avro as an RPC alternative for Cassandra is 
dead.  See reasoning here: http://goo.gl/urENc

On Jun 15, 2011, at 8:28 AM, Holger Hoffstaette wrote:

> On Wed, 15 Jun 2011 10:04:53 +1200, aaron morton wrote:
> 
>> Avro is dead.
> 
> Just so that this is not misunderstood: "for Cassandra".
> Avro itself (and -ipc) is far from dead.
> 
> -h
> 
> 



Re: Where is my data?

2011-06-15 Thread AJ

Thanks

On 6/15/2011 3:20 AM, Sylvain Lebresne wrote:

You can use the thrift call describe_ring(). It will returns a map
that associate to each range of the
ring who is a replica. Once any range has all it's endpoint
unavailable, that range of the data is unavailable.

--
Sylvain





Re: New web client & future API

2011-06-15 Thread Eric Evans
On Tue, 2011-06-14 at 09:49 -0400, Victor Kabdebon wrote:
> Actually from what I understood (please correct me if I am wrong) CQL
> is based on Thrift / Avro.

In this project, we tend to use the word "Thrift" as a sort of shorthand
for "Cassandra's RPC interface", and not, "The serialization and RPC
framework from the Apache Thrift project".

CQL does not (yet )have its own networking protocol, so it uses Thrift
as a means of delivering queries, and serializing the results, but it is
*not* a wrapper around the existing RPC methods.  The query string you
provide is parsed entirely on the server.

-- 
Eric Evans
eev...@rackspace.com



Re: New web client & future API

2011-06-15 Thread Markus Wiesenbacher | Codefreun.de
I am using a Javascript framework, Sencha ExtJS. The format between UI and 
servlets is JSON.

Thanks for your response and that you agree to my music taste ;)


Am 15.06.2011 um 15:48 schrieb AJ :

> Nice interface... and someone has good taste in music.
> 
> BTW, I'm new to web programming, what did you use for the web components?  
> JSF, JavaScript, something else?
> 
> On 6/14/2011 7:42 AM, Markus Wiesenbacher | Codefreun.de wrote:
>> 
>> 
>> Hi,
>> 
>> what is the future API for Cassandra? Thrift, Avro, CQL?
>> 
>> I just released an early version of my web client 
>> (http://www.codefreun.de/apollo) which is Thrift-based, and therefore I 
>> would like to know what the future is ...
>> 
>> Many thanks
>> MW
> 


Re: New web client & future API

2011-06-15 Thread Victor Kabdebon
Ok thanks for the update. I thought the query string was translated to
Thrift, then send to a server.

Victor Kabdebon

2011/6/15 Eric Evans 

> On Tue, 2011-06-14 at 09:49 -0400, Victor Kabdebon wrote:
> > Actually from what I understood (please correct me if I am wrong) CQL
> > is based on Thrift / Avro.
>
> In this project, we tend to use the word "Thrift" as a sort of shorthand
> for "Cassandra's RPC interface", and not, "The serialization and RPC
> framework from the Apache Thrift project".
>
> CQL does not (yet )have its own networking protocol, so it uses Thrift
> as a means of delivering queries, and serializing the results, but it is
> *not* a wrapper around the existing RPC methods.  The query string you
> provide is parsed entirely on the server.
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Atomicity of batch updates

2011-06-15 Thread Artem Orobets
Hi,
Wiki says that write operation is atomic within ColumnFamily 
(http://wiki.apache.org/cassandra/ArchitectureOverview chapter "write 
properties").
If I use batch update for single CF, and get an exception in last mutation 
operation, is it means that all previous operation will be reverted.
If no, what means atomic in this context?


Re: New web client & future API

2011-06-15 Thread Jeffrey Kesselman
Correct me if I'm wrong, but AFAIK Hector is the  only higher level
APi I would consider "complete' right now, with support for things
like fail-over.

I notice in the latest Hector build he is starting to add CQL support,
so thats what I'm sticking with.  When he has CQL support done I'll
decide if I want to use it or stick with the programmatic API.

On Wed, Jun 15, 2011 at 10:35 AM, Victor Kabdebon
 wrote:
> Ok thanks for the update. I thought the query string was translated to
> Thrift, then send to a server.
>
> Victor Kabdebon
>
> 2011/6/15 Eric Evans 
>>
>> On Tue, 2011-06-14 at 09:49 -0400, Victor Kabdebon wrote:
>> > Actually from what I understood (please correct me if I am wrong) CQL
>> > is based on Thrift / Avro.
>>
>> In this project, we tend to use the word "Thrift" as a sort of shorthand
>> for "Cassandra's RPC interface", and not, "The serialization and RPC
>> framework from the Apache Thrift project".
>>
>> CQL does not (yet )have its own networking protocol, so it uses Thrift
>> as a means of delivering queries, and serializing the results, but it is
>> *not* a wrapper around the existing RPC methods.  The query string you
>> provide is parsed entirely on the server.
>>
>> --
>> Eric Evans
>> eev...@rackspace.com
>>
>
>



-- 
It's always darkest just before you are eaten by a grue.


Re: Forcing Cassandra to free up some space

2011-06-15 Thread Shotaro Kamio
We've encountered the situation that compacted sstable files aren't
deleted after node repair. Even when gc is triggered via jmx, it
sometimes leaves compacted files. In a case, a lot of files are left.
Some files stay more than 10 hours already. There is no guarantee that
gc will cleanup all compacted sstable files.

We have a great interest on the following ticket.
https://issues.apache.org/jira/browse/CASSANDRA-2521


Regards,
Shotaro


On Fri, May 27, 2011 at 11:27 AM, Jeffrey Kesselman  wrote:
> Im also not sure that will guarantee all space is cleaned up.  It
> really depends on what you are doing inside Cassandra.  If you have
> your on garbage collect that is just in some way tied to the gc run,
> then it will run when  it runs.
>
> If otoh you are associating records in your storage with specific
> objects in memory and using one of the post-mortem hooks (finalize or
> PhantomReference) to tell you to clean up that particular record then
> its quite possible they wont all get cleaned up.  In general hotspot
> does not find and clean every candidate object on every GC run.  It
> starts with the easiest/fastest to find and then sees what more it
> thinks it needs to do to create enough memory for anticipated near
> future needs.
>
> On Thu, May 26, 2011 at 10:16 PM, Jonathan Ellis  wrote:
>> In summary, system.gc works fine unless you've deliberately done
>> something like setting the -XX:-DisableExplicitGC flag.
>>
>> On Thu, May 26, 2011 at 5:58 PM, Konstantin  Naryshkin
>>  wrote:
>>> So, in summary, there is no way to predictably and efficiently tell 
>>> Cassandra to get rid of all of the extra space it is using on disk?
>>>
>>> - Original Message -
>>> From: "Jeffrey Kesselman" 
>>> To: user@cassandra.apache.org
>>> Sent: Thursday, May 26, 2011 8:57:49 PM
>>> Subject: Re: Forcing Cassandra to free up some space
>>>
>>> Which JVM?  Which collector?  There have been and continue to be many.
>>>
>>> Hotspot itself supports a number of different collectors with
>>> different behaviors.   Many of them do not collect every candidate on
>>> every gc, but merely the easiest ones to find.  This is why depending
>>> on finalizers is a *bad* idea in java code.  They may well never get
>>> run.  (Finalizer is one of a few features the Sun Java team always
>>> regretted putting in Java to start with.  It has caused quite a few
>>> application problems over the years)
>>>
>>> The really important thing is that NONE of these behaviors of the
>>> colelctors are guaranteed by specification not to change from version
>>> to version.  Basing your code on non-specified behaviors is a good way
>>> to hit mysterious failures on updates.
>>>
>>> For instance, in the mid 90s, IBM had a mode of their Vm called
>>> "infinite heap."  it *never* garbage collected, even if you called
>>> System.gc.  Instead it just threw away address space and counted on
>>> the total memory needs for the life of the program being less then the
>>> total addressable space of the processor.
>>>
>>> It was *very* fast for certain kinds of applications.
>>>
>>> Far from being pedantic, not depending on undocumented behavior is
>>> simply good engineering.
>>>
>>>
>>> On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis  wrote:
 I've read the relevant source. While you're pedantically correct re
 the spec, you're wrong as to what the JVM actually does.

 On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman  
 wrote:
> Some references...
>
> "An object enters an unreachable state when no more strong references
> to it exist. When an object is unreachable, it is a candidate for
> collection. Note the wording: Just because an object is a candidate
> for collection doesn't mean it will be immediately collected. The JVM
> is free to delay collection until there is an immediate need for the
> memory being consumed by the object."
>
> http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394
>
> and "Calling the gc method suggests that the Java Virtual Machine
> expend effort toward recycling unused objects"
>
> http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc()
>
> It goes on to say that the VM will make a "best effort", but "best
> effort" is *deliberately* left up to the definition of the gc
> implementor.
>
> I guess you missed the many lectures I have given on this subject over
> the years at Java One Conferences
>
> On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis  wrote:
>> It's a common misunderstanding that system.gc is only a suggestion; on
>> any VM you're likely to run Cassandra on, System.gc will actually
>> invoke a full collection.
>>
>> On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman  
>> wrote:
>>> Actually this is no gaurantee.   Its a common misunderstanding that
>>> System.gc "forces" gc.  It does not. It is a suggestion only. The v

Re: New web client & future API

2011-06-15 Thread Nate McCall
CQL support in Hector was available as of 0.8.0 release. See details here:
https://github.com/rantav/hector/wiki/Using-CQL



On Wed, Jun 15, 2011 at 9:46 AM, Jeffrey Kesselman  wrote:
> Correct me if I'm wrong, but AFAIK Hector is the  only higher level
> APi I would consider "complete' right now, with support for things
> like fail-over.
>
> I notice in the latest Hector build he is starting to add CQL support,
> so thats what I'm sticking with.  When he has CQL support done I'll
> decide if I want to use it or stick with the programmatic API.
>
> On Wed, Jun 15, 2011 at 10:35 AM, Victor Kabdebon
>  wrote:
>> Ok thanks for the update. I thought the query string was translated to
>> Thrift, then send to a server.
>>
>> Victor Kabdebon
>>
>> 2011/6/15 Eric Evans 
>>>
>>> On Tue, 2011-06-14 at 09:49 -0400, Victor Kabdebon wrote:
>>> > Actually from what I understood (please correct me if I am wrong) CQL
>>> > is based on Thrift / Avro.
>>>
>>> In this project, we tend to use the word "Thrift" as a sort of shorthand
>>> for "Cassandra's RPC interface", and not, "The serialization and RPC
>>> framework from the Apache Thrift project".
>>>
>>> CQL does not (yet )have its own networking protocol, so it uses Thrift
>>> as a means of delivering queries, and serializing the results, but it is
>>> *not* a wrapper around the existing RPC methods.  The query string you
>>> provide is parsed entirely on the server.
>>>
>>> --
>>> Eric Evans
>>> eev...@rackspace.com
>>>
>>
>>
>
>
>
> --
> It's always darkest just before you are eaten by a grue.
>


sstable2json2sstable bug with json data stored

2011-06-15 Thread Timo Nentwig

Hi!

Couldn't google anybody having yet experienced this, so I do (0.8):

{
  "foo":{
"foo":{
  "foo":"bar",
  "foo":"bar",
  "foo":"bar",
  "foo":"",
  "foo":"bar",
  "foo":"bar",
  "id":123456
}  },
  "foo":null
}

(json can likely be boiled down even more...)

[default@foo] set 
transactions[test][data]='{"foo":{"foo":{"foo":"bar","foo":"bar","foo":"bar","foo":"","foo":"bar","foo":"bar","id":123456}},"foo":null}';

$ ./sstable2json /var/lib/cassandra/data/foo/transactions-g-1-Data.db > /tmp/foo
$ cat /tmp/foo
{
"74657374": [["data", 
"{"foo":{"foo":{"foo":"bar","foo":"bar","foo":"bar","foo":"","foo":"bar","foo":"bar","id":123456}},"foo":null}",
 1308152085301000]]
}

$ ./json2sstable -s -c transactions -K foo /tmp/json /tmp/ss-g-1-Data.db
Counting keys to import, please wait... (NOTE: to skip this use -n )
org.codehaus.jackson.JsonParseException: Unexpected character ('f' (code 102)): 
was expecting comma to separate ARRAY entries
 at [Source: /tmp/json; line: 2, column: 27]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:929)
at 
org.codehaus.jackson.impl.JsonParserBase._reportError(JsonParserBase.java:632)
at 
org.codehaus.jackson.impl.JsonParserBase._reportUnexpectedChar(JsonParserBase.java:565)
at 
org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:128)
at 
org.codehaus.jackson.impl.JsonParserBase.skipChildren(JsonParserBase.java:263)
at 
org.apache.cassandra.tools.SSTableImport.importSorted(SSTableImport.java:328)
at 
org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:252)
at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:476)
ERROR: Unexpected character ('f' (code 102)): was expecting comma to separate 
ARRAY entries
 at [Source: /tmp/json; line: 2, column: 27]

create column family transactions
with comparator = AsciiType
and key_validation_class = AsciiType
and default_validation_class = UTF8Type
and keys_cached = 0
and rows_cached = 0
and column_metadata = [{
column_name : uuid,
validation_class : LexicalUUIDType,
index_name : uuid_idx,
index_type : 0
}, {
column_name : session_id,
validation_class : LexicalUUIDType,
index_name : session_id_idx,
index_type : 0
}, {
column_name : guid,
validation_class : LexicalUUIDType,
index_name : guid_idx,
index_type : 0
}, {
column_name : timestamp,
validation_class : LongType
}, {
column_name : completed,
validation_class : BytesType
}, {
column_name : user_id,
validation_class : LongType
}];
;



Re: cascading failures due to memory

2011-06-15 Thread AJ

Sasha,

Did you ever nail down the cause of this problem?

On 5/31/2011 4:01 AM, Sasha Dolgy wrote:

hi everyone,

the current nodes i have deployed (4) have all been working fine, with
not a lot of data ... more reads than writes at the moment.  as i had
monitoring disabled, when one node's OS killed the cassandra process
due to out of memory problems ... that was fine.  24 hours later,
another node, 24 hours later, another node ...until finally, all 4
nodes no longer had cassandra running.

When all nodes are started fresh, CPU utilization is at about 21% on
each box.  after 24 hours, this goes up to 32% and then 51% 24 hours
later.

originally I had thought that this may be a result of 'nodetool
repair' not being run consistently ... after adding a cronjob to run
every 24 hours (staggered between nodes) the problem of the increasing
memory utilization does not resolve.

i've read the operations page and also the
http://wiki.apache.org/cassandra/MemtableThresholds page.  i am
running defaults and 0.7.6-02 ...

what are the best places to start in terms of finding why this is
happening?  CF design / usage?  'nodetool cfstats' gives me some good
info ... and i've already implemented some changes to one CF based on
how it had ballooned (too many rows versus not enough columns)

suggestions appreciated





Re: Forcing Cassandra to free up some space

2011-06-15 Thread Terje Marthinussen
Even if the gc call cleaned all files, it is not really acceptable on a
decent sized cluster due to the impact full gc has on performance.
Especially non-needed ones.

The delay in file deletion can also at times make it hard to see how much
spare disk you actually have.

We easily see 100% increase in disk use which extends for long periods of
time before anything gets cleaned up. This can be quite misleading and I
believe on a couple of occasions we seen short term full disk scenarios
during testing as a result of cleanup not happening entirely when it
should...

Terje

On Wed, Jun 15, 2011 at 11:50 PM, Shotaro Kamio  wrote:

> We've encountered the situation that compacted sstable files aren't
> deleted after node repair. Even when gc is triggered via jmx, it
> sometimes leaves compacted files. In a case, a lot of files are left.
> Some files stay more than 10 hours already. There is no guarantee that
> gc will cleanup all compacted sstable files.
>
> We have a great interest on the following ticket.
> https://issues.apache.org/jira/browse/CASSANDRA-2521
>
>
> Regards,
> Shotaro
>
>
> On Fri, May 27, 2011 at 11:27 AM, Jeffrey Kesselman 
> wrote:
> > Im also not sure that will guarantee all space is cleaned up.  It
> > really depends on what you are doing inside Cassandra.  If you have
> > your on garbage collect that is just in some way tied to the gc run,
> > then it will run when  it runs.
> >
> > If otoh you are associating records in your storage with specific
> > objects in memory and using one of the post-mortem hooks (finalize or
> > PhantomReference) to tell you to clean up that particular record then
> > its quite possible they wont all get cleaned up.  In general hotspot
> > does not find and clean every candidate object on every GC run.  It
> > starts with the easiest/fastest to find and then sees what more it
> > thinks it needs to do to create enough memory for anticipated near
> > future needs.
> >
> > On Thu, May 26, 2011 at 10:16 PM, Jonathan Ellis 
> wrote:
> >> In summary, system.gc works fine unless you've deliberately done
> >> something like setting the -XX:-DisableExplicitGC flag.
> >>
> >> On Thu, May 26, 2011 at 5:58 PM, Konstantin  Naryshkin
> >>  wrote:
> >>> So, in summary, there is no way to predictably and efficiently tell
> Cassandra to get rid of all of the extra space it is using on disk?
> >>>
> >>> - Original Message -
> >>> From: "Jeffrey Kesselman" 
> >>> To: user@cassandra.apache.org
> >>> Sent: Thursday, May 26, 2011 8:57:49 PM
> >>> Subject: Re: Forcing Cassandra to free up some space
> >>>
> >>> Which JVM?  Which collector?  There have been and continue to be many.
> >>>
> >>> Hotspot itself supports a number of different collectors with
> >>> different behaviors.   Many of them do not collect every candidate on
> >>> every gc, but merely the easiest ones to find.  This is why depending
> >>> on finalizers is a *bad* idea in java code.  They may well never get
> >>> run.  (Finalizer is one of a few features the Sun Java team always
> >>> regretted putting in Java to start with.  It has caused quite a few
> >>> application problems over the years)
> >>>
> >>> The really important thing is that NONE of these behaviors of the
> >>> colelctors are guaranteed by specification not to change from version
> >>> to version.  Basing your code on non-specified behaviors is a good way
> >>> to hit mysterious failures on updates.
> >>>
> >>> For instance, in the mid 90s, IBM had a mode of their Vm called
> >>> "infinite heap."  it *never* garbage collected, even if you called
> >>> System.gc.  Instead it just threw away address space and counted on
> >>> the total memory needs for the life of the program being less then the
> >>> total addressable space of the processor.
> >>>
> >>> It was *very* fast for certain kinds of applications.
> >>>
> >>> Far from being pedantic, not depending on undocumented behavior is
> >>> simply good engineering.
> >>>
> >>>
> >>> On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis 
> wrote:
>  I've read the relevant source. While you're pedantically correct re
>  the spec, you're wrong as to what the JVM actually does.
> 
>  On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman 
> wrote:
> > Some references...
> >
> > "An object enters an unreachable state when no more strong references
> > to it exist. When an object is unreachable, it is a candidate for
> > collection. Note the wording: Just because an object is a candidate
> > for collection doesn't mean it will be immediately collected. The JVM
> > is free to delay collection until there is an immediate need for the
> > memory being consumed by the object."
> >
> >
> http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394
> >
> > and "Calling the gc method suggests that the Java Virtual Machine
> > expend effort toward recycling unused objects"
> >
> >
> http://download.oracle.com/javase/6

Re: Forcing Cassandra to free up some space

2011-06-15 Thread Terje Marthinussen
On Thu, Jun 16, 2011 at 12:48 AM, Terje Marthinussen <
tmarthinus...@gmail.com> wrote:

> Even if the gc call cleaned all files, it is not really acceptable on a
> decent sized cluster due to the impact full gc has on performance.
> Especially non-needed ones.
>
>
Not acceptable as running GC on every node in the cluster will further
increase the time period when you have degraded performance.

Terje


What triggers hint delivery?

2011-06-15 Thread Terje Marthinussen
Hi,

I was looking quickly at source code tonight.
As far as I could see from a quick code scan, hint delivery is only
triggered as a state change from a node is down to when it enters up state?

If this is indeed the case, it would potentially explain why we sometimes
have hints on machines which does not seem to get played back, but I got a
feeling I must have been missing something  when I scanned the code :)

Terje


Re: cascading failures due to memory

2011-06-15 Thread Sasha Dolgy
No.  Upgraded to 0.8 and monitor the systems more.  we schedule a repair
every 24hrs via cron and so far no problems..
On Jun 15, 2011 5:44 PM, "AJ"  wrote:
> Sasha,
>
> Did you ever nail down the cause of this problem?
>
> On 5/31/2011 4:01 AM, Sasha Dolgy wrote:
>> hi everyone,
>>
>> the current nodes i have deployed (4) have all been working fine, with
>> not a lot of data ... more reads than writes at the moment. as i had
>> monitoring disabled, when one node's OS killed the cassandra process
>> due to out of memory problems ... that was fine. 24 hours later,
>> another node, 24 hours later, another node ...until finally, all 4
>> nodes no longer had cassandra running.
>>
>> When all nodes are started fresh, CPU utilization is at about 21% on
>> each box. after 24 hours, this goes up to 32% and then 51% 24 hours
>> later.
>>
>> originally I had thought that this may be a result of 'nodetool
>> repair' not being run consistently ... after adding a cronjob to run
>> every 24 hours (staggered between nodes) the problem of the increasing
>> memory utilization does not resolve.
>>
>> i've read the operations page and also the
>> http://wiki.apache.org/cassandra/MemtableThresholds page. i am
>> running defaults and 0.7.6-02 ...
>>
>> what are the best places to start in terms of finding why this is
>> happening? CF design / usage? 'nodetool cfstats' gives me some good
>> info ... and i've already implemented some changes to one CF based on
>> how it had ballooned (too many rows versus not enough columns)
>>
>> suggestions appreciated
>>
>


Re: Forcing Cassandra to free up some space

2011-06-15 Thread AJ
In regards to cleaning-up old sstable files, I posed this question 
before as I noticed after taking a snapshot, the older files 
(pre-compaction) shared no links with the snapshots.  Therefore, (if the 
Cass snapshot functionality is working correctly) those older files can 
be manually deleted.  The reasoning is simply because if you were to do 
a backup based on the snapshots that Cass created, then those older 
(pre-compation) files would be left-out of the backup.  Therefore, they 
are no longer needed.


But, I never got a definitive answer to this.  If the Cass snapshot 
functionality can be relied upon with 100% confidence, then all you have 
to do is take a snapshot, then delete all the files with hard links <= 1 
and with mod times prior to the snapshotted files.  But, again, this is 
only considered safe if the Cass snapshot function is 100% reliable.  I 
have no reason to believe it's not... just saying.


On 6/15/2011 9:48 AM, Terje Marthinussen wrote:
Even if the gc call cleaned all files, it is not really acceptable on 
a decent sized cluster due to the impact full gc has on performance. 
Especially non-needed ones.


The delay in file deletion can also at times make it hard to see how 
much spare disk you actually have.


We easily see 100% increase in disk use which extends for long periods 
of time before anything gets cleaned up. This can be quite misleading 
and I believe on a couple of occasions we seen short term full disk 
scenarios during testing as a result of cleanup not happening entirely 
when it should...


Terje

On Wed, Jun 15, 2011 at 11:50 PM, Shotaro Kamio > wrote:


We've encountered the situation that compacted sstable files aren't
deleted after node repair. Even when gc is triggered via jmx, it
sometimes leaves compacted files. In a case, a lot of files are left.
Some files stay more than 10 hours already. There is no guarantee that
gc will cleanup all compacted sstable files.

We have a great interest on the following ticket.
https://issues.apache.org/jira/browse/CASSANDRA-2521


Regards,
Shotaro


On Fri, May 27, 2011 at 11:27 AM, Jeffrey Kesselman
mailto:jef...@gmail.com>> wrote:
> Im also not sure that will guarantee all space is cleaned up.  It
> really depends on what you are doing inside Cassandra.  If you have
> your on garbage collect that is just in some way tied to the gc run,
> then it will run when  it runs.
>
> If otoh you are associating records in your storage with specific
> objects in memory and using one of the post-mortem hooks
(finalize or
> PhantomReference) to tell you to clean up that particular record
then
> its quite possible they wont all get cleaned up.  In general hotspot
> does not find and clean every candidate object on every GC run.  It
> starts with the easiest/fastest to find and then sees what more it
> thinks it needs to do to create enough memory for anticipated near
> future needs.
>
> On Thu, May 26, 2011 at 10:16 PM, Jonathan Ellis
mailto:jbel...@gmail.com>> wrote:
>> In summary, system.gc works fine unless you've deliberately done
>> something like setting the -XX:-DisableExplicitGC flag.
>>
>> On Thu, May 26, 2011 at 5:58 PM, Konstantin  Naryshkin
>> mailto:konstant...@a-bb.net>> wrote:
>>> So, in summary, there is no way to predictably and efficiently
tell Cassandra to get rid of all of the extra space it is using on
disk?
>>>
>>> - Original Message -
>>> From: "Jeffrey Kesselman" mailto:jef...@gmail.com>>
>>> To: user@cassandra.apache.org 
>>> Sent: Thursday, May 26, 2011 8:57:49 PM
>>> Subject: Re: Forcing Cassandra to free up some space
>>>
>>> Which JVM?  Which collector?  There have been and continue to
be many.
>>>
>>> Hotspot itself supports a number of different collectors with
>>> different behaviors.   Many of them do not collect every
candidate on
>>> every gc, but merely the easiest ones to find.  This is why
depending
>>> on finalizers is a *bad* idea in java code.  They may well
never get
>>> run.  (Finalizer is one of a few features the Sun Java team always
>>> regretted putting in Java to start with.  It has caused quite
a few
>>> application problems over the years)
>>>
>>> The really important thing is that NONE of these behaviors of the
>>> colelctors are guaranteed by specification not to change from
version
>>> to version.  Basing your code on non-specified behaviors is a
good way
>>> to hit mysterious failures on updates.
>>>
>>> For instance, in the mid 90s, IBM had a mode of their Vm called
>>> "infinite heap."  it *never* garbage collected, even if you called
>>> System.gc.  Instead it just threw away address space and
counted on
>>> the total memory need

Re: Forcing Cassandra to free up some space

2011-06-15 Thread Ryan King
There's a ticket open to address this:

https://issues.apache.org/jira/browse/CASSANDRA-1974

-ryan

On Wed, Jun 15, 2011 at 8:49 AM, Terje Marthinussen
 wrote:
>
>
> On Thu, Jun 16, 2011 at 12:48 AM, Terje Marthinussen
>  wrote:
>>
>> Even if the gc call cleaned all files, it is not really acceptable on a
>> decent sized cluster due to the impact full gc has on performance.
>> Especially non-needed ones.
>>
>
> Not acceptable as running GC on every node in the cluster will further
> increase the time period when you have degraded performance.
>
> Terje
>


Re: Forcing Cassandra to free up some space

2011-06-15 Thread Peter Schuller
> Even if the gc call cleaned all files, it is not really acceptable on a
> decent sized cluster due to the impact full gc has on performance.
> Especially non-needed ones.

You can run with -XX:+ExplicitGCInvokesConcurrent to "safely" trigger
CMS cycles. However that also means System.gc() semantics changes so
I'm not sure off hand what'll happen to the auto-system.gc code in
cassandra that attempts to free space.

CASSANDRA-2521 is IMO the real solution.

-- 
/ Peter Schuller


Re: last record rowId

2011-06-15 Thread Jonathan Ellis
You're better served using UUIDs than numeric row IDs for surrogate
keys.  (Of course natural keys work fine too.)

On Wed, Jun 15, 2011 at 9:16 AM, Utku Can Topçu  wrote:
> As far as I can tell, this functionality doesn't exist.
>
> However you can use such a method to insert the rowId into another column
> within a seperate row, and request the latest column.
> I think this would work for you. However every insert would need a get
> request, which I think would be performance issue somehow.
>
> Regards,
> Utku
>
> On Wed, Jun 15, 2011 at 11:14 AM, karim abbouh  wrote:
>>
>> in my java application,when we try to insert we should all the time know
>> the last rowId
>> in order the insert the new record in rowId+1,so for that we should save
>> this rowId in a file
>> is there other way to know the last record rowId?
>> thanks
>> B.R
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: What triggers hint delivery?

2011-06-15 Thread Jonathan Ellis
On Wed, Jun 15, 2011 at 10:53 AM, Terje Marthinussen
 wrote:
> I was looking quickly at source code tonight.
> As far as I could see from a quick code scan, hint delivery is only
> triggered as a state change from a node is down to when it enters up state?

Right.

> If this is indeed the case, it would potentially explain why we sometimes
> have hints on machines which does not seem to get played back

Why is that?  Hints don't get created in the first place unless a node
is in the down state.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


useful little way to run locally with (pig|hive) && cassandra

2011-06-15 Thread Jeremy Hanna
We started doing this recently and thought it might be useful to others.

Pig (and Hive) have a sample function that allows you to sample data from your 
data store.

In pig it looks something like this:
mysample = SAMPLE myrelation 0.01;

One possible use for this, with pig and cassandra is to solve a conundrum of 
testing locally.  We've wondered how to do this so we decided to do sampling of 
a column family (or set of CFs), store into HDFS (or CFS), download locally, 
then import into your local Cassandra node.  That gives you real data to test 
against with pig/hive or for other purposes.

That way, when you're flying out to the Hadoop Summit or the Cassandra SF 
event, you can play with real data :).

Maybe others have been doing this for years, but if not, we're finding it handy.

Jeremy

Force a node to form part of quorum

2011-06-15 Thread A J
Is there a way to favor a node to always participate (or never
participate) towards fulfillment of read consistency as well as write
consistency ?

Thanks
AJ


Re: Docs: Token Selection

2011-06-15 Thread Vijay
The problem in the above approach is you have 2 nodes between 12 to 4 in DC1
but from 4 to 12  you just have 1 (Which will cause uneven distribution
of data the node)
It is easier to think of the DCs as ring and split equally and interleave
them together

DC1 Node 1 : token 0
DC1 Node 2 : token 8..

DC2 Node 1 : token 4..
DC2 Node 1 : token 12..

Regards,




On Tue, Jun 14, 2011 at 7:31 PM, AJ  wrote:

>  Yes, which means that the ranges overlap each other.
>
> Is this just a convention, or is it technically required when using
> NetworkTopologyStrategy?  Would it be acceptable to split the ranges into
> quarters by ignoring the data centers, such as:
>
> DC1
> node 1 = 0  Range: (12, 16], (0, 0]
> node 2 = 4  Range: (0, 4]
>
> DC2
> node 3 = 8  Range: (4, 8]
> node 4 = 12   Range: (8, 12]
>
> If this is OK, are there any drawbacks to this?
>
>
>
> On 6/14/2011 6:10 PM, Vijay wrote:
>
> Yes... Thats right...  If you are trying to say the below...
>
>  DC1
>  Node1 Owns 50%
>
>  (Ranges 8..4 -> 8..5 & 8..5 -> 0)
>
> Node2 Owns 50%
>
> (Ranges 0 -> 1 & 1 -> 8..4)
>
>
>  DC2
>  Node1 Owns 50%
>
>  (Ranges 8..5 -> 0 & 0 -> 1)
>
>   Node2 Owns 50%
>
>   (Ranges 1 -> 8..4 & 8..4 -> 8..5)
>
>
>  Regards,
> 
>
>
>
> On Tue, Jun 14, 2011 at 3:47 PM, AJ  wrote:
>
>> This http://wiki.apache.org/cassandra/Operations#Token_selection  says:
>>
>> "With NetworkTopologyStrategy, you should calculate the tokens the nodes
>> in each DC independantly."
>>
>> and gives the example:
>>
>> DC1
>> node 1 = 0
>> node 2 = 85070591730234615865843651857942052864
>>
>> DC2
>> node 3 = 1
>> node 4 = 85070591730234615865843651857942052865
>>
>>
>> So, according to the above, the token ranges would be (abbreviated nums):
>>
>> DC1
>> node 1 = 0  Range: (8..4, 16], (0, 0]
>> node 2 = 8..4   Range: (0, 8..4]
>>
>> DC2
>> node 3 = 1  Range: (8..5, 16], (0, 1]
>> node 4 = 8..5   Range: (1, 8..5]
>>
>>
>> If the above is correct, then I would be surprised as this paragraph is
>> the only place were one would discover this and may be easy to miss...
>> unless there's a doc buried somewhere in plain view that I missed.
>>
>> So, have I interpreted this paragraph correctly?  Was this design to help
>> keep data somewhat localized if that was important, such as a geographically
>> dispersed DC?
>>
>> Thanks!
>>
>
>
>


Re: Docs: Token Selection

2011-06-15 Thread Vijay
Correction

"The problem in the above approach is you have 2 nodes between 12 to 4 in
DC1 but from 4 to 12  you just have 1"

should be

"The problem in the above approach is you have 1 node between 0-4 (25%) and
and one node covering the rest which is 4-16, 0-0 (75%)"

Regards,




On Wed, Jun 15, 2011 at 11:10 AM, Vijay  wrote:

> The problem in the above approach is you have 2 nodes between 12 to 4 in
> DC1 but from 4 to 12  you just have 1 (Which will cause uneven
> distribution of data the node)
> It is easier to think of the DCs as ring and split equally and interleave
> them together
>
> DC1 Node 1 : token 0
> DC1 Node 2 : token 8..
>
> DC2 Node 1 : token 4..
> DC2 Node 1 : token 12..
>
> Regards,
> 
>
>
>
>
> On Tue, Jun 14, 2011 at 7:31 PM, AJ  wrote:
>
>>  Yes, which means that the ranges overlap each other.
>>
>> Is this just a convention, or is it technically required when using
>> NetworkTopologyStrategy?  Would it be acceptable to split the ranges into
>> quarters by ignoring the data centers, such as:
>>
>> DC1
>> node 1 = 0  Range: (12, 16], (0, 0]
>> node 2 = 4  Range: (0, 4]
>>
>> DC2
>> node 3 = 8  Range: (4, 8]
>> node 4 = 12   Range: (8, 12]
>>
>> If this is OK, are there any drawbacks to this?
>>
>>
>>
>> On 6/14/2011 6:10 PM, Vijay wrote:
>>
>> Yes... Thats right...  If you are trying to say the below...
>>
>>  DC1
>>  Node1 Owns 50%
>>
>>  (Ranges 8..4 -> 8..5 & 8..5 -> 0)
>>
>> Node2 Owns 50%
>>
>> (Ranges 0 -> 1 & 1 -> 8..4)
>>
>>
>>  DC2
>>  Node1 Owns 50%
>>
>>  (Ranges 8..5 -> 0 & 0 -> 1)
>>
>>   Node2 Owns 50%
>>
>>   (Ranges 1 -> 8..4 & 8..4 -> 8..5)
>>
>>
>>  Regards,
>> 
>>
>>
>>
>> On Tue, Jun 14, 2011 at 3:47 PM, AJ  wrote:
>>
>>> This http://wiki.apache.org/cassandra/Operations#Token_selection  says:
>>>
>>> "With NetworkTopologyStrategy, you should calculate the tokens the nodes
>>> in each DC independantly."
>>>
>>> and gives the example:
>>>
>>> DC1
>>> node 1 = 0
>>> node 2 = 85070591730234615865843651857942052864
>>>
>>> DC2
>>> node 3 = 1
>>> node 4 = 85070591730234615865843651857942052865
>>>
>>>
>>> So, according to the above, the token ranges would be (abbreviated nums):
>>>
>>> DC1
>>> node 1 = 0  Range: (8..4, 16], (0, 0]
>>> node 2 = 8..4   Range: (0, 8..4]
>>>
>>> DC2
>>> node 3 = 1  Range: (8..5, 16], (0, 1]
>>> node 4 = 8..5   Range: (1, 8..5]
>>>
>>>
>>> If the above is correct, then I would be surprised as this paragraph is
>>> the only place were one would discover this and may be easy to miss...
>>> unless there's a doc buried somewhere in plain view that I missed.
>>>
>>> So, have I interpreted this paragraph correctly?  Was this design to help
>>> keep data somewhat localized if that was important, such as a geographically
>>> dispersed DC?
>>>
>>> Thanks!
>>>
>>
>>
>>
>


prep for cassandra storage from pig

2011-06-15 Thread William Oberman
I think I'm stuck on typing issues trying to store data in cassandra.  To
verify, cassandra wants (key, {tuples})

My pig script is fairly brief:
raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS
(key:chararray, columns:bag {column:tuple (name, value)});
--colums == timeUUID -> JSON
rows = FOREACH raw GENERATE key, FLATTEN(columns);
alias_target_day = FOREACH rows {
--I wrote a specialized parser that does exactly what I need
observation_map = com.civicscience.pig.ParseObservation($2);
GENERATE $0 as alias, observation_map#'_fqt' as target,
observation_map#'_day' as day;
};
grouping = GROUP alias_target_day BY ((chararray)target,(chararray)day);
X = FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1,
COUNT($1)) as day_count;

This gets me:
(targetA, (day1, count))
(targetA, (day2, count))
(targetB, (day1, count))


But, cassandra wants the 2nd item to be a bag.  So, I tried:
X = FOREACH grouping GENERATE group.$0 as target, TOBAG(TOTUPLE(group.$1,
COUNT($1))) as day_count;

But this results in:
(targetA, {((day1, count))})
(targetA, {((day2, count))})
(targetB, {((day1, count))})
It's hard to see, but the 2nd item now has a nested tuple as the first
value, which is still bad.

How to I get (key, {tuple})???  I wasn't sure where to post this (pig or
cassandra), so I'm posting to the pig list too.

will


Re: Atomicity of batch updates

2011-06-15 Thread chovatia jaydeep
Cassandra write operation is atomic for all the columns/super columns for a 
given row key in Column Family. So in your case not all previous operations 
(assuming each operation was on separate key) will be reverted.

Thank you,
Jaydeep




From: Artem Orobets 
To: "user@cassandra.apache.org" 
Cc: Andrey Lomakin 
Sent: Wednesday, 15 June 2011 7:42 AM
Subject: Atomicity of batch updates


 
Hi,
Wiki says that write operation is atomic within ColumnFamily
(http://wiki.apache.org/cassandra/ArchitectureOverview chapter “write 
properties”).
If I use batch update for single CF, and get an exception in
last mutation operation, is it means that all previous operation will be
reverted.
If no, what means atomic in this context?

upgrading from cassandra 0.7.3 to 0.8.0

2011-06-15 Thread Anurag Gujral
Hi All,
  I had a cassandra node which was running on cassandra 0.7.3.
Without changing the data directories I installed cassandra 0.8.0 but when I
query data I get timeouts.
Can somehow please guide me how to go about upgrade from cassandra 0.7.3 to
cassandra 0.8.0.

Thanks
Anurag


Re: useful little way to run locally with (pig|hive) && cassandra

2011-06-15 Thread Jeremy Hanna
Cool - thanks Dmitriy!

On Jun 15, 2011, at 12:54 PM, Dmitriy Ryaboy wrote:

> Another tip:
> If you parametrize your load statements, it becomes easy to switch
> between loading from something like Cassandra, and reading from HDFS
> or local fs directly.
> 
> Also:
> Try using Pig's "illustrate" command when working through your flows
> -- it does some clever things that go far beyond simple random
> sampling of source data, in order to ensure that you can see the
> effects of doing filters, that joins get (possibly artificial)
> matching keys even if you sampled in a way that didn't actually
> produce any, etc.
> 
> D
> 
> On Wed, Jun 15, 2011 at 10:35 AM, Jeremy Hanna
>  wrote:
>> We started doing this recently and thought it might be useful to others.
>> 
>> Pig (and Hive) have a sample function that allows you to sample data from 
>> your data store.
>> 
>> In pig it looks something like this:
>> mysample = SAMPLE myrelation 0.01;
>> 
>> One possible use for this, with pig and cassandra is to solve a conundrum of 
>> testing locally.  We've wondered how to do this so we decided to do sampling 
>> of a column family (or set of CFs), store into HDFS (or CFS), download 
>> locally, then import into your local Cassandra node.  That gives you real 
>> data to test against with pig/hive or for other purposes.
>> 
>> That way, when you're flying out to the Hadoop Summit or the Cassandra SF 
>> event, you can play with real data :).
>> 
>> Maybe others have been doing this for years, but if not, we're finding it 
>> handy.
>> 
>> Jeremy



Re: When does it make sense to use TimeUUID?

2011-06-15 Thread chovatia jaydeep
Hi Sameer,

One example is, store all the tweets for a given user in a Column 
Family, where row key is user name/user id and column name is of 
TimeUUID type that  represents tweet arrival time. User would generally 
like to see the tweets sorted based on its arrival time. So TimeUUID 
will help here.

Thank you,
Jaydeep



From: Sameer Farooqui 
To: user@cassandra.apache.org
Sent: Tuesday, 14 June 2011 5:16 PM
Subject: When does it make sense to use TimeUUID?


I would like to store some timestamped user info in a Column Family with the 
usernames as the row key and different timestamps as column names. Each user 
might have a thousand timestamped data.

I understand that the ver 1 UUIDs that Cassandra combines the MAC address of 
the computer generating the UUID with the number of 100-nanosecond intervals 
since the beginning of the Gregorian calendar.

So, if user1 had data stored for an event at Jan 30, 2011/2:15pm and user2 had 
an event at the exact same time, the data could potentially be stored in 
different column names? So, I would have to know the MAC of the generating 
computer in order to do a column slice, right? 

When does it make sense to use TimeUUID vs just a time string like 
20110130141500 and comparator type UTF8?

- Sameer

Re: upgrading from cassandra 0.7.3 to 0.8.0

2011-06-15 Thread Jonathan Ellis
Are there exceptions in the Cassandra log?

On Wed, Jun 15, 2011 at 1:54 PM, Anurag Gujral  wrote:
> Hi All,
>           I had a cassandra node which was running on cassandra 0.7.3.
> Without changing the data directories I installed cassandra 0.8.0 but when I
> query data I get timeouts.
> Can somehow please guide me how to go about upgrade from cassandra 0.7.3 to
> cassandra 0.8.0.
> Thanks
> Anurag



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: prep for cassandra storage from pig

2011-06-15 Thread Jeremy Hanna
Hi Will,

That's partly why I like to use FromCassandraBag and ToCassandraBag from 
pygmalion - it does the work for you to get it back into a form that cassandra 
understands.

Others may know better how to massage the data into that form using just pig, 
but if all else fails, you could write a udf to do that.

Jeremy

On Jun 15, 2011, at 1:17 PM, William Oberman wrote:

> I think I'm stuck on typing issues trying to store data in cassandra.  To 
> verify, cassandra wants (key, {tuples})
> 
> My pig script is fairly brief:
> raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS 
> (key:chararray, columns:bag {column:tuple (name, value)});
> --colums == timeUUID -> JSON
> rows = FOREACH raw GENERATE key, FLATTEN(columns);
> alias_target_day = FOREACH rows {
> --I wrote a specialized parser that does exactly what I need
> observation_map = com.civicscience.pig.ParseObservation($2);
> GENERATE $0 as alias, observation_map#'_fqt' as target, 
> observation_map#'_day' as day;
> };
> grouping = GROUP alias_target_day BY ((chararray)target,(chararray)day);
> X = FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1, 
> COUNT($1)) as day_count;
> 
> This gets me:
> (targetA, (day1, count))
> (targetA, (day2, count))
> (targetB, (day1, count))
> 
> 
> But, cassandra wants the 2nd item to be a bag.  So, I tried:
> X = FOREACH grouping GENERATE group.$0 as target, TOBAG(TOTUPLE(group.$1, 
> COUNT($1))) as day_count;
> 
> But this results in:
> (targetA, {((day1, count))})
> (targetA, {((day2, count))})
> (targetB, {((day1, count))})
> It's hard to see, but the 2nd item now has a nested tuple as the first value, 
> which is still bad.
> 
> How to I get (key, {tuple})???  I wasn't sure where to post this (pig or 
> cassandra), so I'm posting to the pig list too.
> 
> will



Re: prep for cassandra storage from pig

2011-06-15 Thread William Oberman
My problem is the column names are dynamic (a date), and pygmalion seems to
want the column names to be fixed at "compile time" (the script).

On Wed, Jun 15, 2011 at 3:04 PM, Jeremy Hanna wrote:

> Hi Will,
>
> That's partly why I like to use FromCassandraBag and ToCassandraBag from
> pygmalion - it does the work for you to get it back into a form that
> cassandra understands.
>
> Others may know better how to massage the data into that form using just
> pig, but if all else fails, you could write a udf to do that.
>
> Jeremy
>
> On Jun 15, 2011, at 1:17 PM, William Oberman wrote:
>
> > I think I'm stuck on typing issues trying to store data in cassandra.  To
> verify, cassandra wants (key, {tuples})
> >
> > My pig script is fairly brief:
> > raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS
> (key:chararray, columns:bag {column:tuple (name, value)});
> > --colums == timeUUID -> JSON
> > rows = FOREACH raw GENERATE key, FLATTEN(columns);
> > alias_target_day = FOREACH rows {
> > --I wrote a specialized parser that does exactly what I need
> > observation_map = com.civicscience.pig.ParseObservation($2);
> > GENERATE $0 as alias, observation_map#'_fqt' as target,
> observation_map#'_day' as day;
> > };
> > grouping = GROUP alias_target_day BY ((chararray)target,(chararray)day);
> > X = FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1,
> COUNT($1)) as day_count;
> >
> > This gets me:
> > (targetA, (day1, count))
> > (targetA, (day2, count))
> > (targetB, (day1, count))
> > 
> >
> > But, cassandra wants the 2nd item to be a bag.  So, I tried:
> > X = FOREACH grouping GENERATE group.$0 as target, TOBAG(TOTUPLE(group.$1,
> COUNT($1))) as day_count;
> >
> > But this results in:
> > (targetA, {((day1, count))})
> > (targetA, {((day2, count))})
> > (targetB, {((day1, count))})
> > It's hard to see, but the 2nd item now has a nested tuple as the first
> value, which is still bad.
> >
> > How to I get (key, {tuple})???  I wasn't sure where to post this (pig or
> cassandra), so I'm posting to the pig list too.
> >
> > will
>
>


-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com


Re: prep for cassandra storage from pig

2011-06-15 Thread William Oberman
I'll do a reply all, to keep this more consistent (sorry!).

Rather than staying stuck, I wrote a custom function: TupleToBagOfTuple. I'm
curious if I could have avoided it with proper pig scripting though.

On Wed, Jun 15, 2011 at 3:08 PM, William Oberman
wrote:

> My problem is the column names are dynamic (a date), and pygmalion seems to
> want the column names to be fixed at "compile time" (the script).
>
>
> On Wed, Jun 15, 2011 at 3:04 PM, Jeremy Hanna 
> wrote:
>
>> Hi Will,
>>
>> That's partly why I like to use FromCassandraBag and ToCassandraBag from
>> pygmalion - it does the work for you to get it back into a form that
>> cassandra understands.
>>
>> Others may know better how to massage the data into that form using just
>> pig, but if all else fails, you could write a udf to do that.
>>
>> Jeremy
>>
>> On Jun 15, 2011, at 1:17 PM, William Oberman wrote:
>>
>> > I think I'm stuck on typing issues trying to store data in cassandra.
>>  To verify, cassandra wants (key, {tuples})
>> >
>> > My pig script is fairly brief:
>> > raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS
>> (key:chararray, columns:bag {column:tuple (name, value)});
>> > --colums == timeUUID -> JSON
>> > rows = FOREACH raw GENERATE key, FLATTEN(columns);
>> > alias_target_day = FOREACH rows {
>> > --I wrote a specialized parser that does exactly what I need
>> > observation_map = com.civicscience.pig.ParseObservation($2);
>> > GENERATE $0 as alias, observation_map#'_fqt' as target,
>> observation_map#'_day' as day;
>> > };
>> > grouping = GROUP alias_target_day BY ((chararray)target,(chararray)day);
>> > X = FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1,
>> COUNT($1)) as day_count;
>> >
>> > This gets me:
>> > (targetA, (day1, count))
>> > (targetA, (day2, count))
>> > (targetB, (day1, count))
>> > 
>> >
>> > But, cassandra wants the 2nd item to be a bag.  So, I tried:
>> > X = FOREACH grouping GENERATE group.$0 as target,
>> TOBAG(TOTUPLE(group.$1, COUNT($1))) as day_count;
>> >
>> > But this results in:
>> > (targetA, {((day1, count))})
>> > (targetA, {((day2, count))})
>> > (targetB, {((day1, count))})
>> > It's hard to see, but the 2nd item now has a nested tuple as the first
>> value, which is still bad.
>> >
>> > How to I get (key, {tuple})???  I wasn't sure where to post this (pig or
>> cassandra), so I'm posting to the pig list too.
>> >
>> > will
>>
>>
>
>
> --
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) ober...@civicscience.com
>



-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com


Re: prep for cassandra storage from pig

2011-06-15 Thread Jeremy Hanna
Yeah - for completely dynamic column names, then yeah - From/To Cassandra Bag 
doesn't handle that.  It does handle prefixed names though - like link* will 
get a bag of all the columns that start with link.  But sounds like you are 
doing what I would have to do if I got into a nested data conundrum.  Like I 
said, others may have better advice for getting the data the way you want it.

On Jun 15, 2011, at 2:08 PM, William Oberman wrote:

> My problem is the column names are dynamic (a date), and pygmalion seems to
> want the column names to be fixed at "compile time" (the script).
> 
> On Wed, Jun 15, 2011 at 3:04 PM, Jeremy Hanna 
> wrote:
> 
>> Hi Will,
>> 
>> That's partly why I like to use FromCassandraBag and ToCassandraBag from
>> pygmalion - it does the work for you to get it back into a form that
>> cassandra understands.
>> 
>> Others may know better how to massage the data into that form using just
>> pig, but if all else fails, you could write a udf to do that.
>> 
>> Jeremy
>> 
>> On Jun 15, 2011, at 1:17 PM, William Oberman wrote:
>> 
>>> I think I'm stuck on typing issues trying to store data in cassandra.  To
>> verify, cassandra wants (key, {tuples})
>>> 
>>> My pig script is fairly brief:
>>> raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS
>> (key:chararray, columns:bag {column:tuple (name, value)});
>>> --colums == timeUUID -> JSON
>>> rows = FOREACH raw GENERATE key, FLATTEN(columns);
>>> alias_target_day = FOREACH rows {
>>>--I wrote a specialized parser that does exactly what I need
>>>observation_map = com.civicscience.pig.ParseObservation($2);
>>>GENERATE $0 as alias, observation_map#'_fqt' as target,
>> observation_map#'_day' as day;
>>> };
>>> grouping = GROUP alias_target_day BY ((chararray)target,(chararray)day);
>>> X = FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1,
>> COUNT($1)) as day_count;
>>> 
>>> This gets me:
>>> (targetA, (day1, count))
>>> (targetA, (day2, count))
>>> (targetB, (day1, count))
>>> 
>>> 
>>> But, cassandra wants the 2nd item to be a bag.  So, I tried:
>>> X = FOREACH grouping GENERATE group.$0 as target, TOBAG(TOTUPLE(group.$1,
>> COUNT($1))) as day_count;
>>> 
>>> But this results in:
>>> (targetA, {((day1, count))})
>>> (targetA, {((day2, count))})
>>> (targetB, {((day1, count))})
>>> It's hard to see, but the 2nd item now has a nested tuple as the first
>> value, which is still bad.
>>> 
>>> How to I get (key, {tuple})???  I wasn't sure where to post this (pig or
>> cassandra), so I'm posting to the pig list too.
>>> 
>>> will
>> 
>> 
> 
> 
> -- 
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) ober...@civicscience.com



Re: Docs: Token Selection

2011-06-15 Thread AJ

On 6/15/2011 12:14 PM, Vijay wrote:

Correction

"The problem in the above approach is you have 2 nodes between 12 to 4 
in DC1 but from 4 to 12  you just have 1"


should be

"The problem in the above approach is you have 1 node between 0-4 
(25%) and and one node covering the rest which is 4-16, 0-0 (75%)"


Regards,




Ok, I think you are saying that the computed token range intervals are 
incorrect and that they would be:


DC1
*node 1 = 0  Range: (4, 16], (0, 0]
node 2 = 4  Range: (0, 4]

DC2
*node 3 = 8  Range: (12, 16], (0, 8]
node 4 = 12   Range: (8, 12]

If so, then yes, this is what I am seeking to confirm since I haven't 
found any documentation stating this directly and that reference that I 
gave only implies this; that is, that the token ranges are calculated 
per data center rather than per cluster.  I just need someone to confirm 
that 100% because it doesn't sound right to me based on everything else 
I've read.


SO, the question is:  Does Cass calculate the consecutive node token 
ranges A.) per cluster, or B.) for the whole data center?


From all I understand, the answer is B.  But, that documentation 
(reprinted below) implies A... or something that doesn't make sense to 
me because of the token placement in the example:


"With NetworkTopologyStrategy, you should calculate the tokens the nodes 
in each DC independantly...


DC1
node 1 = 0
node 2 = 85070591730234615865843651857942052864

DC2
node 3 = 1
node 4 = 85070591730234615865843651857942052865"


However, I do see why this would be helpful, but first I'm just asking if this 
token assignment is absolutely mandatory
or if it's just a technique to achieve some end.





Re: Multi data center configuration - A question on read correction

2011-06-15 Thread Selva Kumar
Thanks Jonathan. Can we turn off RR by READ_REPAIR_CHANCE.= 0. Please advice.

Selva





From: Jonathan Ellis 
To: user@cassandra.apache.org
Sent: Tue, June 14, 2011 8:59:41 PM
Subject: Re: Multi data center configuration - A question on read correction

That's just read repair sending MD5s of the data for comparison.  So
net traffic is light.

You can turn off RR but the downsides can be large.  Turning it down
to say 10% can be reasonable tho.

But again, if network traffic is your concern you should be fine.

On Tue, Jun 14, 2011 at 8:44 PM, Selva Kumar  wrote:
> I have setup a multiple data center configuration in Cassandra. My primary
> intention is to minimize the network traffic between DC1 and DC2. Want DC1
> read requests be served with out reaching DC2 nodes. After going through
> documentation, i felt following setup would do.
>
>
> Replica Placement Strategy: NetworkTopologyStrategy
> Replication Factor: 3
> strategy_options:
> DC1 : 2
> DC2 : 1
> endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch
> Read Consistency Level: LOCAL_QUORUM
> Write Consistency Level: LOCAL_QUORUM
>
> File: cassandra-topology.properties
> # Cassandra Node IP=Data Center:Rack
> 10.10.10.149=DC1:RAC1
> 10.10.10.150=DC1:RAC1
> 10.10.10.151=DC1:RAC1
>
> 10.20.10.153=DC2:RAC1
> 10.20.10.154=DC2:RAC1
> # default for unknown nodes
> default=DC1:RAC1
>
> Question I have:
> 1. Created a java program to test. It was querying with consistency level
> LOCAL_QUORUM on a DC1 node. Read count(Through cfstats) on the DC2 node
> showed read happened there too. Is it because of read correction?. Is there
> way to avoid doing read correction in DC2 nodes, when we query DC1 nodes.
>
> Thanks
> Selva



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Docs: Token Selection

2011-06-15 Thread Vijay
All you heard is right...
You are not overriding Cassandra's token assignment by saying here is your
token...

Logic is:
Calculate a token for the given key...
find the node in each region independently (If you use NTS and if you set
the strategy options which says you want to replicate to the other
region)...
Search for the ranges in each region independntly
Replicate the data to that node.

For multi DC cassandra needs nodes to be equally partitioned within each
dc (If you care that the load equally distributed) as well as
there shouldn't be any collusion of tokens within a cluster

The documentation tried to explain the same and the example in the
documentation.
Hope this clarifies...

More examples if it helps

DC1 Node 1 : token 0
DC1 Node 2 : token 8..

DC2 Node 1 : token 4..
DC2 Node 1 : token 12..

or

DC1 Node 1 : token 0
DC1 Node 2 : token 1..

DC2 Node 1 : token 8..
DC2 Node 1 : token  7..

Regards,




On Wed, Jun 15, 2011 at 12:28 PM, AJ  wrote:

>  On 6/15/2011 12:14 PM, Vijay wrote:
>
> Correction
>
>  "The problem in the above approach is you have 2 nodes between 12 to 4 in
> DC1 but from 4 to 12  you just have 1"
>
>  should be
>
>  "The problem in the above approach is you have 1 node between 0-4 (25%)
> and and one node covering the rest which is 4-16, 0-0 (75%)"
>
> Regards,
> 
>
>
> Ok, I think you are saying that the computed token range intervals are
> incorrect and that they would be:
>
> DC1
> *node 1 = 0  Range: (4, 16], (0, 0]
>
> node 2 = 4  Range: (0, 4]
>
> DC2
> *node 3 = 8  Range: (12, 16], (0, 8]
>
> node 4 = 12   Range: (8, 12]
>
> If so, then yes, this is what I am seeking to confirm since I haven't found
> any documentation stating this directly and that reference that I gave only
> implies this; that is, that the token ranges are calculated per data center
> rather than per cluster.  I just need someone to confirm that 100% because
> it doesn't sound right to me based on everything else I've read.
>
> SO, the question is:  Does Cass calculate the consecutive node token ranges
> A.) per cluster, or B.) for the whole data center?
>
> From all I understand, the answer is B.  But, that documentation (reprinted
> below) implies A... or something that doesn't make sense to me because of
> the token placement in the example:
>
> "With NetworkTopologyStrategy, you should calculate the tokens the nodes in
> each DC independantly...
>
> DC1
> node 1 = 0
> node 2 = 85070591730234615865843651857942052864
>
> DC2
> node 3 = 1
> node 4 = 850705917302346158658436518579
> 42052865"
>
>
> However, I do see why this would be helpful, but first I'm just asking if 
> this token assignment is absolutely mandatory
> or if it's just a technique to achieve some end.
>
>
>
>


Re: Docs: Token Selection

2011-06-15 Thread AJ
Vijay, thank you for your thoughtful reply.  Will Cass complain if I 
don't setup my tokens like in the examples?


On 6/15/2011 2:41 PM, Vijay wrote:

All you heard is right...
You are not overriding Cassandra's token assignment by saying here is 
your token...


Logic is:
Calculate a token for the given key...
find the node in each region independently (If you use NTS and if you 
set the strategy options which says you want to replicate to the other 
region)...

Search for the ranges in each region independntly
Replicate the data to that node.

For multi DC cassandra needs nodes to be equally partitioned 
within each dc (If you care that the load equally distributed) as 
well as there shouldn't be any collusion of tokens within a cluster


The documentation tried to explain the same and the example in the 
documentation.

Hope this clarifies...

More examples if it helps

DC1 Node 1 : token 0
DC1 Node 2 : token 8..

DC2 Node 1 : token 4..
DC2 Node 1 : token 12..

or

DC1 Node 1 : token 0
DC1 Node 2 : token 1..

DC2 Node 1 : token 8..
DC2 Node 1 : token  7..

Regards,




On Wed, Jun 15, 2011 at 12:28 PM, AJ > wrote:


On 6/15/2011 12:14 PM, Vijay wrote:

Correction

"The problem in the above approach is you have 2 nodes between 12
to 4 in DC1 but from 4 to 12  you just have 1"

should be

"The problem in the above approach is you have 1 node between 0-4
(25%) and and one node covering the rest which is 4-16, 0-0 (75%)"

Regards,




Ok, I think you are saying that the computed token range intervals
are incorrect and that they would be:

DC1
*node 1 = 0  Range: (4, 16], (0, 0]

node 2 = 4  Range: (0, 4]

DC2
*node 3 = 8  Range: (12, 16], (0, 8]

node 4 = 12   Range: (8, 12]

If so, then yes, this is what I am seeking to confirm since I
haven't found any documentation stating this directly and that
reference that I gave only implies this; that is, that the token
ranges are calculated per data center rather than per cluster.  I
just need someone to confirm that 100% because it doesn't sound
right to me based on everything else I've read.

SO, the question is:  Does Cass calculate the consecutive node
token ranges A.) per cluster, or B.) for the whole data center?

From all I understand, the answer is B.  But, that documentation
(reprinted below) implies A... or something that doesn't make
sense to me because of the token placement in the example:

"With NetworkTopologyStrategy, you should calculate the tokens the
nodes in each DC independantly...

DC1 node 1 = 0 node 2 = 85070591730234615865843651857942052864 DC2
node 3 = 1 node 4 = 850705917302346158658436518579
42052865"


However, I do see why this would be helpful, but first I'm just asking if 
this token assignment is absolutely mandatory
or if it's just a technique to achieve some end.








Slowdowns during repair

2011-06-15 Thread Aurynn Shaw

Hey all;

So, we have Cassandra running on a 5-server ring, with a RF of 3, and 
we're regularly seeing major slowdowns in read & write performance while 
running nodetool repair, as well as the occasional Cassandra crash 
during the repair window - slowdowns past 10 seconds to perform a single 
write.


The repair cycle runs nightly on a different server, so each server has 
it run once a week.


We're running 0.7.0 currently, and we'll be upgrading to 0.7.6 shortly.

System load on the Cassandra servers is never more than 10% CPU and 
utterly minimal IO usage, so I wouldn't think we'd be seeing issues 
quite like this.


What sort of knobs should I be looking at tuning to reduce the impact 
that nodetool repair has on Cassandra? What questions should I be asking 
as to why Cassandra slows down to the level that it does, and what I 
should be optimizing?


Additionally, what should I be looking for in the logs when this is 
happening? There's a lot in the logs, but I'm not sure what to look for.


Cassadra is, in this instance, backing a system that supports around a 
million requests a day, so not terribly heavy traffic.


Thanks,

Aurynn


Easy way to overload a single node on purpose?

2011-06-15 Thread Suan Aik Yeo
Here's a weird one... what's the best way to get a Cassandra node into a
"half-crashed" state?

We have a 3-node cluster running 0.7.5. A few days ago this happened
organically to node1 - the partition the commitlog was on was 100% full and
there was a "No space left on device" error, and after a while, although the
cluster and node1 was still up, to the other nodes it was down, and messages
like:
DEBUG 14:36:55,546 ... timed out
started to show up in its debug logs.

We have a tool to indicate to the load balancer that a Cassandra node is
down, but it didn't detect it that time. Now I'm having trouble
purposefully getting the node back to that state, so that I can try other
monitoring methods. I've tried to fill up the commitlog partition with other
files, and although I get the "No space left on device" error, the node
still doesn't go down and show the other symptoms it showed before.

Also, if anyone could recommend a good way for a node itself to detect that
its in such a state I'd be interested in that too. Currently what we're
doing is making a "describe_cluster_name()" thrift call, but that still
worked when the node was "down". I'm thinking of something like
reading/writing to a fixed value in a keyspace as a check... Unfortunately
Java-based solutions are out of the question.


Thanks,
Suan


Re: Is there a way from a running Cassandra node to determine whether or not itself is "up"?

2011-06-15 Thread Suan Aik Yeo
Thanks, Aaron, but we determined that adding Java into the equation just
brings in too much complexity for something that's called out of an Nginx
Perl module. Right now I'm having trouble even replicating the above
scenario and posted a question here:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Easy-way-to-overload-a-single-node-on-purpose-tt6480958.html


- Suan

On Thu, Jun 9, 2011 at 3:58 AM, aaron morton wrote:

> None via thrift that I can recall, but the StorageService MBean exposes
> getLiveNodes() this is what nodetool uses to see which nodes are live.
>
> From the code...
>/**
> * Retrieve the list of live nodes in the cluster, where "liveness" is
> * determined by the failure detector of the node being queried.
> *
> * @return set of IP addresses, as Strings
> */
>public List getLiveNodes();
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 9 Jun 2011, at 17:56, Suan Aik Yeo wrote:
>
> > Is there a way (preferably an exposed method accessible through Thrift),
> from a running Cassandra node to determine whether or not itself is "up"?
> (Per Cassandra standards, I'm assuming based on the gossip protocol).
> Another way to think of what I'm looking for is basically running "nodetool
> ring" just on myself, but I'm only interested in knowing whether I'm "Up" or
> "Down"?
> >
> > I'm currently using the "describe_cluster" method, but earlier today when
> the commitlogs for a node filled up and it appeared down to the other nodes,
> describe_cluster() still worked fine, thus failing the check.
> >
> > Thanks,
> > Suan
>
>


Re: Docs: Token Selection

2011-06-15 Thread Vijay
No it wont it will assume you are doing the right thing...

Regards,




On Wed, Jun 15, 2011 at 2:34 PM, AJ  wrote:

>  Vijay, thank you for your thoughtful reply.  Will Cass complain if I don't
> setup my tokens like in the examples?
>
>
> On 6/15/2011 2:41 PM, Vijay wrote:
>
> All you heard is right...
> You are not overriding Cassandra's token assignment by saying here is your
> token...
>
>  Logic is:
> Calculate a token for the given key...
> find the node in each region independently (If you use NTS and if you set
> the strategy options which says you want to replicate to the other
> region)...
> Search for the ranges in each region independntly
> Replicate the data to that node.
>
> For multi DC cassandra needs nodes to be equally partitioned within each
> dc (If you care that the load equally distributed) as well as
> there shouldn't be any collusion of tokens within a cluster
>
>  The documentation tried to explain the same and the example in the
> documentation.
> Hope this clarifies...
>
>  More examples if it helps
>
>   DC1 Node 1 : token 0
> DC1 Node 2 : token 8..
>
>  DC2 Node 1 : token 4..
> DC2 Node 1 : token 12..
>
>  or
>
>  DC1 Node 1 : token 0
> DC1 Node 2 : token 1..
>
>  DC2 Node 1 : token 8..
> DC2 Node 1 : token  7..
>
>  Regards,
> 
>
>
>
> On Wed, Jun 15, 2011 at 12:28 PM, AJ  wrote:
>
>>  On 6/15/2011 12:14 PM, Vijay wrote:
>>
>> Correction
>>
>>  "The problem in the above approach is you have 2 nodes between 12 to 4
>> in DC1 but from 4 to 12  you just have 1"
>>
>>  should be
>>
>>  "The problem in the above approach is you have 1 node between 0-4 (25%)
>> and and one node covering the rest which is 4-16, 0-0 (75%)"
>>
>> Regards,
>> 
>>
>>
>>  Ok, I think you are saying that the computed token range intervals are
>> incorrect and that they would be:
>>
>> DC1
>> *node 1 = 0  Range: (4, 16], (0, 0]
>>
>> node 2 = 4  Range: (0, 4]
>>
>> DC2
>>  *node 3 = 8  Range: (12, 16], (0, 8]
>>
>> node 4 = 12   Range: (8, 12]
>>
>>  If so, then yes, this is what I am seeking to confirm since I haven't
>> found any documentation stating this directly and that reference that I gave
>> only implies this; that is, that the token ranges are calculated per data
>> center rather than per cluster.  I just need someone to confirm that 100%
>> because it doesn't sound right to me based on everything else I've read.
>>
>> SO, the question is:  Does Cass calculate the consecutive node token
>> ranges A.) per cluster, or B.) for the whole data center?
>>
>> From all I understand, the answer is B.  But, that documentation
>> (reprinted below) implies A... or something that doesn't make sense to me
>> because of the token placement in the example:
>>
>> "With NetworkTopologyStrategy, you should calculate the tokens the nodes
>> in each DC independantly...
>>
>> DC1
>> node 1 = 0
>> node 2 = 85070591730234615865843651857942052864
>>
>> DC2
>> node 3 = 1
>> node 4 = 850705917302346158658436518579
>> 42052865"
>>
>>
>> However, I do see why this would be helpful, but first I'm just asking if 
>> this token assignment is absolutely mandatory
>> or if it's just a technique to achieve some end.
>>
>>
>>
>>
>
>


Re: What triggers hint delivery?

2011-06-15 Thread Terje Marthinussen
I suspect a few possibilities:
1. I have not checked, but what happens (in terms of hint delivery) if a
node tries to write something but the write times out even if the node is
marked as up?
2. I would assume there can be ever so slight variations in how different
nodes in the cluster think the rest of the cluster is up. These events will
of course typically  be short lived (unless some sort of long term split
brain situation occurs), but if you are writing data while for instance a
node is restarting, I would not be surprised if there are race conditions
where A see B as down, sends a hint to C but C already think B is up
3. I have observed situations where it seems like a node comes in up state
but for some reason takes a while to get really operational. Hint delivery
fails, the hint sender gives up and nothing more happens.

May be an idea to let a node check if it has hints on heartbeats maybe
(potentially not all of them, but at a regular interval)?

Terje

On Thu, Jun 16, 2011 at 2:08 AM, Jonathan Ellis  wrote:

> On Wed, Jun 15, 2011 at 10:53 AM, Terje Marthinussen
>  wrote:
> > I was looking quickly at source code tonight.
> > As far as I could see from a quick code scan, hint delivery is only
> > triggered as a state change from a node is down to when it enters up
> state?
>
> Right.
>
> > If this is indeed the case, it would potentially explain why we sometimes
> > have hints on machines which does not seem to get played back
>
> Why is that?  Hints don't get created in the first place unless a node
> is in the down state.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: What triggers hint delivery?

2011-06-15 Thread Jonathan Ellis
You're right, those could all cause what you are seeing.

We used to have a "re-check hourly" scheduled task, but took it out
because it was very very performance intensive -- at the time, hints
were not stored by machine so asking "does machine X have any hints"
required scanning all hints.  Should be fine to add that back now.

On Wed, Jun 15, 2011 at 7:48 PM, Terje Marthinussen
 wrote:
> I suspect a few possibilities:
> 1. I have not checked, but what happens (in terms of hint delivery) if a
> node tries to write something but the write times out even if the node is
> marked as up?
> 2. I would assume there can be ever so slight variations in how different
> nodes in the cluster think the rest of the cluster is up. These events will
> of course typically  be short lived (unless some sort of long term split
> brain situation occurs), but if you are writing data while for instance a
> node is restarting, I would not be surprised if there are race conditions
> where A see B as down, sends a hint to C but C already think B is up
> 3. I have observed situations where it seems like a node comes in up state
> but for some reason takes a while to get really operational. Hint delivery
> fails, the hint sender gives up and nothing more happens.
>
> May be an idea to let a node check if it has hints on heartbeats maybe
> (potentially not all of them, but at a regular interval)?
>
> Terje
>
> On Thu, Jun 16, 2011 at 2:08 AM, Jonathan Ellis  wrote:
>>
>> On Wed, Jun 15, 2011 at 10:53 AM, Terje Marthinussen
>>  wrote:
>> > I was looking quickly at source code tonight.
>> > As far as I could see from a quick code scan, hint delivery is only
>> > triggered as a state change from a node is down to when it enters up
>> > state?
>>
>> Right.
>>
>> > If this is indeed the case, it would potentially explain why we
>> > sometimes
>> > have hints on machines which does not seem to get played back
>>
>> Why is that?  Hints don't get created in the first place unless a node
>> is in the down state.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


downgrading from cassandra 0.8 to 0.7.3

2011-06-15 Thread Anurag Gujral
Hi All,
  I moved to cassandra 0.8.0 from cassandra-0.7.3 when I  try to
move back I get the following error:
java.lang.RuntimeException: Can't open sstables from the future! Current
version f, found file: /data/cassandra/data/system/Schema-g-9.

Please suggest.

Thanks
Anurag


Re: downgrading from cassandra 0.8 to 0.7.3

2011-06-15 Thread Terje Marthinussen
Can't help you with that.
You may have to go the json2sstable route and re-import into 0.7.3

But... why would you want to go back to 0.7.3?

Terje

On Thu, Jun 16, 2011 at 10:30 AM, Anurag Gujral wrote:

> Hi All,
>   I moved to cassandra 0.8.0 from cassandra-0.7.3 when I  try to
> move back I get the following error:
> java.lang.RuntimeException: Can't open sstables from the future! Current
> version f, found file: /data/cassandra/data/system/Schema-g-9.
>
> Please suggest.
>
> Thanks
> Anurag
>
>
>
>
>


Re: Forcing Cassandra to free up some space

2011-06-15 Thread Terje Marthinussen
Watching this on a node here right now and it sort of shows how bad this can
get.
This node still has 109GB free disk by the way...

INFO [CompactionExecutor:5] 2011-06-16 09:11:59,164 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:5] 2011-06-16 09:12:23,929 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:5] 2011-06-16 09:12:46,489 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:17:53,299 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:18:17,782 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:18:42,078 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:19:06,984 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:19:32,079 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:19:57,265 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:20:22,706 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:20:47,331 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:21:13,062 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:21:38,288 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:22:03,500 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:22:29,407 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:22:55,577 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:23:20,951 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:23:46,448 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:3] 2011-06-16 09:24:12,030 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [ScheduledTasks:1] 2011-06-16 09:29:29,494 GCInspector.java (line 128)
GC for ParNew: 392 ms, 398997776 reclaimed leaving 2334786808 used; max is
10844635136
 INFO [ScheduledTasks:1] 2011-06-16 09:29:32,831 GCInspector.java (line 128)
GC for ParNew: 737 ms, 332336832 reclaimed leaving 2473311448 used; max is
10844635136
 INFO [CompactionExecutor:6] 2011-06-16 09:48:00,633 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:6] 2011-06-16 09:48:26,119 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:6] 2011-06-16 09:48:49,002 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:6] 2011-06-16 10:10:20,196 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:6] 2011-06-16 10:10:45,322 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:6] 2011-06-16 10:11:07,619 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:7] 2011-06-16 11:01:45,562 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:7] 2011-06-16 11:02:10,236 StorageService.java
(line 2071) requesting GC to free disk space
 INFO [CompactionExecutor:7] 2011-06-16 11:05:31,297 StorageService.java
(line 2071) requesting GC to free disk space

If I look at the data dir, I see 46 *Compacted files which makes up an
additional 137GB of space.
The oldest of these Compacted files dates back to Jun 16th 01:26.

If these got deleted, there should actually be enough disk for the node to
run a full compaction run if needed.

Either the GC cleanup tactic is seriously flawed or  we have a potential bug
keeping references far longer than needed?

Terje



On Wed, Jun 15, 2011 at 11:50 PM, Shotaro Kamio  wrote:

> We've encountered the situation that compacted sstable files aren't
> deleted after node repair. Even when gc is triggered via jmx, it
> sometimes leaves compacted files. In a case, a lot of files are left.
> Some files stay more than 10 hours already. There is no guarantee that
> gc will cleanup all compacted sstable files.
>
> We have a great interest on the following ticket.
> https://issues.apache.org/jira/browse/CASSANDRA-2521
>
>
> Regards,
> Shotaro
>
>
> On Fri, May 27, 2011 at 11:27 AM, Jeffrey Kesselman 
> wrote:
> > Im also not sure that will guarantee all space is cleaned up.  It
> > really depends on what you are doing inside Cassandra.  If you have
> > your on garbage collect that is just in some way tied to the gc run,
> > then it will run wh

Re: Docs: Token Selection

2011-06-15 Thread AJ
Ok.  I understand the reasoning you laid out.  But, I think it should be 
documented more thoroughly.  I was trying to get an idea as to how 
flexible Cass lets you be with the various combinations of strategies, 
snitches, token ranges, etc..


It would be instructional to see what a graphical representation of a 
cluster ring with multiple data centers looks like.  Google turned-up 
nothing.  I imagine it's a multilayer ring; one layer per data center 
with the nodes of one layer slightly offset from the ones in the other 
(based on the example in the wiki).  I would also like to know which 
node is next in the ring such so as to understand replica placement in, 
for example, the OldNetworkTopologyStrategy when it's doc states,


"...It places one replica in a different data center from the first (if 
there is any such data center), the third replica in a different rack in 
the first datacenter, and any remaining replicas on the first unused 
nodes on the ring."


I can only assume for now that "the ring" referred to is the "local" 
ring of the first data center.



On 6/15/2011 5:51 PM, Vijay wrote:

No it wont it will assume you are doing the right thing...

Regards,




On Wed, Jun 15, 2011 at 2:34 PM, AJ > wrote:


Vijay, thank you for your thoughtful reply.  Will Cass complain if
I don't setup my tokens like in the examples?


On 6/15/2011 2:41 PM, Vijay wrote:

All you heard is right...
You are not overriding Cassandra's token assignment by saying
here is your token...

Logic is:
Calculate a token for the given key...
find the node in each region independently (If you use NTS and if
you set the strategy options which says you want to replicate to
the other region)...
Search for the ranges in each region independntly
Replicate the data to that node.

For multi DC cassandra needs nodes to be equally partitioned
within each dc (If you care that the load equally
distributed) as well as there shouldn't be any collusion of
tokens within a cluster

The documentation tried to explain the same and the example in
the documentation.
Hope this clarifies...

More examples if it helps

DC1 Node 1 : token 0
DC1 Node 2 : token 8..

DC2 Node 1 : token 4..
DC2 Node 1 : token 12..

or

DC1 Node 1 : token 0
DC1 Node 2 : token 1..

DC2 Node 1 : token 8..
DC2 Node 1 : token  7..

Regards,




On Wed, Jun 15, 2011 at 12:28 PM, AJ mailto:a...@dude.podzone.net>> wrote:

On 6/15/2011 12:14 PM, Vijay wrote:

Correction

"The problem in the above approach is you have 2 nodes
between 12 to 4 in DC1 but from 4 to 12  you just have 1"

should be

"The problem in the above approach is you have 1 node
between 0-4 (25%) and and one node covering the rest which
is 4-16, 0-0 (75%)"

Regards,




Ok, I think you are saying that the computed token range
intervals are incorrect and that they would be:

DC1
*node 1 = 0  Range: (4, 16], (0, 0]

node 2 = 4  Range: (0, 4]

DC2
*node 3 = 8  Range: (12, 16], (0, 8]

node 4 = 12   Range: (8, 12]

If so, then yes, this is what I am seeking to confirm since I
haven't found any documentation stating this directly and
that reference that I gave only implies this; that is, that
the token ranges are calculated per data center rather than
per cluster.  I just need someone to confirm that 100%
because it doesn't sound right to me based on everything else
I've read.

SO, the question is:  Does Cass calculate the consecutive
node token ranges A.) per cluster, or B.) for the whole data
center?

From all I understand, the answer is B.  But, that
documentation (reprinted below) implies A... or something
that doesn't make sense to me because of the token placement
in the example:

"With NetworkTopologyStrategy, you should calculate the
tokens the nodes in each DC independantly...

DC1 node 1 = 0 node 2 =
85070591730234615865843651857942052864 DC2 node 3 = 1 node 4
= 850705917302346158658436518579
42052865"


However, I do see why this would be helpful, but first I'm just asking 
if this token assignment is absolutely mandatory
or if it's just a technique to achieve some end.











Re: Forcing Cassandra to free up some space

2011-06-15 Thread Jeffrey Kesselman
The GC cleanup approach, if depending on specific objects being GCd,
is fundamentally flawed.

I brought this up earlier, won't restart that thread.  It should be in
the archives.


On Wed, Jun 15, 2011 at 10:17 PM, Terje Marthinussen
 wrote:
> Watching this on a node here right now and it sort of shows how bad this can
> get.
> This node still has 109GB free disk by the way...
> INFO [CompactionExecutor:5] 2011-06-16 09:11:59,164 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:5] 2011-06-16 09:12:23,929 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:5] 2011-06-16 09:12:46,489 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:17:53,299 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:18:17,782 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:18:42,078 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:19:06,984 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:19:32,079 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:19:57,265 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:20:22,706 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:20:47,331 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:21:13,062 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:21:38,288 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:22:03,500 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:22:29,407 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:22:55,577 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:23:20,951 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:23:46,448 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:3] 2011-06-16 09:24:12,030 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [ScheduledTasks:1] 2011-06-16 09:29:29,494 GCInspector.java (line 128)
> GC for ParNew: 392 ms, 398997776 reclaimed leaving 2334786808 used; max is
> 10844635136
>  INFO [ScheduledTasks:1] 2011-06-16 09:29:32,831 GCInspector.java (line 128)
> GC for ParNew: 737 ms, 332336832 reclaimed leaving 2473311448 used; max is
> 10844635136
>  INFO [CompactionExecutor:6] 2011-06-16 09:48:00,633 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:6] 2011-06-16 09:48:26,119 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:6] 2011-06-16 09:48:49,002 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:6] 2011-06-16 10:10:20,196 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:6] 2011-06-16 10:10:45,322 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:6] 2011-06-16 10:11:07,619 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:7] 2011-06-16 11:01:45,562 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:7] 2011-06-16 11:02:10,236 StorageService.java
> (line 2071) requesting GC to free disk space
>  INFO [CompactionExecutor:7] 2011-06-16 11:05:31,297 StorageService.java
> (line 2071) requesting GC to free disk space
> If I look at the data dir, I see 46 *Compacted files which makes up an
> additional 137GB of space.
> The oldest of these Compacted files dates back to Jun 16th 01:26.
> If these got deleted, there should actually be enough disk for the node to
> run a full compaction run if needed.
> Either the GC cleanup tactic is seriously flawed or  we have a potential bug
> keeping references far longer than needed?
> Terje
>
>
> On Wed, Jun 15, 2011 at 11:50 PM, Shotaro Kamio  wrote:
>>
>> We've encountered the situation that compacted sstable files aren't
>> deleted after node repair. Even when gc is triggered via jmx, it
>> sometimes leaves compacted files. In a case, a lot of files are left.
>> Some files stay more than 10 hours already. There is no guarantee that
>> gc will cleanup all compacted sstable files.
>>
>> We have a great interest on the followin

Re: Is there a way from a running Cassandra node to determine whether or not itself is "up"?

2011-06-15 Thread Jake Luciani
No force a node "down" you can use nodetool disablegossip

On Wed, Jun 15, 2011 at 6:42 PM, Suan Aik Yeo  wrote:

> Thanks, Aaron, but we determined that adding Java into the equation just
> brings in too much complexity for something that's called out of an Nginx
> Perl module. Right now I'm having trouble even replicating the above
> scenario and posted a question here:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Easy-way-to-overload-a-single-node-on-purpose-tt6480958.html
>
>
> - Suan
>
>
> On Thu, Jun 9, 2011 at 3:58 AM, aaron morton wrote:
>
>> None via thrift that I can recall, but the StorageService MBean exposes
>> getLiveNodes() this is what nodetool uses to see which nodes are live.
>>
>> From the code...
>>/**
>> * Retrieve the list of live nodes in the cluster, where "liveness" is
>> * determined by the failure detector of the node being queried.
>> *
>> * @return set of IP addresses, as Strings
>> */
>>public List getLiveNodes();
>>
>> Hope that helps.
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 9 Jun 2011, at 17:56, Suan Aik Yeo wrote:
>>
>> > Is there a way (preferably an exposed method accessible through Thrift),
>> from a running Cassandra node to determine whether or not itself is "up"?
>> (Per Cassandra standards, I'm assuming based on the gossip protocol).
>> Another way to think of what I'm looking for is basically running "nodetool
>> ring" just on myself, but I'm only interested in knowing whether I'm "Up" or
>> "Down"?
>> >
>> > I'm currently using the "describe_cluster" method, but earlier today
>> when the commitlogs for a node filled up and it appeared down to the other
>> nodes, describe_cluster() still worked fine, thus failing the check.
>> >
>> > Thanks,
>> > Suan
>>
>>
>


-- 
http://twitter.com/tjake


Re: Forcing Cassandra to free up some space

2011-06-15 Thread Ryan King
There's a ticket open for this:
https://issues.apache.org/jira/browse/CASSANDRA-2521. Vote on it if
you think its important.

-ryan

On Wed, Jun 15, 2011 at 7:34 PM, Jeffrey Kesselman  wrote:
> The GC cleanup approach, if depending on specific objects being GCd,
> is fundamentally flawed.
>
> I brought this up earlier, won't restart that thread.  It should be in
> the archives.
>
>
> On Wed, Jun 15, 2011 at 10:17 PM, Terje Marthinussen
>  wrote:
>> Watching this on a node here right now and it sort of shows how bad this can
>> get.
>> This node still has 109GB free disk by the way...
>> INFO [CompactionExecutor:5] 2011-06-16 09:11:59,164 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:5] 2011-06-16 09:12:23,929 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:5] 2011-06-16 09:12:46,489 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:17:53,299 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:18:17,782 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:18:42,078 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:19:06,984 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:19:32,079 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:19:57,265 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:20:22,706 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:20:47,331 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:21:13,062 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:21:38,288 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:22:03,500 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:22:29,407 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:22:55,577 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:23:20,951 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:23:46,448 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:24:12,030 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [ScheduledTasks:1] 2011-06-16 09:29:29,494 GCInspector.java (line 128)
>> GC for ParNew: 392 ms, 398997776 reclaimed leaving 2334786808 used; max is
>> 10844635136
>>  INFO [ScheduledTasks:1] 2011-06-16 09:29:32,831 GCInspector.java (line 128)
>> GC for ParNew: 737 ms, 332336832 reclaimed leaving 2473311448 used; max is
>> 10844635136
>>  INFO [CompactionExecutor:6] 2011-06-16 09:48:00,633 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:6] 2011-06-16 09:48:26,119 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:6] 2011-06-16 09:48:49,002 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:6] 2011-06-16 10:10:20,196 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:6] 2011-06-16 10:10:45,322 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:6] 2011-06-16 10:11:07,619 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:7] 2011-06-16 11:01:45,562 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:7] 2011-06-16 11:02:10,236 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:7] 2011-06-16 11:05:31,297 StorageService.java
>> (line 2071) requesting GC to free disk space
>> If I look at the data dir, I see 46 *Compacted files which makes up an
>> additional 137GB of space.
>> The oldest of these Compacted files dates back to Jun 16th 01:26.
>> If these got deleted, there should actually be enough disk for the node to
>> run a full compaction run if needed.
>> Either the GC cleanup tactic is seriously flawed or  we have a potential bug
>> keeping references far longer than needed?
>> Terje
>>
>>
>> On Wed, Jun 15, 2011 at 11:50 PM, Shotaro Kamio  wrote:
>>>
>>> We've encountered the situation that compacted sstable files aren't
>>> deleted after node r

Re: Docs: Token Selection

2011-06-15 Thread Vijay
+1 for more documentation (I guess contributions are always welcomed) I
will try to write it down sometime when we have a bit more time...

0.8 nodetool ring command adds the DC and RAC information

http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
http://www.datastax.com/products/opscenter

Hope this helps...

Regards,




On Wed, Jun 15, 2011 at 7:24 PM, AJ  wrote:

>  Ok.  I understand the reasoning you laid out.  But, I think it should be
> documented more thoroughly.  I was trying to get an idea as to how flexible
> Cass lets you be with the various combinations of strategies, snitches,
> token ranges, etc..
>
> It would be instructional to see what a graphical representation of a
> cluster ring with multiple data centers looks like.  Google turned-up
> nothing.  I imagine it's a multilayer ring; one layer per data center with
> the nodes of one layer slightly offset from the ones in the other (based on
> the example in the wiki).  I would also like to know which node is next in
> the ring such so as to understand replica placement in, for example, the
> OldNetworkTopologyStrategy when it's doc states,
>
> "...It places one replica in a different data center from the first (if
> there is any such data center), the third replica in a different rack in the
> first datacenter, and any remaining replicas on the first unused nodes on
> the ring."
>
> I can only assume for now that "the ring" referred to is the "local" ring
> of the first data center.
>
>
>
> On 6/15/2011 5:51 PM, Vijay wrote:
>
> No it wont it will assume you are doing the right thing...
>
> Regards,
> 
>
>
>
> On Wed, Jun 15, 2011 at 2:34 PM, AJ  wrote:
>
>>  Vijay, thank you for your thoughtful reply.  Will Cass complain if I
>> don't setup my tokens like in the examples?
>>
>>
>> On 6/15/2011 2:41 PM, Vijay wrote:
>>
>> All you heard is right...
>> You are not overriding Cassandra's token assignment by saying here is your
>> token...
>>
>>  Logic is:
>> Calculate a token for the given key...
>> find the node in each region independently (If you use NTS and if you set
>> the strategy options which says you want to replicate to the other
>> region)...
>> Search for the ranges in each region independntly
>> Replicate the data to that node.
>>
>> For multi DC cassandra needs nodes to be equally partitioned within each
>> dc (If you care that the load equally distributed) as well as
>> there shouldn't be any collusion of tokens within a cluster
>>
>>  The documentation tried to explain the same and the example in the
>> documentation.
>> Hope this clarifies...
>>
>>  More examples if it helps
>>
>>   DC1 Node 1 : token 0
>> DC1 Node 2 : token 8..
>>
>>  DC2 Node 1 : token 4..
>> DC2 Node 1 : token 12..
>>
>>  or
>>
>>  DC1 Node 1 : token 0
>> DC1 Node 2 : token 1..
>>
>>  DC2 Node 1 : token 8..
>> DC2 Node 1 : token  7..
>>
>>  Regards,
>> 
>>
>>
>>
>> On Wed, Jun 15, 2011 at 12:28 PM, AJ  wrote:
>>
>>>  On 6/15/2011 12:14 PM, Vijay wrote:
>>>
>>> Correction
>>>
>>>  "The problem in the above approach is you have 2 nodes between 12 to 4
>>> in DC1 but from 4 to 12  you just have 1"
>>>
>>>  should be
>>>
>>>  "The problem in the above approach is you have 1 node between 0-4 (25%)
>>> and and one node covering the rest which is 4-16, 0-0 (75%)"
>>>
>>> Regards,
>>> 
>>>
>>>
>>>  Ok, I think you are saying that the computed token range intervals are
>>> incorrect and that they would be:
>>>
>>> DC1
>>> *node 1 = 0  Range: (4, 16], (0, 0]
>>>
>>> node 2 = 4  Range: (0, 4]
>>>
>>> DC2
>>>  *node 3 = 8  Range: (12, 16], (0, 8]
>>>
>>> node 4 = 12   Range: (8, 12]
>>>
>>>  If so, then yes, this is what I am seeking to confirm since I haven't
>>> found any documentation stating this directly and that reference that I gave
>>> only implies this; that is, that the token ranges are calculated per data
>>> center rather than per cluster.  I just need someone to confirm that 100%
>>> because it doesn't sound right to me based on everything else I've read.
>>>
>>> SO, the question is:  Does Cass calculate the consecutive node token
>>> ranges A.) per cluster, or B.) for the whole data center?
>>>
>>> From all I understand, the answer is B.  But, that documentation
>>> (reprinted below) implies A... or something that doesn't make sense to me
>>> because of the token placement in the example:
>>>
>>> "With NetworkTopologyStrategy, you should calculate the tokens the nodes
>>> in each DC independantly...
>>>
>>> DC1
>>> node 1 = 0
>>> node 2 = 85070591730234615865843651857942052864
>>>
>>> DC2
>>> node 3 = 1
>>> node 4 = 850705917302346158658436518579
>>> 42052865"
>>>
>>>
>>> However, I do see why this would be helpful, but first I'm just asking if 
>>> this token assignment is absolutely mandatory
>>> or if it's just a technique to achieve some end.
>>>
>>>
>>>
>>>
>>
>>
>
>


Re: What's the best approach to search in Cassandra

2011-06-15 Thread Mark Kerzner
Jake,

*You need to maintain a huge number of distinct indexes.*
*
*
*Are we talking about secondary indexes? If yes, this sounds like exactly my
problem. There is so little documentation! - but I think that if I read all
there is on GitHub, I can probably start using it.
*

Thank you,
Mark

On Fri, Jun 3, 2011 at 8:07 PM, Jake Luciani  wrote:

> Mark,
>
> Check out Solandra.  http://github.com/tjake/Solandra
>
>
> On Fri, Jun 3, 2011 at 7:56 PM, Mark Kerzner wrote:
>
>> Hi,
>>
>> I need to store, say, 10M-100M documents, with each document having say
>> 100 fields, like author, creation date, access date, etc., and then I want
>> to ask questions like
>>
>> give me all documents whose author is like abc**, and creation date any
>> time in 2010 and access date in 2010-2011, and so on, perhaps 10-20
>> conditions, matching a list of some keywords.
>>
>> What's best, Lucene, Katta, Cassandra CF with secondary indices, or plan
>> scan and compare of every record?
>>
>> Thanks a bunch!
>>
>> Mark
>>
>
>
>
> --
> http://twitter.com/tjake
>


Re: What's the best approach to search in Cassandra

2011-06-15 Thread Sasha Dolgy
Datastax has pretty sufficient documentation on their site for secondary
indexes.
On Jun 16, 2011 6:57 AM, "Mark Kerzner"  wrote:
> Jake,
>
> *You need to maintain a huge number of distinct indexes.*
> *
> *
> *Are we talking about secondary indexes? If yes, this sounds like exactly
my
> problem. There is so little documentation! - but I think that if I read
all
> there is on GitHub, I can probably start using it.
> *
>
> Thank you,
> Mark
>
> On Fri, Jun 3, 2011 at 8:07 PM, Jake Luciani  wrote:
>
>> Mark,
>>
>> Check out Solandra. http://github.com/tjake/Solandra
>>
>>
>> On Fri, Jun 3, 2011 at 7:56 PM, Mark Kerzner wrote:
>>
>>> Hi,
>>>
>>> I need to store, say, 10M-100M documents, with each document having say
>>> 100 fields, like author, creation date, access date, etc., and then I
want
>>> to ask questions like
>>>
>>> give me all documents whose author is like abc**, and creation date any
>>> time in 2010 and access date in 2010-2011, and so on, perhaps 10-20
>>> conditions, matching a list of some keywords.
>>>
>>> What's best, Lucene, Katta, Cassandra CF with secondary indices, or plan
>>> scan and compare of every record?
>>>
>>> Thanks a bunch!
>>>
>>> Mark
>>>
>>
>>
>>
>> --
>> http://twitter.com/tjake
>>


Important Variables for Scaling

2011-06-15 Thread Schuilenga, Jan Taeke
Which variables (for instance: throughput, CPU, I/O, connections) are
leading in deciding to add a node to a Cassandra setup which is put
under strain. We are trying to proove scalibility, but when is the time
there to add a node and have the optimum scalibilty result.