date:20101201

1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction both
can clean out rows 'tagged'  tombstones , this kind of clean out doesn't
mead remove it from the disk permanently.
The real remove is done by the jvm GC ?
2. The intence of compaction is merging multi sstables into one , clean out
the tombstone , let the un-tombstones  rows be into a new ordered sstable ?



On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne  wrote:

> On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang  wrote:
> > And i have another question , what's the difference between minor
> > compaction and major compaction?
>
> A major compaction is a compaction that compact *all* the SSTables of a
> given
> column family (compaction compacts one CF at a time).
>
> Before https://issues.apache.org/jira/browse/CASSANDRA-1074
> (introduced in 0.6.6 and
> recent 0.7 betas/rcs), major compactions where the only ones that removed
> the
> tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
> and this is the
> reason major compaction exists. Now, with #1074, minor compactions
> should remove most
> if not all tombstones, so major compaction are not or much less useful
> (it may depend on your
> workload though as minor can't always delete the tombstones).
>
> --
> Sylvain
>
> >
> > On 12/1/10, Chen Xinli  wrote:
> >> 2010/12/1 Ying Tang 
> >>
> >>> Every time cassandra creates a new sstable , it will call the
> >>> CompactionManager.submitMinorIfNeeded  ? And if the number of memtables
> is
> >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
> called.
> >>> And there is also a method named CompactionManager.submitMajor , and
> the
> >>> call relationship is :
> >>>
> >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
> >>> Table.forceCompaction -->CompactionManager.performMajor -->
> >>> CompactionManager.submitMajor
> >>>
> >>> ColumnFamilyStore.forceMajorCompaction -->
> CompactionManager.performMajor
> >>> --> CompactionManager.submitMajor
> >>>
> >>>
> >>> HintedHandOffManager
> >>>  --> CompactionManager.submitMajor
> >>>
> >>> So i have 3 questions:
> >>> 1. Once a new sstable has been created ,
> >>> CompactionManager.submitMinorIfNeeded  will be called , minorCompaction
> >>> maybe called .
> >>> But when will the majorCompaction be called ? Just the NodeCmd ?
> >>>
> >>
> >> Yes, majorCompaction must be called manually from NodeCmd
> >>
> >>
> >>> 2. Which jobs will minorCompaction and majorCompaction do ?
> >>> Will minorCompaction delete the data that have been marked as
> deleted
> >>> ?
> >>> And how about the major compaction ?
> >>>
> >>
> >> Compaction only mark sstables as deleted. Deletion will be done when
> there
> >> are full gc, or node restarted.
> >>
> >>
> >>> 3. When gc be called ? Every time compaction been called?
> >>>
> >>
> >> GC has nothing to do with compaction, you may mistake the two
> conceptions
> >>
> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>> Ivy Tang
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Best Regards,
> >> Chen Xinli
> >>
> >
> >
> > --
> > Best regards,
> >
> > Ivy Tang
> >
>



-- 
Best regards,

Ivy Tang

Re: Can not connect to cassandra 0.7 using CLI

try
 bin/cassandra-cli --host 

On Wed, Dec 1, 2010 at 7:29 PM, Joshua Partogi wrote:

> Hi there,
>
> I just downloaded cassandra 0.7rc1. I started it using bin/cassandra
> without making any configuration changes.
>
> I then tried to connect using the CLI with command like this:
>
> f...@ubuntu:~/Applications/apache-cassandra-0.7.0-rc1$ bin/cassandra-cli
> Welcome to cassandra CLI.
>
> Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
> [defa...@unknown] connect localhost/9160;
> Exception connecting to localhost/9160. Reason: Connection refused.
>
> Why am I getting connection refused? I didn't experience this with
> cassandra 0.6.8.
>
> Thank you in advance for your help.
>
> Kind regards,
> Joshua.
>
> --
> http://twitter.com/jpartogi 
>



-- 
Best regards,

Ivy Tang

Re: When to call the major compaction ?

I'm confused , plz ingore the mail above.
Here is my confusion ,
   posterior to 0.6.6/0.7  , minor compaction and major compaction both  can
clean out rows 'tagged'  tombstones  , and generate a new , without
tombstones , sstable .
And the tombstones remains in memory ,waiting to be removed by jvm gc .
Am i right?

On Wed, Dec 1, 2010 at 9:10 PM, Ying Tang  wrote:

> 1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction both
> can clean out rows 'tagged'  tombstones , this kind of clean out doesn't
> mead remove it from the disk permanently.
> The real remove is done by the jvm GC ?
> 2. The intence of compaction is merging multi sstables into one , clean out
> the tombstone , let the un-tombstones  rows be into a new ordered sstable ?
>
>
>
> On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne wrote:
>
>> On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang  wrote:
>> > And i have another question , what's the difference between minor
>> > compaction and major compaction?
>>
>> A major compaction is a compaction that compact *all* the SSTables of a
>> given
>> column family (compaction compacts one CF at a time).
>>
>> Before https://issues.apache.org/jira/browse/CASSANDRA-1074
>> (introduced in 0.6.6 and
>> recent 0.7 betas/rcs), major compactions where the only ones that removed
>> the
>> tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
>> and this is the
>> reason major compaction exists. Now, with #1074, minor compactions
>> should remove most
>> if not all tombstones, so major compaction are not or much less useful
>> (it may depend on your
>> workload though as minor can't always delete the tombstones).
>>
>> --
>> Sylvain
>>
>> >
>> > On 12/1/10, Chen Xinli  wrote:
>> >> 2010/12/1 Ying Tang 
>> >>
>> >>> Every time cassandra creates a new sstable , it will call the
>> >>> CompactionManager.submitMinorIfNeeded  ? And if the number of
>> memtables is
>> >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
>> called.
>> >>> And there is also a method named CompactionManager.submitMajor , and
>> the
>> >>> call relationship is :
>> >>>
>> >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
>> >>> Table.forceCompaction -->CompactionManager.performMajor -->
>> >>> CompactionManager.submitMajor
>> >>>
>> >>> ColumnFamilyStore.forceMajorCompaction -->
>> CompactionManager.performMajor
>> >>> --> CompactionManager.submitMajor
>> >>>
>> >>>
>> >>> HintedHandOffManager
>> >>>  --> CompactionManager.submitMajor
>> >>>
>> >>> So i have 3 questions:
>> >>> 1. Once a new sstable has been created ,
>> >>> CompactionManager.submitMinorIfNeeded  will be called ,
>> minorCompaction
>> >>> maybe called .
>> >>> But when will the majorCompaction be called ? Just the NodeCmd ?
>> >>>
>> >>
>> >> Yes, majorCompaction must be called manually from NodeCmd
>> >>
>> >>
>> >>> 2. Which jobs will minorCompaction and majorCompaction do ?
>> >>> Will minorCompaction delete the data that have been marked as
>> deleted
>> >>> ?
>> >>> And how about the major compaction ?
>> >>>
>> >>
>> >> Compaction only mark sstables as deleted. Deletion will be done when
>> there
>> >> are full gc, or node restarted.
>> >>
>> >>
>> >>> 3. When gc be called ? Every time compaction been called?
>> >>>
>> >>
>> >> GC has nothing to do with compaction, you may mistake the two
>> conceptions
>> >>
>> >>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best regards,
>> >>>
>> >>> Ivy Tang
>> >>>
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Best Regards,
>> >> Chen Xinli
>> >>
>> >
>> >
>> > --
>> > Best regards,
>> >
>> > Ivy Tang
>> >
>>
>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>


-- 
Best regards,

Ivy Tang

Re: Can not connect to cassandra 0.7 using CLI

2010-12-01 Thread Brayton Thompson

All of the times I have had similar issues the problem has always been 
misconfigured iptables. You said it was running fine on 0.6.8 though?
On the same box or a different box?

On Dec 1, 2010, at 6:29 AM, Joshua Partogi wrote:

> Hi there,
> 
> I just downloaded cassandra 0.7rc1. I started it using bin/cassandra without 
> making any configuration changes.
> 
> I then tried to connect using the CLI with command like this:
> 
> f...@ubuntu:~/Applications/apache-cassandra-0.7.0-rc1$ bin/cassandra-cli
> Welcome to cassandra CLI.
> 
> Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
> [defa...@unknown] connect localhost/9160;
> Exception connecting to localhost/9160. Reason: Connection refused.
> 
> Why am I getting connection refused? I didn't experience this with cassandra 
> 0.6.8.
> 
> Thank you in advance for your help.
> 
> Kind regards,
> Joshua.
> 
> -- 
> http://twitter.com/jpartogi

Sorted Integer -> UUID

2010-12-01 Thread Benjamin Waldher


I have a fairly simple problem that might require a complicated solution.

I need to store Integer -> UUID in a column family, and be able to query 
(and then paginate) the rows ordered by the integer in descending order. 
This is simple enough if no two rows have the same integer, as the 
integer could be a column name which can easily be sorted. However, in 
my scenario, two rows may have the same Integer value. As such, I would 
need to use the integer as the key in the column family. However, this 
means I must use OrderPreservingPartitioner, which is going to cause a 
huge load imbalance on one of my nodes.


How can I have a sorted set of rows of Integer -> UUID where the integer 
may exist many times?

Re: When to call the major compaction ?

2010-12-01 Thread Nick Bailey

The part about gc refers to old sstable files on disk. After a compaction,
the old files on disk will be deleted when garbage collection happens.

On Wed, Dec 1, 2010 at 7:31 AM, Ying Tang  wrote:

> I'm confused , plz ingore the mail above.
> Here is my confusion ,
>posterior to 0.6.6/0.7  , minor compaction and major compaction both
> can clean out rows 'tagged'  tombstones  , and generate a new , without
> tombstones , sstable .
> And the tombstones remains in memory ,waiting to be removed by jvm gc .
> Am i right?
>
> On Wed, Dec 1, 2010 at 9:10 PM, Ying Tang  wrote:
>
>> 1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction
>> both  can clean out rows 'tagged'  tombstones , this kind of clean out
>> doesn't mead remove it from the disk permanently.
>> The real remove is done by the jvm GC ?
>> 2. The intence of compaction is merging multi sstables into one , clean
>> out the tombstone , let the un-tombstones  rows be into a new ordered
>> sstable ?
>>
>>
>>
>> On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne wrote:
>>
>>> On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang 
>>> wrote:
>>> > And i have another question , what's the difference between minor
>>> > compaction and major compaction?
>>>
>>> A major compaction is a compaction that compact *all* the SSTables of a
>>> given
>>> column family (compaction compacts one CF at a time).
>>>
>>> Before https://issues.apache.org/jira/browse/CASSANDRA-1074
>>> (introduced in 0.6.6 and
>>> recent 0.7 betas/rcs), major compactions where the only ones that removed
>>> the
>>> tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
>>> and this is the
>>> reason major compaction exists. Now, with #1074, minor compactions
>>> should remove most
>>> if not all tombstones, so major compaction are not or much less useful
>>> (it may depend on your
>>> workload though as minor can't always delete the tombstones).
>>>
>>> --
>>> Sylvain
>>>
>>> >
>>> > On 12/1/10, Chen Xinli  wrote:
>>> >> 2010/12/1 Ying Tang 
>>> >>
>>> >>> Every time cassandra creates a new sstable , it will call the
>>> >>> CompactionManager.submitMinorIfNeeded  ? And if the number of
>>> memtables is
>>> >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
>>> called.
>>> >>> And there is also a method named CompactionManager.submitMajor , and
>>> the
>>> >>> call relationship is :
>>> >>>
>>> >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
>>> >>> Table.forceCompaction -->CompactionManager.performMajor -->
>>> >>> CompactionManager.submitMajor
>>> >>>
>>> >>> ColumnFamilyStore.forceMajorCompaction -->
>>> CompactionManager.performMajor
>>> >>> --> CompactionManager.submitMajor
>>> >>>
>>> >>>
>>> >>> HintedHandOffManager
>>> >>>  --> CompactionManager.submitMajor
>>> >>>
>>> >>> So i have 3 questions:
>>> >>> 1. Once a new sstable has been created ,
>>> >>> CompactionManager.submitMinorIfNeeded  will be called ,
>>> minorCompaction
>>> >>> maybe called .
>>> >>> But when will the majorCompaction be called ? Just the NodeCmd ?
>>> >>>
>>> >>
>>> >> Yes, majorCompaction must be called manually from NodeCmd
>>> >>
>>> >>
>>> >>> 2. Which jobs will minorCompaction and majorCompaction do ?
>>> >>> Will minorCompaction delete the data that have been marked as
>>> deleted
>>> >>> ?
>>> >>> And how about the major compaction ?
>>> >>>
>>> >>
>>> >> Compaction only mark sstables as deleted. Deletion will be done when
>>> there
>>> >> are full gc, or node restarted.
>>> >>
>>> >>
>>> >>> 3. When gc be called ? Every time compaction been called?
>>> >>>
>>> >>
>>> >> GC has nothing to do with compaction, you may mistake the two
>>> conceptions
>>> >>
>>> >>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Best regards,
>>> >>>
>>> >>> Ivy Tang
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards,
>>> >> Chen Xinli
>>> >>
>>> >
>>> >
>>> > --
>>> > Best regards,
>>> >
>>> > Ivy Tang
>>> >
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Ivy Tang
>>
>>
>>
>>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>

Re: When to call the major compaction ?

2010-12-01 Thread Chen Xinli

2010/12/1 Ying Tang 

> I'm confused , plz ingore the mail above.
> Here is my confusion ,
>posterior to 0.6.6/0.7  , minor compaction and major compaction both
> can clean out rows 'tagged'  tombstones  , and generate a new , without
> tombstones , sstable .
>

This is right.


> And the tombstones remains in memory ,waiting to be removed by jvm gc .
> Am i right?
>

No! Compactions merge several old sstables into one, and mark old sstables
as deleted which will be deleted while jvm gc.
SSTable are files on harddisk, nothing to do with memory. You'd better have
a look at Google's bigtable paper.


>
> On Wed, Dec 1, 2010 at 9:10 PM, Ying Tang  wrote:
>
>> 1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction
>> both  can clean out rows 'tagged'  tombstones , this kind of clean out
>> doesn't mead remove it from the disk permanently.
>> The real remove is done by the jvm GC ?
>> 2. The intence of compaction is merging multi sstables into one , clean
>> out the tombstone , let the un-tombstones  rows be into a new ordered
>> sstable ?
>>
>>
>>
>> On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne wrote:
>>
>>> On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang 
>>> wrote:
>>> > And i have another question , what's the difference between minor
>>> > compaction and major compaction?
>>>
>>> A major compaction is a compaction that compact *all* the SSTables of a
>>> given
>>> column family (compaction compacts one CF at a time).
>>>
>>> Before https://issues.apache.org/jira/browse/CASSANDRA-1074
>>> (introduced in 0.6.6 and
>>> recent 0.7 betas/rcs), major compactions where the only ones that removed
>>> the
>>> tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
>>> and this is the
>>> reason major compaction exists. Now, with #1074, minor compactions
>>> should remove most
>>> if not all tombstones, so major compaction are not or much less useful
>>> (it may depend on your
>>> workload though as minor can't always delete the tombstones).
>>>
>>> --
>>> Sylvain
>>>
>>> >
>>> > On 12/1/10, Chen Xinli  wrote:
>>> >> 2010/12/1 Ying Tang 
>>> >>
>>> >>> Every time cassandra creates a new sstable , it will call the
>>> >>> CompactionManager.submitMinorIfNeeded  ? And if the number of
>>> memtables is
>>> >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
>>> called.
>>> >>> And there is also a method named CompactionManager.submitMajor , and
>>> the
>>> >>> call relationship is :
>>> >>>
>>> >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
>>> >>> Table.forceCompaction -->CompactionManager.performMajor -->
>>> >>> CompactionManager.submitMajor
>>> >>>
>>> >>> ColumnFamilyStore.forceMajorCompaction -->
>>> CompactionManager.performMajor
>>> >>> --> CompactionManager.submitMajor
>>> >>>
>>> >>>
>>> >>> HintedHandOffManager
>>> >>>  --> CompactionManager.submitMajor
>>> >>>
>>> >>> So i have 3 questions:
>>> >>> 1. Once a new sstable has been created ,
>>> >>> CompactionManager.submitMinorIfNeeded  will be called ,
>>> minorCompaction
>>> >>> maybe called .
>>> >>> But when will the majorCompaction be called ? Just the NodeCmd ?
>>> >>>
>>> >>
>>> >> Yes, majorCompaction must be called manually from NodeCmd
>>> >>
>>> >>
>>> >>> 2. Which jobs will minorCompaction and majorCompaction do ?
>>> >>> Will minorCompaction delete the data that have been marked as
>>> deleted
>>> >>> ?
>>> >>> And how about the major compaction ?
>>> >>>
>>> >>
>>> >> Compaction only mark sstables as deleted. Deletion will be done when
>>> there
>>> >> are full gc, or node restarted.
>>> >>
>>> >>
>>> >>> 3. When gc be called ? Every time compaction been called?
>>> >>>
>>> >>
>>> >> GC has nothing to do with compaction, you may mistake the two
>>> conceptions
>>> >>
>>> >>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Best regards,
>>> >>>
>>> >>> Ivy Tang
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards,
>>> >> Chen Xinli
>>> >>
>>> >
>>> >
>>> > --
>>> > Best regards,
>>> >
>>> > Ivy Tang
>>> >
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Ivy Tang
>>
>>
>>
>>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>


-- 
Best Regards,
Chen Xinli

Re: TheGC inspector's frequency

2010-12-01 Thread Jonathan Ellis

The key statement:

if (gcw.getDuration() > MIN_DURATION_TPSTATS)
{
logStats();
}

On Wed, Dec 1, 2010 at 2:44 AM, Ying Tang  wrote:
> The GCInspector's start() method ,
> In this method ,
> StorageService.scheduledTasks.scheduleWithFixedDelay(t,
> INTERVAL_IN_MS, INTERVAL_IN_MS, TimeUnit.MILLISECONDS);
> t is Runnable t and it's run method is logIntervalGCStats.
> According to this code segment , the logIntervalGCStats should be run
> every second.
> But the log of cassandra  shows the logIntervalGCStats didn't run
> every second,it's  disorder .
> How this happened?
>
> --
> Best regards,
>
> Ivy Tang
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Sorted Integer -> UUID

2010-12-01 Thread Daniel Lundin

Unless I misunderstand the Q, composing the column names with the row
keys and merging the resulting would yield something useful.

keyA => (1, uuid), (2, uuid), (3, uid)
keyB => (1, uuid), (2, uuid), (3, uid)

Should be transformed into:

 (1, keyA, uuid),
 (1, keyB, uuid),
 (2, keyA, uuid),
 (2, keyB, uuid),
 (3, keyA, uuid),
 (3, keyB, uuid)

map + merge to the rescue.

On Wed, Dec 1, 2010 at 3:33 PM, Benjamin Waldher  wrote:
> I have a fairly simple problem that might require a complicated solution.
>
> I need to store Integer -> UUID in a column family, and be able to query
> (and then paginate) the rows ordered by the integer in descending order.
> This is simple enough if no two rows have the same integer, as the integer
> could be a column name which can easily be sorted. However, in my scenario,
> two rows may have the same Integer value. As such, I would need to use the
> integer as the key in the column family. However, this means I must use
> OrderPreservingPartitioner, which is going to cause a huge load imbalance on
> one of my nodes.
>
> How can I have a sorted set of rows of Integer -> UUID where the integer may
> exist many times?
>

Re: [RELEASE] 0.7.0 rc1

2010-12-01 Thread Olivier Rosello

> FYI, 0.7.0~rc1 debs are available in a new PPA for experimental
> releases:
> 
> http://launchpad.net/~cassandra-ubuntu/+archive/experimental
> 

It seems there is a dependancy on libjets3t-java

Is it really needed ? This dependancy cannot be resolved on Ubuntu Lucid :-(

Re: C++ client for Cassandra

2010-12-01 Thread Jonathan Ellis

There is https://github.com/posulliv/libcassandra, but I think it's
0.6 only atm.

On Wed, Dec 1, 2010 at 12:13 AM, Narendra Sharma
 wrote:
> Are there any C++ clients out there similar to Hector (in terms of features)
> for Cassandra? I am looking for C++ Client for Cassandra 0.7.
>
> Thanks,
> Naren
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: C++ client for Cassandra

2010-12-01 Thread David Replogle

And I've contacted Padraig and he has no intention of upgrading to 0.7. I'm 
working heavily in C++ and Cassandra, though, so hopefully I can contribute in 
some way eventually. I may be able to help a little bit with C++ and Cassandra 
if you're totally stuck, but I'm basically just using thrift, no wizardry 
really going on.

David
Sent from my iPhone

On Dec 1, 2010, at 11:34 AM, Jonathan Ellis  wrote:

> There is https://github.com/posulliv/libcassandra, but I think it's
> 0.6 only atm.
> 
> On Wed, Dec 1, 2010 at 12:13 AM, Narendra Sharma
>  wrote:
>> Are there any C++ clients out there similar to Hector (in terms of features)
>> for Cassandra? I am looking for C++ Client for Cassandra 0.7.
>> 
>> Thanks,
>> Naren
>> 
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com

Re: Is there any way to store muti-version data based on the timestamp?

2010-12-01 Thread Ed Anuff

If you go this route, be sure to take a look at the custom column comparator
I wrote to make this sort of thing easier:

https://github.com/edanuff/CassandraCompositeType

On Wed, Dec 1, 2010 at 4:56 AM, Daniel Lundin  wrote:

> You could also use a standard column family, composing the version
> into the column name:
>
>  foo => { bar:v1 => data, bar:v2 => data, bar:v3 => data }
>
> Here, there's a cost on retrieval of course, which may or may not work
> depending on your access pattern. If you do large slices, it's
> probably not an option. It could be feasible to write a custom
> comparator sorting on some version component, to allow efficient
> slicing of the "latest" versions.
>
>

How to shutdown Cassandra?

2010-12-01 Thread rambabu pakala

Hi,
 
Can someone please let me know how to shutdown Cassandra on Windows Environment?
 
stop-server is actually kiing the Cassandra Server and I was unable to 
create/get a pid file for the Cassandra process. Are there any setup steps that 
are needed that I am missing?
 
Thanks,
-Ram.

Re: Is there any way to store muti-version data based on the timestamp?

2010-12-01 Thread Robert Coli


On 12/1/10 4:56 AM, Daniel Lundin wrote:

Correct. Unlike BigTable and HBase, Cassandra columns don't have a
version dimension.
Timestamp is used for (crude) conflict resolution, and older versions
are always overwritten.
I would be careful with the word "overwritten" here as it obfuscated the 
immutability of SSTables, and conceptual understanding of same is 
important to understanding the actual versioning behavior and how it 
relates to what data has to be read to satisfy queries. :)


=Rob

How to shutdown Cassandra on Ubuntu?

2010-12-01 Thread Melton Low

I am just starting to play with Cassandra.  My environment is Ubuntu Lucid
10.04 and latest Cassandra stable 0.6.8 binaries.

I am unclear on how to shut down the Cassandra server. Documentation is
quite clear on starting up the server.  I didn't find anything useful on the
user mailing list archive. No luck Googling as well unless I missed it.

A pointer would be appreciated.

Mel

Re: How to shutdown Cassandra on Ubuntu?

2010-12-01 Thread Rafał Krupiński

On Wed, Dec 1, 2010 at 18:39, Melton Low  wrote:
> I am just starting to play with Cassandra.  My environment is Ubuntu Lucid
> 10.04 and latest Cassandra stable 0.6.8 binaries.
> I am unclear on how to shut down the Cassandra server. Documentation is
> quite clear on starting up the server.  I didn't find anything useful on the
> user mailing list archive. No luck Googling as well unless I missed it.
> A pointer would be appreciated.
> Mel

Debian/Ubuntu specific way:
sudo invoke-rc.d cassandra stop

-- 
Pozdrawiam / Best Regards
Rafal Krupinski

Re: C++ client for Cassandra

2010-12-01 Thread Chris Trimble

Are there any that compile on Windows without the need for linking in
cygwin?

  C

On Tue, Nov 30, 2010 at 10:16 PM, sharanabasava raddi
wrote:

> Thrift is there..
>
>
> On Wed, Dec 1, 2010 at 11:43 AM, Narendra Sharma <
> narendra.sha...@gmail.com> wrote:
>
>> Are there any C++ clients out there similar to Hector (in terms of
>> features) for Cassandra? I am looking for C++ Client for Cassandra 0.7.
>>
>> Thanks,
>> Naren
>>
>>
>>
>

Re: C++ client for Cassandra

2010-12-01 Thread Adi

You can look at this patch. It has a patched version for thrift revision
818530. You will need to apply the patch to the thrift version which your
cassandra release is using. It is thrift revision 917130 for the later
releases of the 0.6 branch not sure about 0.7.

https://issues.apache.org/jira/browse/THRIFT-591

-Adi

On Wed, Dec 1, 2010 at 1:01 PM, Chris Trimble  wrote:

> Are there any that compile on Windows without the need for linking in
> cygwin?
>
>   C
>
>
> On Tue, Nov 30, 2010 at 10:16 PM, sharanabasava raddi  > wrote:
>
>> Thrift is there..
>>
>>
>> On Wed, Dec 1, 2010 at 11:43 AM, Narendra Sharma <
>> narendra.sha...@gmail.com> wrote:
>>
>>> Are there any C++ clients out there similar to Hector (in terms of
>>> features) for Cassandra? I am looking for C++ Client for Cassandra 0.7.
>>>
>>> Thanks,
>>> Naren
>>>
>>>
>>>
>>
>

Data Model Question

2010-12-01 Thread Pablo D. Salgado

Hello,

I need to store "products" data (product.name, product.price, product.state
and product.owner) in Cassandra 0.7 rc1.
The problem is that I need to get "products"  where product.price > XX AND
product.price < XX AND product.name = XXX AND product.state = XXX. Also I
need return the products with pagination sorted by one of their differents
fields (product.name, product.price, product.state or product.owner). This
is a for an "advance product search" functionality.
- I know that I can do the WHERE clause with secondary index of Cassandra
0.7 but I can't make the pagination because I don't know how to implement
the "previuos" functionality for Row Pagination. (I can use OPP if needed)
- Also I know how to do pagination on columns but I can't do the WHERE
clause with more than two fields because the result may be not sorted by the
correct field.
Do you have any idea how to do the data model to reach this requirement?

Thank you in advance,

Pablo D. Salgado
psalg...@colpix.net
http://www.colpix.net

Re: How to shutdown Cassandra?

There is no "shutdown" command in cassandra, it's designed to be stopped by killing it. The pid file is created by the *nix scripts and is not supported by the cassandra.bat file for windows. There have been a couple of discussions on running cassandra as a service under windows...http://www.mail-archive.com/user@cassandra.apache.org/msg01765.htmlhttp://www.mail-archive.com/user@cassandra.apache.org/msg04021.htmlhttp://coderjournal.com/2010/06/run-cassandra-as-a-windows-service/Hope that helps. AaronOn 02 Dec, 2010,at 06:14 AM, rambabu pakala  wrote:Hi,

Can someone please let me know how to shutdown Cassandra on Windows Environment?

stop-server is actually kiing the Cassandra Server and I was unable to create/get a pid file for the Cassandra process. Are there any setup steps that are needed that I am missing?

Thanks,
-Ram.

Range Queries in RP on SCF in 0.7 with UUID SCs

2010-12-01 Thread Frank LoVecchio

Is it possible to perform paginated queries using Random Partitioner in 0.7
with Super Column Families whose Super Columns are UUID's?  I don't believe
it is, based on this article:
http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner,
and my attempts with Pelops.

Let's say I get the last 25 inserts for a key.  I now know the 25th Super
Column UUID, and I want to start from there and get the next 25 inserts.  Do
I have to change over to Order Preserving Partitioner?  I'd like to avoid
this at all costs.

Thanks,

Frank

Re: GC Exceptions and cluster nodes are dying

Running nodes with different JVM heap sizes would not be recommended practice, for many reasons. Nor would I recommend running them with all the memory the machine has, it will just lead to the OS swapping the JVM out to disk and considerable slow things down.I would suggest a heap size of 1.5 or 2.0 GB for each node, and have a read of the JVM Heap Size section here http://wiki.apache.org/cassandra/MemtableThresholds . AFAIK the logs are showing your cluster was under heavy GC pressure. Finally, the ActiveCount error message was a known issue in beta 2. Treat yourself and try RC1 :)http://www.mail-archive.com/user@cassandra.apache.org/msg06298.htmlAaronOn 02 Dec, 2010,at 12:33 AM, asakka  wrote:
Hello,

I'm making some tests on a  data model with 3 CF and 1 SCF,  I want to start
by inserting 1 million rows (my target is to have 1billion rows) . 
I have three nodes cluster  (I'm using the same machines with 3GB of RAM
each , intel core2 duo 1,6GHZ), RF = 2, CL = 1, HEAPSIZE of the seed = 3GO
(it was 1.5GO, I've doubled it to avoid the heap size exception I had) , the
other two nodes are 1.5GO. 
 
I am using cassandra (V0.7.0-beta2) and Hector (V0.7.0.18) . I'm making
insertion in batch mode using hector Mutator.
My disk_access_mode is standard.
I reduced also my memtable_throughput_in_mb to 64, but the problem persists
and I have the following exception :
I want to know if it is a configuration or hardware problem ?

INFO [Timer-0] 2010-12-01 10:34:42,124 Gossiper.java (line 196) InetAddress
/10.0.100.215 is now dead.
 INFO [GOSSIP_STAGE:1] 2010-12-01 10:34:44,188 Gossiper.java (line 594) Node
/10.0.100.215 has restarted, now UP again
 INFO [GOSSIP_STAGE:1] 2010-12-01 10:34:44,189 StorageService.java (line
643) Node /10.0.100.215 state jump to normal
 INFO [GOSSIP_STAGE:1] 2010-12-01 10:34:44,189 StorageService.java (line
650) Will not change my token ownership to /10.0.100.215
 INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:34:44,189
HintedHandOffManager.java (line 196) Started hinted handoff for endpoint
/10.0.100.215
 INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:34:44,189
HintedHandOffManager.java (line 252) Finished hinted handoff of 0 rows to
endpoint /10.0.100.215
 INFO [GC inspection] 2010-12-01 10:40:29,141 GCInspector.java (line 129) GC
for ParNew: 750 ms, 14693208 reclaimed leaving 2140055192 used; max is
3355312128
 INFO [GC inspection] 2010-12-01 10:40:30,280 GCInspector.java (line 129) GC
for ParNew: 445 ms, 17042288 reclaimed leaving 2178211008 used; max is
3355312128
 INFO [WRITE-/10.0.100.214] 2010-12-01 10:40:31,552
OutboundTcpConnection.java (line 115) error writing to /10.0.100.214
 INFO [GC inspection] 2010-12-01 10:40:32,280 GCInspector.java (line 129) GC
for ParNew: 211 ms, 25550568 reclaimed leaving 2235227312 used; max is
3355312128
 INFO [GC inspection] 2010-12-01 10:40:34,320 GCInspector.java (line 129) GC
for ParNew: 290 ms, 26512896 reclaimed leaving 2277013184 used; max is
3355312128
 INFO [GC inspection] 2010-12-01 10:40:35,950 GCInspectorjava (line 129) GC
for ParNew: 506 ms, 24319976 reclaimed leaving 2303739672 used; max is
3355312128
 INFO [GC inspection] 2010-12-01 10:40:37,202 GCInspector.java (line 129) GC
for ParNew: 462 ms, 31759008 reclaimed leaving 2306914712 used; max is
3355312128
 INFO [GC inspection] 2010-12-01 10:40:42,629 GCInspector.java (line 129) GC
for ParNew: 445 ms, 14769312 reclaimed leaving 2327064920 used; max is
3355312128
 INFO [GC inspection] 2010-12-01 10:40:43,969 GCInspector.java (line 129) GC
for ParNew: 720 ms, 14804208 reclaimed leaving 2366434112 used; max is
3355312128
 INFO [GC inspection] 2010-12-01 10:40:45,372 GCInspector.java (line 129) GC
for ParNew: 325 ms, 23112128 reclaimed leaving 2421032952 used; max is
3355312128
 INFO [GC inspection] 2010-12-01 10:40:47,843 GCInspector.java (line 129) GC
for ParNew: 801 ms, 26014296 reclaimed leaving 2474278880 used; max is
3355312128
 INFO [Timer-0] 2010-12-01 10:41:18,451 Gossiper.java (line 196) InetAddress
/10.0.100.215 is now dead.
 INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:41:19,362
HintedHandOffManager.java (line 196) Started hinted handoff for endpoint
/10.0.100.215
 INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:41:19,975
HintedHandOffManager.java (line 252) Finished hinted handoff of 0 rows to
endpoint /10.0.100.215
 INFO [GOSSIP_STAGE:1] 2010-12-01 10:41:19,506 Gossiper.java (line 580)
InetAddress /10.0.100.215 is now UP
 INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:28,873 SSTablejava (line
145) Deleted /var/lib/cassandra/data/SAE4/Document-e-20-<>
 INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:28,952 SSTable.java (line
145) Deleted /var/lib/cassandra/data/system/LocationInfo-e-148-<>
 INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,053 SSTable.java (line
145) Deleted /var/lib/cassandra/data/SAE4/Account-e-7-<>
 INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,163 SSTable.java (line
145) Deleted /var/lib/cassandra/data/SAE4/Account-e-12-<>
 INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,2

Re: Can not connect to cassandra 0.7 using CLI

2010-12-01 Thread Joshua Partogi

Hi Brayton.

Thanks for the reply. It was running find on 0.6.8 on the same box.

Kind regards,
Joshua

On Thu, Dec 2, 2010 at 1:05 AM, Brayton Thompson wrote:

> All of the times I have had similar issues the problem has always been
> misconfigured iptables. You said it was running fine on 0.6.8 though?
> On the same box or a different box?
>
> On Dec 1, 2010, at 6:29 AM, Joshua Partogi wrote:
>
> Hi there,
>
> I just downloaded cassandra 0.7rc1. I started it using bin/cassandra
> without making any configuration changes.
>
> I then tried to connect using the CLI with command like this:
>
> f...@ubuntu:~/Applications/apache-cassandra-0.7.0-rc1$ bin/cassandra-cli
> Welcome to cassandra CLI.
>
> Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
> [defa...@unknown] connect localhost/9160;
> Exception connecting to localhost/9160. Reason: Connection refused.
>
> Why am I getting connection refused? I didn't experience this with
> cassandra 0.6.8.
>
> Thank you in advance for your help.
>
> Kind regards,
> Joshua.
>
> --
> http://twitter.com/jpartogi 
>
>
>


-- 
http://twitter.com/jpartogi

Re: Sorted Integer -> UUID

Could you use a Super CF?Super Col name is the Integer, and the Col Names are the UUID. Not sure what your col values are or your key. There are some limitations to Super CF but I do not think they would apply in this case http://wiki.apache.org/cassandra/CassandraLimitationsYou can the slice the super col names (your integers) and get back the super col and all it's columns. Or you could also use a two CF solution...Index CF where you integer is the column name, not sure what your key is. The column value is not important.  Item CF where the row key is the Integer, col names are the UUID not sure what the col value is. Some things to consider...- is there a natural grouping to your integers ? e.g. every day- what is the column value ? Will this make for big rows?Hope that helps. AaronOn 02 Dec, 2010,at 04:56 AM, Daniel Lundin  wrote:Unless I misunderstand the Q, composing the column names with the row
keys and merging the resulting would yield something useful.

keyA => (1, uuid), (2, uuid), (3, uid)
keyB => (1, uuid), (2, uuid), (3, uid)

Should be transformed into:

 (1, keyA, uuid),
 (1, keyB, uuid),
 (2, keyA, uuid),
 (2, keyB, uuid),
 (3, keyA, uuid),
 (3, keyB, uuid)

map + merge to the rescue.

On Wed, Dec 1, 2010 at 3:33 PM, Benjamin Waldher  wrote:
> I have a fairly simple problem that might require a complicated solution.
>
> I need to store Integer -> UUID in a column family, and be able to query
> (and then paginate) the rows ordered by the integer in descending order.
> This is simple enough if no two rows have the same integer, as the integer
> could be a column name which can easily be sorted. However, in my scenario,
> two rows may have the same Integer value. As such, I would need to use the
> integer as the key in the column family. However, this means I must use
> OrderPreservingPartitioner, which is going to cause a huge load imbalance on
> one of my nodes.
>
> How can I have a sorted set of rows of Integer -> UUID where the integer may
> exist many times?
>

Re: Can not connect to cassandra 0.7 using CLI

Take a look at your cassandra.yaml file at the rpc_address this is the address it's listening to connections on. The comments there should help, if you set it to 0.0.0.0 it will bind to all interfaces.  Probably not what you want in production but handy for dev.Hope that helps. AaronOn 02 Dec, 2010,at 09:40 AM, Joshua Partogi  wrote:Hi Brayton.Thanks for the reply. It was running find on 0.6.8 on the same box.Kind regards,JoshuaOn Thu, Dec 2, 2010 at 1:05 AM, Brayton Thompson  wrote:
All of the times I have had similar issues the problem has always been misconfigured iptables. You said it was running fine on 0.6.8 though?
On the same box or a different box?On Dec 1, 2010, at 6:29 AM, Joshua Partogi wrote:Hi there,I just downloaded cassandra 0.7rc1. I started it using bin/cassandra without making any configuration changes.
I then tried to connect using the CLI with command like this:f...@ubuntu:~/Applications/apache-cassandra-0.7.0-rc1$ bin/cassandra-cli
Welcome to cassandra CLI.Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.[defa...@unknown] connect localhost/9160;Exception connecting to localhost/9160. Reason: Connection refused.

Why am I getting connection refused? I didn't experience this with cassandra 0.6.8.Thank you in advance for your help.Kind regards,Joshua.-- http://twitter.com/jpartogi


-- http://twitter.com/jpartogi

Re: Can not connect to cassandra 0.7 using CLI

2010-12-01 Thread Joshua Partogi

It is set to localhost I didn't change it and it is the same as configured
in 0.6.8. Why doesn't it work out of the box?

Thanks heaps.

On Thu, Dec 2, 2010 at 7:49 AM, Aaron Morton wrote:

> Take a look at your cassandra.yaml file at the rpc_address this is the
> address it's listening to connections on. The comments there should help, if
> you set it to 0.0.0.0 it will bind to all interfaces.  Probably not what you
> want in production but handy for dev.
>
> Hope that helps.
> Aaron
>
> On 02 Dec, 2010,at 09:40 AM, Joshua Partogi  wrote:
>
> Hi Brayton.
>
> Thanks for the reply. It was running find on 0.6.8 on the same box.
>
> Kind regards,
> Joshua
>
> On Thu, Dec 2, 2010 at 1:05 AM, Brayton Thompson wrote:
>
>> All of the times I have had similar issues the problem has always been
>> misconfigured iptables. You said it was running fine on 0.6.8 though?
>> On the same box or a different box?
>>
>>
>> On Dec 1, 2010, at 6:29 AM, Joshua Partogi wrote:
>>
>> Hi there,
>>
>> I just downloaded cassandra 0.7rc1. I started it using bin/cassandra
>> without making any configuration changes.
>>
>> I then tried to connect using the CLI with command like this:
>>
>> f...@ubuntu:~/Applications/apache-cassandra-0.7.0-rc1$ bin/cassandra-cli
>> Welcome to cassandra CLI.
>>
>> Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
>> [defa...@unknown] connect localhost/9160;
>> Exception connecting to localhost/9160. Reason: Connection refused.
>>
>> Why am I getting connection refused? I didn't experience this with
>> cassandra 0.6.8.
>>
>> Thank you in advance for your help.
>>
>> Kind regards,
>> Joshua.
>>
>> --
>> http://twitter.com/jpartogi 
>>
>>
>>
>
>
> --
> http://twitter.com/jpartogi
>
>


-- 
http://twitter.com/jpartogi

Re: Range Queries in RP on SCF in 0.7 with UUID SCs

The Partitioner applies to the row keys, not the columns. Their order is determined by the compare_with and compare_subcolumns_with CF settings So where you say "get the last 25 inserts for a key" I'm translating that into "get the most recent 25 super columns for a row, where the super column names are UUID's and the CF definition has compare_with:  TimeUUIDType" You can send a get_slice where the SliceRange has count=25 and reversed=True. If you know the 25th col name you can use it as the start param, and AFAIK the operation would go faster. Not sure how this translates into calls against Pelops. This assumes you are using time / v1 UUID's that have an increasing order. Hope that helps. AaronOn 02 Dec, 2010,at 09:19 AM, Frank LoVecchio  wrote:Is it possible to perform paginated queries using Random Partitioner in 0.7 with Super Column Families whose Super Columns are UUID's?  I don't believe it is, based on this article: http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner, and my attempts with Pelops.
Let's say I get the last 25 inserts for a key.  I now know the 25th Super Column UUID, and I want to start from there and get the next 25 inserts.  Do I have to change over to Order Preserving Partitioner?  I'd like to avoid this at all costs.
Thanks,Frank

Re: Range Queries in RP on SCF in 0.7 with UUID SCs

2010-12-01 Thread Frank LoVecchio

Hey Aaron,


Yes, in regards to SCF definition, you are correct:


name: Sensor

  column_type: Super

  compare_with: TimeUUIDType

  gc_grace_seconds: 864000

  keys_cached: 1.0

  read_repair_chance: 1.0

  rows_cached: 0.0

I'm not quite sure I follow you, though, as I think I'm doing what you
specify.  The Pelops code is below.  Basically, I want to get rows starting
from a Super Column with a specific UUID and limit the number, just as you
inferred.  When I run this code I just get the last N values (25 in this
case) if non-reversed, and the first N values if reversed.  However,
regardless of what start param we use (Super Column UUID is String startKey
below), we still get the same values for the specified amount (e.g. the same
25).

*public* *void* getSuperRowKeys(String rowKey, String columnFamily,
*int* limit,
String startKey) *throws* Exception {

 *byte*[] byteArray = UuidHelper.*timeUuidStringToBytes*(startKey);

ByteBuffer bb = ByteBuffer.*wrap*(byteArray);

*new* UUID (bb.getLong(), bb.getLong());

 List cols = selector.getPageOfSuperColumnsFromRow(columnFamily,
rowKey, Bytes.*fromByteBuffer*(bb), *false*, limit, ConsistencyLevel.*ONE*);


 *for* (SuperColumn col : cols) {

*if* (col.getName() != *null*) {


System.*out*.println("NAME: " + UUID.*nameUUIDFromBytes*(col.getName()));


*for* (Column c : col.columns) {

System.*out*.println("\t\tName: " + Bytes.*toUTF8*(c.getName())

+ " Value: " + Bytes.*toUTF8*(c.getValue())

+ " timestamp: " + c.timestamp);


}


}

}


}

Here is some example data from the CLI.  If we specify
2f814d30-f758-11df-2f81-4d30f75811df
as the start param (second super column down), we still get
52e6540-f759-11df-952e-6540f75911df
(first super column) returned.

=> (super_column=952e6540-f759-11df-952e-6540f75911df,
 (column=64617465, value=323031302d31312d32332032333a32393a30332e303030,
timestamp=1290554997141000)
 (column=65787472615f696e666f, value=6e6f6e65,
timestamp=1290554997141000)
 (column=726561736f6e, value=6e6f6e65, timestamp=1290554997141000)
 (column=7365636f6e64735f746f5f6e657874, value=373530,
timestamp=1290554997141000)
 (column=73657269616c, value=393135353032353731,
timestamp=1290554997141000)
 (column=737461747573, value=5550, timestamp=1290554997141000)
 (column=74797065, value=486561727462656174,
timestamp=1290554997141000))
=> (super_column=2f814d30-f758-11df-2f81-4d30f75811df,
 (column=64617465, value=323031302d31312d32332032333a31393a30332e303030,
timestamp=129055439706)
 (column=65787472615f696e666f, value=6e6f6e65,
timestamp=129055439706)
 (column=726561736f6e, value=6e6f6e65, timestamp=129055439706)
 (column=7365636f6e64735f746f5f6e657874, value=373530,
timestamp=129055439706)
 (column=73657269616c, value=393135353032353731,
timestamp=129055439706)
 (column=737461747573, value=5550, timestamp=129055439706)
 (column=74797065, value=486561727462656174,
timestamp=129055439706))
=> (super_column=7c959f00-f757-11df-7c95-9f00f75711df,
 (column=64617465, value=323031302d31312d32332032333a31343a30332e303030,
timestamp=1290554096881000)
 (column=65787472615f696e666f, value=6e6f6e65,
timestamp=1290554096881000)
 (column=726561736f6e, value=6e6f6e65, timestamp=1290554096881000)
 (column=7365636f6e64735f746f5f6e657874, value=373530,
timestamp=1290554096881000)
 (column=73657269616c, value=393135353032353731,
timestamp=1290554096881000)
 (column=737461747573, value=5550, timestamp=1290554096881000)
 (column=74797065, value=486561727462656174,
timestamp=1290554096881000))
=> (super_column=c9be6330-f756-11df-c9be-6330f75611df,
 (column=64617465, value=323031302d31312d32332032333a30393a30332e303030,
timestamp=1290553796836000)
 (column=65787472615f696e666f, value=6e6f6e65,
timestamp=1290553796836000)
 (column=726561736f6e, value=6e6f6e65, timestamp=1290553796836000)
 (column=7365636f6e64735f746f5f6e657874, value=373530,
timestamp=1290553796836000)
 (column=73657269616c, value=393135353032353731,
timestamp=1290553796836000)
 (column=737461747573, value=5550, timestamp=1290553796836000)
 (column=74797065, value=486561727462656174,
timestamp=1290553796836000))
=> (super_column=17108150-f756-11df-1710-8150f75611df,
 (column=64617465, value=323031302d31312d32332032333a30343a30332e303030,
timestamp=1290553497067000)
 (column=65787472615f696e666f, value=6e6f6e65,
timestamp=1290553497067000)
 (column=726561736f6e, value=6e6f6e65, timestamp=1290553497067000)
 (column=7365636f6e64735f746f5f6e657874, value=373530,
timestamp=1290553497067000)
 (column=73657269616c, value=393135353032353731,
timestamp=1290553497067000)
 (column=737461747573, value=5550, timestamp=1290553497067000)
 (column=74797065, value=486561727462656174,
timestamp=1290553497067000))
=> (super_column=641da730-f755-11df-641d-a730f75511df,
 (column=64617465, value=323031302d31312d32332032323a35393a30332e

[no subject]

2010-12-01 Thread Moldován Eduárd


 unsubscribe

unsubscribe

2010-12-01 Thread Moldován Eduárd


 unsubscribe

Re: Solr DataImportHandler (DIH) and Cassandra

Try the solr source code. AaronOn 30 Nov, 2010,at 01:37 PM, Mark  wrote:
The DataSource subclass route is what I will probably be interested
in. Are there are working examples of this already out there? 

  

  
  

On 11/29/10 12:32 PM, Aaron Morton wrote:

  AFAIK there is nothing pre-written to pull the data out for
you. 
  
  
  You should be able to create your DataSource sub class http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/DataSource.html Using
the Hector java library to pull data from Cassandra. 
  
  
  
  I'm guessing you will need to consider how to perform delta
imports. Perhaps using the secondary indexes in 0.7* , or
maintaining your own queues or indexes to know what has
changed. 
  
  
  There is also the Lucandra project, not exactly what your
after but may be of interest anyway 

https://github.com/tjake/Lucandra
  
  
  Hope that helps.
  Aaron
  

On 30 Nov, 2010,at 05:04 AM, Mark
 wrote:

  
  

  
Is there anyway to use DIH to import
  from Cassandra? Thanks

thrift error

2010-12-01 Thread Michael Fortin

Hello,

I'm trying to insert a super column but I can't get passed this error.  

the error:
InvalidRequestException(why:column name must not be empty)
at 
org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:14408)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:828)
at 
org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:800)

Family def:
- {name: Super, column_type: Super, compare_with: TimeUUIDType, 
compare_subcolumns_with: LongType}

this is the code I'm calling:
...
client = new Cassandra.Client(protocol, protocol)
client.insert(key, columnParent, c, level)


And the values from the debugger:
key = {java.nio.heapbytebuf...@3168}  "java.nio.HeapByteBuffer[pos=0 lim=16 
cap=16]"
columnParent = {org.apache.cassandra.thrift.columnpar...@3169}  
"ColumnParent(column_family:Super, super_column:00 00 00 00 00 00 00 0C)"
c = {org.apache.cassandra.thrift.col...@3170}  "Column(name:63 6F 6C 75 6D 6E, 
value:76 61 6C 75 65, timestamp:1291243840220)"


I can insert a standard column without any issues with the same codebase..  
What column name must not be empty??  Clearly it's not.  What am I missing?  

Thanks
Mike

Re: Range Queries in RP on SCF in 0.7 with UUID SCs

When you say "I want to get rows starting from a Super Column..." it's a bit confusing. Do you want to get super columns from a single row, or multiple rows? I'm assuming you are talking about getting columns from a single row / key as that's what your code does.For the pelops code, it looks OK but I've not used Pelops. You can turn the logging up on the server and check the command that is sent to it. I'm would guess there is something wrong with the way you are transforming the start key For your cli example what was the command you executed ? AaronOn 02 Dec, 2010,at 11:03 AM, Frank LoVecchio  wrote:
Hey Aaron, 

Yes, in regards to SCF definition, you are correct:  

name: Sensor
      column_type: Super
      compare_with: TimeUUIDType
      gc_grace_seconds: 864000
      keys_cached: 1.0
      read_repair_chance: 1.0
      rows_cached: 0.0

I'm not quite sure I follow you, though, as I think I'm doing what you specify.  The Pelops code is below.  Basically, I want to get rows starting from a Super Column with a specific UUID and limit the number, just as you inferred.  When I run this code I just get the last N values (25 in this case) if non-reversed, and the first N values if reversed.  However, regardless of what start param we use (Super Column UUID is String startKey below), we still get the same values for the specified amount (e.g. the same 25).  

public void getSuperRowKeys(String rowKey, String columnFamily, int limit, String startKey) throws Exception {

	
		byte[] byteArray = UuidHelper.timeUuidStringToBytes(startKey);
		ByteBuffer bb = ByteBuffer.wrap(byteArray);
		new UUID (bb.getLong(), bb.getLong());
		
		
		List cols = selector.getPageOfSuperColumnsFromRow(columnFamily, rowKey, Bytes.fromByteBuffer(bb), false, limit, ConsistencyLevel.ONE);


	
		for (SuperColumn col : cols) {
			if (col.getName() != null) {

System.out.println("NAME: " + UUID.nameUUIDFromBytes(col.getName()));

for (Column c : col.columns) {
	System.out.println("\t\tName: " + Bytes.toUTF8(c.getName())
			+ " Value: " + Bytes.toUTF8(c.getValue())
			+ " timestamp: " + c.timestamp);

}

			}
		}

	}
Here is some example data from the CLI.  If we specify 2f814d30-f758-11df-2f81-4d30f75811df as the start param (second super column down), we still get 52e6540-f759-11df-952e-6540f75911df (first super column) returned.

=> (super_column=952e6540-f759-11df-952e-6540f75911df,     (column=64617465, value=323031302d31312d32332032333a32393a30332e303030, timestamp=1290554997141000)     (column=65787472615f696e666f, value=6e6f6e65, timestamp=1290554997141000)
     (column=726561736f6e, value=6e6f6e65, timestamp=1290554997141000)     (column=7365636f6e64735f746f5f6e657874, value=373530, timestamp=1290554997141000)     (column=73657269616c, value=393135353032353731, timestamp=1290554997141000)
     (column=737461747573, value=5550, timestamp=1290554997141000)     (column=74797065, value=486561727462656174, timestamp=1290554997141000))=> (super_column=2f814d30-f758-11df-2f81-4d30f75811df,
     (column=64617465, value=323031302d31312d32332032333a31393a30332e303030, timestamp=129055439706)     (column=65787472615f696e666f, value=6e6f6e65, timestamp=129055439706)     (column=726561736f6e, value=6e6f6e65, timestamp=129055439706)
     (column=7365636f6e64735f746f5f6e657874, value=373530, timestamp=129055439706)     (column=73657269616c, value=393135353032353731, timestamp=129055439706)     (column=737461747573, value=5550, timestamp=129055439706)
     (column=74797065, value=486561727462656174, timestamp=129055439706))=> (super_column=7c959f00-f757-11df-7c95-9f00f75711df,     (column=64617465, value=323031302d31312d32332032333a31343a30332e303030, timestamp=1290554096881000)
     (column=65787472615f696e666f, value=6e6f6e65, timestamp=1290554096881000)     (column=726561736f6e, value=6e6f6e65, timestamp=1290554096881000)     (column=7365636f6e64735f746f5f6e657874, value=373530, timestamp=1290554096881000)
     (column=73657269616c, value=393135353032353731, timestamp=1290554096881000)     (column=737461747573, value=5550, timestamp=1290554096881000)     (column=74797065, value=486561727462656174, timestamp=1290554096881000))
=> (super_column=c9be6330-f756-11df-c9be-6330f75611df,     (column=64617465, value=323031302d31312d32332032333a30393a30332e303030, timestamp=1290553796836000)     (column=65787472615f696e666f, value=6e6f6e65, timestamp=1290553796836000)
     (column=726561736f6e, value=6e6f6e65, timestamp=1290553796836000)     (column=7365636f6e64735f746f5f6e657874, value=373530, timestamp=1290553796836000)     (column=73657269616c, value=393135353032353731, timestamp=1290553796836000)
     (column=737461747573, value=5550, timestamp=1290553796836000)     (column=74797065, value=486561727462656174, timestamp=1290553796836000))=> (super_column=17108150-f756-11df-1710-8150f75611df,
     (column=64617465, value=323031302d31312d32332032333a30343a30332e303030, timestamp=1290553497067000)

Re: How to shutdown Cassandra?

2010-12-01 Thread rambabu pakala

Hi Aaron,

Thanks, that helped and it works.

--- On Wed, 12/1/10, Aaron Morton  wrote:

From: Aaron Morton 
Subject: Re: How to shutdown Cassandra?
To: user@cassandra.apache.org
Date: Wednesday, December 1, 2010, 12:18 PM

There is no "shutdown" command in cassandra, it's designed to be stopped by 
killing it. 

The pid file is created by the *nix scripts and is not supported by the 
cassandra.bat file for windows. 

There have been a couple of discussions on running cassandra as a service under 
windows...
http://www.mail-archive.com/user@cassandra.apache.org/msg01765.html
http://www.mail-archive.com/user@cassandra.apache.org/msg04021.html
http://coderjournal.com/2010/06/run-cassandra-as-a-windows-service/

Hope that helps. 
Aaron

On 02 Dec, 2010,at 06:14 AM, rambabu pakala  wrote:

Hi,

Can someone please let me know how to shutdown Cassandra on Windows Environment?

stop-server is actually kiing the Cassandra Server and I was unable to 
create/get a pid file for the Cassandra process. Are there any setup steps that 
are needed that I am missing?

Thanks,
-Ram.

Re: Range Queries in RP on SCF in 0.7 with UUID SCs

2010-12-01 Thread Frank LoVecchio

Actually, it was a class issue at this line:

System.*out*.println("NAME: " + UUID.*nameUUIDFromBytes*(col.getName()));

The native Pelops class timeUuidHelper is what should be used.

On Wed, Dec 1, 2010 at 4:16 PM, Aaron Morton wrote:

> When you say "I want to get rows starting from a Super Column..." it's a
> bit confusing. Do you want to get super columns from a single row, or
> multiple rows? I'm assuming you are talking about getting columns from a
> single row / key as that's what your code does.
>
> For the pelops code, it looks OK but I've not used Pelops. You can turn the
> logging up on the server and check the command that is sent to it. I'm would
> guess there is something wrong with the way you are transforming the start
> key
>
> For your cli example what was the command you executed ?
>
> Aaron
>
> On 02 Dec, 2010,at 11:03 AM, Frank LoVecchio  wrote:
>
> Hey Aaron,
>
>
> Yes, in regards to SCF definition, you are correct:
>
>
> name: Sensor
>
>   column_type: Super
>
>   compare_with: TimeUUIDType
>
>   gc_grace_seconds: 864000
>
>   keys_cached: 1.0
>
>   read_repair_chance: 1.0
>
>   rows_cached: 0.0
>
> I'm not quite sure I follow you, though, as I think I'm doing what you
> specify.  The Pelops code is below.  Basically, I want to get rows
> starting from a Super Column with a specific UUID and limit the number, just
> as you inferred.  When I run this code I just get the last N values (25 in
> this case) if non-reversed, and the first N values if reversed.  However,
> regardless of what start param we use (Super Column UUID is String startKey
> below), we still get the same values for the specified amount (e.g. the same
> 25).
>
> *public* *void* getSuperRowKeys(String rowKey, String columnFamily, *int* 
> limit,
> String startKey) *throws* Exception {
>
>   *byte*[] byteArray = UuidHelper.*timeUuidStringToBytes*(startKey);
>
>  ByteBuffer bb = ByteBuffer.*wrap*(byteArray);
>
>  *new* UUID (bb.getLong(), bb.getLong());
>
>List cols = 
> selector.getPageOfSuperColumnsFromRow(columnFamily,
> rowKey, Bytes.*fromByteBuffer*(bb), *false*, limit, ConsistencyLevel.*ONE*
> );
>
>
>   *for* (SuperColumn col : cols) {
>
>  *if* (col.getName() != *null*) {
>
>
>  System.*out*.println("NAME: " + UUID.*nameUUIDFromBytes*(col.getName()));
>
>
> *for* (Column c : col.columns) {
>
> System.*out*.println("\t\tName: " + Bytes.*toUTF8*(c.getName())
>
> + " Value: " + Bytes.*toUTF8*(c.getValue())
>
> + " timestamp: " + c.timestamp);
>
>
> }
>
>
> }
>
> }
>
>
> }
>
> Here is some example data from the CLI.  If we specify 
> 2f814d30-f758-11df-2f81-4d30f75811df
> as the start param (second super column down), we still get 
> 52e6540-f759-11df-952e-6540f75911df
> (first super column) returned.
>
> => (super_column=952e6540-f759-11df-952e-6540f75911df,
>  (column=64617465,
> value=323031302d31312d32332032333a32393a30332e303030,
> timestamp=1290554997141000)
>  (column=65787472615f696e666f, value=6e6f6e65,
> timestamp=1290554997141000)
>  (column=726561736f6e, value=6e6f6e65, timestamp=1290554997141000)
>  (column=7365636f6e64735f746f5f6e657874, value=373530,
> timestamp=1290554997141000)
>  (column=73657269616c, value=393135353032353731,
> timestamp=1290554997141000)
>  (column=737461747573, value=5550, timestamp=1290554997141000)
>  (column=74797065, value=486561727462656174,
> timestamp=1290554997141000))
> => (super_column=2f814d30-f758-11df-2f81-4d30f75811df,
>  (column=64617465,
> value=323031302d31312d32332032333a31393a30332e303030,
> timestamp=129055439706)
>  (column=65787472615f696e666f, value=6e6f6e65,
> timestamp=129055439706)
>  (column=726561736f6e, value=6e6f6e65, timestamp=129055439706)
>  (column=7365636f6e64735f746f5f6e657874, value=373530,
> timestamp=129055439706)
>  (column=73657269616c, value=393135353032353731,
> timestamp=129055439706)
>  (column=737461747573, value=5550, timestamp=129055439706)
>  (column=74797065, value=486561727462656174,
> timestamp=129055439706))
> => (super_column=7c959f00-f757-11df-7c95-9f00f75711df,
>  (column=64617465,
> value=323031302d31312d32332032333a31343a30332e303030,
> timestamp=1290554096881000)
>  (column=65787472615f696e666f, value=6e6f6e65,
> timestamp=1290554096881000)
>  (column=726561736f6e, value=6e6f6e65, timestamp=1290554096881000)
>  (column=7365636f6e64735f746f5f6e657874, value=373530,
> timestamp=1290554096881000)
>  (column=73657269616c, value=393135353032353731,
> timestamp=1290554096881000)
>  (column=737461747573, value=5550, timestamp=1290554096881000)
>  (column=74797065, value=486561727462656174,
> timestamp=1290554096881000))
> => (super_column=c9be6330-f756-11df-c9be-6330f75611df,
>  (column=64617465,
> value=323031302d31312d32332032333a30393a30332e303030,
> timestamp=1290553796836000)
>  (column=65787472615f696e666f, value=6e6f6e65,
> timestamp=1290553796836000)
>  (column=7265

Re: thrift error

2010-12-01 Thread Tyler Hobbs

Is there a particular reason why you're not using a high level client?

http://wiki.apache.org/cassandra/ClientOptions

Raw thrift is painful in many ways.

- Tyler

On Wed, Dec 1, 2010 at 5:06 PM, Michael Fortin  wrote:

> Hello,
>
> I'm trying to insert a super column but I can't get passed this error.
>
> the error:
> InvalidRequestException(why:column name must not be empty)
>at
> org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:14408)
>at
> org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:828)
>at
> org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:800)
>
> Family def:
> - {name: Super, column_type: Super, compare_with: TimeUUIDType,
> compare_subcolumns_with: LongType}
>
> this is the code I'm calling:
> ...
> client = new Cassandra.Client(protocol, protocol)
> client.insert(key, columnParent, c, level)
>
>
> And the values from the debugger:
> key = {java.nio.heapbytebuf...@3168}  "java.nio.HeapByteBuffer[pos=0
> lim=16 cap=16]"
> columnParent = {org.apache.cassandra.thrift.columnpar...@3169}
>  "ColumnParent(column_family:Super, super_column:00 00 00 00 00 00 00 0C)"
> c = {org.apache.cassandra.thrift.col...@3170}  "Column(name:63 6F 6C 75 6D
> 6E, value:76 61 6C 75 65, timestamp:1291243840220)"
>
>
> I can insert a standard column without any issues with the same codebase..
>  What column name must not be empty??  Clearly it's not.  What am I missing?
>
> Thanks
> Mike
>
>
>

OutOfMemory exceptions w/ Cassandra 0.6.8

Hi,

We have a small cluster of 3 Cassandra servers running w/ full
replication. Every once in a while we get an OutOfMemory exception and
have to restart servers. Sometimes just restarting doesn’t do it and
we have to clean the commitlog or data directory.

We are running Cassandra 0.6.8. There is only 1 keyspace and 3 column
families. There are less than 1000 keys across all column families.
There is roughly 1 write request per second and 1 read request. Each
server is allocated 1GB.  Size of all files in data directory of the
only column family is ~300MB. MemtableThroughputInMB is throttled way
down to 2 and BinaryMemtableThroughputInMB to 8 (w/ higher values we
were running out of memory extremely fast, this way it works for a
couple of days w/o crashing).

Last time this issue happened, I didn’t clear the commitlog/data
folders, enabled gc logging and restarted Cassandra. It crashes really
fast, but what is really strange is that it seems like it still has
plenty of memory when the error happens, last 3 lines from gc log:
21.408: [GC 437098K->436592K(1046464K), 0.0986800 secs]
21.520: [GC 453616K->453117K(1046464K), 0.0967770 secs]
21.629: [GC 470141K->469436K(1046464K), 0.0383520 secs]
The full log is here: http://pastebin.com/XGRSRcBd

I’ve tried increasing the memory up to 1.5GB, but it still doesn’t start.

Any ideas what might be the problem here?

Thank you,
Aram

Re: Range Queries in RP on SCF in 0.7 with UUID SCs

2010-12-01 Thread Dan Washusen

Using the methods on the Bytes class would be preferable.  The byte[]
related methods on UuidHelper should have been deprecated with the Bytes
class was introduced...

e.g. new Bytes(col.getName()).toUuid()

Cheers,
Dan

On Thu, Dec 2, 2010 at 10:26 AM, Frank LoVecchio  wrote:

> Actually, it was a class issue at this line:
>
> System.*out*.println("NAME: " + UUID.*nameUUIDFromBytes*(col.getName()));
>
> The native Pelops class timeUuidHelper is what should be used.
>
> On Wed, Dec 1, 2010 at 4:16 PM, Aaron Morton wrote:
>
>> When you say "I want to get rows starting from a Super Column..." it's a
>> bit confusing. Do you want to get super columns from a single row, or
>> multiple rows? I'm assuming you are talking about getting columns from a
>> single row / key as that's what your code does.
>>
>> For the pelops code, it looks OK but I've not used Pelops. You can turn
>> the logging up on the server and check the command that is sent to it. I'm
>> would guess there is something wrong with the way you are transforming the
>> start key
>>
>> For your cli example what was the command you executed ?
>>
>> Aaron
>>
>> On 02 Dec, 2010,at 11:03 AM, Frank LoVecchio  wrote:
>>
>> Hey Aaron,
>>
>>
>> Yes, in regards to SCF definition, you are correct:
>>
>>
>> name: Sensor
>>
>>   column_type: Super
>>
>>   compare_with: TimeUUIDType
>>
>>   gc_grace_seconds: 864000
>>
>>   keys_cached: 1.0
>>
>>   read_repair_chance: 1.0
>>
>>   rows_cached: 0.0
>>
>> I'm not quite sure I follow you, though, as I think I'm doing what you
>> specify.  The Pelops code is below.  Basically, I want to get rows
>> starting from a Super Column with a specific UUID and limit the number, just
>> as you inferred.  When I run this code I just get the last N values (25 in
>> this case) if non-reversed, and the first N values if reversed.  However,
>> regardless of what start param we use (Super Column UUID is String startKey
>> below), we still get the same values for the specified amount (e.g. the same
>> 25).
>>
>> *public* *void* getSuperRowKeys(String rowKey, String columnFamily, *int* 
>> limit,
>> String startKey) *throws* Exception {
>>
>>   *byte*[] byteArray = UuidHelper.*timeUuidStringToBytes*(startKey);
>>
>>  ByteBuffer bb = ByteBuffer.*wrap*(byteArray);
>>
>>  *new* UUID (bb.getLong(), bb.getLong());
>>
>>List cols = 
>> selector.getPageOfSuperColumnsFromRow(columnFamily,
>> rowKey, Bytes.*fromByteBuffer*(bb), *false*, limit, ConsistencyLevel.*ONE
>> *);
>>
>>
>>   *for* (SuperColumn col : cols) {
>>
>>  *if* (col.getName() != *null*) {
>>
>>
>>  System.*out*.println("NAME: " + UUID.*nameUUIDFromBytes*
>> (col.getName()));
>>
>>
>> *for* (Column c : col.columns) {
>>
>> System.*out*.println("\t\tName: " + Bytes.*toUTF8*(c.getName())
>>
>> + " Value: " + Bytes.*toUTF8*(c.getValue())
>>
>> + " timestamp: " + c.timestamp);
>>
>>
>> }
>>
>>
>> }
>>
>> }
>>
>>
>> }
>>
>> Here is some example data from the CLI.  If we specify 
>> 2f814d30-f758-11df-2f81-4d30f75811df
>> as the start param (second super column down), we still get 
>> 52e6540-f759-11df-952e-6540f75911df
>> (first super column) returned.
>>
>> => (super_column=952e6540-f759-11df-952e-6540f75911df,
>>  (column=64617465,
>> value=323031302d31312d32332032333a32393a30332e303030,
>> timestamp=1290554997141000)
>>  (column=65787472615f696e666f, value=6e6f6e65,
>> timestamp=1290554997141000)
>>  (column=726561736f6e, value=6e6f6e65, timestamp=1290554997141000)
>>  (column=7365636f6e64735f746f5f6e657874, value=373530,
>> timestamp=1290554997141000)
>>  (column=73657269616c, value=393135353032353731,
>> timestamp=1290554997141000)
>>  (column=737461747573, value=5550, timestamp=1290554997141000)
>>  (column=74797065, value=486561727462656174,
>> timestamp=1290554997141000))
>> => (super_column=2f814d30-f758-11df-2f81-4d30f75811df,
>>  (column=64617465,
>> value=323031302d31312d32332032333a31393a30332e303030,
>> timestamp=129055439706)
>>  (column=65787472615f696e666f, value=6e6f6e65,
>> timestamp=129055439706)
>>  (column=726561736f6e, value=6e6f6e65, timestamp=129055439706)
>>  (column=7365636f6e64735f746f5f6e657874, value=373530,
>> timestamp=129055439706)
>>  (column=73657269616c, value=393135353032353731,
>> timestamp=129055439706)
>>  (column=737461747573, value=5550, timestamp=129055439706)
>>  (column=74797065, value=486561727462656174,
>> timestamp=129055439706))
>> => (super_column=7c959f00-f757-11df-7c95-9f00f75711df,
>>  (column=64617465,
>> value=323031302d31312d32332032333a31343a30332e303030,
>> timestamp=1290554096881000)
>>  (column=65787472615f696e666f, value=6e6f6e65,
>> timestamp=1290554096881000)
>>  (column=726561736f6e, value=6e6f6e65, timestamp=1290554096881000)
>>  (column=7365636f6e64735f746f5f6e657874, value=373530,
>> timestamp=1290554096881000)
>>  (column=73657269616c, value=393135353032353731,
>> timestamp=12905540968

Re: thrift error

Try turning up the logging on the server side to DEBUG and see what it says. Chances are you are not sending what you think you are. Or if you feel like it put a breakpoint in o.a.c.thrift.Cassandra$Client.send_insert to see how when the client is doing. I agree with Tyler, higher level clients are *much* easier. But it's sometimes fun to see what's happening on the inside. AaronOn 02 Dec, 2010,at 12:59 PM, Tyler Hobbs  wrote:Is there a particular reason why you're not using a high level client?http://wiki.apache.org/cassandra/ClientOptionsRaw thrift is painful in many ways.
- TylerOn Wed, Dec 1, 2010 at 5:06 PM, Michael Fortin  wrote:
Hello,

I'm trying to insert a super column but I can't get passed this error.

the error:
InvalidRequestException(why:column name must not be empty)
        at org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:14408)
        at org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:828)
        at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:800)

Family def:
- {name: Super, column_type: Super, compare_with: TimeUUIDType, compare_subcolumns_with: LongType}

this is the code I'm calling:
...
client = new Cassandra.Client(protocol, protocol)
client.insert(key, columnParent, c, level)


And the values from the debugger:
key = {java.nio.heapbytebuf...@3168}  "java.nioHeapByteBuffer[pos=0 lim=16 cap=16]"
columnParent = {org.apache.cassandra.thrift.columnpar...@3169}  "ColumnParent(column_family:Super, super_column:00 00 00 00 00 00 00 0C)"
c = {org.apache.cassandra.thrift.col...@3170}  "Column(name:63 6F 6C 75 6D 6E, value:76 61 6C 75 65, timestamp:1291243840220)"


I can insert a standard column without any issues with the same codebase..  What column name must not be empty??  Clearly it's not.  What am I missing?

Thanks
Mike

Re: OutOfMemory exceptions w/ Cassandra 0.6.8

Do you have a log message for the OOM? And some GC messages around it? Have you tried watching the server with jconsole?Is the OOM happening on system start or after it's been running ? Or both?Do you have any row/key caches? Cannot remember but is 0.6* has this but have you enabled the save cache feature?Aaron On 02 Dec, 2010,at 01:28 PM, Aram Ayazyan wrote:Hi,

We have a small cluster of 3 Cassandra servers running w/ full
replication. Every once in a while we get an OutOfMemory exception and
have to restart servers. Sometimes just restarting doesn’t do it and
we have to clean the commitlog or data directory.

We are running Cassandra 0.6.8. There is only 1 keyspace and 3 column
families. There are less than 1000 keys across all column families.
There is roughly 1 write request per second and 1 read request. Each
server is allocated 1GB. Size of all files in data directory of the
only column family is ~300MB. MemtableThroughputInMB is throttled way
down to 2 and BinaryMemtableThroughputInMB to 8 (w/ higher values we
were running out of memory extremely fast, this way it works for a
couple of days w/o crashing).

Last time this issue happened, I didn’t clear the commitlog/data
folders, enabled gc logging and restarted Cassandra. It crashes really
fast, but what is really strange is that it seems like it still has
plenty of memory when the error happens, last 3 lines from gc log:
21.408: [GC 437098K->436592K(1046464K), 0.0986800 secs]
21.520: [GC 453616K->453117K(1046464K), 0.0967770 secs]
21.629: [GC 470141K->469436K(1046464K), 0.0383520 secs]
The full log is here: http://pastebin.com/XGRSRcBd

I’ve tried increasing the memory up to 1.5GB, but it still doesn’t start.

Any ideas what might be the problem here?

Thank you,
Aram

Re: OutOfMemory exceptions w/ Cassandra 0.6.8

Hi Aaron,

OOM is happening both after the system has been running for a while as
well as when I restart it afterwards. The only way to make it run
after it has crashed, is to remove everything from data and commitlog
directories. Unfortunately I don't have the original log from when
cassandra crashed earlier, but might have some soon if another node
crashes.

This particular exception happened during start-up:
ERROR [main] 2010-12-01 14:58:37,795 CassandraDaemon.java (line 242)
Exception encountered during startup.
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:597)
at 
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService.(PeriodicCommitLogExecutorService.java:57)
at 
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService.(PeriodicCommitLogExecutorService.java:40)
at 
org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:117)
at org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:71)
at 
org.apache.cassandra.db.commitlog.CommitLog$CLHandle.(CommitLog.java:85)
at 
org.apache.cassandra.db.commitlog.CommitLog.instance(CommitLog.java:80)
at 
org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilyStore.java:469)
at 
org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:517)
at org.apache.cassandra.db.Table.flush(Table.java:431)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:291)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
at 
org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)

And here is the full GC log: http://pastebin.com/XGRSRcBd (all 21
seconds of it).

Thank you,
Aram

On Wed, Dec 1, 2010 at 4:55 PM, Aaron Morton  wrote:
> Do you have a log message for the OOM? And some GC messages around it? Have
> you tried watching the server with jconsole?
> Is the OOM happening on system start or after it's been running ? Or both?
> Do you have any row/key caches? Cannot remember but is 0.6* has this but
> have you enabled the save cache feature?
> Aaron
>
> On 02 Dec, 2010,at 01:28 PM, Aram Ayazyan  wrote:
>
> Hi,
>
> We have a small cluster of 3 Cassandra servers running w/ full
> replication. Every once in a while we get an OutOfMemory exception and
> have to restart servers. Sometimes just restarting doesn’t do it and
> we have to clean the commitlog or data directory.
>
> We are running Cassandra 0.6.8. There is only 1 keyspace and 3 column
> families. There are less than 1000 keys across all column families.
> There is roughly 1 write request per second and 1 read request. Each
> server is allocated 1GB. Size of all files in data directory of the
> only column family is ~300MB. MemtableThroughputInMB is throttled way
> down to 2 and BinaryMemtableThroughputInMB to 8 (w/ higher values we
> were running out of memory extremely fast, this way it works for a
> couple of days w/o crashing).
>
> Last time this issue happened, I didn’t clear the commitlog/data
> folders, enabled gc logging and restarted Cassandra. It crashes really
> fast, but what is really strange is that it seems like it still has
> plenty of memory when the error happens, last 3 lines from gc log:
> 21.408: [GC 437098K->436592K(1046464K), 0.0986800 secs]
> 21.520: [GC 453616K->453117K(1046464K), 0.0967770 secs]
> 21.629: [GC 470141K->469436K(1046464K), 0.0383520 secs]
> The full log is here: http://pastebin.com/XGRSRcBd
>
> I’ve tried increasing the memory up to 1.5GB, but it still doesn’t start.
>
> Any ideas what might be the problem here?
>
> Thank you,
> Aram
>

Re: OutOfMemory exceptions w/ Cassandra 0.6.8

Regarding caches, I haven't explicitly enabled them and the
"saved_caches" directory is empty.

-Aram

On Wed, Dec 1, 2010 at 5:05 PM, Aram Ayazyan  wrote:
> Hi Aaron,
>
> OOM is happening both after the system has been running for a while as
> well as when I restart it afterwards. The only way to make it run
> after it has crashed, is to remove everything from data and commitlog
> directories. Unfortunately I don't have the original log from when
> cassandra crashed earlier, but might have some soon if another node
> crashes.
>
> This particular exception happened during start-up:
> ERROR [main] 2010-12-01 14:58:37,795 CassandraDaemon.java (line 242)
> Exception encountered during startup.
> java.lang.OutOfMemoryError: unable to create new native thread
>        at java.lang.Thread.start0(Native Method)
>        at java.lang.Thread.start(Thread.java:597)
>        at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService.(PeriodicCommitLogExecutorService.java:57)
>        at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService.(PeriodicCommitLogExecutorService.java:40)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:117)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:71)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog$CLHandle.(CommitLog.java:85)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog.instance(CommitLog.java:80)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilyStore.java:469)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:517)
>        at org.apache.cassandra.db.Table.flush(Table.java:431)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:291)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
>        at 
> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
>        at 
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
>
> And here is the full GC log: http://pastebin.com/XGRSRcBd (all 21
> seconds of it).
>
> Thank you,
> Aram
>
> On Wed, Dec 1, 2010 at 4:55 PM, Aaron Morton  wrote:
>> Do you have a log message for the OOM? And some GC messages around it? Have
>> you tried watching the server with jconsole?
>> Is the OOM happening on system start or after it's been running ? Or both?
>> Do you have any row/key caches? Cannot remember but is 0.6* has this but
>> have you enabled the save cache feature?
>> Aaron
>>
>> On 02 Dec, 2010,at 01:28 PM, Aram Ayazyan  wrote:
>>
>> Hi,
>>
>> We have a small cluster of 3 Cassandra servers running w/ full
>> replication. Every once in a while we get an OutOfMemory exception and
>> have to restart servers. Sometimes just restarting doesn’t do it and
>> we have to clean the commitlog or data directory.
>>
>> We are running Cassandra 0.6.8. There is only 1 keyspace and 3 column
>> families. There are less than 1000 keys across all column families.
>> There is roughly 1 write request per second and 1 read request. Each
>> server is allocated 1GB. Size of all files in data directory of the
>> only column family is ~300MB. MemtableThroughputInMB is throttled way
>> down to 2 and BinaryMemtableThroughputInMB to 8 (w/ higher values we
>> were running out of memory extremely fast, this way it works for a
>> couple of days w/o crashing).
>>
>> Last time this issue happened, I didn’t clear the commitlog/data
>> folders, enabled gc logging and restarted Cassandra. It crashes really
>> fast, but what is really strange is that it seems like it still has
>> plenty of memory when the error happens, last 3 lines from gc log:
>> 21.408: [GC 437098K->436592K(1046464K), 0.0986800 secs]
>> 21.520: [GC 453616K->453117K(1046464K), 0.0967770 secs]
>> 21.629: [GC 470141K->469436K(1046464K), 0.0383520 secs]
>> The full log is here: http://pastebin.com/XGRSRcBd
>>
>> I’ve tried increasing the memory up to 1.5GB, but it still doesn’t start.
>>
>> Any ideas what might be the problem here?
>>
>> Thank you,
>> Aram
>>
>

Re: OutOfMemory exceptions w/ Cassandra 0.6.8

2010-12-01 Thread Jonathan Ellis

Stack trace looks like an OS-level thread limit causing problems, not
actually memory.

On Wed, Dec 1, 2010 at 7:05 PM, Aram Ayazyan  wrote:
> Hi Aaron,
>
> OOM is happening both after the system has been running for a while as
> well as when I restart it afterwards. The only way to make it run
> after it has crashed, is to remove everything from data and commitlog
> directories. Unfortunately I don't have the original log from when
> cassandra crashed earlier, but might have some soon if another node
> crashes.
>
> This particular exception happened during start-up:
> ERROR [main] 2010-12-01 14:58:37,795 CassandraDaemon.java (line 242)
> Exception encountered during startup.
> java.lang.OutOfMemoryError: unable to create new native thread
>        at java.lang.Thread.start0(Native Method)
>        at java.lang.Thread.start(Thread.java:597)
>        at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService.(PeriodicCommitLogExecutorService.java:57)
>        at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService.(PeriodicCommitLogExecutorService.java:40)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:117)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:71)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog$CLHandle.(CommitLog.java:85)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog.instance(CommitLog.java:80)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilyStore.java:469)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:517)
>        at org.apache.cassandra.db.Table.flush(Table.java:431)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:291)
>        at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
>        at 
> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
>        at 
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
>
> And here is the full GC log: http://pastebin.com/XGRSRcBd (all 21
> seconds of it).
>
> Thank you,
> Aram
>
> On Wed, Dec 1, 2010 at 4:55 PM, Aaron Morton  wrote:
>> Do you have a log message for the OOM? And some GC messages around it? Have
>> you tried watching the server with jconsole?
>> Is the OOM happening on system start or after it's been running ? Or both?
>> Do you have any row/key caches? Cannot remember but is 0.6* has this but
>> have you enabled the save cache feature?
>> Aaron
>>
>> On 02 Dec, 2010,at 01:28 PM, Aram Ayazyan  wrote:
>>
>> Hi,
>>
>> We have a small cluster of 3 Cassandra servers running w/ full
>> replication. Every once in a while we get an OutOfMemory exception and
>> have to restart servers. Sometimes just restarting doesn’t do it and
>> we have to clean the commitlog or data directory.
>>
>> We are running Cassandra 0.6.8. There is only 1 keyspace and 3 column
>> families. There are less than 1000 keys across all column families.
>> There is roughly 1 write request per second and 1 read request. Each
>> server is allocated 1GB. Size of all files in data directory of the
>> only column family is ~300MB. MemtableThroughputInMB is throttled way
>> down to 2 and BinaryMemtableThroughputInMB to 8 (w/ higher values we
>> were running out of memory extremely fast, this way it works for a
>> couple of days w/o crashing).
>>
>> Last time this issue happened, I didn’t clear the commitlog/data
>> folders, enabled gc logging and restarted Cassandra. It crashes really
>> fast, but what is really strange is that it seems like it still has
>> plenty of memory when the error happens, last 3 lines from gc log:
>> 21.408: [GC 437098K->436592K(1046464K), 0.0986800 secs]
>> 21.520: [GC 453616K->453117K(1046464K), 0.0967770 secs]
>> 21.629: [GC 470141K->469436K(1046464K), 0.0383520 secs]
>> The full log is here: http://pastebin.com/XGRSRcBd
>>
>> I’ve tried increasing the memory up to 1.5GB, but it still doesn’t start.
>>
>> Any ideas what might be the problem here?
>>
>> Thank you,
>> Aram
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

cassandra version update and my cluster

2010-12-01 Thread Nick Santini

Hi,
this is an hypothetical situation highly likely to happen:

I have a cassandra 0.7 cluster, filled with production data, and I want to
upgrade cassandra to the 0.8 version (and so on as new versions get
released)

what happen with my data and keyspace / column families definitions? whats
the process to upgrade cassandra in a production cluster?


Thahks


Nicolas Santini

Re: When to call the major compaction ?

@Chen Xinli
"and mark old sstables as deleted which will be deleted while jvm gc."
SSTable is on the harddisk , how could jvm gc delete it ? JVM GC is in
charge the using of the space in the memory.

@Nick
The GC in cassandra doesn't refer to jvm gc ? This kind of gc is cassandda's
gc , intend to remove the unused file on harddisk ?



On Wed, Dec 1, 2010 at 10:54 PM, Chen Xinli  wrote:

>
>
>  2010/12/1 Ying Tang 
>
>> I'm confused , plz ingore the mail above.
>>  Here is my confusion ,
>>posterior to 0.6.6/0.7  , minor compaction and major compaction both
>> can clean out rows 'tagged'  tombstones  , and generate a new , without
>> tombstones , sstable .
>>
>
> This is right.
>
>
>> And the tombstones remains in memory ,waiting to be removed by jvm gc
>> .
>> Am i right?
>>
>
> No! Compactions merge several old sstables into one, and mark old sstables
> as deleted which will be deleted while jvm gc.
> SSTable are files on harddisk, nothing to do with memory. You'd better have
> a look at Google's bigtable paper.
>
>
>>
>>   On Wed, Dec 1, 2010 at 9:10 PM, Ying Tang wrote:
>>
>>> 1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction
>>> both  can clean out rows 'tagged'  tombstones , this kind of clean out
>>> doesn't mead remove it from the disk permanently.
>>> The real remove is done by the jvm GC ?
>>> 2. The intence of compaction is merging multi sstables into one , clean
>>> out the tombstone , let the un-tombstones  rows be into a new ordered
>>> sstable ?
>>>
>>>
>>>
>>> On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne wrote:
>>>
 On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang 
 wrote:
 > And i have another question , what's the difference between minor
 > compaction and major compaction?

 A major compaction is a compaction that compact *all* the SSTables of a
 given
 column family (compaction compacts one CF at a time).

 Before https://issues.apache.org/jira/browse/CASSANDRA-1074
 (introduced in 0.6.6 and
 recent 0.7 betas/rcs), major compactions where the only ones that
 removed the
 tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
 and this is the
 reason major compaction exists. Now, with #1074, minor compactions
 should remove most
 if not all tombstones, so major compaction are not or much less useful
 (it may depend on your
 workload though as minor can't always delete the tombstones).

 --
 Sylvain

 >
 > On 12/1/10, Chen Xinli  wrote:
 >> 2010/12/1 Ying Tang 
 >>
 >>> Every time cassandra creates a new sstable , it will call the
 >>> CompactionManager.submitMinorIfNeeded  ? And if the number of
 memtables is
 >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
 called.
 >>> And there is also a method named CompactionManager.submitMajor , and
 the
 >>> call relationship is :
 >>>
 >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
 >>> Table.forceCompaction -->CompactionManager.performMajor -->
 >>> CompactionManager.submitMajor
 >>>
 >>> ColumnFamilyStore.forceMajorCompaction -->
 CompactionManager.performMajor
 >>> --> CompactionManager.submitMajor
 >>>
 >>>
 >>> HintedHandOffManager
 >>>  --> CompactionManager.submitMajor
 >>>
 >>> So i have 3 questions:
 >>> 1. Once a new sstable has been created ,
 >>> CompactionManager.submitMinorIfNeeded  will be called ,
 minorCompaction
 >>> maybe called .
 >>> But when will the majorCompaction be called ? Just the NodeCmd ?
 >>>
 >>
 >> Yes, majorCompaction must be called manually from NodeCmd
 >>
 >>
 >>> 2. Which jobs will minorCompaction and majorCompaction do ?
 >>> Will minorCompaction delete the data that have been marked as
 deleted
 >>> ?
 >>> And how about the major compaction ?
 >>>
 >>
 >> Compaction only mark sstables as deleted. Deletion will be done when
 there
 >> are full gc, or node restarted.
 >>
 >>
 >>> 3. When gc be called ? Every time compaction been called?
 >>>
 >>
 >> GC has nothing to do with compaction, you may mistake the two
 conceptions
 >>
 >>
 >>>
 >>>
 >>>
 >>> --
 >>> Best regards,
 >>>
 >>> Ivy Tang
 >>>
 >>>
 >>>
 >>>
 >>
 >>
 >> --
 >> Best Regards,
 >> Chen Xinli
 >>
 >
 >
 > --
 > Best regards,
 >
 > Ivy Tang
 >

>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Ivy Tang
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Ivy Tang
>>
>>
>>
>>
>
>
> --
> Best Regards,
> Chen Xinli
>



-- 
Best regards,

Ivy Tang

how can i ran the word count example on windows?

2010-12-01 Thread Bingbing Liu

i don't know how to set the command line 

is there a word_count.bat ? like the word_count in the bin on linux?

2010-12-02 



Bingbing Liu

Re: When to call the major compaction ?

2010-12-01 Thread Chen Xinli

You are right, jvm gc is for memory.
In cassandra, there is a small trick called *PhantomReference*, which will
be called when jvm gc. And deletion is actually done in PhantomReference.

2010/12/2 Ying Tang 

> @Chen Xinli
> "and mark old sstables as deleted which will be deleted while jvm gc."
> SSTable is on the harddisk , how could jvm gc delete it ? JVM GC is in
> charge the using of the space in the memory.
>
> @Nick
> The GC in cassandra doesn't refer to jvm gc ? This kind of gc is
> cassandda's gc , intend to remove the unused file on harddisk ?
>
>
>
> On Wed, Dec 1, 2010 at 10:54 PM, Chen Xinli  wrote:
>
>>
>>
>>  2010/12/1 Ying Tang 
>>
>>> I'm confused , plz ingore the mail above.
>>>  Here is my confusion ,
>>>posterior to 0.6.6/0.7  , minor compaction and major compaction both
>>> can clean out rows 'tagged'  tombstones  , and generate a new , without
>>> tombstones , sstable .
>>>
>>
>> This is right.
>>
>>
>>> And the tombstones remains in memory ,waiting to be removed by jvm gc
>>> .
>>> Am i right?
>>>
>>
>> No! Compactions merge several old sstables into one, and mark old sstables
>> as deleted which will be deleted while jvm gc.
>> SSTable are files on harddisk, nothing to do with memory. You'd better
>> have a look at Google's bigtable paper.
>>
>>
>>>
>>>   On Wed, Dec 1, 2010 at 9:10 PM, Ying Tang wrote:
>>>
 1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction
 both  can clean out rows 'tagged'  tombstones , this kind of clean out
 doesn't mead remove it from the disk permanently.
 The real remove is done by the jvm GC ?
 2. The intence of compaction is merging multi sstables into one , clean
 out the tombstone , let the un-tombstones  rows be into a new ordered
 sstable ?



 On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne wrote:

> On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang 
> wrote:
> > And i have another question , what's the difference between minor
> > compaction and major compaction?
>
> A major compaction is a compaction that compact *all* the SSTables of a
> given
> column family (compaction compacts one CF at a time).
>
> Before https://issues.apache.org/jira/browse/CASSANDRA-1074
> (introduced in 0.6.6 and
> recent 0.7 betas/rcs), major compactions where the only ones that
> removed the
> tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
> and this is the
> reason major compaction exists. Now, with #1074, minor compactions
> should remove most
> if not all tombstones, so major compaction are not or much less useful
> (it may depend on your
> workload though as minor can't always delete the tombstones).
>
> --
> Sylvain
>
> >
> > On 12/1/10, Chen Xinli  wrote:
> >> 2010/12/1 Ying Tang 
> >>
> >>> Every time cassandra creates a new sstable , it will call the
> >>> CompactionManager.submitMinorIfNeeded  ? And if the number of
> memtables is
> >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
> called.
> >>> And there is also a method named CompactionManager.submitMajor ,
> and the
> >>> call relationship is :
> >>>
> >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
> >>> Table.forceCompaction -->CompactionManager.performMajor -->
> >>> CompactionManager.submitMajor
> >>>
> >>> ColumnFamilyStore.forceMajorCompaction -->
> CompactionManager.performMajor
> >>> --> CompactionManager.submitMajor
> >>>
> >>>
> >>> HintedHandOffManager
> >>>  --> CompactionManager.submitMajor
> >>>
> >>> So i have 3 questions:
> >>> 1. Once a new sstable has been created ,
> >>> CompactionManager.submitMinorIfNeeded  will be called ,
> minorCompaction
> >>> maybe called .
> >>> But when will the majorCompaction be called ? Just the NodeCmd
> ?
> >>>
> >>
> >> Yes, majorCompaction must be called manually from NodeCmd
> >>
> >>
> >>> 2. Which jobs will minorCompaction and majorCompaction do ?
> >>> Will minorCompaction delete the data that have been marked as
> deleted
> >>> ?
> >>> And how about the major compaction ?
> >>>
> >>
> >> Compaction only mark sstables as deleted. Deletion will be done when
> there
> >> are full gc, or node restarted.
> >>
> >>
> >>> 3. When gc be called ? Every time compaction been called?
> >>>
> >>
> >> GC has nothing to do with compaction, you may mistake the two
> conceptions
> >>
> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>> Ivy Tang
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Best Regards,
> >> Chen Xinli
> >>
> >
> >
> > --
> > Best regards,
> >
> > Ivy Tang
> >
>

Re: how can i ran the word count example on windows?

2010-12-01 Thread Jeremy Hanna

There isn't currently, but perhaps you could contribute one :).  If you take a 
look at the sh script in the bin directory of the word count example, it 
shouldn't be terribly difficult to mimic the behavior.  It's mostly just 
setting up the classpath and executing the Java class with some arguments.

On Dec 1, 2010, at 8:26 PM, Bingbing Liu wrote:

> i don't know how to set the command line
>  
> is there a word_count.bat ? like the word_count in the bin on linux?
>  
> 2010-12-02
> Bingbing Liu

Re: OutOfMemory exceptions w/ Cassandra 0.6.8

Thanks a lot Jonathan! That seems to be it, since the exact same
configuration w/ the same data starts up and works fine on a different
server.

-Aram

On Wed, Dec 1, 2010 at 5:24 PM, Jonathan Ellis  wrote:
> Stack trace looks like an OS-level thread limit causing problems, not
> actually memory.
>
> On Wed, Dec 1, 2010 at 7:05 PM, Aram Ayazyan  wrote:
>> Hi Aaron,
>>
>> OOM is happening both after the system has been running for a while as
>> well as when I restart it afterwards. The only way to make it run
>> after it has crashed, is to remove everything from data and commitlog
>> directories. Unfortunately I don't have the original log from when
>> cassandra crashed earlier, but might have some soon if another node
>> crashes.
>>
>> This particular exception happened during start-up:
>> ERROR [main] 2010-12-01 14:58:37,795 CassandraDaemon.java (line 242)
>> Exception encountered during startup.
>> java.lang.OutOfMemoryError: unable to create new native thread
>>        at java.lang.Thread.start0(Native Method)
>>        at java.lang.Thread.start(Thread.java:597)
>>        at 
>> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService.(PeriodicCommitLogExecutorService.java:57)
>>        at 
>> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService.(PeriodicCommitLogExecutorService.java:40)
>>        at 
>> org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:117)
>>        at 
>> org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:71)
>>        at 
>> org.apache.cassandra.db.commitlog.CommitLog$CLHandle.(CommitLog.java:85)
>>        at 
>> org.apache.cassandra.db.commitlog.CommitLog.instance(CommitLog.java:80)
>>        at 
>> org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilyStore.java:469)
>>        at 
>> org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:517)
>>        at org.apache.cassandra.db.Table.flush(Table.java:431)
>>        at 
>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:291)
>>        at 
>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
>>        at 
>> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
>>        at 
>> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
>>
>> And here is the full GC log: http://pastebin.com/XGRSRcBd (all 21
>> seconds of it).
>>
>> Thank you,
>> Aram
>>
>> On Wed, Dec 1, 2010 at 4:55 PM, Aaron Morton  wrote:
>>> Do you have a log message for the OOM? And some GC messages around it? Have
>>> you tried watching the server with jconsole?
>>> Is the OOM happening on system start or after it's been running ? Or both?
>>> Do you have any row/key caches? Cannot remember but is 0.6* has this but
>>> have you enabled the save cache feature?
>>> Aaron
>>>
>>> On 02 Dec, 2010,at 01:28 PM, Aram Ayazyan  wrote:
>>>
>>> Hi,
>>>
>>> We have a small cluster of 3 Cassandra servers running w/ full
>>> replication. Every once in a while we get an OutOfMemory exception and
>>> have to restart servers. Sometimes just restarting doesn’t do it and
>>> we have to clean the commitlog or data directory.
>>>
>>> We are running Cassandra 0.6.8. There is only 1 keyspace and 3 column
>>> families. There are less than 1000 keys across all column families.
>>> There is roughly 1 write request per second and 1 read request. Each
>>> server is allocated 1GB. Size of all files in data directory of the
>>> only column family is ~300MB. MemtableThroughputInMB is throttled way
>>> down to 2 and BinaryMemtableThroughputInMB to 8 (w/ higher values we
>>> were running out of memory extremely fast, this way it works for a
>>> couple of days w/o crashing).
>>>
>>> Last time this issue happened, I didn’t clear the commitlog/data
>>> folders, enabled gc logging and restarted Cassandra. It crashes really
>>> fast, but what is really strange is that it seems like it still has
>>> plenty of memory when the error happens, last 3 lines from gc log:
>>> 21.408: [GC 437098K->436592K(1046464K), 0.0986800 secs]
>>> 21.520: [GC 453616K->453117K(1046464K), 0.0967770 secs]
>>> 21.629: [GC 470141K->469436K(1046464K), 0.0383520 secs]
>>> The full log is here: http://pastebin.com/XGRSRcBd
>>>
>>> I’ve tried increasing the memory up to 1.5GB, but it still doesn’t start.
>>>
>>> Any ideas what might be the problem here?
>>>
>>> Thank you,
>>> Aram
>>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

TheGC inspector's frequency

The GCInspector's start() method ,
In this method ,
StorageService.scheduledTasks.scheduleWithFixedDelay(t,
INTERVAL_IN_MS, INTERVAL_IN_MS, TimeUnit.MILLISECONDS);
t is Runnable t and it's run method is logIntervalGCStats.
According to this code segment , the logIntervalGCStats should be run
every second.
But the log of cassandra  shows the logIntervalGCStats didn't run
every second,it's  disorder .
How this happened?

-- 
Best regards,

Ivy Tang

Re: When to call the major compaction ?

2010-12-01 Thread Chen Xinli

2010/12/1 Ying Tang 

> Every time cassandra creates a new sstable , it will call the
> CompactionManager.submitMinorIfNeeded  ? And if the number of memtables is
> beyond  MinimumCompactionThreshold  , the minor compaction will be called.
> And there is also a method named CompactionManager.submitMajor , and the
> call relationship is :
>
> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
> Table.forceCompaction -->CompactionManager.performMajor -->
> CompactionManager.submitMajor
>
> ColumnFamilyStore.forceMajorCompaction --> CompactionManager.performMajor
> --> CompactionManager.submitMajor
>   
> 
> HintedHandOffManager
>  --> CompactionManager.submitMajor
>
> So i have 3 questions:
> 1. Once a new sstable has been created ,
> CompactionManager.submitMinorIfNeeded  will be called , minorCompaction
> maybe called .
> But when will the majorCompaction be called ? Just the NodeCmd ?
>

Yes, majorCompaction must be called manually from NodeCmd


> 2. Which jobs will minorCompaction and majorCompaction do ?
> Will minorCompaction delete the data that have been marked as deleted ?
> And how about the major compaction ?
>

Compaction only mark sstables as deleted. Deletion will be done when there
are full gc, or node restarted.


> 3. When gc be called ? Every time compaction been called?
>

GC has nothing to do with compaction, you may mistake the two conceptions


>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>


-- 
Best Regards,
Chen Xinli

Re: When to call the major compaction ?