Re: question on saved_cache_directory

2011-03-28 Thread Peter Schuller
>     I have sdd and normal disk .I am using sdd for data directory
> should i also use sdd for saved_cache directory.

It won't really hurt but there's no need. It's sequential dumping and
reading of data. No random I/O.


-- 
/ Peter Schuller


Re: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Stephen Connolly
iterate.

otherwise if that will be too slow and you will do it often, the nosql way
is to create a separate column family updated with each row add/delete to
hold the answer for you.

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 28 Mar 2011 07:40, "Sheng Chen"  wrote:
> Hi all,
> I want to know how many records I am holding in Cassandra, just like
> count(*) in sql.
> What can I do ? Thank you.
>
> Sheng


balance between concurrent_[reads|writes] and feeding/reading threads i clients

2011-03-28 Thread Terje Marthinussen
Hi,

I was pondering about how the concurrent_read and write settings balances
towards max read/write threads in clients.

Lets say we have 3 nodes, and concurrent read/write set to 8.
That is, 8*3=24 threads for reading and writing.

Replication factor is 3.

Lets say we have clients that in total set up 16 connections to each node.

Now all the clients write at the same time. Since the replication factor is
3, you could get up to 16*3=48  concurrent write request per node (which
needs to be handled by 8 threads)?

What is the result if this load continues?
Could you see that replication of data fails (at least initially) causing
all kinds of fun timeouts around in the system?

Same on the read side.
If all clients read at the same time with Consistency level QUORUM, you get
16*2 read requests in best case (and more in worst case)?

Could you see that one node answers, but another one times out due to lack
of read threads, causing read repair which again further degrades?

How does this queue up internally between thrift, gossip and the threads
doing the actual read and writes?

Regards,
Terje


Re: How to repair HintsColumnFamily?

2011-03-28 Thread Shotaro Kamio
I see. Then, I'll remove the HintsColumnFamily.

Because our cluster has a lot of data, running repair takes much time
(more than a day). And it's a kind of pain. It often causes disk full,
creates many sstables and degrades read performance.
If it's easy to fix the hint, it could be less painful solution. But I
understand there's no other option in this case.


Thanks,
Shotaro


On Sun, Mar 27, 2011 at 11:51 PM, Jonathan Ellis  wrote:
> Why would you try to repair hints?
>
> If you run repair on the non-system data then you don't need the hint
> data and can remove it.
>
> On Sun, Mar 27, 2011 at 12:17 AM, Shotaro Kamio  wrote:
>> Hi,
>>
>> Our cluster uses cassandra 0.7.4 (upgraded from 0.7.3) with
>> replication = 3. I found that error occurs on one node during hinted
>> handoff with following error (log #1 below).
>> When I tried out "scrub system HintsColumnFamily", I saw an ERROR in
>> log (log #2 below).
>> Do you think these errors are critical ?
>> I tried to "repair system HintsColumnFamily". But, it refuses to run
>> with "No neighbors". I can understand because hints are not
>> replicated. But then, is there any way to fix it without data loss?
>>
>>  INFO [manual-repair-0996a2ec-26d3-4243-9586-d56daf30f9bd] 2011-03-27
>> 13:55:05,664 AntiEntropyService.java (line 752) No neighbors to repair
>> with: manual-repair-0996a2ec-26d3-4243-9586-d56daf30f9bd completed.
>>
>>
>> Best regards,
>> Shotaro
>>
>>
>>  Log #1: Error on hinted handoff
>> 
>>
>> ERROR [HintedHandoff:1] 2011-03-26 20:04:22,528
>> DebuggableThreadPoolExecutor.java (line 103) Error in
>> ThreadPoolExecutor
>> java.lang.RuntimeException: java.lang.RuntimeException: error reading
>> 4976040 of 4976067
>>        at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>        at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>        at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>        at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.lang.RuntimeException: error reading 4976040 of 4976067
>>        at 
>> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83)
>>        at 
>> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
>>        at 
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>>        at 
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>>        at 
>> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
>>        at 
>> org.apache.commons.collections.iterators.CollatingIterator.anyHasNext(CollatingIterator.java:364)
>>        at 
>> org.apache.commons.collections.iterators.CollatingIterator.hasNext(CollatingIterator.java:217)
>>        at 
>> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:63)
>>        at 
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>>        at 
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>>        at 
>> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
>>        at 
>> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130)
>>        at 
>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1368)
>>        at 
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1245)
>>        at 
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1173)
>>        at 
>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:321)
>>        at 
>> org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
>>        at 
>> org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:409)
>>        at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>        ... 3 more
>> Caused by: java.io.EOFException
>>        at java.io.RandomAccessFile.readByte(RandomAccessFile.java:591)
>>        at 
>> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:324)
>>        at 
>> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:335)
>>        at 
>> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:351)
>>        at 
>> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:311)
>>        at 
>> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
>>        ... 21 more
>>
>> --
>>
>>  Log #2: Error on scrub ---
>>
>>  INFO [CompactionExecutor:1] 20

Help on how to configure an off-site DR node.

2011-03-28 Thread Brian Lycett
Hello.

I'm setting up a cluster that has three nodes in our production rack.
My intention is to have a replication factor of two for this.
For disaster recovery purposes, I need to have another node (or two?)
off-site.

The off-site node is entirely for the purpose of having an offsite
backup of the data - no clients will connect to it.

My question is, is it possible to configure Cassandra so that the
offsite node will have a full copy of the data set?
That is, somehow guarantee that a replica of all data will be written to
it, but without having to resort to an ALL consistency level for writes?
Although the offsite node will on a 20Mbit leased line, I'd rather not
have the risk that the link goes down and breaks the cluster.

I've seen this suggestion here:
http://www.datastax.com/docs/0.7/operations/datacenter#disaster
but that configuration is vulnerable to the link breaking, and uses four
nodes in the offsite location.


Regards,

Brian




Re: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Joshua Partogi
Not all NoSQL is like that. Or perhaps the term NoSQL has became vague
these days.

On Mon, Mar 28, 2011 at 6:16 PM, Stephen Connolly
 wrote:
> iterate.
>
> otherwise if that will be too slow and you will do it often, the nosql way
> is to create a separate column family updated with each row add/delete to
> hold the answer for you.
>
> - Stephen
>
> ---
> Sent from my Android phone, so random spelling mistakes, random nonsense
> words and other nonsense are a direct result of using swype to type on the
> screen
>
> On 28 Mar 2011 07:40, "Sheng Chen"  wrote:
>> Hi all,
>> I want to know how many records I am holding in Cassandra, just like
>> count(*) in sql.
>> What can I do ? Thank you.
>>
>> Sheng
>



-- 
http://twitter.com/jpartogi


Something about cassandra API

2011-03-28 Thread An Zhuo
HI, I've learned something about Cassandra and find that there are two packages 
about how to access cassandra: avro and thrift。

So how should I choose the suitable way with java, avro or thrift? thank you.

2011-03-28 



An Zhuo 


Re: Something about cassandra API

2011-03-28 Thread Norman Maurer
Hi there,

you would be better of to use a high-level client like hector or pelops.

See:
http://wiki.apache.org/cassandra/ClientOptions

But to answer your question... If you really want to use something lowlevel
then Thrift is the way to go...


Bye,
Norman



2011/3/28 An Zhuo 

>  HI, I've learned something about Cassandra and find that there are two
> packages about how to access cassandra: avro and thrift。
>
> So how should I choose the suitable way with java, avro or thrift? thank
> you.
>
> 2011-03-28
> --
> An Zhuo
>


RE: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Or Yanay
I use one of two ways to achieve that:
  1. run a map reduce. Pig is really helpful in these cases. Make sure you run 
your MR using Hadoop task tracker on your nodes - or your performance will take 
a hit.
  2. dump all keys using sstablekeys script from relevant files on all machines 
and count unique values. I do that using "sort -n  keys.txt |uniq >> 
unique_keys.txt"

Dumping all keys is much faster but less elegant and can be more annoying if 
you want do that from your application.

Hope that do the trick for you.
-Orr

-Original Message-
From: Joshua Partogi [mailto:joshua.j...@gmail.com] 
Sent: Monday, March 28, 2011 2:39 PM
To: user@cassandra.apache.org
Subject: Re: newbie question: how do I know the total number of rows of a cf?

Not all NoSQL is like that. Or perhaps the term NoSQL has became vague
these days.

On Mon, Mar 28, 2011 at 6:16 PM, Stephen Connolly
 wrote:
> iterate.
>
> otherwise if that will be too slow and you will do it often, the nosql way
> is to create a separate column family updated with each row add/delete to
> hold the answer for you.
>
> - Stephen
>
> ---
> Sent from my Android phone, so random spelling mistakes, random nonsense
> words and other nonsense are a direct result of using swype to type on the
> screen
>
> On 28 Mar 2011 07:40, "Sheng Chen"  wrote:
>> Hi all,
>> I want to know how many records I am holding in Cassandra, just like
>> count(*) in sql.
>> What can I do ? Thank you.
>>
>> Sheng
>



-- 
http://twitter.com/jpartogi


Re: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Stephen Connolly
ok, so not all nosql has column families...

just

s/nosql/cassandra/g

on my previous post ;-)

On 28 March 2011 13:38, Joshua Partogi  wrote:
> Not all NoSQL is like that. Or perhaps the term NoSQL has became vague
> these days.
>
> On Mon, Mar 28, 2011 at 6:16 PM, Stephen Connolly
>  wrote:
>> iterate.
>>
>> otherwise if that will be too slow and you will do it often, the nosql way
>> is to create a separate column family updated with each row add/delete to
>> hold the answer for you.
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on the
>> screen
>>
>> On 28 Mar 2011 07:40, "Sheng Chen"  wrote:
>>> Hi all,
>>> I want to know how many records I am holding in Cassandra, just like
>>> count(*) in sql.
>>> What can I do ? Thank you.
>>>
>>> Sheng
>>
>
>
>
> --
> http://twitter.com/jpartogi
>


Re: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Stephen Connolly
for #2 you could pipe through wc -l to get the answer

sort -n keys.txt | uniq | wc -l

but both examples are just refinements of iterate.

#1 is just a distributed iterate
#2 is just an optimized iterate based on knowledge of the on-disk
format (and my give inaccurate results... tombstones...)

On 28 March 2011 14:16, Or Yanay  wrote:
> I use one of two ways to achieve that:
>  1. run a map reduce. Pig is really helpful in these cases. Make sure you run 
> your MR using Hadoop task tracker on your nodes - or your performance will 
> take a hit.
>  2. dump all keys using sstablekeys script from relevant files on all 
> machines and count unique values. I do that using "sort -n  keys.txt |uniq >> 
> unique_keys.txt"
>
> Dumping all keys is much faster but less elegant and can be more annoying if 
> you want do that from your application.
>
> Hope that do the trick for you.
> -Orr
>
> -Original Message-
> From: Joshua Partogi [mailto:joshua.j...@gmail.com]
> Sent: Monday, March 28, 2011 2:39 PM
> To: user@cassandra.apache.org
> Subject: Re: newbie question: how do I know the total number of rows of a cf?
>
> Not all NoSQL is like that. Or perhaps the term NoSQL has became vague
> these days.
>
> On Mon, Mar 28, 2011 at 6:16 PM, Stephen Connolly
>  wrote:
>> iterate.
>>
>> otherwise if that will be too slow and you will do it often, the nosql way
>> is to create a separate column family updated with each row add/delete to
>> hold the answer for you.
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on the
>> screen
>>
>> On 28 Mar 2011 07:40, "Sheng Chen"  wrote:
>>> Hi all,
>>> I want to know how many records I am holding in Cassandra, just like
>>> count(*) in sql.
>>> What can I do ? Thank you.
>>>
>>> Sheng
>>
>
>
>
> --
> http://twitter.com/jpartogi
>


Re: Something about cassandra API

2011-03-28 Thread Stephen Connolly
FYI Avro is in all likelyhood being removed in 0.8

2011/3/28 Norman Maurer :
> Hi there,
>
> you would be better of to use a high-level client like hector or pelops.
>
> See:
> http://wiki.apache.org/cassandra/ClientOptions
>
> But to answer your question... If you really want to use something lowlevel
> then Thrift is the way to go...
>
>
> Bye,
> Norman
>
>
>
> 2011/3/28 An Zhuo 
>>
>> HI, I've learned something about Cassandra and find that there are two
>> packages about how to access cassandra: avro and thrift。
>>
>> So how should I choose the suitable way with java, avro or thrift? thank
>> you.
>>
>> 2011-03-28
>> 
>> An Zhuo
>


Problem about freeing space after a major compaction

2011-03-28 Thread Roberto Bentivoglio
Hi all,
we're working on a Cassandra 0.7.0 production enviroment with a store of
data near to 500 GB.
We need to periodically remove the tombstones from deleted/expired data
performing a major compaction operation through nodetool.
After invoking the compaction on a single column family we can see from
JConsole that the LiveSSTableCount is going from 15 to 3 while the
LiveDiskSpaceUsed is going from 90GB to 50GB.
The problem now is that the space on the file system is been taken from
Cassandra (I assumed from the old SSTable) and it isn't freed. We have tried
to perform a full GC from the JConsole as described in
http://wiki.apache.org/cassandra/MemtableSSTable without any success. The
space is freed only after a database restart.

How can we free this disk space without restart the db?

Thanks you very much,
Roberto Bentivoglio


Re: Poor performance on small data set

2011-03-28 Thread Sébastien Kondov
Hi,

Just to inform that i finally compiled thrift extension to a .dll and
performances are improved. I was forced to switch to a php vc9. vc6 isn't
supported anymore by php.

Average access time were pretty bad before (70-100ms) by row and now it's
5-10ms. So nearly 10X faster caused by new extension .dll and maybe php vc9.

So it's good news... but 10ms is no really good performance compare to mysql
or memcached on insert. So i'll make new test on a virtual machine (XP) to
see Windows Seven impact.


I would like your advice on these performances :

CPU : U9400@1.4Ghz
Windows 7 32bit
Ram : 4Go
(my dev config In futur, it'll run on unix server)

Testing by reading/inserting 1000x the same row id

Read : 7.2 sec to read 1000 rows
Insert : 8.5 sec to insert 1000 rows

strlen of 1 rows serialized= 2604 char
1 row = 20 column
When i say row it's mean like mysql row.


Does it sound good to you ?
Are performances limited by CPU ?


Other observation is when i store my row serialized in one column i've got
boost performance.

cf[id][serialized]=serialize(row)

read/insert: 2.3sec/1000rows

when read/insert row not serialized 7-8sec

*So performance does not depend on size but on number of column*

So conclusion is that is better to store a data row serialized as i will all
the time read all the data of a row each time.




Thank you,

Vodnok

2011/3/12 Tyler Hobbs 

> On Sat, Mar 12, 2011 at 6:45 AM, Vodnok  wrote:
>
>>
>> THRIFT-638 : It seems to be a solution but i don't know how to patch this
>> on my environement phpcassa has a C extension but it's hard for me to build
>> a php extension
>>
>
> The master branch of phpcassa includes the changes from THRIFT-638.
>
> --
> Tyler Hobbs
> Software Engineer, DataStax 
> Maintainer of the pycassa  Cassandra
> Python client library
>
>


Re: Problem about freeing space after a major compaction

2011-03-28 Thread Ching-Cheng Chen
tombstones removal also depends on your gc grace period setting.

If you are pretty sure that you have proper gc grace period set and still on
0.7.0, then probably related to this bug.

https://issues.apache.org/jira/browse/CASSANDRA-2059

Regards,



Chen

Senior Developer, EvidentSoftware(Leaders in Monitoring of NoSQL & JAVA )

http://www.evidentsoftware.com

On Mon, Mar 28, 2011 at 10:40 AM, Roberto Bentivoglio <
roberto.bentivog...@gmail.com> wrote:

> Hi all,
> we're working on a Cassandra 0.7.0 production enviroment with a store of
> data near to 500 GB.
> We need to periodically remove the tombstones from deleted/expired data
> performing a major compaction operation through nodetool.
> After invoking the compaction on a single column family we can see from
> JConsole that the LiveSSTableCount is going from 15 to 3 while the
> LiveDiskSpaceUsed is going from 90GB to 50GB.
> The problem now is that the space on the file system is been taken from
> Cassandra (I assumed from the old SSTable) and it isn't freed. We have tried
> to perform a full GC from the JConsole as described in
> http://wiki.apache.org/cassandra/MemtableSSTable without any success. The
> space is freed only after a database restart.
>
> How can we free this disk space without restart the db?
>
> Thanks you very much,
> Roberto Bentivoglio
>


Re: Problem about freeing space after a major compaction

2011-03-28 Thread Roberto Bentivoglio
Hi Chen,
we've set the gc grace period of the column families to 0 as suggest in a
single node enviroment.
Can this setting cause the problem? I don't think so...

Thanks,
Roberto

On 28 March 2011 16:54, Ching-Cheng Chen  wrote:

> tombstones removal also depends on your gc grace period setting.
>
> If you are pretty sure that you have proper gc grace period set and still
> on 0.7.0, then probably related to this bug.
>
> https://issues.apache.org/jira/browse/CASSANDRA-2059
>
> Regards,
>
> 
>
> Chen
>
> Senior Developer, EvidentSoftware(Leaders in Monitoring of NoSQL & JAVA )
>
> http://www.evidentsoftware.com
>
> On Mon, Mar 28, 2011 at 10:40 AM, Roberto Bentivoglio <
> roberto.bentivog...@gmail.com> wrote:
>
>> Hi all,
>> we're working on a Cassandra 0.7.0 production enviroment with a store of
>> data near to 500 GB.
>> We need to periodically remove the tombstones from deleted/expired data
>> performing a major compaction operation through nodetool.
>> After invoking the compaction on a single column family we can see from
>> JConsole that the LiveSSTableCount is going from 15 to 3 while the
>> LiveDiskSpaceUsed is going from 90GB to 50GB.
>> The problem now is that the space on the file system is been taken from
>> Cassandra (I assumed from the old SSTable) and it isn't freed. We have tried
>> to perform a full GC from the JConsole as described in
>> http://wiki.apache.org/cassandra/MemtableSSTable without any success. The
>> space is freed only after a database restart.
>>
>> How can we free this disk space without restart the db?
>>
>> Thanks you very much,
>> Roberto Bentivoglio
>>
>
>


Re: Problem about freeing space after a major compaction

2011-03-28 Thread Ching-Cheng Chen
AFAIK, setting gc_grace_period to 0 shouldn't cause this issue.   In fact,
that what I'm using now in a single node environment like yours.

However, I'm using 0.7.2 with some patches.   If you are still using 0.7.0,
most likely you got hit with this bug.
You might want to patch it or upgrade to latest release.

https://issues.apache.org/jira/browse/CASSANDRA-2059

Regards,



Chen

Senior Developer, EvidentSoftware(Leaders in Monitoring of NoSQL & JAVA )

http://www.evidentsoftware.com

On Mon, Mar 28, 2011 at 11:04 AM, Roberto Bentivoglio <
roberto.bentivog...@gmail.com> wrote:

> Hi Chen,
> we've set the gc grace period of the column families to 0 as suggest in a
> single node enviroment.
> Can this setting cause the problem? I don't think so...
>
> Thanks,
> Roberto
>
> On 28 March 2011 16:54, Ching-Cheng Chen wrote:
>
>> tombstones removal also depends on your gc grace period setting.
>>
>> If you are pretty sure that you have proper gc grace period set and still
>> on 0.7.0, then probably related to this bug.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-2059
>>
>> Regards,
>>
>> 
>>
>> Chen
>>
>> Senior Developer, EvidentSoftware(Leaders in Monitoring of NoSQL & JAVA )
>>
>> http://www.evidentsoftware.com
>>
>> On Mon, Mar 28, 2011 at 10:40 AM, Roberto Bentivoglio <
>> roberto.bentivog...@gmail.com> wrote:
>>
>>> Hi all,
>>> we're working on a Cassandra 0.7.0 production enviroment with a store of
>>> data near to 500 GB.
>>> We need to periodically remove the tombstones from deleted/expired data
>>> performing a major compaction operation through nodetool.
>>> After invoking the compaction on a single column family we can see from
>>> JConsole that the LiveSSTableCount is going from 15 to 3 while the
>>> LiveDiskSpaceUsed is going from 90GB to 50GB.
>>> The problem now is that the space on the file system is been taken from
>>> Cassandra (I assumed from the old SSTable) and it isn't freed. We have tried
>>> to perform a full GC from the JConsole as described in
>>> http://wiki.apache.org/cassandra/MemtableSSTable without any success.
>>> The space is freed only after a database restart.
>>>
>>> How can we free this disk space without restart the db?
>>>
>>> Thanks you very much,
>>> Roberto Bentivoglio
>>>
>>
>>
>


Re: Problem about freeing space after a major compaction

2011-03-28 Thread Roberto Bentivoglio
Thanks you again, we're going to update our enviroment.

Regards,
Roberto

On 28 March 2011 17:08, Ching-Cheng Chen  wrote:

>
> AFAIK, setting gc_grace_period to 0 shouldn't cause this issue.   In fact,
> that what I'm using now in a single node environment like yours.
>
> However, I'm using 0.7.2 with some patches.   If you are still using 0.7.0,
> most likely you got hit with this bug.
> You might want to patch it or upgrade to latest release.
>
> https://issues.apache.org/jira/browse/CASSANDRA-2059
>
> Regards,
>
> 
>
> Chen
>
> Senior Developer, EvidentSoftware(Leaders in Monitoring of NoSQL & JAVA )
>
> http://www.evidentsoftware.com
>
> On Mon, Mar 28, 2011 at 11:04 AM, Roberto Bentivoglio <
> roberto.bentivog...@gmail.com> wrote:
>
>> Hi Chen,
>> we've set the gc grace period of the column families to 0 as suggest in a
>> single node enviroment.
>> Can this setting cause the problem? I don't think so...
>>
>> Thanks,
>> Roberto
>>
>> On 28 March 2011 16:54, Ching-Cheng Chen wrote:
>>
>>> tombstones removal also depends on your gc grace period setting.
>>>
>>> If you are pretty sure that you have proper gc grace period set and still
>>> on 0.7.0, then probably related to this bug.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-2059
>>>
>>> Regards,
>>>
>>> 
>>>
>>> Chen
>>>
>>> Senior Developer, EvidentSoftware(Leaders in Monitoring of NoSQL & JAVA )
>>>
>>> http://www.evidentsoftware.com
>>>
>>> On Mon, Mar 28, 2011 at 10:40 AM, Roberto Bentivoglio <
>>> roberto.bentivog...@gmail.com> wrote:
>>>
 Hi all,
 we're working on a Cassandra 0.7.0 production enviroment with a store of
 data near to 500 GB.
 We need to periodically remove the tombstones from deleted/expired data
 performing a major compaction operation through nodetool.
 After invoking the compaction on a single column family we can see from
 JConsole that the LiveSSTableCount is going from 15 to 3 while the
 LiveDiskSpaceUsed is going from 90GB to 50GB.
 The problem now is that the space on the file system is been taken from
 Cassandra (I assumed from the old SSTable) and it isn't freed. We have 
 tried
 to perform a full GC from the JConsole as described in
 http://wiki.apache.org/cassandra/MemtableSSTable without any success.
 The space is freed only after a database restart.

 How can we free this disk space without restart the db?

 Thanks you very much,
 Roberto Bentivoglio

>>>
>>>
>>
>


Re: Something about cassandra API

2011-03-28 Thread Eric Evans
On Mon, 2011-03-28 at 14:21 +0100, Stephen Connolly wrote:
> FYI Avro is in all likelyhood being removed in 0.8

FWIW, Avro is long-gone at this point.

-- 
Eric Evans
eev...@rackspace.com



Re: Something about cassandra API

2011-03-28 Thread Eric Evans
On Mon, 2011-03-28 at 14:51 +0200, Norman Maurer wrote:
> you would be better of to use a high-level client like hector or
> pelops.
> 
> See:
> http://wiki.apache.org/cassandra/ClientOptions
> 
> But to answer your question... If you really want to use something
> lowlevel then Thrift is the way to go...

If you are targeting 0.8, then CQL is another option
(https://svn.apache.org/viewvc/cassandra/trunk/doc/cql/CQL.html?view=co).

There is a JDBC driver in-tree (see drivers/java/).

> 2011/3/28 An Zhuo 
> 
> >  HI, I've learned something about Cassandra and find that there are
> two
> > packages about how to access cassandra: avro and thrift。
> >
> > So how should I choose the suitable way with java, avro or thrift?
> thank
> > you. 
-- 
Eric Evans
eev...@rackspace.com



Re: debian/ubuntu mirror down?

2011-03-28 Thread Eric Evans
On Fri, 2011-03-25 at 13:54 -0700, Shashank Tiwari wrote:
> The Ubuntu Software Update seems to complain --
> Failed to fetch
> http://www.apache.org/dist/cassandra/debian/dists/unstable/main/binary-amd64/Packages.gz
> 403  Forbidden [IP: 140.211.11.131 80]
> Failed to fetch
> http://www.apache.org/dist/cassandra/debian/dists/unstable/main/source/Sources.gz
> 403  Forbidden [IP: 140.211.11.131 80]
> 
> Has something changed or is the mirror down? 

It's working now.

-- 
Eric Evans
eev...@rackspace.com



Re: Something about cassandra API

2011-03-28 Thread Stephen Connolly
On 28 March 2011 16:33, Eric Evans  wrote:
> On Mon, 2011-03-28 at 14:21 +0100, Stephen Connolly wrote:
>> FYI Avro is in all likelyhood being removed in 0.8
>
> FWIW, Avro is long-gone at this point.

You have the advantage of actually being a Cassandra Dev as opposed to
being a Cassandra Hanger-on like me ;-)
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: ParNew (promotion failed)

2011-03-28 Thread Peter Schuller
> But he's talking about "promotion failed" which is about heap
> fragmentation, not "concurrent mode failure" which would indicate CMS
> too late.  So increasing young generation size + tenuring threshold is
> probably the way to go (especially in a read-heavy workload;
> increasing tenuring will just mean copying data in memtables around
> between survivor spaces for a write-heavy load).

Thanks for the catch. You're right.

For interested parties:

This caused me to look into when 'promotion failed' and 'concurrent
mode failure' are actually reported. WIth some background here (from
2006, so potentially out of date):

  http://blogs.sun.com/jonthecollector/entry/when_the_sum_of_the

I looked at a semi-recent openjdk7 (so it may have changed since 1.6).
"concurrent mode failure" seems to be logged in two cases; one is
CMSCollector::do_mark_sweep_work(). The other is
CMSCollector::acquire_control_and_collect().

The former is called by the latter if it is determined that compaction
should happens, which seems to boil down to whether the the
incremental collection is "believed" to fail (my source navigation fu
failed me and I'm for some reason unable to find the implementation of
collection_attempt_is_safe() that applies...). The other concurrent
mode failure is if acquire_control_and_collect() determines that one
is already in progress.

That seems consistent with the blog entry.

"promotion failed" seems reported when an actual
next_gen->par_promote() call fails for a specific object.

So, my reading is that while 'promotion failed' can indeed be an
indicator of promotion failure due to fragmentation alone (if a
promotion were to fail in spite of there being plenty of free space
left), it can also have a cause overlapping with concurrent mode
failure in case a young-gen collection was attempted under the belief
that there would be enough space - only to then fail.

However, given the reported numbers (CMS:
1341669K->1142937K(2428928K)) it does seem clear that finding
contiguous free space is indeed the problem.

Running with -XX:PrintFLSStatistics=1 may yield interesting results,
but of course won't actually help.

-- 
/ Peter Schuller


Re: fabric script for cassandra

2011-03-28 Thread Sal Fuentes
I know you can find other scripts for provisioning a cassandra cluster on
github but they may be outdated (thinking 0.6 releases):

[Chef scripts]
https://github.com/b/cookbooks/tree/cassandra
https://github.com/fuentesjr/cass-pack

[Shell scripts]
https://github.com/digitalreasoning/PyStratus

Hope that helps



On Sun, Mar 27, 2011 at 10:40 AM, Anurag Gujral wrote:

> Hi All,
>   Does anyone knows where to get /have a fabric script for
> deploying cassandra on multiple machines.
> Thanks
> Anurag
>
>


-- 
Salvador Fuentes Jr.


New committer Sylvain Lebresne

2011-03-28 Thread Jonathan Ellis
The Cassandra PMC has voted to add Sylvain as a committer.

Welcome, Sylvain, and thanks for the hard work!

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: memtable_threshold

2011-03-28 Thread ruslan usifov
2011/3/29 Narendra Sharma 

> This is because the memtable threshold is not correct to the last byte. The
> threshold basically account for column name, value and timestamp (or the
> serialized column). It doesn't account for all the in-memory overhead for
> maintaining the data and references etc.
>
>
Overhead in 2 times Hm why so many. Also in JMX (throw jconsole) i don't
see any overhead, when memtable reach it memory threshold it will be reset
to very low value (about 300 - 400 KB)


Re: memtable_threshold

2011-03-28 Thread Narendra Sharma
Following shows how the size of memtable is updated:
currentThroughput.addAndGet(cf.size());

The jconsole/JMX shows this and this doesn't account for the overhead of
holding the data in in-memory data structures.

The size of CF, SuperColumn and Column is calculated as following:

Column Size:
public int size()
{
/*
 * Size of a column is =
 *   size of a name (short + length of the string)
 * + 1 byte to indicate if the column has been deleted
 * + 8 bytes for timestamp
 * + 4 bytes which basically indicates the size of the byte array
 * + entire byte array.
*/
return DBConstants.shortSize_ + name.remaining() +
DBConstants.boolSize_ + DBConstants.tsSize_ + DBConstants.intSize_ +
value.remaining();
}

SuperColumn Size:
public int size()
{
int size = 0;
for (IColumn subColumn : getSubColumns())
{
size += subColumn.serializedSize();
}
return size;
}

ColumnFamily Size:
int size()
{
int size = 0;
for (IColumn column : columns.values())
{
size += column.size();
}
return size;
}

Hope this makes it clear.

Thanks,
Naren

On Mon, Mar 28, 2011 at 2:15 PM, ruslan usifov wrote:

>
>
> 2011/3/29 Narendra Sharma 
>
>> This is because the memtable threshold is not correct to the last byte.
>> The threshold basically account for column name, value and timestamp (or the
>> serialized column). It doesn't account for all the in-memory overhead for
>> maintaining the data and references etc.
>>
>>
> Overhead in 2 times Hm why so many. Also in JMX (throw jconsole) i
> don't see any overhead, when memtable reach it memory threshold it will be
> reset to very low value (about 300 - 400 KB)
>



-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: New committer Sylvain Lebresne

2011-03-28 Thread Edward Capriolo
Congratulations Sylvain!

On Mon, Mar 28, 2011 at 4:33 PM, Jonathan Ellis  wrote:
> The Cassandra PMC has voted to add Sylvain as a committer.
>
> Welcome, Sylvain, and thanks for the hard work!
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: New committer Sylvain Lebresne

2011-03-28 Thread Chris Goffinet
Congratulations Sylvain!

On Mon, Mar 28, 2011 at 2:56 PM, Edward Capriolo wrote:

> Congratulations Sylvain!
>
> On Mon, Mar 28, 2011 at 4:33 PM, Jonathan Ellis  wrote:
> > The Cassandra PMC has voted to add Sylvain as a committer.
> >
> > Welcome, Sylvain, and thanks for the hard work!
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of DataStax, the source for professional Cassandra support
> > http://www.datastax.com
> >
>


Re: memtable_threshold

2011-03-28 Thread Jonathan Ellis
It's closer to 8x than 2x for small values. Java objects simply use a
lot more memory than you'd think, and it takes multiple objects to
store a column.

http://kohlerm.blogspot.com/2008/12/how-much-memory-is-used-by-my-java.html

On Mon, Mar 28, 2011 at 4:15 PM, ruslan usifov  wrote:
>
>
> 2011/3/29 Narendra Sharma 
>>
>> This is because the memtable threshold is not correct to the last byte.
>> The threshold basically account for column name, value and timestamp (or the
>> serialized column). It doesn't account for all the in-memory overhead for
>> maintaining the data and references etc.
>
> Overhead in 2 times Hm why so many. Also in JMX (throw jconsole) i don't
> see any overhead, when memtable reach it memory threshold it will be reset
> to very low value (about 300 - 400 KB)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


design cassandra issue client when moving from version 0.6.* to 0.7.3

2011-03-28 Thread Anurag Gujral
Hi All,
 I am currently porting a cassandra c++ client from 0.6.*  to 0.7.3.
The c++ client I had in 0.6.* used to function
conn->client->send_multiget_slice which used to take as parameter cseqid.
The sign of the function in 0.6.* was
void CassandraClient::send_multiget_slice(const std::string& keyspace, const
std::vector & keys, const ColumnParent& column_parent, const
SlicePredicate& predicate, const ConsistencyLevel consistency_level, const
int32_t cseqid)


Incase the function send_multiget_slice did not return sucess. The code used
to wait on the socket by calling select and use to read data if the data
was available using recv_multiget_slice provided cseqid passed to
send_multiget_slice was same as that in the call to function
recv_mutlget_slice .

In Cassandra 0.7.3 the function send_multiget_slice and recv_multiget_slice
dont take cseqid as parameter.

How can I accomplish the behaviour of 0.6.* in 0.7.3 version.

Please Suggest
Thanks
Anurag


Re: New committer Sylvain Lebresne

2011-03-28 Thread Jake Luciani
Great job, well deserved Sylvain!

On Mon, Mar 28, 2011 at 4:33 PM, Jonathan Ellis  wrote:

> The Cassandra PMC has voted to add Sylvain as a committer.
>
> Welcome, Sylvain, and thanks for the hard work!
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
http://twitter.com/tjake


atomicity in cassandra

2011-03-28 Thread Saurabh Sehgal
I have seen this question pop up once or twice in mailing lists regarding
atomicity when using batch_mutate() operations. I understand that the
operations in batch_mutate() should be idempotent and do not get rolled back
on failures. However, a client crashing (due to machine issues, networking
issue etc) in the middle of such a transaction can leave the data in an
inconsistent state. Is there a way to figure out such inconsistencies ? Will
Cassandra keep a log of failed batch_mutate() operations, or partially
completed operations, that might require manual intervention when the client
comes back up ?


Re: atomicity in cassandra

2011-03-28 Thread Narendra Sharma
There is no undo or redo log in Cassandra. From Cassandra perspective if the
operation gets logged in commit log, it is considered committed.

Remember the eventual consistency...



On Mon, Mar 28, 2011 at 6:21 PM, Saurabh Sehgal wrote:

> I have seen this question pop up once or twice in mailing lists regarding
> atomicity when using batch_mutate() operations. I understand that the
> operations in batch_mutate() should be idempotent and do not get rolled back
> on failures. However, a client crashing (due to machine issues, networking
> issue etc) in the middle of such a transaction can leave the data in an
> inconsistent state. Is there a way to figure out such inconsistencies ? Will
> Cassandra keep a log of failed batch_mutate() operations, or partially
> completed operations, that might require manual intervention when the client
> comes back up ?
>
>


-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Gossip mysteries (0.7.4 on EC2)

2011-03-28 Thread Alexis Lê-Quôc
Hi,

To make a long story short I'm trying to understand what the logic behind the 
gossip is. The following is an excerpt from a log captured today.

2011-03-28T18:37:56.505316+00:00 Node /10.96.81.193 has restarted, now UP again
2011-03-28T18:37:56.505316+00:00 Node /10.96.81.193 state jump to normal
2011-03-28T18:37:56.505468+00:00 Node /10.124.143.36 has restarted, now UP again
2011-03-28T18:37:56.505468+00:00 Node /10.124.143.36 state jump to normal
2011-03-28T18:37:56.559275+00:00 Binding thrift service to /10.208.214.3:9160
2011-03-28T18:37:56.563852+00:00 Using TFastFramedTransport with a max frame 
size of 15728640 bytes.
2011-03-28T18:37:56.574986+00:00 Listening for thrift clients...
2011-03-28T18:37:56.805426+00:00 Node /10.202.61.193 has restarted, now UP again
2011-03-28T18:37:56.806171+00:00 Nodes /10.202.61.193 and /10.96.81.193 have 
the same token [XXX].  Ignoring /10.202.61.193

What's surprising is that 10.96.81.193 has been gone for at least 1 week (and 
was the subject of a bug CASSANDRA-2371).

> 2011-03-28T18:37:56.505316+00:00 Node /10.96.81.193 has restarted, now UP 
> again
> 2011-03-28T18:37:56.505316+00:00 Node /10.96.81.193 state jump to normal

The code I'm seeing is:
624  private void handleMajorStateChange(InetAddress ep, EndpointState epState)
625  {
626  if (endpointStateMap.get(ep) != null)
627  logger.info("Node {} has restarted, now UP again", ep);


endpointStateMap is not up-to-date. Shouldn't a gossip exchange mark the node 
as down immediately? Or if there's a network partition, at least the node could 
be marked as in an unknown state. I'm interpreting UP as available for 
cassandra's point of view.

> 2011-03-28T18:37:56.505468+00:00 Node /10.124.143.36 has restarted, now UP 
> again
> 2011-03-28T18:37:56.505468+00:00 Node /10.124.143.36 state jump to normal

That's fine, the node is actually up.

> 2011-03-28T18:37:56.805426+00:00 Node /10.202.61.193 has restarted, now UP 
> again

Fine too, 10.202.61.193 is up (for real).

2011-03-28T18:37:56.806171+00:00 Nodes /10.202.61.193 and /10.96.81.193 have 
the same token [XXX].  Ignoring /10.202.61.193

Not clear to me on which basis that decision is made. From the code I see

728  InetAddress currentOwner = tokenMetadata_.getEndpoint(token);
729  if (currentOwner == null)
730  {
...
735  }
736  else if (endpoint.equals(currentOwner))
737  {
...
741  }
742  else if (Gossiper.instance.compareEndpointStartup(endpoint, currentOwner) 
> 0)
743  {
...
749  }
750  else
751  {
752  logger_.info(String.format("Nodes %s and %s have the same token %s. 
Ignoring %s",
753 endpoint, currentOwner, token, endpoint));
754  }

So that decision is made by default, which in this particular case does not 
work. I've only cursorily looked at the source code so I don't know how 
tokenMetadata gets updated.

I've also seen today non-convering gossip rings, where some ghost nodes show 
up, others don't. My recourse has been so far to removetoken (force) since 
initial removetokens have left one node streaming to another but the recipient 
unaware of such streaming as shown in the following excerpts:

On Node 1 (sender) -- stuck forever in that state
Mode: Normal
Streaming to: /Node2
 
/data/cassandra/data/Intake/Metrics-f-2271-Data.db/(0,601518756),(601518756,700910088)
 progress=127695240/700910088 - 18%



On Node 2 (recipient) at the very same moment:
Mode: Normal
Not sending any streams.
Not receiving any streams.


This kind of discrepancy is also hard to understand given that nodetool ring on 
Node 1 yields:

...
Node2 Up  Normal 52.04 GB 51.03% token1

and the same command on Node 2 yields:

...
Node1 Up  Normal 50.89 GB 23.97% token2

Any light shed on both issues is appreciated.

-- 
Alexis Lê-Quôc (@datadoghq)