Re: ebs or ephemeral

2011-10-10 Thread aaron morton
6 nodes and RF3 will mean you can handle between 1 and 2 failed nodes. 

see http://thelastpickle.com/2011/06/13/Down-For-Me/
Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7/10/2011, at 9:37 PM, Madalina Matei wrote:

> Hi Aaron,
>  
> For a 6 nodes cluster, what RF can we use in order to support 2 failed nodes?
> From the article that you sent i understood "avoid EMS" and use ephemeral. am 
> i missing anything?
>  
> Thank you so much for your help,
> Madaina
> On Fri, Oct 7, 2011 at 9:15 AM, aaron morton  wrote:
> Data Stax have pre build AMI's here 
> http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami
>  
> 
> And an explanation of why we normally avoid ephemeral. 
> 
> Also, I would go with 6 nodes. You will then be able to handle up to 2 failed 
> nodes. 
> 
> Hope that helps. 
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 7/10/2011, at 9:11 PM, Yi Yang wrote:
> 
>> Obviously ephemeral. It has higher IO availability, will not affect your 
>> Ethernet IO performance, and it is free (included in instance price)
>> and the redundancy is provided by cassandra itself.
>> 從我的 BlackBerry® 無線裝置
>> 
>> From: Madalina Matei 
>> Date: Fri, 7 Oct 2011 09:02:06 +0100
>> To: 
>> ReplyTo: user@cassandra.apache.org
>> Subject: ebs or ephemeral
>> 
>> Hi,
>> 
>>  I'm looking to deploy a 5 nodes cluster in EC2 with RF3 and QUORUM CL.
>> 
>>  Could you please advice me on EBS vs ephemeral storage ?
>> 
>> Cheers,
>> Madalina
> 
> 



Re: 0.7.9 RejectedExecutionException

2011-10-10 Thread aaron morton
Have you checked /var/log/cassandra/output.txt (the packaged install pipes std 
out/err to there) or the system logs ? If there are no errors in the logs it 
may well be something external killing it. 
 
With regard to memory usage, it's hard for people to help unless you provide 
some numbers. What do you mean by MAX heap ? Is this the max used heap size 
reported by JMX or the -Xmx setting passed to the server ? 

Cheers
  
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8/10/2011, at 7:02 AM, Ashley Martens wrote:

> Okay, this is still a problem. This node keeps dieing at 1am every day, most 
> times without an error in the log. I'd appriciate any help in tracking down 
> why.
> 
> Additionally, I don't understand why 0.7.x using *way* more RAM than 0.6.x 
> and 0.8.x, from a top or ps perspective. I'm now watching the JVM memory and 
> it seems to be more in line with 0.6.x but the MAX heap is crazy high (28G on 
> my servers).



Re: 54 memtable flushes in hour at peaktime

2011-10-10 Thread aaron morton
It's not a problem by it's self, compaction will do it's thing. It you are also 
seeing read latency increase it may be something you want to look it.  

What version are you using ? The tuning is different (i.e. it gets easier) 
between versions 0.7, 0.8 and 1.0. 

It's probably just the case that you are writing a lot of data. Look for log 
messages from ColumnFamilyStore that start with "Enqueuing flush of Memtable…" 
They will tell you how many serialized bytes and operations the memtable soaked 
up before been flushed. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10/10/2011, at 6:56 AM, Tomer B wrote:

> Hi
> 
> at highest traffic hours i get 54 memtable flushes, this happens for a few 
> hours during the day and at the rest of hours its ranging from 0 to 10 .
> 
> Should I be doing anything about it? is that number on critical level? can i 
> live with 54 memtable flushes per hour during peak hours? (I might expect 
> higher peaks coming during this year).
> 
> (The rest of my memtables with lower traffic range at about 1-4 memtable 
> flushes per hour).
> 
> thanks



Re: Existing column(s) not readable

2011-10-10 Thread aaron morton
What error are you seeing  in the server logs ? Are the columns unreadable at 
all Consistency Levels ? i.e. are the columns unreadable on all nodes.

What is the upgrade history of the cluster ? What version did it start at ? 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10/10/2011, at 7:42 AM, Thomas Richter wrote:

> Hi,
> 
> here is some further information. Compaction did not help, but data is
> still there when I dump the row with sstable2json.
> 
> Best,
> 
> Thomas
> 
> On 10/08/2011 11:30 PM, Thomas Richter wrote:
>> Hi,
>> 
>> we are running a 3 node cassandra (0.7.6-2) cluster and some of our
>> column families contain quite large rows (400k+ columns, 4-6GB row size).
>> Replicaton factor is 3 for all keyspaces. The cluster is running fine
>> for several months now and we never experienced any serious trouble.
>> 
>> Some days ago we noticed, that some previously written columns could not
>> be read. This does not always happen, and only some dozen columns out of
>> 400k are affected.
>> 
>> After ruling out application logic as a cause I dumped the row in
>> question with sstable2json and the columns are there (and are not marked
>> for deletion).
>> 
>> Next thing was setting up a fresh single node cluster and copying the
>> column family data to that node. Columns could not be read either.
>> Right now I'm running a nodetool compact for the cf to see if data could
>> be read afterwards.
>> 
>> Is there any explanation for such behavior? Are there any suggestions
>> for further investigation?
>> 
>> TIA,
>> 
>> Thomas
> 



Re: ebs or ephemeral

2011-10-10 Thread Sasha Dolgy
just catching the tail end of this discussion.  aaron, in your previous
email, you said "And an explanation of why we normally avoid ephemeral. "
 shouldn't this be, avoiding EBS?  EBS was a nightmare for us in terms
of performance.

On Mon, Oct 10, 2011 at 9:23 AM, aaron morton wrote:

> 6 nodes and RF3 will mean you can handle between 1 and 2 failed nodes.
>
> see http://thelastpickle.com/2011/06/13/Down-For-Me/
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7/10/2011, at 9:37 PM, Madalina Matei wrote:
>
> Hi Aaron,
>
> For a 6 nodes cluster, what RF can we use in order to support 2 failed
> nodes?
> From the article that you sent i understood "avoid EMS" and use ephemeral.
> am i missing anything?
>
> Thank you so much for your help,
> Madaina
> On Fri, Oct 7, 2011 at 9:15 AM, aaron morton wrote:
>
>> Data Stax have pre build AMI's here
>> http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami
>>
>>
>> And an explanation of why we normally avoid ephemeral.
>>
>> Also, I would go with 6 nodes. You will then be able to handle up to 2
>> failed nodes.
>>
>> Hope that helps.
>>
>>


Re: ebs or ephemeral

2011-10-10 Thread aaron morton
yes, should have been 

And an explanation of why we normally avoid *EBS*. 

My bad. 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10/10/2011, at 9:03 PM, Sasha Dolgy wrote:

> just catching the tail end of this discussion.  aaron, in your previous 
> email, you said "And an explanation of why we normally avoid ephemeral. " 
>  shouldn't this be, avoiding EBS?  EBS was a nightmare for us in terms of 
> performance.  
> 
> On Mon, Oct 10, 2011 at 9:23 AM, aaron morton  wrote:
> 6 nodes and RF3 will mean you can handle between 1 and 2 failed nodes. 
> 
> see http://thelastpickle.com/2011/06/13/Down-For-Me/
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 7/10/2011, at 9:37 PM, Madalina Matei wrote:
> 
>> Hi Aaron,
>>  
>> For a 6 nodes cluster, what RF can we use in order to support 2 failed nodes?
>> From the article that you sent i understood "avoid EMS" and use ephemeral. 
>> am i missing anything?
>>  
>> Thank you so much for your help,
>> Madaina
>> On Fri, Oct 7, 2011 at 9:15 AM, aaron morton  wrote:
>> Data Stax have pre build AMI's here 
>> http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami
>>  
>> 
>> And an explanation of why we normally avoid ephemeral. 
>> 
>> Also, I would go with 6 nodes. You will then be able to handle up to 2 
>> failed nodes. 
>> 
>> Hope that helps. 
>> 
> 



Re: ebs or ephemeral

2011-10-10 Thread Yi Yang
Agree, EBS systems are not so good for cassandra systems and during previous 
conversations in this mail list, people tend to use ephemeral.

從我的 BlackBerry® 無線裝置

-Original Message-
From: Sasha Dolgy 
Date: Mon, 10 Oct 2011 10:03:26 
To: 
Reply-To: user@cassandra.apache.org
Subject: Re: ebs or ephemeral

just catching the tail end of this discussion.  aaron, in your previous
email, you said "And an explanation of why we normally avoid ephemeral. "
 shouldn't this be, avoiding EBS?  EBS was a nightmare for us in terms
of performance.

On Mon, Oct 10, 2011 at 9:23 AM, aaron morton wrote:

> 6 nodes and RF3 will mean you can handle between 1 and 2 failed nodes.
>
> see http://thelastpickle.com/2011/06/13/Down-For-Me/
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7/10/2011, at 9:37 PM, Madalina Matei wrote:
>
> Hi Aaron,
>
> For a 6 nodes cluster, what RF can we use in order to support 2 failed
> nodes?
> From the article that you sent i understood "avoid EMS" and use ephemeral.
> am i missing anything?
>
> Thank you so much for your help,
> Madaina
> On Fri, Oct 7, 2011 at 9:15 AM, aaron morton wrote:
>
>> Data Stax have pre build AMI's here
>> http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami
>>
>>
>> And an explanation of why we normally avoid ephemeral.
>>
>> Also, I would go with 6 nodes. You will then be able to handle up to 2
>> failed nodes.
>>
>> Hope that helps.
>>
>>



"Insufficient space" on 1.0.0-rc2 when compacting compressed CFs

2011-10-10 Thread Günter Ladwig
Hi,

I couldn't find anything on this issue, but maybe my google-fu is weak.

I'm running a Cassandra 1.0.0-rc2 cluster with compression enabled for all of 
the two CFs I have right now. The load on a single node is about 32GB (disk is 
80GB per node). 

Whenever I try to run a compaction using nodetool on one of the CFs, I get the 
message "insufficient space to compact all requested files" in the log (it goes 
on to compact some SSTables, but not all). As not even half of the disk is 
used, compaction should be possible, right? Or does Cassandra use the 
uncompressed size to check whether there is enough space or not? I estimate 
that the data is compressed by a factor of about 3x.

Cheers,
Günter

--  
Dipl.-Inform. Günter Ladwig

Karlsruhe Institute of Technology (KIT)
Institute AIFB

Englerstraße 11 (Building 11.40, Room 250)
76131 Karlsruhe, Germany
Phone: +49 721 608-47946
Email: guenter.lad...@kit.edu
Web: www.aifb.kit.edu

KIT – University of the State of Baden-Württemberg and National Large-scale 
Research Center of the Helmholtz Association



smime.p7s
Description: S/MIME cryptographic signature


Re: Existing column(s) not readable

2011-10-10 Thread Thomas Richter
Hi,

no errors in the server logs. The columns are unreadable on all nodes at
any consistency level (ONE, QUORUM, ALL). We started with 0.7.3 and
upgraded to 0.7.6-2 two months ago.

Best,

Thomas

On 10/10/2011 10:03 AM, aaron morton wrote:
> What error are you seeing  in the server logs ? Are the columns unreadable at 
> all Consistency Levels ? i.e. are the columns unreadable on all nodes.
> 
> What is the upgrade history of the cluster ? What version did it start at ? 
> 
> Cheers
> 
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 10/10/2011, at 7:42 AM, Thomas Richter wrote:
> 
>> Hi,
>>
>> here is some further information. Compaction did not help, but data is
>> still there when I dump the row with sstable2json.
>>
>> Best,
>>
>> Thomas
>>
>> On 10/08/2011 11:30 PM, Thomas Richter wrote:
>>> Hi,
>>>
>>> we are running a 3 node cassandra (0.7.6-2) cluster and some of our
>>> column families contain quite large rows (400k+ columns, 4-6GB row size).
>>> Replicaton factor is 3 for all keyspaces. The cluster is running fine
>>> for several months now and we never experienced any serious trouble.
>>>
>>> Some days ago we noticed, that some previously written columns could not
>>> be read. This does not always happen, and only some dozen columns out of
>>> 400k are affected.
>>>
>>> After ruling out application logic as a cause I dumped the row in
>>> question with sstable2json and the columns are there (and are not marked
>>> for deletion).
>>>
>>> Next thing was setting up a fresh single node cluster and copying the
>>> column family data to that node. Columns could not be read either.
>>> Right now I'm running a nodetool compact for the cf to see if data could
>>> be read afterwards.
>>>
>>> Is there any explanation for such behavior? Are there any suggestions
>>> for further investigation?
>>>
>>> TIA,
>>>
>>> Thomas
>>
> 



Re: "Insufficient space" on 1.0.0-rc2 when compacting compressed CFs

2011-10-10 Thread Sylvain Lebresne
On Mon, Oct 10, 2011 at 10:08 AM, Günter Ladwig  wrote:
> Hi,
>
> I couldn't find anything on this issue, but maybe my google-fu is weak.
>
> I'm running a Cassandra 1.0.0-rc2 cluster with compression enabled for all of 
> the two CFs I have right now. The load on a single node is about 32GB (disk 
> is 80GB per node).
>
> Whenever I try to run a compaction using nodetool on one of the CFs, I get 
> the message "insufficient space to compact all requested files" in the log 
> (it goes on to compact some SSTables, but not all). As not even half of the
disk is used, compaction should be possible, right? Or does Cassandra
use the uncompressed size to check whether there is enough space or
not? I estimate that the data is compressed by a factor of about 3x.

We do use the uncompressed size to check if there is enough room to
compact, which is a bug. I've created
https://issues.apache.org/jira/browse/CASSANDRA-3338 to fix it.
Thanks for the report.

--
Sylvain

>
> Cheers,
> Günter
>
> --
> Dipl.-Inform. Günter Ladwig
>
> Karlsruhe Institute of Technology (KIT)
> Institute AIFB
>
> Englerstraße 11 (Building 11.40, Room 250)
> 76131 Karlsruhe, Germany
> Phone: +49 721 608-47946
> Email: guenter.lad...@kit.edu
> Web: www.aifb.kit.edu
>
> KIT – University of the State of Baden-Württemberg and National Large-scale 
> Research Center of the Helmholtz Association
>
>


[RELEASE] Apache Cassandra 0.8.7 released

2011-10-10 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 0.8.7.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1]. Please pay attention to the
release notes[2] before upgrading and let us know[3] if you were to encounter
any problem.

Have fun!


[1]: http://goo.gl/8bCMG (CHANGES.txt)
[2]: http://goo.gl/nOkhy (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Volunteers needed - Wiki

2011-10-10 Thread aaron morton
Hi there, 
The dev's have been very busy and Cassandra 1.0 is just around the 
corner and full of new features. To celebrate I'm trying to give the wiki some 
loving to make things a little more welcoming for new users.

To keep things manageable I'd like to focus on completeness and 
correctness for now, and worry about being super awesome later. For example the 
nodetool page is incomplete http://wiki.apache.org/cassandra/NodeTool , we do 
not have anything about CQL and config page is from 0.7 
http://wiki.apache.org/cassandra/StorageConfiguration

As a starting point I've created a draft home page 
http://wiki.apache.org/cassandra/FrontPage_draft_aaron/ . I also hope to use 
this as a planning tool where we can mark off what's in progress or has been 
completed. 

The guidelines I think we should follow are:
* ensure coverage of 1.0, a best effort for 0.8 and leave any content 
from previous versions. 
* where appropriate include examples from CQL and RPC as both are still 
supported. 

If you would like to contribute to this effort please let me know via 
the email list. It's a great way to contribute to the project and learn how 
Cassandra works, and I'll do my best to help with any questions you may have. 
Or if you have something you've already written that you feel may be of use let 
me know, and we'll see about linking to it.

Thanks. 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com



A good key for data distribution over nodes

2011-10-10 Thread Laurent Aufrechter
Hi,

I am planing to make tests on Cassandra with a few nodes. I want to create a 
column family where the key will be the date down to the second (like 
2011/10/10-16:07:53). Doing so, my keys will be very similar from each others. 
Is it ok to use such keys if I want my data to be evenly distributed across my 
nodes or do I have to "do something" ?

Thanks in advance.

L. Aufrechter

Re: A good key for data distribution over nodes

2011-10-10 Thread David McNelis
You should be ok, depending on the partitioner strategy you use.  The keys
end up created as a hash (which is why when you're setting up your nodes you
can give them a specific key.  Then, whatever your key is will be used to
create an MD5 hash, that hash will then determine what node your data will
live on.

So while your distribution won't necessarily be completely balanced, it
should at least be in the right ballpark.

To give you an idea of this in practice, we've got consecutive integer
values as our keys and we're using the random partitioner...we have VERY
close to the same number of keys on each of our nodes.  Then the bigger
question about balancing your load is how big each record is...if they are
consistent in size, vary widely, ect, as that is just as likely to impact
how balanced your loads are.

On Mon, Oct 10, 2011 at 9:09 AM, Laurent Aufrechter <
laurent.aufrech...@yahoo.fr> wrote:

> Hi,
>
> I am planing to make tests on Cassandra with a few nodes. I want to create
> a column family where the key will be the date down to the second (like
> 2011/10/10-16:07:53). Doing so, my keys will be very similar from each
> others. Is it ok to use such keys if I want my data to be evenly distributed
> across my nodes or do I have to "do something" ?
>
> Thanks in advance.
>
> L. Aufrechter
>



-- 
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
o: 630.359.6395
c: 219.384.5143

*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*


Re: 0.7.9 RejectedExecutionException

2011-10-10 Thread Ashley Martens
I have check both the output file and the system log, neither have errors in
them. I don't believe anything external is killing the process, I could be
wrong but this node's setup is the same as all my other nodes (including
hardware) so it doesn't make much sense.


jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk/jre/bin/../
-pidfile /var/run/cassandra.pid -errfile &1 -outfile
/var/log/cassandra/output.log -cp
/usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-0.7.8.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/avro-1.4.0-fixes.jar:/usr/share/cassandra/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-codec-1.2.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/concurrentlinkedhashmap-lru-1.1.jar:/usr/share/casandra/guava-r05.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/jetty-6.1.21.jar:/usr/share/cassandra/jetty-util-6.1.21.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json-simple-1.1.jar:/usr/share/cassandra/jug-2.0.0.jar:/usr/share/cassandra/libthrift-0.5.jar:/usr/share/cassandra/log4j-1.2.16.jar:/usr/share/cassandra/servlet-api-2.5-20081211.jar:/usr/share/cassandra/slf4j-api-1.6.1.jar:/usr/share/cassandra/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/snakeyaml-1.6.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
-Dlog4j.configuration=log4j-server.properties
-XX:HeapDumpPath=/var/lib/cassandra/java_1318260751.hprof
-XX:ErrorFile=/var/lib/casandra/hs_err_1318260751.log -ea
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms24196M -Xmx24196M
-Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote.port=8080
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
org.apache.cassandra.thrift.CassandraDaemon

I have munin monitoring of JMX so when I talk about heap max then I'm
referring to:

jmxObjectName java.lang:type=Memory
jmxAttributeName HeapMemoryUsage
jmxAttributeKey max

The other crazy thing is the heap used is no where close to heap max.

On Mon, Oct 10, 2011 at 12:40 AM, aaron morton wrote:

> Have you checked /var/log/cassandra/output.txt (the packaged install pipes
> std out/err to there) or the system logs ? If there are no errors in the
> logs it may well be something external killing it.
>
> With regard to memory usage, it's hard for people to help unless you
> provide some numbers. What do you mean by MAX heap ? Is this the max used
> heap size reported by JMX or the -Xmx setting passed to the server ?
>
>


factors on the effectiveness of bloom filter?

2011-10-10 Thread Yang
I noticed that 2 of my CFs are showing very different bloom filter
false ratios, one is close to 1.0;
the other one is only 0.3

they have roughly the same sizes in SStables and counts, the
difference is key construction,
the one with 0.3 false ratio has a shorter key.

assuming the key can not be changed (or the only possibility to change
they key is simply juggle the byte order),
is there any measure to increase the effectiveness of bloom filters?

thanks
Yang


Re: factors on the effectiveness of bloom filter?

2011-10-10 Thread Radim Kolar

Dne 10.10.2011 18:31, Yang napsal(a):

I noticed that 2 of my CFs are showing very different bloom filter
false ratios, one is close to 1.0;
the other one is only 0.3

cassandra bloom filters are computed for 1% false positive ratio.

is there any measure to increase the effectiveness of bloom filters? 
thanks Yang

try hadoop hbase. you can configure it there.


MapReduce with two ethernet cards

2011-10-10 Thread Scott Fines
Hi all,

This may be a silly question, but I'm at a bit of a loss, and was hoping for 
some help.

I have a Cassandra cluster set up with two NICs--one for internel communication 
between cassandra machines (10.1.1.*), and one to respond to Thrift RPC 
(172.28.*.*).

I also have a Hadoop cluster set up, which, for unrelated reasons, has to 
remain separate from Cassandra, so I've written a little MapReduce job to copy 
data from Cassandra to Hadoop. However, when I try to run my job, I get

java.io.IOException: failed connecting to all endpoints 
10.1.1.24,10.1.1.17,10.1.1.16

which is puzzling to me. It seems like the MR is attempting to connect to the 
internal communication IPs instead of the external Thrift IPs. Since I set up a 
firewall to block external access to the internal IPs of Cassandra, this is 
obviously going to fail.

So my question is: why does Cassandra MR seem to be grabbing the listen_address 
instead of the Thrift one. Presuming it's not a funky configuration error or 
something on my part, is that strictly necessary? All told, I'd prefer if it 
was connecting to the Thrift IPs, but if it can't, should I open up port 7000 
or port 9160 between Hadoop and Cassandra?

Thanks for your help,

Scott




Re: Volunteers needed - Wiki

2011-10-10 Thread hani elabed
Hi Aaron,

I can help with the documentation... I grabbed tons of screenshots as I was
installing Cassandra source trunk(1.0.0.rc2?) on my Mac OS X Snow leopard on
Eclipse Galileo and later Eclipse Indigo, I will be installing it on Eclipse
for Ubuntu 10.04 soon. I took the sceenshots after I noticed the missing
picts in here:

http://wiki.apache.org/cassandra/RunningCassandraInEclipse

so I did plan on helping with the update... I am glad you sent your email
though to get me going.

I am just not sure of the logistics, how to do it, and if I needed to be
granted some write access to the wiki. Please educate...

I can definitely help on the NodeTool and StorageConfiguration as soon as I
can grok them myself, or any other documentation.

Also you draft front page and focusing first on 1.0 first match my thinking.


Hani Elabed


On Mon, Oct 10, 2011 at 4:10 AM, aaron morton wrote:

> Hi there,
> The dev's have been very busy and Cassandra 1.0 is just around the corner
> and full of new features. To celebrate I'm trying to give the wiki some
> loving to make things a little more welcoming for new users.
>
> To keep things manageable I'd like to focus on completeness and correctness
> for now, and worry about being super awesome later. For example the nodetool
> page is incomplete http://wiki.apache.org/cassandra/NodeTool , we do not
> have anything about CQL and config page is from 0.7
> http://wiki.apache.org/cassandra/StorageConfiguration
>
> As a starting point I've created a draft home page
> http://wiki.apache.org/cassandra/FrontPage_draft_aaron/ . I also hope to
> use this as a planning tool where we can mark off what's in progress or has
> been completed.
>
> The guidelines I think we should follow are:
> * ensure coverage of 1.0, a best effort for 0.8 and leave any content from
> previous versions.
> * where appropriate include examples from CQL and RPC as both are still
> supported.
>
> If you would like to contribute to this effort please let me know via the
> email list. It's a great way to contribute to the project and learn how
> Cassandra works, and I'll do my best to help with any questions you may
> have. Or if you have something you've already written that you feel may be
> of use let me know, and we'll see about linking to it.
>
> Thanks.
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
>


Re: how to reduce disk read? (and bloom filter performance)

2011-10-10 Thread Mohit Anchlia
Does it mean you are not updating a row or deleting them? Can you look
at JMX values of

BloomFilter* ?

I don't believe bloom filter false positive % value is configurable.
Someone else might be able to throw more light on this.

I believe if you want to keep disk seeks to 1 ssTable you will need to
compact more often.

On Sun, Oct 9, 2011 at 7:09 AM, Radim Kolar  wrote:
> Dne 7.10.2011 23:16, Mohit Anchlia napsal(a):
>>
>> You'll see output like:
>>
>> Offset      SSTables
>> 1                  8021
>> 2                  783
>>
>> Which means 783 read operations accessed 2 SSTables
>
> thank you for explaining it to me. I see this:
>
> Offset      SSTables
> 1              59323
> 2                857
> 3                  56
>
> it means bloom filter failure ratio over 1%. Cassandra in unit tests expects
> bloom filter false positive less than 1.05%. HBase has configurable bloom
> filters. You can choose 1% or 0.5% - it can make difference for large cache.
>
> But result is that my poor read performance should not be caused by bloom
> filters.
>


Re: Volunteers needed - Wiki

2011-10-10 Thread Brandon Williams
On Mon, Oct 10, 2011 at 11:51 AM, hani elabed  wrote:
> Hi Aaron,
> I can help with the documentation... I grabbed tons of screenshots as I was
> installing Cassandra source trunk(1.0.0.rc2?) on my Mac OS X Snow leopard on
> Eclipse Galileo and later Eclipse Indigo, I will be installing it on Eclipse
> for Ubuntu 10.04 soon. I took the sceenshots after I noticed the missing
> picts in here:
> http://wiki.apache.org/cassandra/RunningCassandraInEclipse

Unfortunately, the ASF no longer allows attachments on the wiki.

-Brandon


Re: MapReduce with two ethernet cards

2011-10-10 Thread Brandon Williams
On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines  wrote:
> Hi all,
> This may be a silly question, but I'm at a bit of a loss, and was hoping for
> some help.
> I have a Cassandra cluster set up with two NICs--one for internel
> communication between cassandra machines (10.1.1.*), and one to respond to
> Thrift RPC (172.28.*.*).
> I also have a Hadoop cluster set up, which, for unrelated reasons, has to
> remain separate from Cassandra, so I've written a little MapReduce job to
> copy data from Cassandra to Hadoop. However, when I try to run my job, I
> get
> java.io.IOException: failed connecting to all endpoints
> 10.1.1.24,10.1.1.17,10.1.1.16
> which is puzzling to me. It seems like the MR is attempting to connect to
> the internal communication IPs instead of the external Thrift IPs. Since I
> set up a firewall to block external access to the internal IPs of Cassandra,
> this is obviously going to fail.
> So my question is: why does Cassandra MR seem to be grabbing the
> listen_address instead of the Thrift one. Presuming it's not a funky
> configuration error or something on my part, is that strictly necessary? All
> told, I'd prefer if it was connecting to the Thrift IPs, but if it can't,
> should I open up port 7000 or port 9160 between Hadoop and Cassandra?
> Thanks for your help,
> Scott

Your cassandra is old, upgrade to the latest version.

-Brandon


AUTO: Manoj Chaudhary is out of the office (returning 10/14/2011)

2011-10-10 Thread Manoj Chaudhary


I am out of the office until 10/14/2011.

I am attending conference in Europe and meeting customers and parteners
from 10/10/2011 to 10/15/2011.

They are might be delay in responding the emails. I will try to respond to
email periodically between meetings and some evenings in the local time
zone.



Note: This is an automated response to your message  "A good key for data
distribution over nodes" sent on 10/10/11 8:09:31.

This is the only notification you will receive while this person is away.

Re: Existing column(s) not readable

2011-10-10 Thread aaron morton
How are they unreadable ? You need to go into some details about what is going 
wrong. 

What sort of read ? 
What client ? 
What is in the logging on client and server side ? 


Try turning the logging up to DEBUG on the server to watch what happens. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10/10/2011, at 9:23 PM, Thomas Richter wrote:

> Hi,
> 
> no errors in the server logs. The columns are unreadable on all nodes at
> any consistency level (ONE, QUORUM, ALL). We started with 0.7.3 and
> upgraded to 0.7.6-2 two months ago.
> 
> Best,
> 
> Thomas
> 
> On 10/10/2011 10:03 AM, aaron morton wrote:
>> What error are you seeing  in the server logs ? Are the columns unreadable 
>> at all Consistency Levels ? i.e. are the columns unreadable on all nodes.
>> 
>> What is the upgrade history of the cluster ? What version did it start at ? 
>> 
>> Cheers
>> 
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 10/10/2011, at 7:42 AM, Thomas Richter wrote:
>> 
>>> Hi,
>>> 
>>> here is some further information. Compaction did not help, but data is
>>> still there when I dump the row with sstable2json.
>>> 
>>> Best,
>>> 
>>> Thomas
>>> 
>>> On 10/08/2011 11:30 PM, Thomas Richter wrote:
 Hi,
 
 we are running a 3 node cassandra (0.7.6-2) cluster and some of our
 column families contain quite large rows (400k+ columns, 4-6GB row size).
 Replicaton factor is 3 for all keyspaces. The cluster is running fine
 for several months now and we never experienced any serious trouble.
 
 Some days ago we noticed, that some previously written columns could not
 be read. This does not always happen, and only some dozen columns out of
 400k are affected.
 
 After ruling out application logic as a cause I dumped the row in
 question with sstable2json and the columns are there (and are not marked
 for deletion).
 
 Next thing was setting up a fresh single node cluster and copying the
 column family data to that node. Columns could not be read either.
 Right now I'm running a nodetool compact for the cf to see if data could
 be read afterwards.
 
 Is there any explanation for such behavior? Are there any suggestions
 for further investigation?
 
 TIA,
 
 Thomas
>>> 
>> 
> 



Re: 0.7.9 RejectedExecutionException

2011-10-10 Thread aaron morton
The service keeps dieing at the same time every day and there is nothing in the 
app logs, it's going to be something external.

Sorry but I'm not sure what the problem with the memory usage is. Is the server 
running out of memory, or is it experiencing a lot of GC ? 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11/10/2011, at 5:00 AM, Ashley Martens wrote:

> I have check both the output file and the system log, neither have errors in 
> them. I don't believe anything external is killing the process, I could be 
> wrong but this node's setup is the same as all my other nodes (including 
> hardware) so it doesn't make much sense.
> 
> 
> jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk/jre/bin/../ 
> -pidfile /var/run/cassandra.pid -errfile &1 -outfile 
> /var/log/cassandra/output.log -cp 
> /usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-0.7.8.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/avro-1.4.0-fixes.jar:/usr/share/cassandra/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-codec-1.2.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/concurrentlinkedhashmap-lru-1.1.jar:/usr/share/casandra/guava-r05.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/jetty-6.1.21.jar:/usr/share/cassandra/jetty-util-6.1.21.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json-simple-1.1.jar:/usr/share/cassandra/jug-2.0.0.jar:/usr/share/cassandra/libthrift-0.5.jar:/usr/share/cassandra/log4j-1.2.16.jar:/usr/share/cassandra/servlet-api-2.5-20081211.jar:/usr/share/cassandra/slf4j-api-1.6.1.jar:/usr/share/cassandra/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/snakeyaml-1.6.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
>  -Dlog4j.configuration=log4j-server.properties 
> -XX:HeapDumpPath=/var/lib/cassandra/java_1318260751.hprof 
> -XX:ErrorFile=/var/lib/casandra/hs_err_1318260751.log -ea 
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms24196M -Xmx24196M 
> -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC 
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 
> -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true 
> -Dcom.sun.management.jmxremote.port=8080 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.authenticate=false 
> org.apache.cassandra.thrift.CassandraDaemon
> 
> I have munin monitoring of JMX so when I talk about heap max then I'm 
> referring to:
> 
> jmxObjectName java.lang:type=Memory
> jmxAttributeName HeapMemoryUsage
> jmxAttributeKey max
> 
> The other crazy thing is the heap used is no where close to heap max.
> 
> On Mon, Oct 10, 2011 at 12:40 AM, aaron morton  
> wrote:
> Have you checked /var/log/cassandra/output.txt (the packaged install pipes 
> std out/err to there) or the system logs ? If there are no errors in the logs 
> it may well be something external killing it.
> 
> With regard to memory usage, it's hard for people to help unless you provide 
> some numbers. What do you mean by MAX heap ? Is this the max used heap size 
> reported by JMX or the -Xmx setting passed to the server ?
> 



Re: 0.7.9 RejectedExecutionException

2011-10-10 Thread Ashley Martens
It is actually not at the exact same time of the day. It varies but happens
within certain blocks of time, like between 00hr and 02hr. The could be up
for hours or it could crash again in 15 minutes. The memory is fine, just
using a larger footprint than 0.6 in all ways.

On Mon, Oct 10, 2011 at 1:18 PM, aaron morton wrote:

> The service keeps dieing at the same time every day and there is nothing in
> the app logs, it's going to be something external.
>
> Sorry but I'm not sure what the problem with the memory usage is. Is the
> server running out of memory, or is it experiencing a lot of GC ?
>
>


cassandra on laptop

2011-10-10 Thread Gary Jefferson
I'm running an underpowered laptop (ubuntu) for development work. Installing 
Cassandra was easy, and getting the twissandra example app up and working was 
also easy.

Here's the problem: after about a day of letting it run (with no load generated 
to webapp or db), my laptop now becomes unresponsive. If I'm patient, I can 
shutdown the cassandra service and return things to normal. In each of these 
cases, the cassandra process is eating up almost all memory, and everything 
goes to swap.

I can't develop against Cassandra in this environment. I know it isn't set up 
by default to work efficiently on a meager laptop, but are there some common 
setting somewhere that I can just tweak to make life not be so miserable? I 
just want to play with it and try it out for this project I'm working on, but 
that's impractical with default settings. I'm going to have to flee to mongodb 
or something not as good...

I'm also a little nervous about this running on a server now -- I've read 
enough to understand that by default it's set up to eat lots of memory, and I'm 
fine with that... but it just lends itself to all the java bigotry that some of 
us accumulate over the years.

Anyway, if someone can give me a pointer on how to set up to run on a laptop in 
a development setting, big thanks.

Thanks!
Gary


seeking contractor to assist with upgrade/expansion

2011-10-10 Thread Scott Dworkis

hope this is not off topic?

we've been struggling following ostensible procedures for awhile now, 
ready to pony up for some pro help (but not quite ready to pony up for 
datastax).  please contact me at svd at mylife dot com if you are 
interested.


-scott


Re: Volunteers needed - Wiki

2011-10-10 Thread aaron morton
Thanks, Hani. 
If you would like to update the storage config page that would be 
handy. Just update http://wiki.apache.org/cassandra/FrontPage_draft_aaron/  to 
say you are working on it. Just click the login link at the top to setup an 
account.

wrt setting up eclipse, perhaps you could post your instructions on a 
blog somewhere and we can link to it. 

cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11/10/2011, at 5:51 AM, hani elabed wrote:

> Hi Aaron,
> 
> I can help with the documentation... I grabbed tons of screenshots as I was 
> installing Cassandra source trunk(1.0.0.rc2?) on my Mac OS X Snow leopard on 
> Eclipse Galileo and later Eclipse Indigo, I will be installing it on Eclipse 
> for Ubuntu 10.04 soon. I took the sceenshots after I noticed the missing 
> picts in here: 
> 
> http://wiki.apache.org/cassandra/RunningCassandraInEclipse
> 
> so I did plan on helping with the update... I am glad you sent your email 
> though to get me going.
> 
> I am just not sure of the logistics, how to do it, and if I needed to be 
> granted some write access to the wiki. Please educate...
> 
> I can definitely help on the NodeTool and StorageConfiguration as soon as I 
> can grok them myself, or any other documentation.
> 
> Also you draft front page and focusing first on 1.0 first match my thinking.  
> 
> Hani Elabed
> 
> 
> On Mon, Oct 10, 2011 at 4:10 AM, aaron morton  wrote:
> Hi there, 
>   The dev's have been very busy and Cassandra 1.0 is just around the 
> corner and full of new features. To celebrate I'm trying to give the wiki 
> some loving to make things a little more welcoming for new users.
> 
>   To keep things manageable I'd like to focus on completeness and 
> correctness for now, and worry about being super awesome later. For example 
> the nodetool page is incomplete http://wiki.apache.org/cassandra/NodeTool , 
> we do not have anything about CQL and config page is from 0.7 
> http://wiki.apache.org/cassandra/StorageConfiguration
> 
>   As a starting point I've created a draft home page 
> http://wiki.apache.org/cassandra/FrontPage_draft_aaron/ . I also hope to use 
> this as a planning tool where we can mark off what's in progress or has been 
> completed. 
> 
>   The guidelines I think we should follow are:
>   * ensure coverage of 1.0, a best effort for 0.8 and leave any content 
> from previous versions. 
>   * where appropriate include examples from CQL and RPC as both are still 
> supported. 
> 
>   If you would like to contribute to this effort please let me know via 
> the email list. It's a great way to contribute to the project and learn how 
> Cassandra works, and I'll do my best to help with any questions you may have. 
> Or if you have something you've already written that you feel may be of use 
> let me know, and we'll see about linking to it.
> 
> Thanks. 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> 



Re: Existing column(s) not readable

2011-10-10 Thread Thomas Richter
Hi Aaron,

normally we use hector to access cassandra, but for debugging I switched
to cassandra-cli.

Column can not be read by a simple
get CFName['rowkey']['colname'];

Response is "Value was not found"
if i query another column, everything is just fine.

Serverlog for unsuccessful read (keyspace and CF names replaced):

DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,739 CassandraServer.java
(line 280) get

DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,744 StorageProxy.java (line
320) Command/ConsistencyLevel is
SliceByNamesReadCommand(table='Keyspace',
key=61636162626139322d396638312d343562382d396637352d393162303337383030393762,
columnParent='QueryPath(columnFamilyName='ColumnFamily',
superColumnName='null', columnName='null')',
columns=[574c303030375030,])/ONE

DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,750 ReadCallback.java (line
86) Blockfor/repair is 1/true; setting up requests to localhost/127.0.0.1

DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,750 StorageProxy.java (line
343) reading data locally

DEBUG [ReadStage:33] 2011-10-10 23:15:29,751 StorageProxy.java (line
448) LocalReadRunnable reading SliceByNamesReadCommand(table='Keyspace',
key=61636162626139322d396638312d343562382d396637352d393162303337383030393762,
columnParent='QueryPath(columnFamilyName='ColumnFamily',
superColumnName='null', columnName='null')', columns=[574c303030375030,])

DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,818 StorageProxy.java (line
393) Read: 67 ms.

Log looks fine to me, but no result is returned.

Best,

Thomas

On 10/10/2011 10:00 PM, aaron morton wrote:
> How are they unreadable ? You need to go into some details about what is 
> going wrong. 
> 
> What sort of read ? 
> What client ? 
> What is in the logging on client and server side ? 
> 
> 
> Try turning the logging up to DEBUG on the server to watch what happens. 
> 
> Cheers
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 10/10/2011, at 9:23 PM, Thomas Richter wrote:
> 
>> Hi,
>>
>> no errors in the server logs. The columns are unreadable on all nodes at
>> any consistency level (ONE, QUORUM, ALL). We started with 0.7.3 and
>> upgraded to 0.7.6-2 two months ago.
>>
>> Best,
>>
>> Thomas
>>
>> On 10/10/2011 10:03 AM, aaron morton wrote:
>>> What error are you seeing  in the server logs ? Are the columns unreadable 
>>> at all Consistency Levels ? i.e. are the columns unreadable on all nodes.
>>>
>>> What is the upgrade history of the cluster ? What version did it start at ? 
>>>
>>> Cheers
>>>
>>>
>>> -
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 10/10/2011, at 7:42 AM, Thomas Richter wrote:
>>>
 Hi,

 here is some further information. Compaction did not help, but data is
 still there when I dump the row with sstable2json.

 Best,

 Thomas

 On 10/08/2011 11:30 PM, Thomas Richter wrote:
> Hi,
>
> we are running a 3 node cassandra (0.7.6-2) cluster and some of our
> column families contain quite large rows (400k+ columns, 4-6GB row size).
> Replicaton factor is 3 for all keyspaces. The cluster is running fine
> for several months now and we never experienced any serious trouble.
>
> Some days ago we noticed, that some previously written columns could not
> be read. This does not always happen, and only some dozen columns out of
> 400k are affected.
>
> After ruling out application logic as a cause I dumped the row in
> question with sstable2json and the columns are there (and are not marked
> for deletion).
>
> Next thing was setting up a fresh single node cluster and copying the
> column family data to that node. Columns could not be read either.
> Right now I'm running a nodetool compact for the cf to see if data could
> be read afterwards.
>
> Is there any explanation for such behavior? Are there any suggestions
> for further investigation?
>
> TIA,
>
> Thomas

>>>
>>
> 



Efficiency of hector's setRowCount

2011-10-10 Thread Don Smith
Hector's IndexedSlicesQuery has a setRowCount method that you can use to 
page through the results, as described in 
https://github.com/rantav/hector/wiki/User-Guide .


 rangeSlicesQuery.setRowCount(1001);
  .
 rangeSlicesQuery.setKeys(lastRow.getKey(),  "");

Is it efficient?  Specifically, suppose my query returns 100,000 results 
and I page through batches of 1000 at a time (making 100 executes of the 
query). Will it internally retrieve all the results each time (but pass 
only the desired set of 1000 or so to me)? Or will it optimize queries 
to avoid the duplication?  I presume the latter. :)


Can IndexedSlicesQuery's setStartKey method be used for the same effect?

   Thanks,  Don


Re: cassandra on laptop

2011-10-10 Thread Peter Sanford
By default, Cassandra is configured to use half the ram of your
system. That's way overkill for playing around with it on a laptop.
Edit /etc/cassandra/cassandra-env.sh and set max_heap_size_in_mb to
something more suited for your environment.

I have it set to 256M for my laptop (with 4G of ram). This works just
fine for light development tasks and for running our test suite.

-psanford

On Mon, Oct 10, 2011 at 1:44 PM, Gary Jefferson
 wrote:
> I'm running an underpowered laptop (ubuntu) for development work. Installing 
> Cassandra was easy, and getting the twissandra example app up and working was 
> also easy.
>
> Here's the problem: after about a day of letting it run (with no load 
> generated to webapp or db), my laptop now becomes unresponsive. If I'm 
> patient, I can shutdown the cassandra service and return things to normal. In 
> each of these cases, the cassandra process is eating up almost all memory, 
> and everything goes to swap.
>
> I can't develop against Cassandra in this environment. I know it isn't set up 
> by default to work efficiently on a meager laptop, but are there some common 
> setting somewhere that I can just tweak to make life not be so miserable? I 
> just want to play with it and try it out for this project I'm working on, but 
> that's impractical with default settings. I'm going to have to flee to 
> mongodb or something not as good...
>
> I'm also a little nervous about this running on a server now -- I've read 
> enough to understand that by default it's set up to eat lots of memory, and 
> I'm fine with that... but it just lends itself to all the java bigotry that 
> some of us accumulate over the years.
>
> Anyway, if someone can give me a pointer on how to set up to run on a laptop 
> in a development setting, big thanks.
>
> Thanks!
> Gary
>


Re: anyway to throttle nodetool repair?

2011-10-10 Thread Yan Chunlu
so how about disk io?  is there anyway to use ionice to control it?

I have tried to adjust the priority by "ionice -c3 -p [cassandra pid].
 seems not working...

On Wed, Sep 28, 2011 at 12:02 AM, Peter Schuller <
peter.schul...@infidyne.com> wrote:

> > I saw the ticket about compaction throttling, just wonder is that
> necessary
> > to add an option or is there anyway to do repair throttling?
> > every time I run nodetool repair, it uses all disk io and the server load
> > goes up quickly, just wonder is there anyway to make it smoother.
>
> The validating compaction that is part of repair is subject to
> compaction throttling.
>
> The streaming of sstables afterwards is not however. In 1.0 there is
> thottling of streaming:
> https://issues.apache.org/jira/browse/CASSANDRA-3080
>
> --
> / Peter Schuller (@scode on twitter)
>


Re: anyway to throttle nodetool repair?

2011-10-10 Thread Yan Chunlu
I am using commodity hardware so even minor compact make disk io goes 100%
and server load get very high

On Tue, Oct 11, 2011 at 11:19 AM, Yan Chunlu  wrote:

> so how about disk io?  is there anyway to use ionice to control it?
>
> I have tried to adjust the priority by "ionice -c3 -p [cassandra pid].
>  seems not working...
>
>
> On Wed, Sep 28, 2011 at 12:02 AM, Peter Schuller <
> peter.schul...@infidyne.com> wrote:
>
>> > I saw the ticket about compaction throttling, just wonder is that
>> necessary
>> > to add an option or is there anyway to do repair throttling?
>> > every time I run nodetool repair, it uses all disk io and the server
>> load
>> > goes up quickly, just wonder is there anyway to make it smoother.
>>
>> The validating compaction that is part of repair is subject to
>> compaction throttling.
>>
>> The streaming of sstables afterwards is not however. In 1.0 there is
>> thottling of streaming:
>> https://issues.apache.org/jira/browse/CASSANDRA-3080
>>
>> --
>> / Peter Schuller (@scode on twitter)
>>
>
>


Multi DC setup

2011-10-10 Thread Cassa L
I am trying to understand multi DC setup for cassandra. As I understand, in
this setup,  replicas exists in same cluster ring, but physically nodes are
distributed across DCs. Is this correct?
I have two different cluster rings in two DCs, and want to replicate data
bidirectionally. They both have same keyspace. They take  data traffic from
different sources, but we want to make sure, data exists in both the rings.
What could be the way to achieve this?

Thanks,
L.


Re: Multi DC setup

2011-10-10 Thread Milind Parikh
Why have two rings? Cassandra manages the replication for youone ring
with physical nodes in two dc might be a better option. Of course, depending
on the inter-dc failure characteristics, might need to endure split-brain
for a while.

/***
sent from my android...please pardon occasional typos as I respond @ the
speed of thought
/

On Oct 10, 2011 10:09 PM, "Cassa L"  wrote:

I am trying to understand multi DC setup for cassandra. As I understand, in
this setup,  replicas exists in same cluster ring, but physically nodes are
distributed across DCs. Is this correct?
I have two different cluster rings in two DCs, and want to replicate data
bidirectionally. They both have same keyspace. They take  data traffic from
different sources, but we want to make sure, data exists in both the rings.
What could be the way to achieve this?

Thanks,
L.


Re: Volunteers needed - Wiki

2011-10-10 Thread Maki Watanabe
Hello aaron,
I raise my hand too.
If you have to-do list about the wiki, please let us know.

maki


2011/10/10 aaron morton :
> Hi there,
> The dev's have been very busy and Cassandra 1.0 is just around the corner
> and full of new features. To celebrate I'm trying to give the wiki some
> loving to make things a little more welcoming for new users.
> To keep things manageable I'd like to focus on completeness and correctness
> for now, and worry about being super awesome later. For example the nodetool
> page is incomplete http://wiki.apache.org/cassandra/NodeTool , we do not
> have anything about CQL and config page is from
> 0.7 http://wiki.apache.org/cassandra/StorageConfiguration
> As a starting point I've created a draft home
> page http://wiki.apache.org/cassandra/FrontPage_draft_aaron/ . I also hope
> to use this as a planning tool where we can mark off what's in progress or
> has been completed.
> The guidelines I think we should follow are:
> * ensure coverage of 1.0, a best effort for 0.8 and leave any content from
> previous versions.
> * where appropriate include examples from CQL and RPC as both are still
> supported.
> If you would like to contribute to this effort please let me know via the
> email list. It's a great way to contribute to the project and learn how
> Cassandra works, and I'll do my best to help with any questions you may
> have. Or if you have something you've already written that you feel may be
> of use let me know, and we'll see about linking to it.
> Thanks.
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>



-- 
w3m


Re: Multi DC setup

2011-10-10 Thread Cassa L
We already have two separate rings. Idea of bidirectional sync is, if one
ring is down, we can still send the traffic to other ring. When original
cluster comes back, it will pick up the data from available cluster. I'm not
sure if it makes sense to have separate rings or combine these two rings
into one.



On Mon, Oct 10, 2011 at 10:17 PM, Milind Parikh wrote:

> Why have two rings? Cassandra manages the replication for youone ring
> with physical nodes in two dc might be a better option. Of course, depending
> on the inter-dc failure characteristics, might need to endure split-brain
> for a while.
>
> /***
> sent from my android...please pardon occasional typos as I respond @ the
> speed of thought
> /
>
> On Oct 10, 2011 10:09 PM, "Cassa L"  wrote:
>
> I am trying to understand multi DC setup for cassandra. As I understand, in
> this setup,  replicas exists in same cluster ring, but physically nodes are
> distributed across DCs. Is this correct?
> I have two different cluster rings in two DCs, and want to replicate data
> bidirectionally. They both have same keyspace. They take  data traffic from
> different sources, but we want to make sure, data exists in both the rings.
> What could be the way to achieve this?
>
> Thanks,
> L.
>
>


Re: Volunteers needed - Wiki

2011-10-10 Thread Sasha Dolgy
maybe that should be the first wiki update  the TODO

On Tue, Oct 11, 2011 at 7:21 AM, Maki Watanabe wrote:

> Hello aaron,
> I raise my hand too.
> If you have to-do list about the wiki, please let us know.
>
> maki
>