encryption_options & 0.8

2011-04-25 Thread Sasha Dolgy
Is it possible to store an encrypted keystore_password and truststore_password in the cassandra.yaml? I see that the defaults allow cleartext which isn't suitable when negotiating with security specialists for sign-off of a solution... From: http://svn.apache.org/repos/asf/cassandra/trunk/conf/c

Re: data management / validation

2011-04-25 Thread Sasha Dolgy
I had also posted this last month: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-SSL-amp-Streaming-1567-td6196693.html "I only want to encrypt data from region 1 < -- > region 2 where a vpn is not possible... data communication in the same rack for example, is on a private n

Re: data management / validation

2011-04-25 Thread Sasha Dolgy
Cheers Aaron ... getNaturalEndpoints is perfect. I can pull a random key and analyze the results to check our business rules. Automated through monitoring ... Excellent. As for security, I had looked at the SSL option a month ago maybe ... For us, this would be a great feature, except, by imp

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Jonathan Ellis
You shouldn't be using beta1 unless your goal is to help shake the bugs out. :) 0.7.5 release is in progress; the artifacts are at http://people.apache.org/~slebresne/ On Tue, Apr 26, 2011 at 12:19 AM, Sanjeev Kulkarni wrote: > BTW where do i download 0.7.5? I went > to http://www.apache.org/dyn

Re: data management / validation

2011-04-25 Thread aaron morton
There is a JMX operation to get the endpoints for a token http://wiki.apache.org/cassandra/JmxInterface#org.apache.cassandra.service.StorageService.Operations.getNaturalEndpoints You can also specify a key when using bin/sstable2json if you want to grab the actual data from a file. If you were

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Sanjeev Kulkarni
BTW where do i download 0.7.5? I went to http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.7.5/apache-cassandra-0.7.5-bin.tar.gzbut all the links there are broken. I was thinking if I just skip 0.7.5 and go with 0.8-beta1, would that be more advisable? Thanks! On Mon, Apr 25, 2011 at 9:30 PM,

Re: Manual Conflict Resolution in Cassandra

2011-04-25 Thread Narendra Sharma
>>>At t8 The request would not start as the CL level of nodes is not available, the write would not be written to node X. The client would get an UnavailableException. In response it should connect to a new coordinator and try again. [Naren] There may (and most likely there will be) be a window wh

Re: advice for EC2 deployment

2011-04-25 Thread aaron morton
For background see this article: http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers And this recent discussion http://www.mail-archive.com/user@cassandra.apache.org/msg12502.html Issues that may be a concern: - lots of cross AZ latency in us-east, e.g. LOCAL_QUORUM

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Jonathan Ellis
No. You'll need to run scrub. On Mon, Apr 25, 2011 at 11:19 PM, Sanjeev Kulkarni wrote: > Hi, > Thanks for pointing out the fix. My followup question is if I install 0.7.5 > will the problem go away with the current data? > Thanks! > On Mon, Apr 25, 2011 at 8:25 PM, Jonathan Ellis wrote: >> >> A

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Sanjeev Kulkarni
Hi, Thanks for pointing out the fix. My followup question is if I install 0.7.5 will the problem go away with the current data? Thanks! On Mon, Apr 25, 2011 at 8:25 PM, Jonathan Ellis wrote: > Ah... could be https://issues.apache.org/jira/browse/CASSANDRA-2349 > (fixed for 0.7.5) > > On Mon, Ap

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Jonathan Ellis
Ah... could be https://issues.apache.org/jira/browse/CASSANDRA-2349 (fixed for 0.7.5) On Mon, Apr 25, 2011 at 9:47 PM, Sanjeev Kulkarni wrote: > The only other interesting information is that the columns of these rows all > had some ttl attached to them. Not sure if that matters. > Thanks! > > O

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Sanjeev Kulkarni
The only other interesting information is that the columns of these rows all had some ttl attached to them. Not sure if that matters. Thanks! On Mon, Apr 25, 2011 at 5:27 PM, Terje Marthinussen wrote: > First column in the row has offset in the file of 190226525, last valid > column is at 380293

Re: Manual Conflict Resolution in Cassandra

2011-04-25 Thread David Strauss
On Mon, 2011-04-25 at 03:50 -0700, Milind Parikh wrote: > I suppose the term 'silently dropped' is a matter of perspective. C > makes an explicit automated choice of latest-timestamp-wins. In > certain situations, this is not the appropriate choice. I would still insist that using Cassandra and ex

RE: Apt repositories

2011-04-25 Thread David Strauss
On Mon, 2011-04-25 at 15:25 -0700, Gregory Szorc wrote: > If you don't want your APT-sourced packages to upgrade automatically, > I suggest pinning the package. I'm aware that I can pin the package, but it's still a workaround for the Cassandra Apt repository not being set up according to best pra

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Terje Marthinussen
First column in the row has offset in the file of 190226525, last valid column is at 380293592, about 181MB from first column to last. in_memory_compaction_limit was 128MB, so almost certainly above the limit. Terje On Tue, Apr 26, 2011 at 8:53 AM, Terje Marthinussen wrote: > In my case, proba

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Terje Marthinussen
In my case, probably yes. From thw rows I have looked at, I think I have only seen this on rows with 1 million plus columns/supercolumns. May very well been larger than in memory limit. I think the compacted row I looked closer at was about 200MB and the in memory limit may have been 256MB. I w

Inaugural Austin, TX Apache Cassadra Meetup this Wednesday

2011-04-25 Thread Nate McCall
I hope that everyone in the Central TX area Apache Cassandra community can make it. Details: http://www.meetup.com/Cassandra-Austin/events/17178322/

RE: Apt repositories

2011-04-25 Thread Gregory Szorc
If you don't want your APT-sourced packages to upgrade automatically, I suggest pinning the package. The apt_preferences(5) man page tells you how to do this. The gist is to add the following lines: Package: cassandra Pin: version 0.6.13 Pin-Priority: 1100 (setting the version to the one

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Sanjeev Kulkarni
I pepper my objects based on a hash so without reading the row I cant tell how big it is. Thanks! Sent from my iPhone On Apr 25, 2011, at 10:08 AM, Jonathan Ellis wrote: > Was it on a "large" row? (> in_memory_compaction_limit?) > > I'm starting to suspect that LazilyCompactedRow is computing

RE: OOM on heavy write load

2011-04-25 Thread Shu Zhang
the way I measure actual memtable row sizes is this write X rows into a cassandra node trigger GC record heap usage trigger compaction and GC record heap savings and divide by X for actual cassandra memtable row size in memory Similar process to measure per-key/per-row cache sizes for your data.

RE: OOM on heavy write load

2011-04-25 Thread Shu Zhang
How large are your rows? binary_memtable_throughput_in_ mb only tracks size of data, but there is an overhead associated with each row on the order of magnitude of a few KBs. If your row data sizes are really small then the overhead dominates the memory usage and binary_memtable_throughput_in_ mb

Re: 0.8 loosing nodes?

2011-04-25 Thread Jonathan Ellis
I bet the problem is with the other tasks on the executor that Gossip heartbeat runs on. I see at least two that could cause blocking: hint cleanup post-delivery and flush-expired-memtables, both of which call forceFlush which will block if the flush queue + threads are full. We've run into this

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Jonathan Ellis
Was it on a "large" row? (> in_memory_compaction_limit?) I'm starting to suspect that LazilyCompactedRow is computing row size incorrectly in some cases. On Mon, Apr 25, 2011 at 11:47 AM, Terje Marthinussen wrote: > I have been hunting similar looking corruptions, especially in the hints > colu

Re: 0.8 loosing nodes?

2011-04-25 Thread Terje Marthinussen
Got just enough time to look at this done today to verify that: Sometimes nodes (under pressure) fails to send heartbeats for long enough to get marked as dead by other nodes (why is a good question, which I need to check better. Does not seem to be GC). The node does however start sending heart

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Terje Marthinussen
I have been hunting similar looking corruptions, especially in the hints column family, but I believe it occurs somewhere while compacting. I looked in greater detail on one sstable and the row length was longer than the actual data in the row, and as far as I could see, either the length was wro

Re: Ec2 Stress Results

2011-04-25 Thread Joaquin Casares
Did the images have EBS storage or Instance Store storage? Typically EBS volumes aren't the best to be benchmarking against: http://www.mail-archive.com/user@cassandra.apache.org/msg11022.html Joaquin Casares DataStax Software Engineer/Support On Wed, Apr 20, 2011 at 5:12 PM, Jonathan Ellis w

Re: IP address resolution in MultiDC setup

2011-04-25 Thread Sasha Dolgy
honest opinion? smoke and mirrors. i really have no idea. i was surprised to see the latency drop when we started using the VIP's we assigned routing through our ec2 vyatta gateways. it makes it nice because it unties you from being 100% stuck on amazon. you can design your environment for cas

Re: IP address resolution in MultiDC setup

2011-04-25 Thread Milind Parikh
@Sasha Very interesting that you find a big difference in latency between nodes. Any hypothesis on what is going on in internal aws routing that makes it inefficient? Milind On Mon, Apr 25, 2011 at 9:48 AM, Sasha Dolgy wrote: > We use vyatta to create a vip on each instance and act as the ga

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Sanjeev Kulkarni
Hi Sylvain, I started it from 0.7.4 with the patch 2376. No upgrade. Thanks! On Mon, Apr 25, 2011 at 7:48 AM, Sylvain Lebresne wrote: > Hi Sanjeev, > > What's the story of the cluster ? Did you started with 0.7.4, or is it > upgraded from > some earlier version ? > > On Mon, Apr 25, 2011 at 5:54

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Sylvain Lebresne
Hi Sanjeev, What's the story of the cluster ? Did you started with 0.7.4, or is it upgraded from some earlier version ? On Mon, Apr 25, 2011 at 5:54 AM, Sanjeev Kulkarni wrote: > Hey guys, > Running a one node cassandra server with version 0.7.4 patched > with https://issues.apache.org/jira/brow

Re: JNA C library errors on OSX

2011-04-25 Thread Jonathan Ellis
Pretty sure this is b/c OS X doesn't support posix_fadvise. Since you shouldn't be running OS X as a server OS in production anyway, I wouldn't worry much. Cassandra will still work fine for development w/o native methods. On Mon, Apr 25, 2011 at 7:52 AM, John Lennard wrote: > Hi, > > I am curre

Re: IP address resolution in MultiDC setup

2011-04-25 Thread Sasha Dolgy
We use vyatta to create a vip on each instance and act as the gateway in each zone & region. this allows us to bridge into our own facilities outside of aws. we still can leverage ec2snitch and find a big speed difference wrt latency between nodes when by passing internal aws routing... On Apr 25

Re: IP address resolution in MultiDC setup

2011-04-25 Thread pankaj soni
scrap the last mail, just finished reading Amazon ec2 resource policy. @milind when deploying cassandra across multiple dcs using your patch, is it possible to have internal network of nodes in each data center talking over private ip? then I assume the node with public ip will act as co-ordinator

JNA C library errors on OSX

2011-04-25 Thread John Lennard
Hi, I am currently testing the current 0.8 beta on my OSX development machine and when cassandra is starting up i am seeing errors from the JNA code as below. john@balorama bin $ sudo -u cassandra ./cassandra -f Password: INFO 00:04:18,013 Logging initialized INFO 00:04:18,027 Heap size: 2126

Re: IP address resolution in MultiDC setup

2011-04-25 Thread pankaj soni
Just read your paper on this. Must say helped a great deal. 1 more query does amazon by default award both external and internal IP address for each node? or we have to explicitly buy the external IP's? I am looking into overlay n/w's. On Mon, Apr 25, 2011 at 5:20 PM, Milind Parikh wrote: > I s

Re: OOM on heavy write load

2011-04-25 Thread Nikolay Kоvshov
I assume if I turn off swap it will just die earlier, no ? What is the mechanism of dying ? >From the link you provided # Row cache is too large, or is caching large rows my row_cache is 0 # The memtable sizes are too large for the amount of heap allocated to the JVM Is my memtable size too la

Re: IP address resolution in MultiDC setup

2011-04-25 Thread Milind Parikh
I stand correctedI show how cassandra can be deployed in multiple dcs through a simple patch; using public ips. In your scenario with an overlay n/w, you will not require this patch. /*** sent from my android...please pardon occasional typos as I respond @ the speed of thou

Re: IP address resolution in MultiDC setup

2011-04-25 Thread pankaj soni
Could you give the exact name of your paper. It will be easier to search. thanks On Mon, Apr 25, 2011 at 5:13 PM, Milind Parikh wrote: > I have authored exactly this paperplease search this ml. Please be > aware about ec2's internal network as you design your deployment. Ec2 also > does not

Re: IP address resolution in MultiDC setup

2011-04-25 Thread Milind Parikh
I have authored exactly this paperplease search this ml. Please be aware about ec2's internal network as you design your deployment. Ec2 also does not support multicast; which is a pain,but not unsurmountable. /*** sent from my android...please pardon occasional typos as I

Re: IP address resolution in MultiDC setup

2011-04-25 Thread pankaj soni
We are expecting to deploy it on amazon cloud ec2, if it may help. I am sure people would have deployed Cassandra data centers in different regions on cloud before. But I am unable to find documentation of any such deployment online. Because of this multi-regions the public-private IP address issu

Re: IP address resolution in MultiDC setup

2011-04-25 Thread Milind Parikh
It will be through an overlay n/w. unfortunately setting up such n/w is complex. Look @ something like openvpn. If multicast is supported, it will be easier. With complex software such as Cassandra, it is much better to go with the expected flow; rather than devicing your own flows.my2c. /***

Re: Manual Conflict Resolution in Cassandra

2011-04-25 Thread Milind Parikh
I suppose the term 'silently dropped' is a matter of perspective. C makes an explicit automated choice of latest-timestamp-wins. In certain situations, this is not the appropriate choice. Regards Milind /*** sent from my android...please pardon occasional typos as I respond @ t

IP address resolution in MultiDC setup

2011-04-25 Thread pankaj soni
Hi, We have a scenario for which we are considering using apache Cassandra for deployment for our data storage needs.The setup is to be spread across multiple data centers in different regions(physical locations). With each data center having multiple nodes. However we can afford at most 1 public

Re: Manual Conflict Resolution in Cassandra

2011-04-25 Thread David Strauss
On Fri, 2011-04-22 at 13:31 -0700, Milind Parikh wrote: > Is there a chance of getting manual conflict resolution in Cassandra? > Please see attachment for why this is important in some cases. You can actually already perform "manual conflict resolution" in Cassandra by naming your columns so that