One last coda, for other noobs to cassandra like me. If you use NetworkTopologyStrategy with replication_factor > 1, make sure you have EC2 instance in multiple availability zones. I was doing baby steps, and tried doing a cluster in one AZ (before spreading to multiple AZs) and was getting the most baffling errors ("cassandra_UnavailableException"). I finally thought to check the cassandra server logs (after debugging the client code, firewalls, etc... painstakingly for connectivity problems), and it ends up my cassandra cluster was considering itself "unavailable" as it couldn't replicate as much as it wanted to. I kind of wish a different word than "unavailable" was chosen for this error condition :-)
will On Tue, Apr 12, 2011 at 10:37 PM, aaron morton <aa...@thelastpickle.com>wrote: > If you can use standard + encoded I would go with that. > > Aaron > > On 13 Apr 2011, at 07:07, William Oberman wrote: > > Excellent to know! (and yes, I figure I'll expand someday, so I'm glad I > found this out before digging a hole). > > The other issue I've been pondering is a normal column family of encoded > objects (in my case JSON) vs. a super column. Based on my use case, things > I've read, etc... right now I'm coming down on normal + encoded. > > will > > On Tue, Apr 12, 2011 at 2:57 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > >> NTS is overkill in the sense that it doesn't really benefit you in a >> single DC, but if you think you may expand to another DC in the future >> it's much simpler if you were already using NTS, than first migrating >> to NTS (changing strategy is painful). >> >> I can't think of any downsides to using NTS in a single-DC >> environment, so that's the "safe" option. >> >> On Tue, Apr 12, 2011 at 1:15 PM, William Oberman >> <ober...@civicscience.com> wrote: >> > Hi, >> > >> > I'm getting closer to commiting to cassandra, and now I'm in system/IT >> > issues and questions. I'm in the amazon EC2 cloud. I previously used >> this >> > forum to discover the best practice for disk layouts (large instance + >> the >> > two ephemeral disks in RAID0 for data + root volume for everything >> else). >> > Now I'm hoping to confirm bits and pieces of things I've read about for >> > snitch/replication strategies. I was thinking of using >> > endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch >> > >> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' >> > (for people hitting this from the mailing list or google, I feel >> obligated >> > to note that the former setting is in cassandra.yaml, and the latter is >> an >> > option on a keyspace). >> > >> > But, I'm only in one region. Is using the amazon snitch/networktopology >> > overkill given everything I have is in one DC (I believe region==DC and >> > availability_zone==rack). I'm using multiple availability zones for >> some >> > level of redundancy, I'm just not yet to the point I'm using multiple >> > regions. If someday I move to using multiple regions, would that change >> the >> > answer? >> > >> > Thanks! >> > >> > -- >> > Will Oberman >> > Civic Science, Inc. >> > 3030 Penn Avenue., First Floor >> > Pittsburgh, PA 15201 >> > (M) 412-480-7835 >> > (E) ober...@civicscience.com >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > > > -- > Will Oberman > Civic Science, Inc. > 3030 Penn Avenue., First Floor > Pittsburgh, PA 15201 > (M) 412-480-7835 > (E) ober...@civicscience.com > > > -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com