Re: Best Practices for Managing Concurrent Client Connections in Cassandra

2024-02-29 Thread Andrew Weaver
We've used these settings in production with no issues. What has been more valuable to us though is limiting the rate of client connections via iptables. Often times users configure an aggressive reconnection policy that floods the cluster with connections in certain circumstances like a node rest

Re: Best Practices for Managing Concurrent Client Connections in Cassandra

2024-02-29 Thread Bowen Song via user
They are suitable for production use for protecting your Cassandra server, not the clients. The clients likely will experience an error when the limit is reached, and it needs to handle that error appropriately. What you really want to do probably are: 1. change the client's behaviour, limit t

Best Practices for Managing Concurrent Client Connections in Cassandra

2024-02-29 Thread Naman kaushik
Hello Cassandra Community, We've been experiencing occasional spikes in the number of client connections to our Cassandra cluster, particularly during high-volume API request periods. We're using persistent connections, and we've noticed that the number of connections can increase significantly du

Re: Backup cassandra and restore. Best practices

2021-04-07 Thread Alexander Nikolaev
Oh, yes you're right. Sorry for my inattention. I'll check it out. Thank you all! I really appreciate it With Best Regards   Alexander N. вторник, 6 апреля 2021 г., 13:50:45 GMT+3, Alexander DEJANOVSKI написал(-а): Yes, Minio is supported by Medusa through the S3 compatible backend.I

Re: Backup cassandra and restore. Best practices

2021-04-06 Thread Alexander Nikolaev
Thank you, Erick!This is a useful tool, but we look for smth that could store backups in local S3 (like minio), not Amazon or else..If I correctly understand, Medusa can't do it. Nevertheless, thank you for your response! With Best RegardsAlexander N. вторник, 6 апреля 2021 г., 11:35:40 GMT+

Backup cassandra and restore. Best practices

2021-04-06 Thread Alexander Nikolaev
Hello everyone!We have a new Cassandra cluster, which exists of 5 nodes in 1 DC. Now, I looking for a tool, which could help backup data to S3 or smth locally. Could you help me please with best practices for backing up and restoring? Is it enough to have local snapshots? In my opinion, the

Re: Backup cassandra and restore. Best practices

2021-04-06 Thread Alexander DEJANOVSKI
Yes, Minio is supported by Medusa through the S3 compatible backend. I reckon we need to update the docs with a guide on setting up those backends, but it's pretty much the same as ceph s3 rgw in configuring your medusa.ini : - use s3_compatible as storage backend - set the host, port and region se

Re: Backup cassandra and restore. Best practices

2021-04-06 Thread Erick Ramirez
Minio is a supported type -- https://github.com/apache/libcloud/blob/trunk/libcloud/storage/types.py#L108 On Tue, 6 Apr 2021 at 20:29, Erick Ramirez wrote: > This is a useful tool, but we look for smth that could store backups in >> local S3 (like minio), not Amazon or else.. >> > > As I stated

Re: Backup cassandra and restore. Best practices

2021-04-06 Thread Erick Ramirez
> > This is a useful tool, but we look for smth that could store backups in > local S3 (like minio), not Amazon or else.. > As I stated in my response, Medusa supports any S3-like storage that the Apache Libcloud API can access. See the docs I linked. Cheers!

Re: Backup cassandra and restore. Best practices

2021-04-06 Thread Bowen Song
Medusa /Support for local storage, Google Cloud Storage (GCS) and AWS S3 through //Apache Libcloud //. Can be extended to support other storage providers supported by Apache Libcloud,/ and Apache Libcloud supports minio

Re: Backup cassandra and restore. Best practices

2021-04-06 Thread Erick Ramirez
I'd recommend using Medusa ( https://github.com/thelastpickle/cassandra-medusa/wiki) -- an open-source tool which automates backups and has support for archiving to S3, Google Cloud and any S3-like storage. Cheers! >

cassandra collection best practices and performance

2020-01-07 Thread onmstester onmstester
Sweet spot for set and list items count (in datastax's documents, the max is 2billions)? Write and read performance of Set vs List vs simple partition row? Thanks in advance

Re: Schema Management Best Practices

2019-05-10 Thread Alain RODRIGUEZ
Hello Mark Second, any ideas what could be creating bottlenecks for schema alteration? I am not too sure what could be going on to make things that long, but about the corrupted data, I've seen it before. Here are some thoughts around schema changes and finding the bottlenecks: Ideally, use co

Schema Management Best Practices

2019-05-09 Thread Mark Bidewell
I am doing post-mortem on an issue with our cassandra cluster. One of our tables became corrupt and had to be restored via a backup. The table schema has been undergoing active development, so the number of "alter table" statements was quite large (300+). Currently, we use cqlsh to do schema loa

Re: Best practices while designing backup storage system for big Cassandra cluster

2019-04-02 Thread Carl Mueller
for speed mostly during a backup, but >> resiliency and not harming the source cluster mostly I would say. >> Then how fast you write to the backup storage system will probably be >> more often limited by what you can read from the source cluster. >> The backups have to b

Re: Best practices while designing backup storage system for big Cassandra cluster

2019-04-01 Thread Carl Mueller
om running nodes, thus it's easy to > overload the disk (reads), network (export backup data to final > destination), and even CPU (as/if the machine handles the transfer). > > What are the best practices while designing backup storage system for big >> Cassandra cluster? >

Re: Best practices while designing backup storage system for big Cassandra cluster

2019-04-01 Thread Alain RODRIGUEZ
even CPU (as/if the machine handles the transfer). What are the best practices while designing backup storage system for big > Cassandra cluster? What is nice to have (not to say mandatory) is a system of incremental backups. You should not take the data from the nodes every time, or you&

Best practices while designing backup storage system for big Cassandra cluster

2019-03-28 Thread manish khandelwal
on SAN help us in this regard? Apart from using SSD disk, what are the alternative approach to make my backup process fast? What are the best practices while designing backup storage system for big Cassandra cluster? Regards Manish

RE: Gathering / Curating / Organizing Cassandra Best Practices & Patterns

2018-02-26 Thread Kenneth Brotman
!!! Kenneth Brotman From: Eric Plowe [mailto:eric.pl...@gmail.com] Sent: Monday, February 26, 2018 1:14 PM To: user@cassandra.apache.org Subject: Re: Gathering / Curating / Organizing Cassandra Best Practices & Patterns Kenneth, How did you get "caught in the middle" of thi

Re: Gathering / Curating / Organizing Cassandra Best Practices & Patterns

2018-02-26 Thread Eric Plowe
one. I said > my two cents. I had to vent. I’m back to concentrating on helping the > group. > > > > Kenneth Brotman > > > > *From:* Eric Evans [mailto:john.eric.ev...@gmail.com] > *Sent:* Monday, February 26, 2018 9:16 AM > *To:* user@cassandra.apache.

RE: Gathering / Curating / Organizing Cassandra Best Practices & Patterns

2018-02-26 Thread Kenneth Brotman
@cassandra.apache.org Subject: Re: Gathering / Curating / Organizing Cassandra Best Practices & Patterns On Sun, Feb 25, 2018 at 8:45 AM, Kenneth Brotman wrote: Chris Mattmann acted without authority and completely improperly as an Apache Software Foundation board member as a board member on their

Re: Gathering / Curating / Organizing Cassandra Best Practices & Patterns

2018-02-26 Thread Eric Evans
communities? > Kenneth, I really think you need to pump the brakes here. You're leveling some pretty serious accusations, and have now resorted to personal attacks; This is not constructive. *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] > *Sent:* Saturday, February 24,

RE: Gathering / Curating / Organizing Cassandra Best Practices & Patterns

2018-02-25 Thread Kenneth Brotman
tman [mailto:kenbrot...@yahoo.com.INVALID] Sent: Saturday, February 24, 2018 12:58 PM To: user@cassandra.apache.org Subject: RE: Gathering / Curating / Organizing Cassandra Best Practices & Patterns Jon, This is considered the start of the problem: https://www.mail-archive

RE: Gathering / Curating / Organizing Cassandra Best Practices & Patterns

2018-02-24 Thread Kenneth Brotman
] Sent: Saturday, February 24, 2018 12:58 PM To: user@cassandra.apache.org Subject: RE: Gathering / Curating / Organizing Cassandra Best Practices & Patterns Jon, This is considered the start of the problem: https://www.mail-archive.com/dev@cassandra.apache.org/msg09050.html Th

RE: Gathering / Curating / Organizing Cassandra Best Practices & Patterns

2018-02-24 Thread Kenneth Brotman
really said. Kenneth Brotman From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad Sent: Saturday, February 24, 2018 12:26 PM To: Kenneth Brotman Subject: Re: Gathering / Curating / Organizing Cassandra Best Practices & Patterns I really don’t want to continue

RE: Gathering / Curating / Organizing Cassandra Best Practices & Patterns

2018-02-24 Thread Kenneth Brotman
/ Organizing Cassandra Best Practices & Patterns DataStax academy is great but no, no work needs to be or should be aligned with it. Datastax is an independent company trying to make a profit, they could yank their docs at any time. There’s a reason why we started doing the docs in-tree, there

Re: Gathering / Curating / Organizing Cassandra Best Practices & Patterns

2018-02-24 Thread Jon Haddad
mailto:kenbrot...@yahoo.com > <mailto:kenbrot...@yahoo.com>] > Sent: Saturday, February 24, 2018 10:16 AM > To: 'user@cassandra.apache.org <mailto:user@cassandra.apache.org>' > Subject: RE: Gathering / Curating / Organizing Cassandra Best Practices & > Patt

RE: Gathering / Curating / Organizing Cassandra Best Practices & Patterns

2018-02-24 Thread Kenneth Brotman
ubject: RE: Gathering / Curating / Organizing Cassandra Best Practices & Patterns To Rahul, This is your official email (just from me as an individual) requesting your assistance to help solve the knowledge management problem. I can appreciate the work you put into the Awesome Cassandr

Re: Heavy one-off writes best practices

2018-02-06 Thread Romain Hardouin
We use Spark2Cassandra (this fork works with C*3.0  https://github.com/leoromanovsky/Spark2Cassandra ) SSTables are streamed to Cassandra by Spark2Cassandra (so you need to open port 7000 accordingly).During benchmark we used 25 EMR nodes but in production we use less nodes to be more gentle wit

Re: Heavy one-off writes best practices

2018-02-06 Thread Julien Moumne
This does look like a very viable solution. Thanks. Could you give us some pointers/documentation on : - how can we build such SSTables using spark jobs, maybe https://github.com/Netflix/sstable-adaptor ? - how do we send these tables to cassandra? does a simple SCP work? - what is the recommen

Re: Heavy one-off writes best practices

2018-02-05 Thread Romain Hardouin
Hi Julien, We have such a use case on some clusters. If you want to insert big batches at fast pace the only viable solution is to generate SSTables on Spark side and stream them to C*. Last time we benchmarked such a job we achieved 1.3 million partitions inserted per seconde on a 3 C* nodes

Re: Heavy one-off writes best practices

2018-02-04 Thread kurt greaves
> > Would you know if there is evidence that inserting skinny rows in sorted > order (no batching) helps C*? This won't have any effect as each insert will be handled separately by the coordinator (or a different coordinator, even). Sorting is also very unlikely to help even if you did batch. Al

Re: Heavy one-off writes best practices

2018-02-04 Thread Julien Moumne
dra/blob/trunk/ > src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java ) > > > > - Jeff > > -- > Jeff Jirsa > > > On Jan 30, 2018, at 12:12 AM, Julien Moumne wrote: > > Hello, I am looking for best practices for the following use case : > > Once a day, we insert a

Re: Heavy one-off writes best practices

2018-01-30 Thread Jeff Jirsa
ien Moumne wrote: > > Hello, I am looking for best practices for the following use case : > > Once a day, we insert at the same time 10 full tables (several 100GiB each) > using Spark C* driver, without batching, with CL set to ALL. > > Whether skinny rows or wide rows, dat

Re: Heavy one-off writes best practices

2018-01-30 Thread Lucas Benevides
e sets new values to memTable Cleanup Threshold and Key cache size. Although it is not proven that the same results will persist in different environments, it is a good starting point. Lucas Benevides 2018-01-30 6:12 GMT-02:00 Julien Moumne : > Hello, I am looking for best practices for the follo

Re: Heavy one-off writes best practices

2018-01-30 Thread Alain RODRIGUEZ
ite speed in Spark if doable from a service perspective or add nodes, so spark is never strong enough to break regular transactions (this could be very expensive). - Run Spark mostly on off-peak hours - ... probably some more I cannot think of just now :). Is there any best practices we

Re: Heavy one-off writes best practices

2018-01-30 Thread Alain RODRIGUEZ
move, probably very good for some use case. Maybe yours? > - Simply limit the write speed in Spark if doable from a service > perspective or add nodes, so spark is never strong enough to break regular > transactions (this could be very expensive). > - Run Spark mostly on off-peak

Heavy one-off writes best practices

2018-01-30 Thread Julien Moumne
Hello, I am looking for best practices for the following use case : Once a day, we insert at the same time 10 full tables (several 100GiB each) using Spark C* driver, without batching, with CL set to ALL. Whether skinny rows or wide rows, data for a partition key is always completely updated

Nodetool Repair Best Practices

2016-11-21 Thread Daniel Subak
Hey everyone, We've just migrated to a new Cassandra cluster running 3.7 and wanted to get some information on best practices when running nodetool repair; our last cluster was 1.2 and per the documentation it seems that a lot of behavior has changed between those versions. >From a rea

Re: Cassandra installation best practices

2016-10-18 Thread kurt Greaves
ont > mehdi.b...@dbi-services.com > www.dbi-services.com > > > > -- > *From: *"Brooke Jensen" > *To: *"user" > *Sent: *Tuesday, October 18, 2016 8:59:14 AM > *Subject: *Re: Cassandra installation best practices > > Hi Mehdi, > In addition, give som

Re: Cassandra installation best practices

2016-10-18 Thread Mehdi Bada
, October 18, 2016 8:59:14 AM Subject: Re: Cassandra installation best practices Hi Mehdi, In addition, give some thought to your cluster topology. For maximum fault tolerance and availability I would recommend using at least three nodes with a replication factor of three. Ideally, you shoul

Re: Cassandra installation best practices

2016-10-17 Thread Brooke Jensen
18 October 2016 at 04:02, Anuj Wadehra wrote: > Hi Mehdi, > > You can refer https://docs.datastax.com/en/landing_page/doc/landing_page/ > recommendedSettings.html . > > Thanks > Anuj > > On Mon, 17 Oct, 2016 at 10:20 PM, Mehdi Bada > > wrote: > Hi all, >

Re: Cassandra installation best practices

2016-10-17 Thread Anuj Wadehra
Hi Mehdi, You can refer  https://docs.datastax.com/en/landing_page/doc/landing_page/recommendedSettings.html  . ThanksAnuj On Mon, 17 Oct, 2016 at 10:20 PM, Mehdi Bada wrote: Hi all, It is exist some best practices when installing Cassandra on production environment? Some standard to follow

Re: Cassandra installation best practices

2016-10-17 Thread Vladimir Yudovin
Azure and SoftLayer. Launch your cluster in minutes. On Mon, 17 Oct 2016 12:50:15 -0400Mehdi Bada <mehdi.b...@dbi-services.com> wrote Hi all, It is exist some best practices when installing Cassandra on production environment? Some standard to follow? For instance, th

Cassandra installation best practices

2016-10-17 Thread Mehdi Bada
Hi all, It is exist some best practices when installing Cassandra on production environment? Some standard to follow? For instance, the file system type etc..

Re: best practices for time-series data with massive amounts of records

2015-03-07 Thread Eric Stevens
It's probably quite rare for extremely large time series data to be querying the whole set of data. Instead there's almost always a "Between X and Y dates" aspect to nearly every real time query you might have against a table like this (with the exception of "most recent N events"). Because of th

Re: best practices for time-series data with massive amounts of records

2015-03-06 Thread graham sanderson
Note that using static column(s) for the “head” value, and trailing TTLed values behind is something we’re considering. Note this is especially nice if your head state includes say a map which is updated by small deltas (individual keys) We have not yet studied the effect of static columns on s

Re: best practices for time-series data with massive amounts of records

2015-03-06 Thread Clint Kelly
Hi all, Thanks for the responses, this was very helpful. I don't know yet what the distribution of clicks and users will be, but I expect to see a few users with an enormous amount of interactions and most users having very few. The idea of doing some additional manual partitioning, and then mai

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread mck
> Here "partition" is a random digit from 0 to (N*M) > where N=nodes in cluster, and M=arbitrary number. Hopefully it was obvious, but here (unless you've got hot partitions), you don't need N. ~mck

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread Yulian Oifa
Hello You can use timeuuid as raw key and create sepate CF to be used for indexing Indexing CF may be either with user_id as key , or a better approach is to partition row by timestamp. In case of partition you can create compound key , in which you will store user_id and timestamp base ( for examp

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread mck
Clint, > CREATE TABLE events ( > id text, > date text, // Could also use year+month here or year+week or something else > event_time timestamp, > event blob, > PRIMARY KEY ((id, date), event_time)) > WITH CLUSTERING ORDER BY (event_time DESC); > > The downside of this approach is that w

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread Jack Krupansky
I'd recommend using 100K and 10M as rough guidelines for the maximum number of rows and bytes in a single partition. Sure, Cassandra can technically handle a lot more than that, but very large partitions can make your life more difficult. Of course you will have to do a POC to validate the sweet sp

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread Jens Rantil
Hi, I have not done something similar, however I have some comments: On Mon, Mar 2, 2015 at 8:47 PM, Clint Kelly wrote: > The downside of this approach is that we can no longer do a simple > continuous scan to get all of the events for a given user. > Sure, but would you really do that real ti

best practices for time-series data with massive amounts of records

2015-03-02 Thread Clint Kelly
Hi all, I am designing an application that will capture time series data where we expect the number of records per user to potentially be extremely high. I am not sure if we will eclipse the max row size of 2B elements, but I assume that we would not want our application to approach that size any

Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
. I >>>>> don't think it's completely comprehensive (but no tool really is) but it >>>>> gets you 90% of the way there. >>>>> >>>>> It's a good idea to run repairs, especially if you're doing deletes or >>>&

Re: Cassandra Maintenance Best practices

2014-12-16 Thread Ryan Svihla
gt; It's a good idea to run repairs, especially if you're doing deletes or >>>> querying at CL=ONE. I assume you're not using quorum, because on RF=2 >>>> that's the same as CL=ALL. >>>> >>>> I recommend at least RF=3 because if yo

Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
hat's the same as CL=ALL. >>> >>> I recommend at least RF=3 because if you lose 1 server, you're on the >>> edge of data loss. >>> >>> >>> On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi >>> wrote: >>> >>>&

Re: Cassandra Maintenance Best practices

2014-12-16 Thread Ryan Svihla
LL. >> >> I recommend at least RF=3 because if you lose 1 server, you're on the >> edge of data loss. >> >> >> On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi >> wrote: >> >>> Hi, >>> We have Two Node Cluster Configuration in producti

Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
e Cluster Configuration in production with RF=2. >> >> Which means that the data is written in both the clusters and it's >> running for about a month now and has good amount of data. >> >> Questions? >> 1. What are the best practices for maintenance? >> 2. Is OPScenter required to be installed or I can manage with nodetool >> utility? >> 3. Is is necessary to run repair weekly? >> >> thanks >> regards >> Neha >> >

Re: Cassandra Maintenance Best practices

2014-12-15 Thread Neha Trivedi
3 because if you lose 1 server, you're on the edge > of data loss. > > > On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi > wrote: > >> Hi, >> We have Two Node Cluster Configuration in production with RF=2. >> >> Which means that the data is written

Re: Cassandra Maintenance Best practices

2014-12-09 Thread Jonathan Haddad
Configuration in production with RF=2. > > Which means that the data is written in both the clusters and it's running > for about a month now and has good amount of data. > > Questions? > 1. What are the best practices for maintenance? > 2. Is OPScenter required to be ins

Cassandra Maintenance Best practices

2014-12-09 Thread Neha Trivedi
Hi, We have Two Node Cluster Configuration in production with RF=2. Which means that the data is written in both the clusters and it's running for about a month now and has good amount of data. Questions? 1. What are the best practices for maintenance? 2. Is OPScenter required to be install

Re: Wide rows best practices and GC impact

2014-12-04 Thread Jabbar Azam
Hello, I saw this earlier yesterday but didn't want to reply because I didn't know what the cause was. Basically I using wide rows with cassandra 1.x and was inserting data constantly. After about 18 hours the JVM would crash with a dump file. For some reason I removed the compaction throttling a

Re: Wide rows best practices and GC impact

2014-12-03 Thread Gianluca Borello
Thanks Robert, I really appreciate your help! I'm still unsure why Cassandra 2.1 seem to perform much better in that same scenario (even setting the same values of compaction threshold and number of compactors), but I guess we'll revise when we'll decide to upgrade 2.1 in production. On Dec 3, 20

Re: Wide rows best practices and GC impact

2014-12-03 Thread Robert Coli
On Tue, Dec 2, 2014 at 5:01 PM, Gianluca Borello wrote: > We mainly store time series-like data, where each data point is a binary > blob of 5-20KB. We use wide rows, and try to put in the same row all the > data that we usually need in a single query (but not more than that). As a > result, our

Wide rows best practices and GC impact

2014-12-02 Thread Gianluca Borello
pending compactions were never more than 50 (whereas in 2.0.11 it took just about 20 minutes to get to the point of failure) Do you have some best practices on wide rows sizing? Are we doing something wrong by using rows this wide or the problem we experienced is unrelated? Thanks a lot.

Best practices for route tracing

2014-11-16 Thread Clint Kelly
Hi all, I am trying to debug some high-latency outliers (99th percentile) in an application I'm working on. I thought that I could turn on route tracing, print the route traces to logs, and then examine my logs after a load test to find the highest-latency paths and figure out what is going on.

Re: Best practices for frequently updated columns

2014-08-15 Thread Philo Yang
; >> I've read comments about frequent column updates causing compaction >> issues with Cassandra. What is the recommended Cassandra configuration / >> best practices for usage scenarios like this? >> > > If your data is frequently UPDATEd, perhaps a log structured

Re: Best practices for frequently updated columns

2014-08-13 Thread Robert Coli
On Wed, Aug 13, 2014 at 8:01 AM, Jeremy Jongsma wrote: > I've read comments about frequent column updates causing compaction issues > with Cassandra. What is the recommended Cassandra configuration / best > practices for usage scenarios like this? > If your data is frequently U

Best practices for frequently updated columns

2014-08-13 Thread Jeremy Jongsma
for an instrument, some columns will be updated multiple times per second. I've read comments about frequent column updates causing compaction issues with Cassandra. What is the recommended Cassandra configuration / best practices for usage scenarios like this?

Re: Best practices for repair

2014-06-22 Thread Paulo Motta
Paulo > > > On Thu, Jun 19, 2014 at 4:40 PM, Jack Krupansky > wrote: > >> The DataStax doc should be current best practices: >> >> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html >> >> If you or anybody el

Re: Best practices for repair

2014-06-20 Thread Paolo Crosato
that just changing the version on the pom.xml and recompiling it will make it work on 2.0.x. Cheers, Paulo On Thu, Jun 19, 2014 at 4:40 PM, Jack Krupansky mailto:j...@basetechnology.com>> wrote: The DataStax doc should be current best practices: http://www.datastax.com/documen

Re: Best practices for repair

2014-06-19 Thread Paulo Ricardo Motta Gomes
Currently available for 1.2.16, but I guess that just changing the version on the pom.xml and recompiling it will make it work on 2.0.x. Cheers, Paulo On Thu, Jun 19, 2014 at 4:40 PM, Jack Krupansky wrote: > The DataStax doc should be current best practices: > http://www.datastax.com/documen

Re: Best practices for repair

2014-06-19 Thread Jack Krupansky
The DataStax doc should be current best practices: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html If you or anybody else finds it inadequate, speak up. -- Jack Krupansky -Original Message- From: Paolo Crosato Sent: Thursday, June 19

Best practices for repair

2014-06-19 Thread Paolo Crosato
procedure would require us to write some java program that calls describe_splits to get the tokens to feed nodetool repair with. The second procedure is available out of the box only in the commercial version of the opscenter, is this true? I would like to know if these are the current best

CQL 3, Schema change management and best practices

2013-12-30 Thread Todd Carrico
Are there published best practices for managing Schema with CQL 3.0? Say for bootstrapping the schema for a new feature? Do folks query the system.schema_keyspaces on startup and create the necessary schema if it doesn't exist? Or do you have one-off scripts that create schema? Is th

Re: best practices on EC2 question

2013-05-17 Thread aaron morton
I was considering that when bootstrapping starts the nodes receive writes so that when the process is complete they have both the data from the streaming process and all writes from the time they started. So that a repair is not needed. Compared to bootstrapping a node from a backup where a (non

Re: best practices on EC2 question

2013-05-17 Thread Robert Coli
On Fri, May 17, 2013 at 11:13 AM, aaron morton wrote: > Bootstrapping a new node into the cluster has a small impact on the existing > nodes and the new nodes to have all the data they need when the finish the > process. Sorry for the pedantry, but bootstrapping from existing replicas cannot guar

Re: best practices on EC2 question

2013-05-17 Thread aaron morton
> b) do people skip backups altogether except for huge outages and just let > rebooted server instances come up empty to repopulate via C*? This one. Bootstrapping a new node into the cluster has a small impact on the existing nodes and the new nodes to have all the data they need when the fini

Re: best practices on EC2 question

2013-05-16 Thread Janne Jalkanen
On May 16, 2013, at 17:05 , Brian Tarbox wrote: > An alternative that we had explored for a while was to do a two stage backup: > 1) copy a C* snapshot from the ephemeral drive to an EBS drive > 2) do an EBS snapshot to S3. > > The idea being that EBS is quite reliable, S3 is still the emergency

best practices on EC2 question

2013-05-16 Thread Brian Tarbox
>From this list and the NYC* conference it seems that the consensus configuration of C* on EC2 is to put the data on an ephemeral drive and then periodically back it the drive to S3...relying on C*'s inherent fault tolerance to deal with any data loss. Fine, and we're doing this...but we find that

Re: Best practices for nodes to come back

2013-03-13 Thread aaron morton
If you node has been dead for less then gc_grace you can return it to the cluster and run nodetool repair (without the -pr). Until repair has completed will be getting inconsistent results, but if you have been using ONE / ONE for all ops that is a possibility for everything. If the node has b

Re: Primary/secondary index question / best practices?

2012-12-11 Thread Hiller, Dean
"user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >mailto:user@cassandra.apache.org>> >Date: Tuesday, December 11, 2012 3:45 PM >To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >mailto:user@cassandra.apache.org>> &

Re: Primary/secondary index question / best practices?

2012-12-11 Thread Hiller, Dean
ot; mailto:user@cassandra.apache.org>> Subject: RE: Primary/secondary index question / best practices? Dean, thank you for your response. To the second half of the query, I’m a little concerned about the secondary index approach since the indexes that I want to create are columns with high

RE: Primary/secondary index question / best practices?

2012-12-11 Thread Stephen.M.Thompson
imary/secondary index question / best practices? Hard to help out on a design without specifics but here is some advice based on the limited information Primary key : yes, must be cluster unique. TimeUUID or UUIDPlayOrm has very unique TimeUUID like keys as in this one 7AL2S8Y.b1 (b1 i

Re: Primary/secondary index question / best practices?

2012-12-11 Thread Hiller, Dean
gt;" >mailto:stephen.m.thomp...@wellsfargo.co >m>> >Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >mailto:user@cassandra.apache.org>> >Date: Tuesday, December 11, 2012 2:49 PM >To: "user@cassandra.apache.org<mailto:user

Re: Primary/secondary index question / best practices?

2012-12-11 Thread Hiller, Dean
gt; Date: Tuesday, December 11, 2012 2:49 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Primary/secondary index question / best practices? m my reading, it seems like I need a UUID column that will be my primary i

Primary/secondary index question / best practices?

2012-12-11 Thread Stephen.M.Thompson
Hi folks - I'm doing an informal proof-of-concept with Cassandra and I've been getting some conflicting information about how my data layout should go. Perhaps somebody could point me in the right direction. I have a column family that will have billions of rows of data. The data do not have

Re: EC2 Best Practices

2012-04-29 Thread aaron morton
; Sent: Wed, April 25, 2012 18:11 > Subject: Re: EC2 Best Practices > > > > has anybody written up anything related to recovery for fails in EC2? > > this morning i woke up to find 1 (of 4) nodes marked as unreachable. i used > the datastax (1.0.7) ami to set u p my cl

Re: EC2 Best Practices

2012-04-25 Thread Dave Brosius
0 is a perfectly valid id.node - 1 is modulo the maximum token value. that token range is 0 - 2**127so node - 1 in this case is 2**127 - Original Message -From: "Deno Vichas" >;d...@syncopated.net

Re: EC2 Best Practices

2012-04-25 Thread Deno Vichas
ork-in-ec2/ Clients need a single port (9160) to talk to the cluster. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/02/2012, at 3:46 AM, Philip Shon wrote: Are there any good resources for best practices when running Cassan

Re: EC2 Best Practices

2012-02-23 Thread aaron morton
single port (9160) to talk to the cluster. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/02/2012, at 3:46 AM, Philip Shon wrote: > Are there any good resources for best practices when running Cassandra within > EC

EC2 Best Practices

2012-02-23 Thread Philip Shon
Are there any good resources for best practices when running Cassandra within EC2? I'm particularly interested in the security issues, when the servers communicating w/ Cassandra are outside of EC2. Thanks, -Phil

Re: best practices for simulating transactions in Cassandra

2011-12-15 Thread John Laban
all subsequent readers or writers of that >>>>>> data >>>>>> would have to check for abandoned transactions and roll them back >>>>>> themselves before they could read the data. I don't think this is >>>>>> possible >&

Re: best practices for simulating transactions in Cassandra

2011-12-15 Thread Boris Yen
>>>> with the XACT_LOG "replay" approach in these slides though, based on how >>>>> the data is indexed (cassandra node token + timeUUID). >>>>> >>>>> >>>>> PS: How are you liking Cages? >>>>> >>

Re: best practices for simulating transactions in Cassandra

2011-12-12 Thread John Laban
>>> Hi John, >>>>> >>>>> I had exactly the same reflexions. >>>>> >>>>> I'm using zookeeper and cage to lock et isolate. >>>>> >>>>> but how to rollback? >>>>> It's imposs

Re: best practices for simulating transactions in Cassandra

2011-12-12 Thread John Laban
de token + timeUUID). >>>>> >>>>> >>>>> PS: How are you liking Cages? >>>>> >>>>> >>>>> >>>>> >>>>> 2011/12/6 Jérémy SEVELLEC >>>>> >>>>>> Hi

Re: best practices for simulating transactions in Cassandra

2011-12-12 Thread Dominic Williams
;>> - remove (or expire) your column. >>>> >>>> if there is a problem during "making the job", you keep the >>>> possibility to replay and replay and replay (synchronously or in a batch). >>>> >>>> Regards >>>>

Re: best practices for simulating transactions in Cassandra

2011-12-12 Thread Jake Luciani
s impossible so try replay! >>>>> >>>>> the idea is explained in this presentation >>>>> http://www.slideshare.net/mattdennis/cassandra-data-modeling (starting >>>>> from slide 24) >>>>> >>>>> - insert your

Re: best practices for simulating transactions in Cassandra

2011-12-12 Thread John Laban
ards >>> >>> Jérémy >>> >>> >>> 2011/12/5 John Laban >>> >>>> Hello, >>>> >>>> I'm building a system using Cassandra as a datastore and I have a few >>>> places where I am need of transac

  1   2   >