Re: Limit on having number of nodes in C* cluster

2017-08-21 Thread Jon Haddad
As far as I know, those 75K nodes are not in a single cluster. If memory serves correctly (and this article seems to indicate that it does http://www.techrepublic.com/article/apples-secret-nosql-sauce-includes-a-hefty-dose-of-cassandra/

Re: Upgrade requirements for upgrading from cassandra 2.1.x to 2.2.x

2017-08-22 Thread Jon Haddad
NEWS.txt is the goto spot for upgrade instructions, caveats, etc. Jon > On Aug 22, 2017, at 2:46 PM, Chuck Reynolds wrote: > > Anyone? > > From: "Chuck (me) Reynolds" > Reply-To: "user@cassandra.apache.org" > Date: Tuesday, August 22, 2017 at 9:40 AM > To: "user@cassandra.apache.org" > Sub

Re: C* 3 node issue -Urgent

2017-09-06 Thread Jon Haddad
I wouldn’t worry about being meticulous about keeping RF = N as the cluster grows. If you had 60 nodes and your auth data was only on 9 you’d be completely fine. > On Sep 6, 2017, at 11:36 AM, Cogumelos Maravilha > wrote: > > After insert a new node we should: > > ALTER KEYSPACE system_au

Re: Cassandra compatibility matrix

2017-09-07 Thread Jon Haddad
There aren’t any drivers maintained by the Cassandra project. Compatibility is up to each driver. Usually a section is included in the README. For instance, in the DataStax Java Driver: https://github.com/datastax/java-driver#compatibility

Re: Modify keyspace replication strategy and rebalance the nodes

2017-09-18 Thread Jon Haddad
For those of you who like trivia, simpleSnitch is hard coded to report every node in DC in “datacenter1” and in rack “rack1”, there’s no way around it. https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/locator/SimpleSnitch.java#L28

Re: Modify keyspace replication strategy and rebalance the nodes

2017-09-18 Thread Jon Haddad
> strategy - he's using the right snitch but SimpleStrategy ignores it > > That's the same reason that adding a new DC doesn't work - the relocation > strategy is dc agnostic and changing it safely IS the problem > > > > -- > Jeff Jirsa > >

Re: Massive deletes -> major compaction?

2017-09-21 Thread Jon Haddad
Have you considered the fantastic DeletingCompactionStrategy? https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy > On Sep 21, 2017, at 11:51 AM, Jeff Jirsa wrote: > >

Re: detail of compactionstats, pending tasks

2017-09-21 Thread Jon Haddad
Pending tasks are not a queue, they are an estimation of the amount of work it would take to reach a perfect compaction point, but the compactions aren’t independent from one another. For instance, with LCS you may have a compaction from L0 -> L1, which triggers a L1 -> L2 compaction. You can’

Reaper 0.7 is released!

2017-09-27 Thread Jon Haddad
Hey folks, We (The Last Pickle) are proud to announce the release of Reaper 0.7! In this release we've added support to run Reaper across multiple data centers as well as supporting Reaper failover when using the Cassandra storage backend. You can grab DEB, RPM and tarballs off the downloads p

Re: Reaper 0.7 is released!

2017-09-27 Thread Jon Haddad
:33 AM -07:00 from Aiman Parvaiz > : > > Thanks!! Love Reaper :) > > Sent from my iPhone > > On Sep 27, 2017, at 10:01 AM, Jon Haddad > wrote: > >> Hey folks, >> >> We (The Last Pickle) are proud to announce the release of Reaper 0.7! In >>

Re: new question ;-) // RE: understanding batch atomicity

2017-09-29 Thread Jon Haddad
The use of “atomic” for batches is misleading. Batches will eventually complete, that doesn’t make them atomic. “All or nothing” is also incorrect, as you can read them in the middle and get “some parts of it”, and without a rollback it’s just “eventually all”. > On Sep 29, 2017, at 10:59 AM

Re: Increasing VNodes

2017-10-04 Thread Jon Haddad
The site (with the docs) is probably more helpful to learn about how reaper works: http://cassandra-reaper.io/ > On Oct 4, 2017, at 9:54 AM, Chris Lohfink wrote: > > Increasing number of tokens will make repairs worse not better. You can just > split the sub range

Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-04 Thread Jon Haddad
Seems pretty overengineered, imo, given you can just save the pagination state as Andy Tolbert pointed out. > On Oct 4, 2017, at 8:38 AM, Daniel Hölbling-Inzko > wrote: > > Thanks for pointing me to Elassandra. > Have you had any experience running this in production at scale? Not sure if > I

Re: Node failure

2017-10-06 Thread Jon Haddad
I’ve had a few use cases for downgrading consistency over the years. If you’re showing a customer dashboard w/ some Ad summary data, it’s great to be right, but showing a number that’s close is better than not being up. > On Oct 6, 2017, at 1:32 PM, Jeff Jirsa wrote: > > I think it was Brando

Re: Could not connect to localhost:9160 when installing Cassandra on AWS

2017-10-10 Thread Jon Haddad
How did you install Cassandra? Try passing the machine’s IP address to cqlsh, like “cqlsh 192.168.1.1" > On Oct 10, 2017, at 10:43 AM, Lutaya Shafiq Holmes > wrote: > > Hello Cassandra Gurus, > > After I installed Cassandra on AWS- This error comes up when I try to > Start CQLSH > > Could

Re: Cassandra 3.11.0 compaction attempting impossible to complete compactions

2017-10-13 Thread Jon Haddad
Can you paste the output of cassandra compactionstats? What you’re describing should not happen. There’s a check that drops sstables out of a compaction task if there isn’t enough available disk space, see https://issues.apache.org/jira/browse/CASSANDRA-12979

Re: Looking for advice and assistance upgrading from Cassandra 1.2.9

2017-10-17 Thread Jon Haddad
I recommend going all the way to 2.2. > On Oct 17, 2017, at 12:37 PM, Jeff Jirsa wrote: > > You’ll go from 1.2 to 2.0 to 2.1 - should be basic steps: > - make sure you have all 1.2 sstables by running upgradesstable > - one node at a time, swap the 1.2 binaries for latest in 2.0 > - once all nod

Re: Inter Data Center Latency calculation of a Multi DC cluster running in AWS

2017-10-17 Thread Jon Haddad
I recommend figuring out the latency between your datacenters. Cassandra isn’t going to be any more than that barring JVM pauses on the remote coordinator. > On Oct 17, 2017, at 4:17 PM, Bill Walters wrote: > > Hi Everyone, > > I need some suggestions on finding the time taken for Cassandra r

Re: Golang + Cassandra + Text Search

2017-10-24 Thread Jon Haddad
When someone talks about full text search, I usually assume there’s more required than keyword search, ie simple tokenization and a little stemming. * Term Vectors, common used for a “more like this feature” * Ranking of search results * Facets * More complex tokenization like trigrams So anyw

Re: Why don't I see my spark jobs running in parallel in Cassandra/Spark DSE cluster?

2017-10-27 Thread Jon Haddad
Seems like a question better suited for the Spark mailing list, or the DSE support , not OSS Cassandra. > On Oct 27, 2017, at 8:14 AM, Thakrar, Jayesh > wrote: > > What you have is sequential and hence sequential processing. > Also Spark/Scala are not parallel programming languages. > But even

Re: Tuning bootstrap new node

2017-10-31 Thread Jon Haddad
Of all the settings you could change, why one that’s related to memtables? Streaming doesn’t go through the write path, memtables aren’t involved unless you’re using materialized views or CDC. > On Oct 31, 2017, at 11:44 AM, Anubhav Kale > wrote: > > You can change YAML setting of memtable_c

Re: Stable Cassandra 3.x version for production

2017-11-07 Thread Jon Haddad
I regularly work with teams that have 3.11.{0.1} in prod, and would recommend it for new clusters. Avoid materialized views and SASI until you really understand how they work and their limitations. MVs solve about one use case correctly, SASI is good if you’re querying a single partition *bu

Re: best practice for repair

2017-11-13 Thread Jon Haddad
We (The Last Pickle) maintain Reaper, an open source repair tool, specifically to address all the complexity around repairs. http://cassandra-reaper.io/ Jon > On Nov 13, 2017, at 3:18 AM, Peng Xiao <2535...@qq.com> wrote: > > sub-range repair is much like primary

Reaper 1.0

2017-11-14 Thread Jon Haddad
We’re excited to announce the release of the 1.0 version of Reaper for Apache Cassandra! We’ve made a lot of improvements to the flexibility of managing repairs and simplified the UI based on feedback we’ve received. We’ve written a blog post discussing the changes in detail here: http://thel

Re: Reaper 1.0

2017-11-15 Thread Jon Haddad
thers is strictly prohibited. If you are not the intended recipient (or > authorized to receive for the recipient), please contact the sender by reply > email and delete all copies of this message. > Please click here > <http://www.cisco.com/web/about/doing_business/

Re: CQL Map vs clustering keys

2017-11-15 Thread Jon Haddad
In 3.0, clustering columns are not actually part of the column name anymore. Yay. Aaron Morton wrote a detailed analysis of the 3.x storage engine here: http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html

Re: Reaper 1.0

2017-11-17 Thread Jon Haddad
opies of this message. > Please click here > <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for > Company Registration Information. > > From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad > Sent: Wednesday, November 15, 2017 9:

Re: How quickly we can bootstrap

2017-11-19 Thread Jon Haddad
It sounds like you’re asking how to bootstrap without paying the cost of bootstrapping :) If you want to scale out, you’ll need to deal with the time it takes. You can’t add a node and have it up in 15 minutes, if you’re running 3 TB it’ll take a while. The exact amount of time depends largel

Re: Time series modeling in C* for range queries

2017-11-19 Thread Jon Haddad
Hi Junaid, I wrote a blog post a few months ago on massively scalable time series, going into a couple techniques on bucketing that you might find helpful. http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html

Re: Solr Search With Apache Cassandra

2017-11-20 Thread Jon Haddad
That’s long since been abandoned (last commit was 5 years ago) > On Nov 20, 2017, at 12:10 PM, Nageswara Rao wrote: > > There is a fork with name on this combo called solandra > > https://github.com/tjake/Solandra > > Please check. > > > On 20 Nov 2017 3:

Re: can't reach cassandra outside my lan

2017-11-30 Thread Jon Haddad
Cassandra is listening on your localhost address, 127.0.0.1, not your laptop’s address on the network. Set rpc_address to the address on your network, or use rpc_interface and let Cassandra figure it out. > On Nov 30, 2017, at 10:38 AM, Andrea Giordano > wrote: > > Hi, osx user here. > I hav

Re: Schema version mismatch with 3.0.8 and 3.0.14

2017-12-01 Thread Jon Haddad
Generally speaking, I would never advise someone to add nodes to a cluster using a different version than the rest of the cluster. > On Dec 1, 2017, at 11:58 AM, Jai Bheemsen Rao Dhanwada > wrote: > > Thanks Jeff, > > I did some more testing on this version upgrade and here is brief summary

Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-07 Thread Jon Haddad
1. No, Apache Cassandra is pretty terrible for search on it’s own. Even with SASI. 2. Maybe, but it’s complicated, and doing it right takes a lot of experience. I’d use Elastic Search instead. > On Dec 7, 2017, at 5:39 PM, @Nandan@ wrote: > > Hi Peoples, > > As currently around the world

Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-07 Thread Jon Haddad
; Cassandra and use Elastic Search for Select records from tables. ? > > > On Fri, Dec 8, 2017 at 9:50 AM, Jon Haddad <mailto:j...@jonhaddad.com>> wrote: > 1. No, Apache Cassandra is pretty terrible for search on it’s own. Even with > SASI. > 2. Maybe, but it’s com

Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-07 Thread Jon Haddad
; On Fri, Dec 8, 2017 at 10:54 AM, Jon Haddad <mailto:j...@jonhaddad.com>> wrote: > I mean ES is great as a search engine. I would use Cassandra as my source of > truth, and also index my data in ES. > > I typed my original message before I walked my dog, I should ha

Re: Upgrade using rebuild

2017-12-14 Thread Jon Haddad
no > On Dec 14, 2017, at 10:59 AM, Anshu Vajpayee wrote: > > Thanks! I am aware with these steps. > > I m just thinking , is it possible to do the upgrade using nodetool rebuild > like we rebuld new dc ? > > Has anyone tried - upgrade with nodetool rebuild ? > > > > On Thu, 14 Dec 2017 a

Re: Upgrade using rebuild

2017-12-14 Thread Jon Haddad
Heh, hit send accidentally. You generally can’t run rebuild to upgrade, because it’s a streaming operation. Streaming isn’t supported between versions, although on 3.x it might work. > On Dec 14, 2017, at 11:01 AM, Jon Haddad wrote: > > no > >> On Dec 14, 2017, at 10:59

Re: Tablesnap with custom endpoint?

2017-12-14 Thread Jon Haddad
Tablesnap uses boto, you may be able to override the S3 endpoint. This Stack Overflow answer suggests it’s possible, but you might have to modify the tablesnap script a little: https://stackoverflow.com/questions/32618216/overwrite-s3-endpoint-using-boto3-configuration-file

Re: NVMe SSD benchmarking with Cassandra

2018-01-05 Thread Jon Haddad
Generally speaking, disable readahead. After that it's very likely the issue isn’t in the settings you’re using the disk settings, but is actually in your Cassandra config or the data model. How are you measuring things? Are you saturating your disks? What resource is your bottleneck? *Ever

Re: NVMe SSD benchmarking with Cassandra

2018-01-05 Thread Jon Haddad
:48 PM, Jon Haddad wrote: > > Generally speaking, disable readahead. After that it's very likely the issue > isn’t in the settings you’re using the disk settings, but is actually in your > Cassandra config or the data model. How are you measuring things? Are you > satur

Re: 3.0.15 or 3.11.1

2018-01-07 Thread Jon Haddad
There’s a tweak to TWCS in 3.11.1 that lets data expire faster, but I wouldn’t call it unstable in any version I’ve ever used it with. I’ve deployed it on 2.0, 2.1, 2.2 [1], and used it in every version of C* that we’ve shipped it, and have never had an issue. I would put 3.11.1 in prod over

Re: Full repair caused disk space increase issue

2018-01-09 Thread Jon Haddad
The old files will not be split. TWCS doesn’t ever do that. > On Jan 9, 2018, at 12:26 AM, wxn...@zjqunshuo.com wrote: > > Hi Alex, > After I changed one node to TWCS using JMX command, it started to compact. I > expect the old large sstable files will be split into smaller ones according >

Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-10 Thread Jon Haddad
For what it’s worth, we (TLP) just posted some results comparing pre and post meltdown statistics: http://thelastpickle.com/blog/2018/01/10/meltdown-impact-on-latency.html > On Jan 10, 2018, at 1:57 AM, Steinmaurer, Thom

Re: vnodes: high availability

2018-01-16 Thread Jon Haddad
While all the token math is helpful, I have to also call out the elephant in the room: You have not correctly configured Cassandra for production. If you had used the correct endpoint snitch & network topology strategy, you would be able to withstand the complete failure of an entire availabili

Re: vnodes: high availability

2018-01-16 Thread Jon Haddad
perience setting up cluster with vnodes < 256 for C* > 2.1? > > vnodes=32 also too high, as for me (we need to have much more than 32 servers > per AZ in order to to get 'reliable' cluster) > vnodes=4 seems to be better from HA + balancing trade-off > > Thanks

Re: Is it recommended to enable debug log in production

2018-01-16 Thread Jon Haddad
In certain versions (2.2 specifically) I’ve seen a massive performance hit from the extra logging in some very specific circumstances. In the case I looked at it was due to the added overhead of reflection. The issue I found was resolved in 3.0 (I think), but I always disable DEBUG logging now

Re: vnodes: high availability

2018-01-17 Thread Jon Haddad
> small number of vnodes for the versions using old allocation method because >> of hot-spots, so it's not an option for my particular case (v.2.1) :( >> >> [As far as I can see from the source code this new method wasn't backported >> to 2.1.] >>

Re: question about nodetool decommission

2018-01-17 Thread Jon Haddad
For what it’s worth, it’s going to be a lot faster to rsync the data to a new node and replace the old one than to decommission and bootstrap. > On Jan 17, 2018, at 3:20 PM, Jerome Basa wrote: > >> What C* version you are working with? > 3.0.14 > >> What is the reason you're decommissioning th

Re: unable to start cassandra 3.11.1

2018-02-02 Thread Jon Haddad
Java 9 is a significantly larger issue, see CASSANDRA-9608. > On Feb 2, 2018, at 8:49 AM, Kant Kodali wrote: > > When you say latest Java runtime you mean does it work with Java 9 as well? > > On Fri, Feb 2, 2018 at 5:02 AM, Sam Tunnicliffe > wrote: > I've actually just

Re: Add column if it does not exist?

2018-02-07 Thread Jon Haddad
All of the drivers also have keyspace / table metadata. For instance: https://datastax.github.io/python-driver/api/cassandra/metadata.html I’d be *really* careful how you use this. A lot of teams want to just deploy their c

Re: Bootstrapping fails with < 128GB RAM ...

2018-02-07 Thread Jon Haddad
It would be extremely helpful to get some info about your heap. At a bare minimum, a histogram of the heap dump would be useful, but ideally a full heap dump would be best. jmap -dump:live,format=b,file=heap.bin PID Taking a look at that in YourKit should give some pretty quick insight into

Re: GDPR, Right to Be Forgotten, and Cassandra

2018-02-09 Thread Jon Haddad
Give this a read through: https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy Basically you write your own logic for how stuff gets forgotten, then you can recompact ever

Re: GDPR, Right to Be Forgotten, and Cassandra

2018-02-09 Thread Jon Haddad
e new user-defined compaction option recently introduced, provided > you can determine over which SSTables a partition is spread > > On Fri, Feb 9, 2018 at 5:23 PM, Jon Haddad <mailto:j...@jonhaddad.com>> wrote: > Give this a read through: > > https://github.com/prote

Re: storing indexes on ssd

2018-02-13 Thread Jon Haddad
It seems like cart-before-horse decision to assume you want to keep your index files cached but not your data files. Why not rely on lvmcache’s statistics about file access to determine what to keep and what not to? It’s going to keep your most heavily hit blocks in the cache and your least hi

Re: vnode random token assignment and replicated data antipatterns

2018-02-20 Thread Jon Haddad
That’s why you use a NTS + a snitch, it picks replaces based on rack awareness. > On Feb 20, 2018, at 9:33 AM, Carl Mueller > wrote: > > So in theory, one could double a cluster by: > > 1) moving snapshots of each node to a new node. > 2) for each snapshot moved, figure out the primary range o

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-20 Thread Jon Haddad
The file format is independent from compaction. A compaction strategy only selects sstables to be compacted, that’s it’s only job. It could have side effects, like generating other files, but any decent compaction strategy will account for the fact that those other files don’t exist. I wrote

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Jon Haddad
Ken, Maybe it’s not clear how open source projects work, so let me try to explain. There’s a bunch of us who either get paid by someone or volunteer on our free time. The folks that get paid, (yay!) usually take direction on what the priorities are, and work on projects that directly affect o

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jon Haddad
Great question. Unfortunately, our OSS docs lack a step by step process on how to add a DC, I’ve created a JIRA to do that: https://issues.apache.org/jira/browse/CASSANDRA-14254 The datastax docs are pretty good for this though: https://

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jon Haddad
gt; > On Thu, Feb 22, 2018 at 12:56 PM Kenneth Brotman > mailto:kenbrot...@yahoo.com.invalid>> wrote: > > That information would have saved me time too. Thanks for making a JIRA for > it Jon. Perhaps this is a good JIRA for me to begin with. > > > > Ken

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-23 Thread Jon Haddad
In my opinion and experience, this isn’t a real problem, since you define a list of seeds as the first few nodes you add to a cluster. When would you add a node to an existing cluster and mark itself as a seed? It’s neither practical or something you’d do by accident. > On Feb 23, 2018, at

Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

2018-02-24 Thread Jon Haddad
You can’t migrate down that way. The last several nodes you have up will get completely overwhelmed, and you’ll be completely screwed. Please do not give advice like this unless you’ve actually gone through the process or at least have an understanding of how the data will be shifted. Adding

Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

2018-02-24 Thread Jon Haddad
We don’t have this documented *anywhere* right now, I’ve created a JIRA to update the site with the relevant info on this topic: https://issues.apache.org/jira/browse/CASSANDRA-14258 <https://issues.apache.org/jira/browse/CASSANDRA-14258> > On Feb 24, 2018, at 7:44 AM, Jon Hadd

Re: Gathering / Curating / Organizing Cassandra Best Practices & Patterns

2018-02-24 Thread Jon Haddad
DataStax academy is great but no, no work needs to be or should be aligned with it. Datastax is an independent company trying to make a profit, they could yank their docs at any time. There’s a reason why we started doing the docs in-tree, there was too much of a reliance on DS documentation.

Re: How to Parse raw CQL text?

2018-02-26 Thread Jon Haddad
Yes ideally. I’ve been spending a bit of time in the parser the last week. There’s a lot of internals which are still using old terminology and are pretty damn confusing. I’m doing a little investigation into exposing some of the information while also modernizing it. > On Feb 26, 2018, a

Re: Filling in the blank To Do sections on the Apache Cassandra web site

2018-02-27 Thread Jon Haddad
The docs have been in tree for years :) https://github.com/apache/cassandra/tree/trunk/doc There’s even a docker image to build them so you don’t need to mess with sphinx. Check the README for instructions. Jon > On Feb 27, 2018, at 9:49 A

Re: Filling in the blank To Do sections on the Apache Cassandra web site

2018-02-27 Thread Jon Haddad
> wrote: > > I was just getting ready to install sphinx. Cool. > > From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad > Sent: Tuesday, February 27, 2018 9:51 AM > To: user@cassandra.apache.org > Subject: Re: Filling in the blank To Do sec

Re: Adding disk to operating C*

2018-03-09 Thread Jon Haddad
I agree with Jeff - I usually advise teams to cap their density around 3TB, especially with TWCS. Read heavy workloads tend to use smaller datasets and ring size ends up being a function of performance tuning. Since 2.2 bootstrap can now be resumed, which helps quite a bit with the streami

Re: What versions should the documentation support now?

2018-03-12 Thread Jon Haddad
Docs for 3.0 go in the 3.0 branch. I’ve never heard of anyone shipping docs for multiple versions, I don’t know why we’d do that. You can get the docs for any version you need by downloading C*, the docs are included. I’m a firm -1 on changing that process. Jon > On Mar 12, 2018, at 9:19 AM,

Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Jon Haddad
Or use spotify’s reaper and forget about it https://github.com/spotify/cassandra-reaper > On Apr 13, 2015, at 3:45 PM, Robert Coli wrote: > > On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland > wrote: > Nodetool repair -par

Re: Efficient IP address location lookup

2013-11-15 Thread Jon Haddad
Instead of determining your table first, you should figure out what you want to ask Cassandra. What do you want to look up your data by? For each query you may need to store the data multiple times, which is perfectly reasonable and is recommended. On Nov 15, 2013, at 4:36 PM, Jacob Rhoden wr

Re: Struggling to understand CFS and its use.

2013-11-17 Thread Jon Haddad
Having used (and moved off of) Titan I do not recommend it as a primary database. Until it overcomes it’s extremely unoptimized graph traversals, it will increase the load on your database by several orders of magnitude. As a secondary analytics database, it might do fine. Just don’t rely on

Re: Securing Cassandra database

2014-04-05 Thread Jon Haddad
This isn’t Cassandra specific, but this is why I hate including db configuration with the main codebase instead of making it the responsibility of ops. This case you described shouldn’t even be possible. The production db configs should be provided by the team maintaining the production enviro

Re: Recommended Approach for Config Changes

2014-04-25 Thread Jon Haddad
You might want to take a peek at what’s happening in the process via strace -p or tcpdump. I can’t remember ever waiting an hour for a node to rejoin. On Apr 25, 2014, at 8:59 AM, Tyler Hobbs wrote: > > On Fri, Apr 25, 2014 at 10:43 AM, Phil Burress > wrote: > Thanks. I made a change to a

Re: Cassandra data retention policy

2014-04-28 Thread Jon Haddad
He said below that he’d like to keep the old data, so that might rule out TTLs in any case. You’ve got a few options that I can think of off the top of my head. The easiest from a management perspective is to use one table per month. WhateverData042014 would be this months. It’s easy enough

Re: Cassandra vs Elasticsearch.

2014-05-03 Thread Jon Haddad
Agreed w/ ES not being the durable data store. I would recommend treating it as ephemeral, and using Cassandra as your source of truth. Keep in mind if you change your ES index mapping, you’ll require a full reindex in order to search the data properly. It’s not like adding a secondary index

Re: Recommendation for hosting multi tenant clusters

2013-08-13 Thread Jon Haddad
I strongly recommend against EBS, even with optimized & ebs provisioned. The throughput you'll get from local drives is significantly better than what you'll get with EBS (even 4K iops provisioned) On Aug 13, 2013, at 2:10 PM, Rahul Gupta wrote: > I am working on requirement to host multi ten

Re: Custom commands in cassandra

2013-08-14 Thread Jon Haddad
Aside from the problems mentioned below, it's a rare case that tightly coupling your application code directly into your database makes it easier to maintain your codebase, especially as you scale. If you roll out your custom Cassandra application, then decide you need search, will you also emb

Re: Configuring ephemeral only column family

2013-08-16 Thread Jon Haddad
+1 for redis for this use case. On Aug 16, 2013, at 10:54 AM, Robert Coli wrote: > On Fri, Aug 16, 2013 at 10:43 AM, Todd Nine wrote: > We're using expiring columns as a mean for locking. > > Perhaps a log structured data store with immutable data files is not ideal > for your use case? >

Re: Failed decommission

2013-08-25 Thread Jon Haddad
We ran into a similar issue as well. I believe we removed the node via cqlsh from the system keyspace, restarted the cluster, then ran a repair. I'm not sure how safe this really is though. On Aug 25, 2013, at 8:47 AM, Mike Heffner wrote: > Janne, > > We ran into this too. Appears it's a b

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad
Does your previous snapshot include the system keyspace? I haven't tried upgrading from 1.0.x then rolling back, but it's possible there's some backwards incompatible changes.Other than that, make sure you also rolled back your config files? On Aug 30, 2013, at 8:57 AM, Mike Neir wrote:

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad
Sorry, I didn't see the test procedure, it's still early. On Aug 30, 2013, at 8:57 AM, Mike Neir wrote: > Greetings folks, > > I'm faced with the need to update a 36 node cluster with roughly 25T of data > on disk to a version of cassandra in the 1.2.x series. While it seems that > 1.2.8 will

Re: CQL & Thrift

2013-08-30 Thread Jon Haddad
If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra wrote: > Hi, > If i a create a table with CQL3 as > > create table user(user_id text PRIMARY KEY, first_name text, last_name text, >

Re: CQL & Thrift

2013-08-30 Thread Jon Haddad
would require coverting it to other types. > > thrift is much more powerful in that respect. > > not everyone needs to take advantage of the full power of dynamic columns. > > > On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad wrote: > Just curious - what do you need to do that req

Re: CQL & Thrift

2013-08-30 Thread Jon Haddad
for select queries. > > CQL is too limiting and negates the power of storing arbitrary data types in > dynamic columns. > > > On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad wrote: > If you're going to work with CQL, work with CQL. If you're going to work > with

Re: CQL & Thrift

2013-08-30 Thread Jon Haddad
ode for the driver they're using. > > > > On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis wrote: > http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows > > > On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wrote: > > my bias perspec

Re: CQL & Thrift

2013-08-30 Thread Jon Haddad
g 30, 2013 at 12:53 PM, Peter Lin wrote: > > my bias perspective, I find the sweet spot is thrift for insert/update and > CQL for select queries. > > CQL is too limiting and negates the power of storing arbitrary data types in > dynamic columns. > > > On Fri, Aug

Re: Cassandra cluster migration in Amazon EC2

2013-09-02 Thread Jon Haddad
If you launch the new servers, have them join the cluster, then decommission the old ones, you'll be able to do it without downtime. It'll also have the effect of randomizing the tokens, I believe. On Sep 2, 2013, at 4:21 PM, Renat Gilfanov wrote: > Hello, > > Currently we have a Cassandra

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Jon Haddad
It sounds some something that's only useful in a really limited use case. In an 11 node cluster it would be quorum reads / writes would need to come from 6 nodes. It would probably be much slower for both reads & writes. It sounds like what you want is a database with replication, not partiti

Re: DELETE does not delete :)

2013-10-07 Thread Jon Haddad
I haven't used VMWare but it seems odd that it would lock up the ntp port. try "ps aux | grep ntp" to see if ntpd it's already running. On Oct 7, 2013, at 12:23 AM, Alexander Shutyaev wrote: > Hi Michał, > > I didn't notice your message at first.. Well this seems like a real cause > candidat

Re: one big cluster vs multiple smaller clusters

2013-10-13 Thread Jon Haddad
This is a pretty vague question. What are you trying to achieve? On Oct 12, 2013, at 9:05 PM, Wei Zhu wrote: > Hi, > As we bring more use cases to Cassandra, we have been thinking about the best > way to host it. Let's say we will have 15 physical machines available, we can > use all of them

Re: Output of "nodetool ring" with virtual nodes

2013-10-15 Thread Jon Haddad
It's expected. I think nodetool status is meant to replace nodetool ring. On Oct 15, 2013, at 11:45 AM, Paulo Motta wrote: > Hello, > > I recently did the "Enabling virtual nodes on an existing production cluster" > procedure > (http://www.datastax.com/documentation/cassandra/1.2/webhelp/cas

Re: mixed linux/windows cluster in Cassandra-1.2

2013-10-21 Thread Jon Haddad
I can't imagine any situation where this would be practical. What would be the reason to even consider this? On Oct 21, 2013, at 11:06 AM, Robert Coli wrote: > On Mon, Oct 21, 2013 at 12:55 AM, Илья Шипицин wrote: > is mixed linux/windows cluster configuration supported in 1.2 ? > > I don't

Re: Wide rows/composite keys clarification needed

2013-10-21 Thread Jon Haddad
If you're working with CQL, you don't need to worry about the column names, it's handled for you. If you specify multiple keys as part of the primary key, they become clustering keys and are mapped to the column names. So if you have a sensor_id / time_stamp, all your sensor readings will be i

Re: Too many open files (Cassandra 2.0.1)

2013-10-29 Thread Jon Haddad
In general, my understanding is that memory mapped files use a lot of open file handlers. We raise all our DBs to unlimited open files. On Oct 29, 2013, at 8:30 AM, Pieter Callewaert wrote: > Investigated a bit more: > > -I can reproduce it, happened already on several nodes when I

Re: AWS ephemeral instances + backup

2019-12-05 Thread Jon Haddad
You can easily do this with bcache or LVM http://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/. Medusa might be a good route to go down if you want to do backups instead: https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html On Thu, Dec 5, 2019 at 1

Re: Connection Pooling in v4.x Java Driver

2019-12-10 Thread Jon Haddad
I'm not sure how closely the driver maintainers are following this list. You might want to ask on the Java Driver mailing list: https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user On Tue, Dec 10, 2019 at 5:10 PM Caravaggio, Kevin < kevin.caravag...@lowes.com> wrote: >

Re: execute is faster than execute_async?

2019-12-11 Thread Jon Haddad
I'm not sure how you're measuring this - could you share your benchmarking code? I ask because execute calls execute_async under the hood: https://github.com/datastax/python-driver/blob/master/cassandra/cluster.py#L2316 I tested the python driver a ways back and found some weird behavior due to t

Re: Streaming Failed during bootstrap of a Replacement node

2019-12-20 Thread Jon Haddad
You should upgrade to Cassandra 3.11.5 before doing anything else. You're running a pretty old and buggy version. There's been hundreds (maybe thousands) of bugs fixed between 3.3 and 3.11.5. On Fri, Dec 20, 2019 at 10:46 AM Nethi, Manoj wrote: > Hi, > > > > We are seeing the following error w

Re: Streaming Failed during bootstrap of a Replacement node

2019-12-20 Thread Jon Haddad
n, > > Yes we will upgrade it soon. But before we can upgrade shouldn’t we get > this lost node in the cluster to be replaced ? > > > > > > > > *From:* Jon Haddad > *Sent:* Friday, December 20, 2019 2:13 PM > *To:* user@cassandra.apache.org > *Subject:* Re:

Re: Question on large partition key

2019-12-31 Thread Jon Haddad
I suggest checking out Aaron Morton's post on the 3.0 storage engine. https://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html On Tue, Dec 31, 2019 at 11:20 AM Subroto Barua wrote: > I have a table --- > > create Table mytable ( > > Id text, > > cdat

<    1   2   3   >