For UI and interactive data exploration there is already the Cassandra interpreter for Apache Zeppelin that is more than decent for the job
On Wed, Feb 21, 2018 at 9:19 AM, Daniel Hölbling-Inzko < daniel.hoelbling-in...@bitmovin.com> wrote: > But what does this video really show? That Microsoft managed to run > Cassandra as a SaaS product with nice UI? > Google did that years ago with BigTable and Amazon with DynamoDB. > > I agree that we need more tools, but not so much for querying (although > that would also help a bit), but just in general the project feels > unapproachable right now. > Besides the excellent DataStax documentation there is little best practice > knowledge about how to operate and provision Cassandra clusters. > Having some recipes for Chef, Puppet or Ansible that show the most common > settings (or some Cloudfoundry/GCP Templates or Helm Charts) would be > really useful. > Also a list of all the projects that Cassandra goes well with (like TLP > Reaper and and Netflix's Priam etc..) > > greetings Daniel > > On Wed, 21 Feb 2018 at 07:23 Kenneth Brotman <kenbrot...@yahoo.com.invalid> > wrote: > >> If you watch this video through you'll see why usability is so >> important. You can't ignore usability issues. >> >> Cassandra does not exist in a vacuum. The competitors are world class. >> >> The video is on the New Cassandra API for Azure Cosmos DB: >> https://www.youtube.com/watch?v=1Sf4McGN1AQ >> >> Kenneth Brotman >> >> -----Original Message----- >> From: Daniel Hölbling-Inzko [mailto:daniel.hoelbling-in...@bitmovin.com] >> Sent: Tuesday, February 20, 2018 1:28 AM >> To: user@cassandra.apache.org; James Briggs >> Cc: d...@cassandra.apache.org >> Subject: Re: Cassandra Needs to Grow Up by Version Five! >> >> Hi, >> >> I have to add my own two cents here as the main thing that keeps me from >> really running Cassandra is the amount of pain running it incurs. >> Not so much because it's actually painful but because the tools are so >> different and the documentation and best practices are scattered across a >> dozen outdated DataStax articles and this mailing list etc.. We've been >> hesitant (although our use case is perfect for using Cassandra) to deploy >> Cassandra to any critical systems as even after a year of running it we >> still don't have the operational experience to confidently run critical >> systems with it. >> >> Simple things like a foolproof / safe cluster-wide S3 Backup (like >> Elasticsearch has it) would for example solve a TON of issues for new >> people. I don't need it auto-scheduled or something, but having to >> configure cron jobs across the whole cluster is a pain in the ass for small >> teams. >> To be honest, even the way snapshots are done right now is already super >> painful. Every other system I operated so far will just create one backup >> folder I can export, in C* the Backup is scattered across a bunch of >> different Keyspace folders etc.. needless to say that it took a while until >> I trusted my backup scripts fully. >> >> And especially for a Database I believe Backup/Restore needs to be a >> non-issue that's documented front and center. If not smaller teams just >> don't have the resources to dedicate to learning and building the tools >> around it. >> >> Now that the team is getting larger we could spare the resources to >> operate these things, but switching from a well-understood RDBMs schema to >> Cassandra is now incredibly hard and will probably take years. >> >> greetings Daniel >> >> On Tue, 20 Feb 2018 at 05:56 James Briggs <james.bri...@yahoo.com. >> invalid> >> wrote: >> >> > Kenneth: >> > >> > What you said is not wrong. >> > >> > Vertica and Riak are examples of distributed databases that don't >> > require hand-holding. >> > >> > Cassandra is for Java-programmer DIYers, or more often Datastax >> > clients, at this point. >> > Thanks, James. >> > >> > ------------------------------ >> > *From:* Kenneth Brotman <kenbrot...@yahoo.com.INVALID> >> > *To:* user@cassandra.apache.org >> > *Cc:* d...@cassandra.apache.org >> > *Sent:* Monday, February 19, 2018 4:56 PM >> > >> > *Subject:* RE: Cassandra Needs to Grow Up by Version Five! >> > >> > Jeff, you helped me figure out what I was missing. It just took me a >> > day to digest what you wrote. I’m coming over from another type of >> > engineering. I didn’t know and it’s not really documented. Cassandra >> > runs in a data center. Now days that means the nodes are going to be >> > in managed containers, Docker containers, managed by Kerbernetes, >> > Meso or something, and for that reason anyone operating Cassandra in a >> > real world setting would not encounter the issues I raised in the way I >> described. >> > >> > Shouldn’t the architectural diagrams people reference indicate that in >> > some way? That would have help me. >> > >> > Kenneth Brotman >> > >> > *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com] >> > *Sent:* Monday, February 19, 2018 10:43 AM >> > *To:* 'user@cassandra.apache.org' >> > *Cc:* 'd...@cassandra.apache.org' >> > *Subject:* RE: Cassandra Needs to Grow Up by Version Five! >> > >> > Well said. Very fair. I wouldn’t mind hearing from others still >> > You’re a good guy! >> > >> > Kenneth Brotman >> > >> > *From:* Jeff Jirsa [mailto:jji...@gmail.com <jji...@gmail.com>] >> > *Sent:* Monday, February 19, 2018 9:10 AM >> > *To:* cassandra >> > *Cc:* Cassandra DEV >> > *Subject:* Re: Cassandra Needs to Grow Up by Version Five! >> > >> > There's a lot of things below I disagree with, but it's ok. I >> > convinced myself not to nit-pick every point. >> > >> > https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of >> > Stefan's work with cert management >> > >> > Beyond that, I encourage you to do what Michael suggested: open JIRAs >> > for things you care strongly about, work on them if you have time. >> > Sometime this year we'll schedule a NGCC (Next Generation Cassandra >> > Conference) where we talk about future project work and direction, I >> > encourage you to attend if you're able (I encourage anyone who cares >> > about the direction of Cassandra to attend, it's probably be either >> > free or very low cost, just to cover a venue and some food). If >> > nothing else, you'll meet some of the teams who are working on the >> > project, and learn why they've selected the projects on which they're >> > working. You'll have an opportunity to pitch your vision, and maybe you >> can talk some folks into helping out. >> > >> > - Jeff >> > >> > >> > >> > >> > On Mon, Feb 19, 2018 at 1:01 AM, Kenneth Brotman < >> > kenbrot...@yahoo.com.invalid> wrote: >> > Comments inline >> > >> > >-----Original Message----- >> > >From: Jeff Jirsa [mailto:jji...@gmail.com] >> > >Sent: Sunday, February 18, 2018 10:58 PM >> > >To: user@cassandra.apache.org >> > >Cc: d...@cassandra.apache.org >> > >Subject: Re: Cassandra Needs to Grow Up by Version Five! >> > > >> > >Comments inline >> > > >> > > >> > >> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman < >> > kenbrot...@yahoo.com.INVALID> wrote: >> > >> >> > > >Cassandra feels like an unfinished program to me. The problem is >> > > >not >> > that it’s open source or cutting edge. It’s an open source cutting >> > edge program that lacks some of its basic functionality. We are all >> > stuck addressing fundamental mechanical tasks for Cassandra because >> > the basic code that would do that part has not been contributed yet. >> > >> >> > >There’s probably 2-3 reasons why here: >> > > >> > >1) Historically the pmc has tried to keep the scope of the project >> > >very >> > narrow. It’s a database. We don’t ship drivers. We don’t ship >> > developer tools. We don’t ship fancy UIs. We ship a database. I think >> > for the most part the narrow vision has been for the best, but maybe >> > it’s time to reconsider some of the scope. >> > > >> > >Postgres will autovacuum to prevent wraparound (hopefully), but >> > >everyone >> > I know running Postgres uses flexible-freeze in cron - sometimes it’s >> > ok to let the database have its opinions and let third party tools >> > fill in the gaps. >> > > >> > >> > I can appreciate the desire to stay in scope. I believe usability is >> > the King. When users have to learn the database, then learn what they >> > have to automate, then learn an automation tool and then use the >> > automation tool to do something that is as fundamental as the >> > fundamental tasks I described, then something is missing from the >> > database itself that is adversely affecting usability - and that is >> > very bad. Where those big companies need to calculate the ROI is in >> > the cost of acquiring or training the next group of users. Consider >> how steep the learning curve is for new users. >> > Consider the business case for improving ease of use. >> > >> > >2) Cassandra is, by definition, a database for large scale problems. >> > >Most >> > of the companies working on/with it tend to be big companies. Big >> > companies often have pre-existing automation that solved the stuff you >> > consider fundamental tasks, so there’s probably nobody actively >> > working on the solved problems that you may consider missing features >> > - for many people they’re already solved. >> > > >> > >> > I could be wrong but it sounds like a lot of the code work is done, >> > and if the companies would take the time to contribute more code, then >> > the rest of the code needed could be generated easily. >> > >> > >3) It’s not nearly as basic as you think it is. Datastax seemingly >> > >had a >> > multi-person team on opscenter, and while it was better than anything >> > else around last time I used it (before it stopped supporting the OSS >> > version), it left a lot to be desired. It’s probably 2-3 engineers >> > working for a month to have any sort of meaningful, reliable, mostly >> > trivial cluster-managing UI, and I can think of about 10 JIRAs I’d >> > rather see that time be spent on first. >> > >> > How about 6-9 engineers working 12 months a year on it then. I'm not >> > kidding. For a big company with revenues in the tens of billions or >> > more, and a heavy use of Cassandra nodes, it's easy to make a case for >> > having a full time person or more that involved. They aren't paying >> > for using the open source code that is Cassandra. Let's see what >> > would the licensing fees be for a big company if the costs where like >> Microsoft or Oracle would >> > charge for their enterprise level relational database? What's the >> > contribution of one or two people in comparison. >> > >> > >> Ease of use issues need to be given much more attention. For an >> > administrator, the ease of use of Cassandra is very poor. >> > >> >> > >>Furthermore, currently Cassandra is an idiot. We have to do >> > >>everything >> > for Cassandra. Contrast that with the fact that we are in the dawn of >> > artificial intelligence. >> > >> >> > > >> > >And for everything you think is obvious, there’s a 50% chance someone >> > else will have already solved differently, and your obvious new >> > solution will be seen as an inconvenient assumption and complexity >> > they won’t appreciate. Open source projects get to walk a fine line of >> > trying to be useful without making too many assumptions, being “too” >> > opinionated, or overstepping bounds. We may be too conservative, but >> > it’s very easy to go too far in the opposite direction. >> > > >> > >> > I appreciate that but when such concerns result in inaction instead of >> > resolution that is no good. >> > >> > >> Software exists to automate tasks for humans, not mechanize humans >> > >> to >> > administer tasks for a database. I’m an engineering type. My job is >> > to apply science and technology to solve real world problems. And >> > that’s where I need an organization’s I.T. talent to focus; not in >> > crank starting an unfinished database. >> > >> >> > > >> > >And that’s why nobody’s done it - we all have bigger problems we’re >> > >being >> > paid to solve, and nobody’s felt it necessary. Because it’s not >> > necessary, it’s nice, but not required. >> > > >> > >> > Of course you would say that, you're Jeff Jirsa. In apprenticeship >> > speak, you’re a master. It's the classic challenge of trying to get >> > a master to see the legitimate issues of the apprentices. I do >> > appreciate the time you give to answer posts to the groups , like this >> > post. So I don't want you to take anything the wrong way. Where it's >> > going to bit everyone is in the future adoption rate. It has to be >> addressed. >> > >> > [snip] >> > >> > >> Certificate management should be automated. >> > >> >> > >Stefan (in particular) has done a fair amount of work on this, but >> > >I’d >> > bet 90% of users don’t use ssl and genuinely don’t care. >> > > >> > >> > I didn't realize. Could I trouble you for a link so I could get up to >> > speed? >> > >> > >> Cluster wide management should be a big theme in any next major >> release. >> > >> >> > >Na. Stability and testing should be a big theme in the next major >> release. >> > > >> > >> > Double Na on that one Jeff. I think you have a concern there about >> > the need to test sufficiently to ensure the stability of the next >> > major release. That makes perfect sense.- for every release, >> > especially the major ones. Continuous improvement is not a phase of >> > development for example. CI should be in everything, in every phase. >> > Stability and testing a part of every release not just one. A major >> > release should be a nice step from the previous major release though. >> > >> > >> What is a major release? How many major releases could a program >> > >> have >> > before all the coding for basic stuff like installation, configuration >> > and maintenance is included! >> > >> >> > >> Finish the basic coding of Cassandra, make it easy to use for >> > administrators, make is smart, add cluster wide management. Keep >> > Cassandra competitive or it will soon be the old Model T we all >> remember fondly. >> > >> >> > > >> > >Let’s keep some perspective. Most of us came to Cassandra from rdbms >> > worlds where we were building solutions out of a bunch of master/slave >> > MySQL / Postgres type databases. I started using Cassandra 0.6 when I >> > needed to store something like 400gb/day in 200whatever on spinning >> > disks when 100gb felt like a “big” database, and the thought of >> > writing runbooks and automation to automatically pick the most up to >> > date slave as the new master, promote it, repoint the other slave to >> > the new master, then reformat the old master and add it as a new slave >> > without downtime and without potentially deleting the company’s whole >> dataset sounded awful. >> > Cassandra solved that problem, at the cost of maintaining a few yaml >> > (then >> > xml) files. Yes there are rough edges - they get slightly less rough >> > on each new release. Can we do better? Sure, use your engineering time >> > and send some patches. But the basic stuff is the nuts and bolts of >> > the >> > database: I care way more about streaming and compaction than I’ll >> > ever care about installation. >> > > >> > >> > I can relate. I was studying the enterprise level MS SQL Server >> > stuff. I noticed exactly what you described. I decided maybe I'll >> > just do other stuff and wait for things to develop more. I'm very >> > excited about the way Cassandra addresses things. Streaming and >> > compaction - very good. I'm glad. Items related to usability are not >> optional though. >> > >> > >> I ask the Committee to compile a list of all such items, make a >> > >> plan, >> > and commit to including the completed and tested code as part of major >> > release 5.0. I further ask that release 4.0 not be delayed and then >> > there be an unusually short skip to version 5.0. >> > >> >> > > >> > >The committers are working their ass off on all sorts of hard problems. >> > Some of those are probably even related to Cassandra. If you have >> > idea, open a JIRA. If you have time, send a patch. Or review a patch. >> > But don’t expect a bunch of people to set down work on optimizing the >> > database to work on packaging and installation, because there’s no ROI >> > in it for 99% of the existing committers: we’re working on the >> > database to solve problems, and installation isn’t one of those >> problems. >> > >> > I'm sure they are working very hard on all kinds of hard problems. I >> > actually wrote "Committee", not "committers" There is an obvious >> > shortage of contributors when you consider the size of the >> > organizations using Cassandra. That leave the burden on an unfair >> > few. Installation or more generally I would say usability is not that >> > big a problem for the big companies out there. Good for them. >> > >> > Ask a new organization or a modest size organization that is >> > struggling to manage their Cassandra cluster that usability is not a >> > big problem. It truly is a big problem for many stakeholders of >> > Cassandra. It needs to be given a bigger priority. Hopefully others >> will weigh in. >> > >> > Kenneth Brotman >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> > <user-unsubscribe@cassandra.apacheorg> >> > For additional commands, e-mail: user-h...@cassandra.apache.org >> > >> > >> > >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >>