As someone who came to the codebase post CQL but prior to thrift being removed, +1 to refactor. The current mixing of terminology is a complete nightmare. This would also give a good opportunity document a lot of code that simply isn't documented (or incorrect). I'd say it's worth doing it in multiple steps though, such as refactor of a single class at a time, then followed by refactor of variable names. We've already done one pretty big refactor (InetAddressAndPort) for 4.0, I don't see how a few more could make it any worse (lol).
Row vs partition vs key vs PK is killing me On 20 March 2018 at 22:04, Jon Haddad <j...@jonhaddad.com> wrote: > Whenever I hop around in the codebase, one thing that always manages to > slow me down is needing to understand the context of the variable names > that I’m looking at. We’ve now removed thrift the transport, but the > variables, classes and comments still remain. Personally, I’d like to go > in and pay off as much technical debt as possible by refactoring the code > to be as close to CQL as possible. Rows should be rows, not partitions, > I’d love to see the term column family removed forever in favor of always > using tables. That said, it’s a big task. I did a quick refactor in a > branch, simply changing the ColumnFamilyStore class to TableStore, and > pushed it up to GitHub. [1] > > Didn’t click on the link? That’s ok. The TL;DR is that it’s almost 2K > LOC changed across 275 files. I’ll note that my branch doesn’t change any > of the almost 1000 search results of “columnfamilystore” found in the > codebase and hundreds of tests failed on my branch in CircleCI, so that 2K > LOC change would probably be quite a bit bigger. There is, of course, a > lot more than just renaming this one class, there’s thousands of variable > names using any manner of “cf”, “cfs”, “columnfamily”, names plus comments > and who knows what else. There’s lots of references in probably every file > that would have to get updated. > > What are people’s thoughts on this? We should be honest with ourselves > and know this isn’t going to get any easier over time. It’s only going to > get more confusing for new people to the project, and having to figure out > “what kind of row am i even looking at” is a waste of time. There’s > obviously a much bigger impact than just renaming a bunch of files, there’s > any number of patches and branches that would become outdated, plus anyone > pulling in Cassandra as a dependency would be affected. I don’t really > have a solution for the disruption other than “leave it in place”, but in > my mind that’s not a great (or even good) solution. > > Anyways, enough out of me. My concern for ergonomics and naming might be > significantly higher than the rest of the folks working in the code, and I > wanted to put a feeler out there before I decided to dig into this in a > more serious manner. > > Jon > > [1] > https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1 > < > https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1 > >