Re: Paying off tech debt and correctly naming things

kurt greaves Wed, 21 Mar 2018 01:06:08 -0700

As someone who came to the codebase post CQL but prior to thrift being
removed, +1 to refactor. The current mixing of terminology is a complete
nightmare. This would also give a good opportunity document a lot of code
that simply isn't documented (or incorrect). I'd say it's worth doing it in
multiple steps though, such as refactor of a single class at a time, then
followed by refactor of variable names. We've already done one pretty big
refactor (InetAddressAndPort) for 4.0, I don't see how a few more could
make it any worse (lol).


Row vs partition vs key vs PK is killing me

On 20 March 2018 at 22:04, Jon Haddad <j...@jonhaddad.com> wrote:

> Whenever I hop around in the codebase, one thing that always manages to
> slow me down is needing to understand the context of the variable names
> that I’m looking at.  We’ve now removed thrift the transport, but the
> variables, classes and comments still remain.  Personally, I’d like to go
> in and pay off as much technical debt as possible by refactoring the code
> to be as close to CQL as possible.  Rows should be rows, not partitions,
> I’d love to see the term column family removed forever in favor of always
> using tables.  That said, it’s a big task.  I did a quick refactor in a
> branch, simply changing the ColumnFamilyStore class to TableStore, and
> pushed it up to GitHub. [1]
>
> Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K
> LOC changed across 275 files.  I’ll note that my branch doesn’t change any
> of the almost 1000 search results of “columnfamilystore” found in the
> codebase and hundreds of tests failed on my branch in CircleCI, so that 2K
> LOC change would probably be quite a bit bigger.  There is, of course, a
> lot more than just renaming this one class, there’s thousands of variable
> names using any manner of “cf”, “cfs”, “columnfamily”, names plus comments
> and who knows what else.  There’s lots of references in probably every file
> that would have to get updated.
>
> What are people’s thoughts on this?  We should be honest with ourselves
> and know this isn’t going to get any easier over time.  It’s only going to
> get more confusing for new people to the project, and having to figure out
> “what kind of row am i even looking at” is a waste of time.  There’s
> obviously a much bigger impact than just renaming a bunch of files, there’s
> any number of patches and branches that would become outdated, plus anyone
> pulling in Cassandra as a dependency would be affected.  I don’t really
> have a solution for the disruption other than “leave it in place”, but in
> my mind that’s not a great (or even good) solution.
>
> Anyways, enough out of me.  My concern for ergonomics and naming might be
> significantly higher than the rest of the folks working in the code, and I
> wanted to put a feeler out there before I decided to dig into this in a
> more serious manner.
>
> Jon
>
> [1]
> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
> <
> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
> >

Re: Paying off tech debt and correctly naming things

Reply via email to