Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

Alex Petrov Sun, 28 May 2023 07:15:00 -0700

Regarding approachability, one of the things I thought is worth adding is a 
DSL. I feel like there's enough functionality in Harry and there's enough 
information for anyone who needs to write even an involved test out there, but 
adoption doesn't usually start with complex use-cases, so it could be that 
making it extremely simple to generate the data and validating that written 
data is where it's supposed to be, should help adoption a lot. Unfortunately, 
more complex use-cases such as group-by support, or SAI testing will require a 
bit more knowledge and writing an involved model, so I do not see any shortcuts 
we can take here.


> I do think that moving Harry in-tree would improve approachability

I think it's similar as it is with in-jvm dtest api. I feel like we wold evolve 
it more actively if we didn't have to cut a release before every commit. In 
other words, I think that changing Harry code and extending functionality will 
be easier, which I think will eventually lead to quicker adoption. But of 
course the act of moving itself does not increase adoption, it just comes from 
better ergonomics.


On Thu, May 25, 2023, at 8:03 PM, Abe Ratnofsky wrote:
> I'm seeing a few distinct topics here:
> 
> 1. Harry's adoption and approachability
> 
> I agree that approachability is one of Harry's main improvement areas right 
> now. If our goal is to produce a fuzz testing framework for the Cassandra 
> project, then adoption by contributors and usage for new feature development 
> are reasonable indicators for whether we're achieving that goal. If Harry is 
> not getting adopted by contributors outside of Apple, and is not getting used 
> for new feature development, then we should make an effort to understand why. 
> I don't think that a several-hour seminar is the best point of leverage to 
> achieve those goals.
> 
> Here's what I think we do need:
> 
> - The README should be understandable by anyone interested in writing a fuzz 
> test
> - Example tests should be runnable from a fresh clone of Cassandra, in an IDE 
> or on the command line
> - Examples of how we would test new features (like CEP-7, CEP-29, etc) with 
> the fuzz testing framework
> 
> I find the JVM dtest framework accomplishes similar goals, and one reason is 
> because there are plenty of examples, and it's relatively easy to copy and 
> paste one example and have it do what you'd like. I believe the same approach 
> would work for a fuzz testing framework.
> 
> Some of these tasks above are already done for Harry, such as better IDE 
> support for samples. This will be available in OSS Harry shortly.
> 
> 2. Moving Harry in-tree vs. in submodule
> 
> As I understand it, making Harry a submodule of Cassandra would make it 
> easier to deal with versioning, since we wouldn't have to do the entire 
> release dance we need to do for dtest-api, but I don't see this as a big 
> improvement to approachability.
> 
> I do think that moving Harry in-tree would improve approachability, for the 
> same reason as the JVM dtests. It's nice to write a feature or fix, find a 
> similar JVM dtest, copy, paste, and edit, and have something useful.
> 
> 3. General subdivision of Cassandra projects
> 
> This topic has come up quite a few times recently - around shared utilities 
> (CEP-10 concurrency primitives, etc), dtest-api, query parser, etc. The 
> project has tried out a few different approaches on composition of separate 
> projects. Hopefully in the near future we find the one that works best and 
> can start this process of splitting out libraries.
> 
> --
> Abe
> 
>> On May 25, 2023, at 6:36 AM, Josh McKenzie <jmcken...@apache.org> wrote:
>> 
>>> I would really like us to split out utilities into a common project
>> +1 to the sentiment.
>> 
>> Would also advocate strongly for it being more tightly integrated with the 
>> base project than what we've been doing with our ecosystem (i.e. completely 
>> separate projects, not submodules), mostly from a discoverability and 
>> workflow standpoint.
>> 
>> I'm definitely salty about having to have 4 IDE's / projects open just to 
>> work on the entire stack.
>> 
>> On Thu, May 25, 2023, at 5:05 AM, Alex Petrov wrote:
>>> This was not a talk, but rather an interactive workshop, unfortunately will 
>>> not work in a recorded way, but I am trying to work out ways to preserve 
>>> this.
>>> 
>>> On Thu, May 25, 2023, at 10:26 AM, Claude Warren, Jr via dev wrote:
>>>> Since the talk was not accepted for Cassandra Summit, would it be possible 
>>>> to record it as a simple youtube video and publish it so that the detailed 
>>>> information about how to use Harry is not lost?
>>>> 
>>>> On Thu, May 25, 2023 at 7:36 AM Alex Petrov <al...@coffeenco.de> wrote:
>>>>> __
>>>>> While we are at it, we may also want to pull the in-jvm dtest API as a 
>>>>> submodule, and actually move some tests that are common between the 
>>>>> branches there.
>>>>> 
>>>>> On Thu, May 25, 2023, at 6:03 AM, Caleb Rackliffe wrote:
>>>>>> Isn’t the other reason Accord works well as a submodule that it has no 
>>>>>> dependencies on C* proper? Harry does at the moment, right? (Not that we 
>>>>>> couldn’t address that…just trying to think this through…)
>>>>>> 
>>>>>>> On May 24, 2023, at 6:54 PM, Benedict <bened...@apache.org> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> In this case Harry is a testing module - it’s not something we will 
>>>>>>> develop in tandem with C* releases, and we will want improvements to be 
>>>>>>> applied across all branches.
>>>>>>> 
>>>>>>> So it seems a natural fit for submodules to me.
>>>>>>> 
>>>>>>> 
>>>>>>>> On 24 May 2023, at 21:09, Caleb Rackliffe <calebrackli...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> > Submodules do have their own overhead and edge cases, so I am mostly 
>>>>>>>> > in favor of using for cases where the code must live outside of tree 
>>>>>>>> > (such as jvm-dtest that lives out of tree as all branches need the 
>>>>>>>> > same interfaces)
>>>>>>>> 
>>>>>>>> Agreed. Basically where I've ended up on this topic.
>>>>>>>> 
>>>>>>>> > We could go over some interesting examples such as testing 2i (SAI)
>>>>>>>> 
>>>>>>>> +100
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, May 24, 2023 at 1:40 PM Alex Petrov <al...@coffeenco.de> wrote:
>>>>>>>>> __
>>>>>>>>> > I'm about to need to harry test for the paging across tombstone 
>>>>>>>>> > work for https://issues.apache.org/jira/browse/CASSANDRA-18424 
>>>>>>>>> > (that's where my own overlapping fuzzing came in). In the process, 
>>>>>>>>> > I'll see if I can't distill something really simple along the lines 
>>>>>>>>> > of how React approaches it (https://react.dev/learn).
>>>>>>>>> 
>>>>>>>>> We can pick that up as an example, sure. 
>>>>>>>>> 
>>>>>>>>> On Wed, May 24, 2023, at 4:53 PM, Josh McKenzie wrote:
>>>>>>>>>>> I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
>>>>>>>>>>> workshop,
>>>>>>>>>> I'm about to need to harry test for the paging across tombstone work 
>>>>>>>>>> for https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's 
>>>>>>>>>> where my own overlapping fuzzing came in). In the process, I'll see 
>>>>>>>>>> if I can't distill something really simple along the lines of how 
>>>>>>>>>> React approaches it (https://react.dev/learn).
>>>>>>>>>> 
>>>>>>>>>> Ideally we'd be able to get something together that's a high level 
>>>>>>>>>> "In the next 15 minutes, you will know and understand A-G and have 
>>>>>>>>>> access to N% of the power of harry" kind of offer.
>>>>>>>>>> 
>>>>>>>>>> Honestly, there's a *lot* in our ecosystem where we could benefit 
>>>>>>>>>> from taking a page from their book in terms of onboarding and 
>>>>>>>>>> getting started IMO.
>>>>>>>>>> 
>>>>>>>>>> On Wed, May 24, 2023, at 10:31 AM, Alex Petrov wrote:
>>>>>>>>>>> > I wonder if a mini-onboarding session would be good as a 
>>>>>>>>>>> > community session - go over Harry, how to run it, how to add a 
>>>>>>>>>>> > test?  Would that be the right venue?  I just would like to see 
>>>>>>>>>>> > how we can not only plug it in to regular CI but get everyone 
>>>>>>>>>>> > that wants to add a test be able to know how to get started with 
>>>>>>>>>>> > it.
>>>>>>>>>>> 
>>>>>>>>>>> I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
>>>>>>>>>>> workshop, but unfortunately it got declined. Goes without saying, 
>>>>>>>>>>> we can still do it online, time and resources permitting. But 
>>>>>>>>>>> again, I do not think it should be barring us from making Harry a 
>>>>>>>>>>> part of the codebase, as it already is. In fact, we can be 
>>>>>>>>>>> iterating on the development quicker having it in-tree. 
>>>>>>>>>>> 
>>>>>>>>>>> We could go over some interesting examples such as testing 2i 
>>>>>>>>>>> (SAI), modelling Group By tests, or testing repair. If there is 
>>>>>>>>>>> enough appetite and collaboration in the community, I will see if 
>>>>>>>>>>> we can pull something like that together. Input on _what_ you would 
>>>>>>>>>>> like to see / hear / tested is also appreciated. Harry was 
>>>>>>>>>>> developed out of a strong need for large-scale testing, which also 
>>>>>>>>>>> has informed many of its APIs, but we can make it easier to access 
>>>>>>>>>>> for interactive testing / unit tests. We have been doing a lot of 
>>>>>>>>>>> that with Transactional Metadata, too. 
>>>>>>>>>>> 
>>>>>>>>>>> > I'll hold off on this until Alex Petrov chimes in. @Alex -> got 
>>>>>>>>>>> > any thoughts here?
>>>>>>>>>>> 
>>>>>>>>>>> Yes, sorry for not responding on this thread earlier. I can not 
>>>>>>>>>>> understate how excited I am about this, and how important I think 
>>>>>>>>>>> this is. Time constraints are somehow hard to overcome, but I hope 
>>>>>>>>>>> the results brought by TCM will make it all worth it.
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, May 24, 2023, at 4:23 PM, Alex Petrov wrote:
>>>>>>>>>>>> I think pulling Harry into the tree will make adoption easier for 
>>>>>>>>>>>> the folks. I have been a bit swamped with Transactional Metadata 
>>>>>>>>>>>> work, but I wanted to make some of the things we were using for 
>>>>>>>>>>>> testing TCM available outside of TCM branch. This includes a bunch 
>>>>>>>>>>>> of helper methods to perform operations on the clusters, data 
>>>>>>>>>>>> generation, and more useful stuff. Of course, the question always 
>>>>>>>>>>>> remains about how much time I want to spend porting it all to 
>>>>>>>>>>>> Gossip, but I think we can find a reasonable compromise. 
>>>>>>>>>>>> 
>>>>>>>>>>>> I would not set this improvement as a prerequisite to pulling 
>>>>>>>>>>>> Harry into the main branch, but rather interpret it as a 
>>>>>>>>>>>> commitment from myself to take community input and make it more 
>>>>>>>>>>>> approachable by the day. 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, May 24, 2023, at 2:44 PM, Josh McKenzie wrote:
>>>>>>>>>>>>>> importantly it’s a million times better than the dtest-api 
>>>>>>>>>>>>>> process - which stymies development due to the friction.
>>>>>>>>>>>>> This is my major concern.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What prompted this thread was harry being external to the core 
>>>>>>>>>>>>> codebase and the lack of adoption and usage of it having led to 
>>>>>>>>>>>>> atrophy of certain aspects of it, which then led to redundant 
>>>>>>>>>>>>> implementation of some fuzz testing and lost time.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We'd all be better served to have this closer to the main 
>>>>>>>>>>>>> codebase as a forcing function to smooth out the rough edges, 
>>>>>>>>>>>>> integrate it, and make it a collective artifact and first class 
>>>>>>>>>>>>> citizen IMO.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have similar opinions about the dtest-api.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, May 24, 2023, at 4:05 AM, Benedict wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It’s not without hiccups, and I’m sure we have more to learn. 
>>>>>>>>>>>>>> But it mostly just works, and importantly it’s a million times 
>>>>>>>>>>>>>> better than the dtest-api process - which stymies development 
>>>>>>>>>>>>>> due to the friction.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 24 May 2023, at 08:39, Mick Semb Wever <m...@apache.org> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> WRT git submodules and CASSANDRA-18204, are we happy with how 
>>>>>>>>>>>>>>> it is working for accord ? 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The time spent on getting that running has been a fair few 
>>>>>>>>>>>>>>> hours, where we could have cut many manual module releases in 
>>>>>>>>>>>>>>> that time. 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> David and folks working on accord ? 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, 23 May 2023 at 20:09, Josh McKenzie 
>>>>>>>>>>>>>>> <jmcken...@apache.org> wrote:
>>>>>>>>>>>>>>>> __
>>>>>>>>>>>>>>>> I'll hold off on this until Alex Petrov chimes in. @Alex -> 
>>>>>>>>>>>>>>>> got any thoughts here?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, May 16, 2023, at 5:17 PM, Jeremy Hanna wrote:
>>>>>>>>>>>>>>>>> I think it would be great to onboard Harry more officially 
>>>>>>>>>>>>>>>>> into the project.  However it would be nice to perhaps do 
>>>>>>>>>>>>>>>>> some sanity checking outside of Apple folks to see how 
>>>>>>>>>>>>>>>>> approachable it is.  That is, can someone take it and just 
>>>>>>>>>>>>>>>>> run it with the current readme without any additional context?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I wonder if a mini-onboarding session would be good as a 
>>>>>>>>>>>>>>>>> community session - go over Harry, how to run it, how to add 
>>>>>>>>>>>>>>>>> a test?  Would that be the right venue?  I just would like to 
>>>>>>>>>>>>>>>>> see how we can not only plug it in to regular CI but get 
>>>>>>>>>>>>>>>>> everyone that wants to add a test be able to know how to get 
>>>>>>>>>>>>>>>>> started with it.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Jeremy
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On May 16, 2023, at 1:34 PM, Abe Ratnofsky <a...@aber.io> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Just to make sure I'm understanding the details, this would 
>>>>>>>>>>>>>>>>>> mean apache/cassandra-harry maintains its status as a 
>>>>>>>>>>>>>>>>>> separate repository, apache/cassandra references it as a 
>>>>>>>>>>>>>>>>>> submodule, and clones and builds Harry locally, rather than 
>>>>>>>>>>>>>>>>>> pulling a released JAR. We can then reference Harry as a 
>>>>>>>>>>>>>>>>>> library without maintaining public artifacts for it. Is that 
>>>>>>>>>>>>>>>>>> in line with what you're thinking?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> > I'd also like to see us get a Harry run integrated as part 
>>>>>>>>>>>>>>>>>> > of our pre-commit CI
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I'm a strong supporter of this, of course.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On May 16, 2023, at 11:03 AM, Josh McKenzie 
>>>>>>>>>>>>>>>>>>> <jmcken...@apache.org> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Similar to what we've done with accord in 
>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18204, I'd 
>>>>>>>>>>>>>>>>>>> like to discuss bringing cassandra-harry in-tree as a 
>>>>>>>>>>>>>>>>>>> submodule. repo link: 
>>>>>>>>>>>>>>>>>>> https://github.com/apache/cassandra-harry
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Given the value it's brought to the project's stabilization 
>>>>>>>>>>>>>>>>>>> efforts and the movement of other things in the ecosystem 
>>>>>>>>>>>>>>>>>>> to being more integrated (accord, build-scripts 
>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18133), I 
>>>>>>>>>>>>>>>>>>> think having the testing framework better localized and 
>>>>>>>>>>>>>>>>>>> integrated would be a net benefit for adoption, awareness, 
>>>>>>>>>>>>>>>>>>> maintenance, and tighter workflows as we troubleshoot 
>>>>>>>>>>>>>>>>>>> future failures it surfaces.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I'd also like to see us get a Harry run integrated as part 
>>>>>>>>>>>>>>>>>>> of our pre-commit CI (a 5 minute simple soak test for 
>>>>>>>>>>>>>>>>>>> instance) and having that local in this fashion should make 
>>>>>>>>>>>>>>>>>>> that a cleaner integration as well.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thoughts?

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

Reply via email to