Hi all

I'm finishing an MSc in which my final project is to implement a new
compaction strategy in Cassandra. I've discussed the main points of the
strategy with other community members and received valuable feedback.
However, I understand this will be a tough challenge for someone who has
never worked with Cassandra, but after getting to know the technology, I've
found it fascinating. This mixed with always wanting to contribute to an
ope source project led me to chose it as the topic for my MSC Project.

But because this is my first time contributing to an open source project,
I've some questions on how to proceed correctly. Looking at the Contribute
<http://wiki.apache.org/cassandra/HowToContribute> page, I see that we're
supposed to create a ticket before starting working on it, so should I just
create one or does the strategy usefulness need to be validated by someone
before? In this case, should I just proceed and implement it, or do
something else? And finally, is this the correct mailing list to be asking
this sort of questions? :)

As for the code itself, in case I have a question like "Should we be using
an abstract class for compaction classes?" or "What is this method supposed
to do?", can I ask here?
What is the best course of action to learn about the details of the code in
Cassandra? I already saw that it has some comments, but probably won't be
enough for me.

The strategy I have in mind will be very simple until I finish the MSc.
After that, I'll improve it with other features and feedback I got, but for
the moment, it'll rely on a time interval (probably scheduled at specific
hours, maybe during a time with less traffic on the system). During that
time interval, the rows will be made unique across all SSTables, but only
if, after a prior analysis, we find that the row exists in a certain number
of SSTables above a certain threshold.

I suppose it's a naive strategy, but the aim here is to give me experience
with C*, and of course I'll be happy to take suggestions. But I'll probably
only use the ideas after delivering the project because, at the moment, I
need to keep it simple. Otherwise, I'll never be able to deliver the
project. :)

Sorry for the long email, and thanks for all the help in advance! I'm very
excited about this project and look forward to being part of this
community!

Best regards
Pedro Gordo

Reply via email to