Hi all I'm finishing an MSc in which my final project is to implement a new compaction strategy in Cassandra. I've discussed the main points of the strategy with other community members and received valuable feedback. However, I understand this will be a tough challenge for someone who has never worked with Cassandra, but after getting to know the technology, I've found it fascinating. This mixed with always wanting to contribute to an ope source project led me to chose it as the topic for my MSC Project.
But because this is my first time contributing to an open source project, I've some questions on how to proceed correctly. Looking at the Contribute <http://wiki.apache.org/cassandra/HowToContribute> page, I see that we're supposed to create a ticket before starting working on it, so should I just create one or does the strategy usefulness need to be validated by someone before? In this case, should I just proceed and implement it, or do something else? And finally, is this the correct mailing list to be asking this sort of questions? :) As for the code itself, in case I have a question like "Should we be using an abstract class for compaction classes?" or "What is this method supposed to do?", can I ask here? What is the best course of action to learn about the details of the code in Cassandra? I already saw that it has some comments, but probably won't be enough for me. The strategy I have in mind will be very simple until I finish the MSc. After that, I'll improve it with other features and feedback I got, but for the moment, it'll rely on a time interval (probably scheduled at specific hours, maybe during a time with less traffic on the system). During that time interval, the rows will be made unique across all SSTables, but only if, after a prior analysis, we find that the row exists in a certain number of SSTables above a certain threshold. I suppose it's a naive strategy, but the aim here is to give me experience with C*, and of course I'll be happy to take suggestions. But I'll probably only use the ideas after delivering the project because, at the moment, I need to keep it simple. Otherwise, I'll never be able to deliver the project. :) Sorry for the long email, and thanks for all the help in advance! I'm very excited about this project and look forward to being part of this community! Best regards Pedro Gordo