I couldn't agree more, the DataStax docs (try saying that 3 times fast) are definitely the most complete and user-friendly source for end-users, while the wiki contains a lot more detailed information on the architecture and internals.
Ideally, I'd like to see the user docs be in a place that the community can maintain, although I imagine DataStax would likely keep their own, be it a mirror or independently maintained. Another issue I've seen becoming increasingly problematic on the wiki is versioning. With so many major differences between each major release, it's important that we properly version the docs by release. Some effort here has been made on the API page, but there are other pages where this has become a problem, especially ones pertaining to operations details. I don't think the Wiki is the right place for community maintained user docs; it doesn't have the necessary structure. Perhaps some generated docs maintained in-tree and hosted somewhere on cassandra.apache.org might be an idea? This would also enforce some order over changes made to them as changes would be controlled by committers and managed through JIRA. These are just some ideas I had while reading Peter's post, feel free to tear them apart if you disagree. Just do it nicely. Regards, Nick Telford On 31 March 2011 17:58, Peter Schuller <peter.schul...@infidyne.com> wrote: > In response to the apparent mass confusion about nodetool repair that > became evidence in the thread: > > http://www.mail-archive.com/user@cassandra.apache.org/msg11755.html > > I started looking around to see what is actually claimed about repair. > I found that the Datastax docs: > > http://www.datastax.com/docs/0.7/operations/scheduled_tasks#repair > > ... uses phrasing which seems very very wrong. It strongly seems to > imply that you should not normally run nodetool repair on a cluster. > > First of all, have I completely flown off the handle and completely > and utterly confused myself - is what I say in the E-Mail thread > wrong? > > On the assumption that I'm not crazy, I think this is a good time to > talk about documentation. I've been itching for a while about the > state of documentation. There is the ad-hoc wiki, and there is the > Datastax stuff, but neither is really complete. > > What I ask myself is how we can achieve the goal that people who are > trying to adopt Cassandra can do so, and use it reliably, without > extensive time spent following mailinglists, JIRA, and trying to keep > track of what's still true and not on the wiki, etc. > > This includes challenges like: > > * How to actually phrase and structure documentation in an accessible > fashion for people who just want to *use* Cassandra, and not be > knee-deep in the community. > > * Try to minimize the amount of "subtle detail" that you have to get > right in order to not have a problem; the very up-to-you-to-fix and > not-very-well-advertised state of 'nodetool repair' is a good example. > Whatever can be done to avoid there even having to *be* documentation > for it, except for people who want to know extra details or are doing > stuff like not having deletes and wanting to avoid repair. > > * Keeping the documentation up-to-date. > > Do people agree with the goals and the claim that we're not there? > What are good ways to achieve the goals? > > I keep feeling the need that there should really be a handbook. The > datastax docs seem to be the right "format" (similarly to the FreeBSD > handbook, which is a good example). But it seems we need something > more agile that people can easily contribute to, while it still can be > kept up-to-date. So what can be done? > > Is having a handbook a good idea? The key property of what I call a > handbook is that there is some section on e.g. "Cassandra operations" > that is reasonably long, and that someone can read through from > beginning to end and get a coherent overall view of how things work, > and know the important aspects that must be taken care of in > production clusters. > > It's fine if every single little detail and potential kink isn't there > (like long long details about how to calculate memtable thresholds). > But stuff like 'yeah, you need to run nodetool repair at least as > often as X"' is important. So are operational best-practices for > performing operations on a cluster in a safe manner (e.g., moving > nodes, seeds being sensitive, gossip delays, bootstrapping multiple > nodes at once, etc). > > I'm not sure how to get there. It's not like I'm *so* motivated and > have *so* much time that if people agree I'll sit down and write 500 > pages of Cassandra handbook. So the question is how to achieve > something incrementally that is yet more organized than the wiki. > > Thoughts? > > -- > / Peter Schuller >