istributed
tokens (hashed keys), all sstables are likely to have almost the
entire possible token range in them.
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
de terminology, be stored
separately in the file system.)
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
ion to responsible node. I.e., it probably means the vnode
information must be kept as state. It is probably difficult to
reconcile with balancing solutions like consistent hashing/crush/etc.
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
is the ring delay stuff which makes it un-workable to do at high
granularity, but that should apply to the active range solution too.
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
pack is limited to a handful of instances. In
order for vnodes to be useful with random placement, we'd need much
more than a handful of vnodes per node (cassandra instances in a
"pack" in that model).
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
t;
> I will have to re-read your orignal post. I seem to have missed something :)
I did, and I may or may not understand what you mean.
Are you comparing vnodes + hashing, with CRUSH + pre-partitioning by
hash + identity hash as you traverse down the topology tree?
--
in unconvinced thus far.
Further, even looking at just the math, the claim cannot possibly hold
as N grows sufficiently large. At some point you will bottleneck on
the network and no longer benefit form a higher RDF, but the
probability of data loss doesn't drop off until you reach DF=number of
partitions (because at that point an increased cluster size doesn't
increase the number of nodes with data sharing with another node).
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Point of clarification: My use of the term "bucket" is completely
unrelated to the term "bucket" used in the CRUSH paper.
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
ssion and details to fill in. I apologize, but again, I really
want to post something now that this is being brought up.
BEGIN un-polished text ("we" = "I"):=
= CRUSHing Cassandra
Author: Peter Schuller
This is a proposal for a significant re-design of some fundamentals of
C
+1 (but FYI changelog has a typo "ahndling").
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
+1
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
commits on
every pull+push iteration.
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
> The Apache Cassandra PMC has voted to add Peter as a committer. Thank
> you Peter, and we look forward to continuing to work with you!
Thank *you*, as do I :)
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
> that's a disambiguation wiki page. what exactly are you talking about?
http://en.wiktionary.org/wiki/when_in_Rome,_do_as_the_Romans_do
Can we *please* stop this thread?
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
into a JIRA ticket after
the fact to figure out what reasoning was).
* You're not rebasing published branches.
The downside I suppose is that the branch count increases.
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
(I don't remember off hand how to tell hector to use auto-discovery.)
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
(And btw, major +1 on the transition to git!)
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
out the specific issue of "git pull" vs "git pull
--rebase" in the simple hacking-away-at-a-single-branch case.
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
> Could this just be commit log reply of the truncate?
Nevermind :)
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Could this just be commit log reply of the truncate?
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
on unless developers ship things early to get it into the
release.
But also keep in mind: If we reach a point where major users of
Cassandra need to run on significantly divergent versions of Cassandra
because the release is just too old, the "normal" mainstream release
will en
are
about being stable, and working, and the version you're upgrading too
should be stable.
(2) Critical fixes need still be maintained for the version you're
running (else you are in fact kind of forced to upgrade).
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
> [1]: http://goo.gl/YtJLq (CHANGES.txt)
Contains merge markers.
>>>>>>> .merge-right.r1176712
0.8.6
--
/ Peter Schuller (@scode on twitter)
> http://mail-archives.apache.org/mod_mbox/cassandra-dev/201109.mbox/%3CCAKkz8Q307TaOfw=7tpkaooal_a+ry_gewnyo-vwnugoenv3...@mail.gmail.com%3E
Oops, I'm sorry. I did actually search my mailbox first, but obviously I failed.
--
/ Peter Schuller (@scode on twitter)
As came up in a thread on user@, I would suggest that
CASSANDRA-3166[1] is enough reason to release 0.8.6. Asking people to
build from source and patch to perform a rolling upgrade isn't good.
[1] https://issues.apache.org/jira/browse/CASSANDRA-3166
--
/ Peter Schuller (@scode on twitter)
probably be updated to reflect that it is slated for 0.8.2?
--
/ Peter Schuller
ster was built with non-released code
> (sporting a different message version).
I believe it is expected in this case due to
https://issues.apache.org/jira/browse/CASSANDRA-2280
--
/ Peter Schuller
rg/jira/browse/CASSANDRA-2420
--
/ Peter Schuller
being sensitive, gossip delays, bootstrapping multiple
nodes at once, etc).
I'm not sure how to get there. It's not like I'm *so* motivated and
have *so* much time that if people agree I'll sit down and write 500
pages of Cassandra handbook. So the question is how to achieve
something incrementally that is yet more organized than the wiki.
Thoughts?
--
/ Peter Schuller
> Please unsubscribe gary.mo...@xerox.com from this email list.
http://wiki.apache.org/cassandra/FAQ#unsubscribe
--
/ Peter Schuller
e mailing lists and JIRA, adjusting the
release engineering a bit seems like a high-priority change towards
that goal.
--
/ Peter Schuller
uction cluster, except:
(7a) New nodes being brought in as seeds
(7b) During the very first initial cluster setup with no data
(7) The above is intended and on purpose, and it would be correct to
operate under these assumptions when updating/improving documentation.
--
/ Peter Schuller
entation is
worse than the user having to read two versions to get a sense of
differences, it seems to make sense.
--
/ Peter Schuller
e for each version?
--
/ Peter Schuller
s/0.6/operations/tuning - but without more
information it's difficult to know what specifically it is that you're
hitting. Are you seriously saying you're running for 15-20 days with
only 2 mb of live data?
--
/ Peter Schuller
> Sorry for spam again. :-)
No, thanks a lot for tracking that down and reporting details!
Presumably a significant amount of users are on that version of Ubuntu
running with openjdk.
--
/ Peter Schuller
ally. I may be interested in
helping out trying to maintain one, but I'm not sure I have sufficient
maven fu yet to be effective (but I'm getting there).)
Regardless, the Riptano maven repository is greatly appreciated as it
appears already.
--
/ Peter Schuller
e queue) if there is no read-ahead until the first successive
access.
I have not checked what actually does happen, nor have I benchmarked
for comparison. But I'd be interested in hearing if people have
already addressed this in the past.
--
/ Peter Schuller
d trying to find places on the wiki that links to the thrift
API page and re-consider whether (or at least how) to link, etc.
--
/ Peter Schuller
> It would be good to document this, or, since the
> correct-even-for-remove logic is not much more complicated, switch to
> that.
Submitted:
https://issues.apache.org/jira/browse/CASSANDRA-1559
--
/ Peter Schuller
rds, I do not believe the remove() code path should not ever
be taken concurrently with insertions (by design, and not by
accident).
Anyone care to confirm/deny?
--
/ Peter Schuller
uld make checking the bloom
> filters unnecessary in most cases for me, but I'm not sure it's worth the
> effort.
Write-through row caching seems like a more direct approach to me
personally, off hand. Also to the extent that you're worried about
false positive rates, larger bloom filters may still be an option (not
currently configurable; would require source changes).
--
/ Peter Schuller
t write-through or not though.
--
/ Peter Schuller
suggestion; might be 1
gig).
(e) log_m(n) will never be large enough for it to be a scaling problem
that you have one thread per "level"
Thoughts?
--
/ Peter Schuller
es, but I may have missed them.
The *.Data.db files are indeed sstables.
--
/ Peter Schuller
eams are closed.
Are the deleted files indeed sstable, or was that a bad assumption on my part?
--
/ Peter Schuller
rge amounts
of data is not what you want (there are any number of practical
situations where this has been an issue for me, if nothing else). But
if I'm overlooking something that would mean that this optimization,
trying to avoid eviction, is useless with Cassandra please do explain
it to me :)
ds though. We'll see.
I'll try to make time for trying this out.
--
/ Peter Schuller
simple rate limiter might help significantly - albeit be
something that has to be tweaked very specifically for the
situation/hardware rather than being auto-tuned.
If I have the time I may look into posix_fadvise() to begin with (but
I'm not promising anything).
Thanks for the input!
--
/ Peter Schuller
pect it to potentially work pretty well without
separation, if you do have such a setup).
--
/ Peter Schuller
h the goal?
--
/ Peter Schuller
cking writes to the commit log for example (are you running with
periodic fsync or batch wise fsync?).
--
/ Peter Schuller
:s such that generated javadocs are easier to
navigate in terms of the overall structure and the roles of packages.
--
/ Peter Schuller
ines as an
easer-to-accomplish goal for the Cassandra developers, yet providing
high payoff to users.
--
/ Peter Schuller
ases is
not an expected use case - beyond some hints in the documentation that
would indicate it's meant for smaller databases.)
--
/ Peter Schuller
55 matches
Mail list logo