The Thrift IF predates vnodes. I agree that's a reasonable alternative.
On Apr 2, 2014 12:47 PM, "Clint Kelly" wrote:
> Hi all,
>
> FWIW the HBase Hadoop InputFormat does not even do this kind of estimation
> of data density over various ranges; it just creates one split for every
> region betwee
Hi all,
FWIW the HBase Hadoop InputFormat does not even do this kind of estimation
of data density over various ranges; it just creates one split for every
region between the start and stop keys of the scan. I'll probably just do
something similar by combining token ranges for virtual nodes that
This doesn’t belong to CQL-the language.
However, this could be implemented as a virtual system column family - sooner
or later we’d need something like this anyway.
Then you’d just run SELECT’s against it as if it were a regular column family.
--
AY
On Wednesday, April 2, 2014 at 00:03 AM
Split calculation can't be done client-side because it requires key
sampling (which requires reading the index summary). This would have to be
added to CQL.
Since I can't see any alternatives and this is required for good Hadoop
support, would you mind opening a ticket to add support for this?
Hi Shao-Chuan,
I understand everything you said above except for how we can estimate the
number of rows using the index interval. I understand that the index
interval is a setting that controls how often samples from an SSTable index
are stored in memory, correct? I was under the impression that
Hi Shao-Chuan,
That sounds like a good idea, thanks for your response. I think I may have
missed the e-mail from Tyler that you reference --- I'll go back and look.
FWIW the code that I have written so far is here:
https://github.com/wibiclint/cassandra2-hadoop2
It is in rough shape now be
Tyler mentioned that client.describe_ring(myKeyspace); can be replaced by a
query of system.peers table which has the ring information. The challenge
here is to describe_splits_ex which needs the estimate the number of rows
in each sub token range (as you mentioned).
>From what I understand and tr
I just saw this question about thrift in the Hadoop / Cassandra integration
in the discussion on the user list about freezing thrift. I have been
working on a project to integrate Hadoop 2 and Cassandra 2 and have been
trying to move all of the way over to the Java driver and away from thrift.
I