Just a followup on this issue. Due to the cost of shuffle, we decided
not to do it. Recently, we added new node and ended up in not well
balanced cluster:
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Token
Edward,
Thanks for the response. This is what I thought. The only reason why I am
doing it like this is that I don't know these partition keys in advance
(otherwise I would design this differently). So when I need to insert data,
it looks like I need to insert to both the data table and the table
Thank you for the answer.
My apologies. I should have been clearer with my question.
Say for example, I have a 1000 partition keys and 1 rows per partition
key I am trying to avoid bringing back 10 million rows to find the 1000
partition keys. I assume I cannot avoid bringing back the 10 mill
You can 'list' or 'select *' the column family and you get them in a pseudo
random order. When you say subset it implies you might want a specific
range which is something this schema can not do.
On Sat, Apr 13, 2013 at 2:05 AM, Gareth Collins
wrote:
> Hello,
>
> If I have a cql3 table like th
With your example you can do an equality search with surname and city and
then use "in" with country
Eg. Select * from yourtable where surname="blah" and city="blah blah" and
country in ("country1", "country2")
Hope that helps
Jabbar Azam
On 13 Apr 2013 07:06, "Gareth Collins" wrote:
> Hello,
Hi,
Does anyone have any experience with running a MapReduce directly against a
CF's SSTable files?
I have a use case where this seems to be an option. I want to export all
data from a CF to a flat file format for statistical analysis.
Some factors that make it (more) doable in my case:
-The Cas