Re: Problems with shuffle

2013-04-13 Thread Rustam Aliyev
Just a followup on this issue. Due to the cost of shuffle, we decided not to do it. Recently, we added new node and ended up in not well balanced cluster: Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Token

Re: Anyway To Query Just The Partition Key?

2013-04-13 Thread Gareth Collins
Edward, Thanks for the response. This is what I thought. The only reason why I am doing it like this is that I don't know these partition keys in advance (otherwise I would design this differently). So when I need to insert data, it looks like I need to insert to both the data table and the table

Re: Anyway To Query Just The Partition Key?

2013-04-13 Thread Gareth Collins
Thank you for the answer. My apologies. I should have been clearer with my question. Say for example, I have a 1000 partition keys and 1 rows per partition key I am trying to avoid bringing back 10 million rows to find the 1000 partition keys. I assume I cannot avoid bringing back the 10 mill

Re: Anyway To Query Just The Partition Key?

2013-04-13 Thread Edward Capriolo
You can 'list' or 'select *' the column family and you get them in a pseudo random order. When you say subset it implies you might want a specific range which is something this schema can not do. On Sat, Apr 13, 2013 at 2:05 AM, Gareth Collins wrote: > Hello, > > If I have a cql3 table like th

Re: Anyway To Query Just The Partition Key?

2013-04-13 Thread Jabbar Azam
With your example you can do an equality search with surname and city and then use "in" with country Eg. Select * from yourtable where surname="blah" and city="blah blah" and country in ("country1", "country2") Hope that helps Jabbar Azam On 13 Apr 2013 07:06, "Gareth Collins" wrote: > Hello,

Extracting data from SSTable files with MapReduce

2013-04-13 Thread Jasper K.
Hi, Does anyone have any experience with running a MapReduce directly against a CF's SSTable files? I have a use case where this seems to be an option. I want to export all data from a CF to a flat file format for statistical analysis. Some factors that make it (more) doable in my case: -The Cas