Unfortunately I've just tried with a new cluster with RandomPartitioner and it 
doesn't work better :

it may come from hadoop/pig modifications :

18:02:53|elia:hadoop cyril$ git diff --stat cassandra-1.1.5..cassandra-1.2.1 .
 .../apache/cassandra/hadoop/BulkOutputFormat.java  |   27 +--
 .../apache/cassandra/hadoop/BulkRecordWriter.java  |   55 +++---
 .../cassandra/hadoop/ColumnFamilyInputFormat.java  |  102 ++++++----
 .../cassandra/hadoop/ColumnFamilyOutputFormat.java |   31 ++--
 .../cassandra/hadoop/ColumnFamilyRecordReader.java |   76 ++++----
 .../cassandra/hadoop/ColumnFamilyRecordWriter.java |   24 +--
 .../apache/cassandra/hadoop/ColumnFamilySplit.java |   32 ++--
 .../org/apache/cassandra/hadoop/ConfigHelper.java  |   73 ++++++--
 .../cassandra/hadoop/pig/CassandraStorage.java     |  214 +++++++++++++-------
 9 files changed, 380 insertions(+), 254 deletions(-)

Can anyone help on getting more mapper running ? Maybe we should open a bug 
report ?

On May 5, 2013, at 8:45 AM, Shamim <sre...@yandex.ru<mailto:sre...@yandex.ru>> 

  We have also came across this issue in our dev environment, when we upgrade 
Cassandra from 1.1.5 to 1.2.1 version. I have mentioned this issue in few times 
in this forum but haven't got any answer yet. For quick work around you can use 
pig.splitCombination false in your pig script to avoid this issue, but it will 
make one of your task with a very big amount of data. I can't figure out why 
this happening in newer version of Cassandra, strongly guess some thing goes 
wrong in Cassandra implementation of LoadFunc or in Murmur3Partition (it's my 
Here is my earliar post

Any comment from authors will be highly appreciated
P.S. please keep me in touch with any solution or hints.

Best regards
  Shamim A.

03.05.2013, 19:25, "cscetbon....@orange.com" <cscetbon....@orange.com>:
I'm using Pig to calculate the sum of a columns from a columnfamily (scan of 
all rows) and I've read that input data locality is supported at 
However when I execute my Pig script Hadoop assigns only one mapper to the task 
and not one mapper on each node (replication factor = 1).  FYI, I've 8 mappers 
available (2 per node).
Is there anything that can disable the data locality feature ?


