Thanks a lot for the help on this!
>From what I can tell that looks like a good solution. Created
https://issues.apache.org/jira/browse/CASSANDRA-2184 to make that
change.
On Thu, Feb 17, 2011 at 11:52 AM, Matt Kennedy wrote:
> I have a resolution for how I'm dealing with this problem for my pa
I have a resolution for how I'm dealing with this problem for my particular
situation and I'd like to throw it out there to see if you think it should
be integrated into the core Cassandra code.
Just to repeat, the immediate workaround for this is to set
-Dpig.splitCombination=false when you launc
Sorry it has taken me a while to get back to this. I'm still trying to get
to the bottom of this to find where the disconnect is between the column
family input format code and the Pig optimizer.
I suspected that the problem was line 365 of:
http://svn.apache.org/viewvc/pig/tags/release-0.8.0/src
On Fri, Feb 4, 2011 at 9:47 PM, Matt Kennedy wrote:
> Found the culprit. There is a new feature in Pig 0.8 that will try to
> reduce the number of splits used to speed up the whole job. Since the
> ColumnFamilyInputFormat lists the input size as zero, this feature
> eliminates all of the splits
Found the culprit. There is a new feature in Pig 0.8 that will try to reduce
the number of splits used to speed up the whole job. Since the
ColumnFamilyInputFormat lists the input size as zero, this feature eliminates
all of the splits except for one.
The workaround is to disable this featu
I noticed in the jobtracker log that when the pig job kicks off, I get the
following info message:
2011-02-02 09:13:07,269 INFO org.apache.hadoop.mapred.JobInProgress: Input size
for job job_201101241634_0193 = 0. Number of splits = 1
So I looked at the job.split file that is created for the P
I'm running Cassandra 0.7 and I'm trying to get Pig integration to work
correctly. I'm using Pig 0.8 running against Hadoop 20.2, I've also tried this
running against CDH2.
I can log into the grunt shell, and execute scripts, but when they run, they
don't read all of the data from Cassandra.