Re: range ghosts and more with hadoop support (with proposed solution)

Jeremy Hanna Mon, 04 Jul 2011 08:55:41 -0700

Created https://issues.apache.org/jira/browse/CASSANDRA-2855


On Jul 1, 2011, at 9:09 PM, Jeremy Hanna wrote:

> We think we're running into a situation where we've deleted all the columns 
> on several thousand rows but they still show up in the results of our pig 
> scripts.  We think that's a product of range ghosts because 
> ColumnFamilyRecordReader uses getRangeSlices.  So that might be a problem for 
> people and I think we have something that might address that.
> 
> What if we were to have a hadoop job specific option to have the CFRR filter 
> out rows returned that don't contain any columns?  It's true that it used to 
> do that in core Cassandra and was removed as a feature because of the 
> performance penalty.  However with hadoop type loads, latency isn't as big of 
> a deal.  That and it could be a job specific option.  Also, for CFRR there's 
> the option for a SlicePredicate.  In addition to being able to suppress range 
> ghosts, it could also skip rows that had no data for that SlicePredicate, 
> which would also be a nice feature - since it might have similar undesirable 
> consequences.  True the person doing the MapReduce job or the pig script or 
> whatever could deal with it at that level.  However, this is core enough and 
> could could be optional so that people wouldn't have to do checking all over 
> the place for keys without any columns.
> 
> Would such an option be okay to add to the hadoop config and to the CFRR?
> 
> Jeremy

Re: range ghosts and more with hadoop support (with proposed solution)

Reply via email to