Created https://issues.apache.org/jira/browse/CASSANDRA-2855
On Jul 1, 2011, at 9:09 PM, Jeremy Hanna wrote: > We think we're running into a situation where we've deleted all the columns > on several thousand rows but they still show up in the results of our pig > scripts. We think that's a product of range ghosts because > ColumnFamilyRecordReader uses getRangeSlices. So that might be a problem for > people and I think we have something that might address that. > > What if we were to have a hadoop job specific option to have the CFRR filter > out rows returned that don't contain any columns? It's true that it used to > do that in core Cassandra and was removed as a feature because of the > performance penalty. However with hadoop type loads, latency isn't as big of > a deal. That and it could be a job specific option. Also, for CFRR there's > the option for a SlicePredicate. In addition to being able to suppress range > ghosts, it could also skip rows that had no data for that SlicePredicate, > which would also be a nice feature - since it might have similar undesirable > consequences. True the person doing the MapReduce job or the pig script or > whatever could deal with it at that level. However, this is core enough and > could could be optional so that people wouldn't have to do checking all over > the place for keys without any columns. > > Would such an option be okay to add to the hadoop config and to the CFRR? > > Jeremy