If you don't mind me asking, how are you handling the fact that pre-widerow you are only getting a static number of columns per key (default 1024)? Or am I not understanding the "limit" concept?
On Thu, Oct 11, 2012 at 11:25 AM, Jeremy Hanna <jeremy.hanna1...@gmail.com>wrote: > The Dachis Group (where I just came from, now at DataStax) uses pig with > cassandra for a lot of things. However, we weren't using the widerow > implementation yet since wide row support is new to 1.1.x and we were on > 0.7, then 0.8, then 1.0.x. > > I think since it's new to 1.1's hadoop support, it sounds like there are > some rough edges like you say. But issues that are reproducible on tickets > for any problems are much appreciated and they will get addressed. > > On Oct 11, 2012, at 10:43 AM, William Oberman <ober...@civicscience.com> > wrote: > > > I'm wondering how many people are using cassandra + pig out there? I > recently went through the effort of validating things at a much higher > level than I previously did(*), and found a few issues: > > https://issues.apache.org/jira/browse/CASSANDRA-4748 > > https://issues.apache.org/jira/browse/CASSANDRA-4749 > > https://issues.apache.org/jira/browse/CASSANDRA-4789 > > > > In general, it seems like the widerow implementation still has rough > edges. I'm concerned I'm not understanding why other people aren't using > the feature, and thus finding these problems. Is everyone else just > setting a high static limit? E.g. LOAD 'cassandra://KEYSPACE/CF?limit=X" > where X >= the max size of any key? Is everyone else using data models > that result in keys with # columns always less than 1024? Do newer version > of hadoop consume the cassandra API in a way that work around these issues? > I'm using CDH3 == hadoop 0.20.2, pig 0.8.1. > > > > (*) I took a random subsample of 50,000 keys of my production data > (approx 1M total key/value pairs, some keys having only a single value and > some having 1000's). I then wrote both a pig script and simple procedural > version of the pig script. Then I compared the results. Obviously I > started with differences, though after locally patching my code to fix the > above 3 bugs (though, really only two issues), I now (finally) get the same > results. > >