Re: pig counting question

2011-03-25 Thread Brandon Williams
On Fri, Mar 25, 2011 at 1:41 PM, Jeffrey Wang wrote: > I don't think it's Pig running out of memory, but rather Cassandra itself > (the data doesn't even make it to Pig). get_range_slices() is called with a > row batch size of 4096, the default, and it's fetching all of the columns in > each ro

Re: pig counting question

2011-03-25 Thread Jeremy Hanna
ffrey Wang [mailto:jw...@palantir.com] > Sent: Friday, March 25, 2011 11:42 AM > To: user@cassandra.apache.org > Subject: RE: pig counting question > > I don't think it's Pig running out of memory, but rather Cassandra itself > (the data doesn't even make it to Pi

RE: pig counting question

2011-03-25 Thread Jeffrey Wang
ssage- From: Jeffrey Wang [mailto:jw...@palantir.com] Sent: Friday, March 25, 2011 11:42 AM To: user@cassandra.apache.org Subject: RE: pig counting question I don't think it's Pig running out of memory, but rather Cassandra itself (the data doesn't even make it to Pig). get_range_sli

RE: pig counting question

2011-03-25 Thread Jeffrey Wang
Friday, March 25, 2011 11:06 AM To: user@cassandra.apache.org Subject: Re: pig counting question One thing I wonder though - if your columns are the thing that are increasing your heap size and eating up a lot of memory, and you're reading the data structure out as a bag of columns, why isn't

Re: pig counting question

2011-03-25 Thread Jeremy Hanna
rey > > -Original Message- > From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] > Sent: Thursday, March 24, 2011 11:34 AM > To: user@cassandra.apache.org > Subject: Re: pig counting question > > The limit defaults to 1024 but you can set it when you use CassandraSto

Re: pig counting question

2011-03-24 Thread Jeremy Hanna
t this is because of the get_range_slices() call from >> ColumnFamilyRecordReader. Are there plans to make this streaming/paged? >> >> -Jeffrey >> >> -Original Message----- >> From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] >> Sent: Thursd

Re: pig counting question

2011-03-24 Thread Jeremy Hanna
r. Are there plans to make this streaming/paged? > > -Jeffrey > > -Original Message- > From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] > Sent: Thursday, March 24, 2011 11:34 AM > To: user@cassandra.apache.org > Subject: Re: pig counting question > > Th

RE: pig counting question

2011-03-24 Thread Jeffrey Wang
this streaming/paged? -Jeffrey -Original Message- From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] Sent: Thursday, March 24, 2011 11:34 AM To: user@cassandra.apache.org Subject: Re: pig counting question The limit defaults to 1024 but you can set it when you use CassandraStorage in

Re: pig counting question

2011-03-24 Thread Jeremy Hanna
The limit defaults to 1024 but you can set it when you use CassandraStorage in pig, like so: rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage(4096); or whatever value you wish. Give that a try and see if it gives you more of what you're looking for. On Mar 24, 2011, at 1:16