Re: pig counting question

2011-03-25 Thread Brandon Williams
On Fri, Mar 25, 2011 at 1:41 PM, Jeffrey Wang wrote: > I don't think it's Pig running out of memory, but rather Cassandra itself > (the data doesn't even make it to Pig). get_range_slices() is called with a > row batch size of 4096, the default, and it's fetching all of the columns in > each ro

Re: pig counting question

2011-03-25 Thread Jeremy Hanna
ffrey Wang [mailto:jw...@palantir.com] > Sent: Friday, March 25, 2011 11:42 AM > To: user@cassandra.apache.org > Subject: RE: pig counting question > > I don't think it's Pig running out of memory, but rather Cassandra itself > (the data doesn't even make it to Pi

RE: pig counting question

2011-03-25 Thread Jeffrey Wang
ssage- From: Jeffrey Wang [mailto:jw...@palantir.com] Sent: Friday, March 25, 2011 11:42 AM To: user@cassandra.apache.org Subject: RE: pig counting question I don't think it's Pig running out of memory, but rather Cassandra itself (the data doesn't even make it to Pig). get_range_sli

RE: pig counting question

2011-03-25 Thread Jeffrey Wang
Friday, March 25, 2011 11:06 AM To: user@cassandra.apache.org Subject: Re: pig counting question One thing I wonder though - if your columns are the thing that are increasing your heap size and eating up a lot of memory, and you're reading the data structure out as a bag of columns, why isn't

Re: pig counting question

2011-03-25 Thread Jeremy Hanna
rey > > -Original Message- > From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] > Sent: Thursday, March 24, 2011 11:34 AM > To: user@cassandra.apache.org > Subject: Re: pig counting question > > The limit defaults to 1024 but you can set it when you use CassandraSto

Re: pig counting question

2011-03-24 Thread Jeremy Hanna
t this is because of the get_range_slices() call from >> ColumnFamilyRecordReader. Are there plans to make this streaming/paged? >> >> -Jeffrey >> >> -Original Message- >> From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] >> Sent: Thursd

Re: pig counting question

2011-03-24 Thread Jeremy Hanna
r. Are there plans to make this streaming/paged? > > -Jeffrey > > -Original Message- > From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] > Sent: Thursday, March 24, 2011 11:34 AM > To: user@cassandra.apache.org > Subject: Re: pig counting question > > Th

RE: pig counting question

2011-03-24 Thread Jeffrey Wang
this streaming/paged? -Jeffrey -Original Message- From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] Sent: Thursday, March 24, 2011 11:34 AM To: user@cassandra.apache.org Subject: Re: pig counting question The limit defaults to 1024 but you can set it when you use CassandraStorage in

Re: pig counting question

2011-03-24 Thread Jeremy Hanna
The limit defaults to 1024 but you can set it when you use CassandraStorage in pig, like so: rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage(4096); or whatever value you wish. Give that a try and see if it gives you more of what you're looking for. On Mar 24, 2011, at 1:16

pig counting question

2011-03-24 Thread Jeffrey Wang
Hey all, I'm trying to run a very simple Pig script against my Cassandra cluster (5 nodes, 0.7.3). I've gotten it all set up and working, but the script is giving me some strange results. Here is my script: rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage(); rowct = FOREAC