And if you download the 0.7 branch and build the cassandra_storage.jar in the 
contrib/pig section with that update, you should be able to use it with your 
0.7.3 cluster.  Those changes are typically independent of the Cassandra 
version.

On Mar 24, 2011, at 5:49 PM, Jeremy Hanna wrote:

> Hmmm, for wide rows, you can page it with I believe some changes on 0.7 
> branch that made it in as part of 
> https://issues.apache.org/jira/browse/CASSANDRA-1618 recently.  Specifically, 
> using the 0.7 branch version of CassandraStorage, you can specify it using 
> this basic template:
> cassandra://<keyspace>/<columnfamily>[?slice_start=<start>&slice_end=<end>[&reversed=true][&limit=1]]
> That goes in your pig LOAD block.
> So it's a pain to do what you're doing I would imagine but it's possible to 
> page in the latest on 0.7 branch.
> 
> On Mar 24, 2011, at 3:57 PM, Jeffrey Wang wrote:
> 
>> It looks like this functionality is not in the 0.7.3 version of 
>> CassandraStorage. I tried to add the constructor which takes the limit to 
>> the class, but I ran into some Pig parsing errors, so I had to make the 
>> parameter a string. How did you get around this for the version of 
>> CassandraStorage in trunk? I'm running Pig 0.8.0.
>> 
>> Also, when I bump the limit up very high (e.g. 1M columns), my Cassandra 
>> starts eating up huge amounts of memory, maxing out my 16GB heap size. I 
>> suspect this is because of the get_range_slices() call from 
>> ColumnFamilyRecordReader. Are there plans to make this streaming/paged?
>> 
>> -Jeffrey
>> 
>> -----Original Message-----
>> From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] 
>> Sent: Thursday, March 24, 2011 11:34 AM
>> To: user@cassandra.apache.org
>> Subject: Re: pig counting question
>> 
>> The limit defaults to 1024 but you can set it when you use CassandraStorage 
>> in pig, like so:
>> rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage(4096);
>> or whatever value you wish.
>> 
>> Give that a try and see if it gives you more of what you're looking for.
>> 
>> On Mar 24, 2011, at 1:16 PM, Jeffrey Wang wrote:
>> 
>>> Hey all,
>>> 
>>> I'm trying to run a very simple Pig script against my Cassandra cluster (5 
>>> nodes, 0.7.3). I've gotten it all set up and working, but the script is 
>>> giving me some strange results. Here is my script:
>>> 
>>> rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage();
>>> rowct = FOREACH rows GENERATE $0, COUNT($1);
>>> dump rowct;
>>> 
>>> If I understand Pig correctly, this should output (row name, column count) 
>>> tuples, but I'm always seeing 1024 for the column count even though the 
>>> rows have highly variable number of columns. Am I missing something? Thanks.
>>> 
>>> -Jeffrey
>>> 
>> 
> 

Reply via email to