You're trying to use FromCqlColumn on a tuple that has been flattened. The schema still thinks it's {title: chararray}, but the flattened tuple is now two values. I don't know how to retrieve the data values in this case.
Your code will work correctly if you do this: *values3 = FOREACH rows GENERATE FromCqlColumn(title) AS title;* *dump values3;* *describe values3;* (Use FromCqlColumn on the original data, not the flattened data.) Chad On Mon, Sep 2, 2013 at 8:45 AM, Miguel Angel Martin junquera < mianmarjun.mailingl...@gmail.com> wrote: > Hi > > > 1.- > > May be? > > -- Register the UDF > REGISTER /path/to/cqlstorageudf-1.0-SNAPSHOT > > -- FromCqlColumn will convert chararray, int, long, float, double > DEFINE FromCqlColumn com.megatome.pig.piggybank.tuple.FromCqlColumn(); > > -- Load data as normal > data_raw = LOAD 'cql://bookcrossing/books' USING CqlStorage(); > > -- Use the UDF > data = FOREACH data_raw GENERATE > *FromCqlColumn*(isbn) AS ISBN, > *FromCqlColumn*(bookauthor) AS BookAuthor, > > *FromCqlColumn*(booktitle) AS BookTitle, > *FromCqlColumn*(publisher) AS Publisher, > > *FromCqlColumn*(yearofpublication) AS YearOfPublication; > > > > > > and 2.: > > with the data in cql cassandra 1.2.8, pig 0.11.11 and cql3: > > *CREATE KEYSPACE keyspace1* > > * WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' > : 1 }* > > * AND durable_writes = true;* > > * > * > > *use keyspace2;* > > * > * > > * CREATE TABLE test (* > > * id text PRIMARY KEY,* > > * title text,* > > * age int* > > * ) WITH COMPACT STORAGE;* > > * > * > > * > * > > * insert into test (id, title, age) values('1', 'child', 21);* > > * insert into test (id, title, age) values('2', 'support', 21);* > > * insert into test (id, title, age) values('3', 'manager', 31);* > > * insert into test (id, title, age) values('4', 'QA', 41);* > > * insert into test (id, title, age) values('5', 'QA', 30);* > > * insert into test (id, title, age) values('6', 'QA', 30);* > > > > > > and script: > > * > * > *register './libs/cqlstorageudf-1.0-SNAPSHOT.jar';* > *DEFINE FromCqlColumn com.megatome.pig.piggybank.tuple.FromCqlColumn();* > *rows = LOAD > 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING > CqlStorage();* > *dump rows;* > *ILLUSTRATE rows;* > *describe rows;* > *A = FOREACH rows GENERATE FLATTEN(title);* > *dump A;* > *values3 = FOREACH A GENERATE FromCqlColumn(title) AS title;* > *dump values3;* > *describe values3;* > > > -- > > > > I have this error: > > > > > .... > > ------------------------------------------------------------- > | rows | id:chararray | age:int | title:chararray | > ------------------------------------------------------------- > | | (id, 5) | (age, 30) | (title, QA) | > ------------------------------------------------------------- > > rows: {id: chararray,age: int,title: chararray} > > > ... > > (title,QA) > (title,QA) > .. > 2013-09-02 16:40:52,454 [Thread-11] WARN > org.apache.hadoop.mapred.LocalJobRunner - job_local_0003 > *java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.pig.data.Tuple* > at com.megatome.pig.piggybank.tuple.ColumnBase.exec(ColumnBase.java:32) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:434) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2013-09-02 16:40:52,832 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - HadoopJobId: job_local_0003 > > > > 8-| > > Regards > > ... > > > Miguel Angel Martín Junquera > Analyst Engineer. > miguelangel.mar...@brainsins.com > > > > 2013/9/2 Miguel Angel Martin junquera <mianmarjun.mailingl...@gmail.com> > >> hi all: >> >> More info : >> >> https://issues.apache.org/jira/browse/CASSANDRA-5941 >> >> >> >> I tried this (and gen. cassandra 1.2.9) but do not work for me, >> >> git clone http://git-wip-us.apache.org/repos/asf/cassandra.git >> cd cassandra >> git checkout cassandra-1.2 >> patch -p1 < 5867-bug-fix-filter-push-down-1.2-branch.txt >> ant >> >> >> >> Miguel Angel Martín Junquera >> Analyst Engineer. >> miguelangel.mar...@brainsins.com >> >> >> >> 2013/9/2 Miguel Angel Martin junquera <mianmarjun.mailingl...@gmail.com> >> >>> *good/nice job !!!* >>> * >>> * >>> * >>> * >>> *I'd testing with an udf only with string schema type this is better >>> and elaborate work..* >>> * >>> * >>> *Regads* >>> >>> >>> Miguel Angel Martín Junquera >>> Analyst Engineer. >>> miguelangel.mar...@brainsins.com >>> >>> >>> >>> 2013/8/31 Chad Johnston <cjohns...@megatome.com> >>> >>>> I threw together a quick UDF to work around this issue. It just >>>> extracts the value portion of the tuple while taking advantage of the >>>> CqlStorage generated schema to keep the type correct. >>>> >>>> You can get it here: https://github.com/iamthechad/cqlstorage-udf >>>> >>>> I'll see if I can find more useful information and open a defect, since >>>> that's what this seems to be. >>>> >>>> Chad >>>> >>>> >>>> On Fri, Aug 30, 2013 at 2:02 AM, Miguel Angel Martin junquera < >>>> mianmarjun.mailingl...@gmail.com> wrote: >>>> >>>>> I try this: >>>>> >>>>> *rows = LOAD >>>>> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' >>>>> USING >>>>> CqlStorage();* >>>>> >>>>> *dump rows;* >>>>> >>>>> *ILLUSTRATE rows;* >>>>> >>>>> *describe rows;* >>>>> >>>>> * >>>>> * >>>>> >>>>> *values2= FOREACH rows GENERATE TOTUPLE (id) as >>>>> (mycolumn:tuple(name,value));* >>>>> >>>>> *dump values2;* >>>>> >>>>> *describe values2;* >>>>> * >>>>> * >>>>> >>>>> But I get this results: >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------- >>>>> | rows | id:chararray | age:int | title:chararray | >>>>> ------------------------------------------------------------- >>>>> | | (id, 6) | (age, 30) | (title, QA) | >>>>> ------------------------------------------------------------- >>>>> >>>>> rows: {id: chararray,age: int,title: chararray} >>>>> 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt >>>>> - ERROR 1031: Incompatable field schema: left is >>>>> "tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray))", right is >>>>> "org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)" >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> or >>>>> >>>>> >>>>> >>>>> .... >>>>> >>>>> *values2= FOREACH rows GENERATE TOTUPLE (id) ;* >>>>> *dump values2;* >>>>> *describe values2;* >>>>> >>>>> >>>>> >>>>> >>>>> and the results are: >>>>> >>>>> >>>>> ... >>>>> (((id,6))) >>>>> (((id,5))) >>>>> values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)} >>>>> >>>>> >>>>> >>>>> Aggg!!!!! >>>>> >>>>> >>>>> * >>>>> * >>>>> >>>>> >>>>> >>>>> Miguel Angel Martín Junquera >>>>> Analyst Engineer. >>>>> miguelangel.mar...@brainsins.com >>>>> >>>>> >>>>> >>>>> 2013/8/26 Miguel Angel Martin junquera < >>>>> mianmarjun.mailingl...@gmail.com> >>>>> >>>>>> hi Chad . >>>>>> >>>>>> I have this issue >>>>>> >>>>>> I send a mail to user-pig-list and I still i can resolve this, and I >>>>>> can not access to column values. >>>>>> In this mail I write some things that I try without results... and >>>>>> information about this issue. >>>>>> >>>>>> >>>>>> >>>>>> http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3ccajeg_hq9s2po3_xytzx5xki4j1mao8q26jydg2wndy_kyiv...@mail.gmail.com%3E >>>>>> >>>>>> >>>>>> >>>>>> I hope someOne reply one comment, idea or solution about this >>>>>> issue or bug. >>>>>> >>>>>> >>>>>> I have reviewed the CqlStorage class in code cassandra 1.2.8 but i >>>>>> do not have configure the environmetn to debug and trace this issue. >>>>>> >>>>>> Only I find some comments like, but I do not understand at all. >>>>>> >>>>>> >>>>>> /** >>>>>> >>>>>> * A LoadStoreFunc for retrieving data from and storing data to >>>>>> Cassandra >>>>>> >>>>>> * >>>>>> >>>>>> * A row from a standard CF will be returned as nested tuples: >>>>>> >>>>>> * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))). >>>>>> */ >>>>>> >>>>>> >>>>>> I you found some idea or solution, please post it >>>>>> >>>>>> thanks >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2013/8/23 Chad Johnston <cjohns...@megatome.com> >>>>>> >>>>>>> (I'm using Cassandra 1.2.8 and Pig 0.11.1) >>>>>>> >>>>>>> I'm loading some simple data from Cassandra into Pig using >>>>>>> CqlStorage. The CqlStorage loader defines a Pig schema based on the >>>>>>> Cassandra schema, but it seems to be wrong. >>>>>>> >>>>>>> If I do: >>>>>>> >>>>>>> data = LOAD 'cql://bookdata/books' USING CqlStorage(); >>>>>>> DESCRIBE data; >>>>>>> >>>>>>> I get this: >>>>>>> >>>>>>> data: {isbn: chararray,bookauthor: chararray,booktitle: >>>>>>> chararray,publisher: chararray,yearofpublication: int} >>>>>>> >>>>>>> However, if I DUMP data, I get results like these: >>>>>>> >>>>>>> ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in >>>>>>> the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986)) >>>>>>> >>>>>>> Clearly the results from Cassandra are key/value pairs, as would be >>>>>>> expected. I don't know why the schema generated by CqlStorage() would >>>>>>> be so >>>>>>> different. >>>>>>> >>>>>>> This is really causing me problems trying to access the column >>>>>>> values. I tried a naive approach of FLATTENing each tuple, then trying >>>>>>> to >>>>>>> access the values that way: >>>>>>> >>>>>>> flattened = FOREACH data GENERATE >>>>>>> FLATTEN(isbn), >>>>>>> FLATTEN(booktitle), >>>>>>> ... >>>>>>> values = FOREACH flattened GENERATE >>>>>>> $1 AS ISBN, >>>>>>> $3 AS BookTitle, >>>>>>> ... >>>>>>> >>>>>>> As soon as I try to access field $5, Pig complains about the index >>>>>>> being out of bounds. >>>>>>> >>>>>>> Is there a way to solve the schema/reality mismatch? Am I doing >>>>>>> something wrong, or have I stumbled across a defect? >>>>>>> >>>>>>> Thanks, >>>>>>> Chad >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >