Hi
1.- May be? -- Register the UDF REGISTER /path/to/cqlstorageudf-1.0-SNAPSHOT -- FromCqlColumn will convert chararray, int, long, float, double DEFINE FromCqlColumn com.megatome.pig.piggybank.tuple.FromCqlColumn(); -- Load data as normal data_raw = LOAD 'cql://bookcrossing/books' USING CqlStorage(); -- Use the UDF data = FOREACH data_raw GENERATE *FromCqlColumn*(isbn) AS ISBN, *FromCqlColumn*(bookauthor) AS BookAuthor, *FromCqlColumn*(booktitle) AS BookTitle, *FromCqlColumn*(publisher) AS Publisher, *FromCqlColumn*(yearofpublication) AS YearOfPublication; and 2.: with the data in cql cassandra 1.2.8, pig 0.11.11 and cql3: *CREATE KEYSPACE keyspace1* * WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }* * AND durable_writes = true;* * * *use keyspace2;* * * * CREATE TABLE test (* * id text PRIMARY KEY,* * title text,* * age int* * ) WITH COMPACT STORAGE;* * * * * * insert into test (id, title, age) values('1', 'child', 21);* * insert into test (id, title, age) values('2', 'support', 21);* * insert into test (id, title, age) values('3', 'manager', 31);* * insert into test (id, title, age) values('4', 'QA', 41);* * insert into test (id, title, age) values('5', 'QA', 30);* * insert into test (id, title, age) values('6', 'QA', 30);* and script: * * *register './libs/cqlstorageudf-1.0-SNAPSHOT.jar';* *DEFINE FromCqlColumn com.megatome.pig.piggybank.tuple.FromCqlColumn();* *rows = LOAD 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING CqlStorage();* *dump rows;* *ILLUSTRATE rows;* *describe rows;* *A = FOREACH rows GENERATE FLATTEN(title);* *dump A;* *values3 = FOREACH A GENERATE FromCqlColumn(title) AS title;* *dump values3;* *describe values3;* -- I have this error: .... ------------------------------------------------------------- | rows | id:chararray | age:int | title:chararray | ------------------------------------------------------------- | | (id, 5) | (age, 30) | (title, QA) | ------------------------------------------------------------- rows: {id: chararray,age: int,title: chararray} ... (title,QA) (title,QA) .. 2013-09-02 16:40:52,454 [Thread-11] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003 *java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.Tuple* at com.megatome.pig.piggybank.tuple.ColumnBase.exec(ColumnBase.java:32) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:434) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 2013-09-02 16:40:52,832 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003 8-| Regards ... Miguel Angel Martín Junquera Analyst Engineer. miguelangel.mar...@brainsins.com 2013/9/2 Miguel Angel Martin junquera <mianmarjun.mailingl...@gmail.com> > hi all: > > More info : > > https://issues.apache.org/jira/browse/CASSANDRA-5941 > > > > I tried this (and gen. cassandra 1.2.9) but do not work for me, > > git clone http://git-wip-us.apache.org/repos/asf/cassandra.git > cd cassandra > git checkout cassandra-1.2 > patch -p1 < 5867-bug-fix-filter-push-down-1.2-branch.txt > ant > > > > Miguel Angel Martín Junquera > Analyst Engineer. > miguelangel.mar...@brainsins.com > > > > 2013/9/2 Miguel Angel Martin junquera <mianmarjun.mailingl...@gmail.com> > >> *good/nice job !!!* >> * >> * >> * >> * >> *I'd testing with an udf only with string schema type this is better >> and elaborate work..* >> * >> * >> *Regads* >> >> >> Miguel Angel Martín Junquera >> Analyst Engineer. >> miguelangel.mar...@brainsins.com >> >> >> >> 2013/8/31 Chad Johnston <cjohns...@megatome.com> >> >>> I threw together a quick UDF to work around this issue. It just extracts >>> the value portion of the tuple while taking advantage of the CqlStorage >>> generated schema to keep the type correct. >>> >>> You can get it here: https://github.com/iamthechad/cqlstorage-udf >>> >>> I'll see if I can find more useful information and open a defect, since >>> that's what this seems to be. >>> >>> Chad >>> >>> >>> On Fri, Aug 30, 2013 at 2:02 AM, Miguel Angel Martin junquera < >>> mianmarjun.mailingl...@gmail.com> wrote: >>> >>>> I try this: >>>> >>>> *rows = LOAD >>>> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING >>>> CqlStorage();* >>>> >>>> *dump rows;* >>>> >>>> *ILLUSTRATE rows;* >>>> >>>> *describe rows;* >>>> >>>> * >>>> * >>>> >>>> *values2= FOREACH rows GENERATE TOTUPLE (id) as >>>> (mycolumn:tuple(name,value));* >>>> >>>> *dump values2;* >>>> >>>> *describe values2;* >>>> * >>>> * >>>> >>>> But I get this results: >>>> >>>> >>>> >>>> ------------------------------------------------------------- >>>> | rows | id:chararray | age:int | title:chararray | >>>> ------------------------------------------------------------- >>>> | | (id, 6) | (age, 30) | (title, QA) | >>>> ------------------------------------------------------------- >>>> >>>> rows: {id: chararray,age: int,title: chararray} >>>> 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt - >>>> ERROR 1031: Incompatable field schema: left is >>>> "tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray))", right is >>>> "org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)" >>>> >>>> >>>> >>>> >>>> >>>> or >>>> >>>> >>>> >>>> .... >>>> >>>> *values2= FOREACH rows GENERATE TOTUPLE (id) ;* >>>> *dump values2;* >>>> *describe values2;* >>>> >>>> >>>> >>>> >>>> and the results are: >>>> >>>> >>>> ... >>>> (((id,6))) >>>> (((id,5))) >>>> values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)} >>>> >>>> >>>> >>>> Aggg!!!!! >>>> >>>> >>>> * >>>> * >>>> >>>> >>>> >>>> Miguel Angel Martín Junquera >>>> Analyst Engineer. >>>> miguelangel.mar...@brainsins.com >>>> >>>> >>>> >>>> 2013/8/26 Miguel Angel Martin junquera < >>>> mianmarjun.mailingl...@gmail.com> >>>> >>>>> hi Chad . >>>>> >>>>> I have this issue >>>>> >>>>> I send a mail to user-pig-list and I still i can resolve this, and I >>>>> can not access to column values. >>>>> In this mail I write some things that I try without results... and >>>>> information about this issue. >>>>> >>>>> >>>>> >>>>> http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3ccajeg_hq9s2po3_xytzx5xki4j1mao8q26jydg2wndy_kyiv...@mail.gmail.com%3E >>>>> >>>>> >>>>> >>>>> I hope someOne reply one comment, idea or solution about this >>>>> issue or bug. >>>>> >>>>> >>>>> I have reviewed the CqlStorage class in code cassandra 1.2.8 but i do >>>>> not have configure the environmetn to debug and trace this issue. >>>>> >>>>> Only I find some comments like, but I do not understand at all. >>>>> >>>>> >>>>> /** >>>>> >>>>> * A LoadStoreFunc for retrieving data from and storing data to >>>>> Cassandra >>>>> >>>>> * >>>>> >>>>> * A row from a standard CF will be returned as nested tuples: >>>>> >>>>> * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))). >>>>> */ >>>>> >>>>> >>>>> I you found some idea or solution, please post it >>>>> >>>>> thanks >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> 2013/8/23 Chad Johnston <cjohns...@megatome.com> >>>>> >>>>>> (I'm using Cassandra 1.2.8 and Pig 0.11.1) >>>>>> >>>>>> I'm loading some simple data from Cassandra into Pig using >>>>>> CqlStorage. The CqlStorage loader defines a Pig schema based on the >>>>>> Cassandra schema, but it seems to be wrong. >>>>>> >>>>>> If I do: >>>>>> >>>>>> data = LOAD 'cql://bookdata/books' USING CqlStorage(); >>>>>> DESCRIBE data; >>>>>> >>>>>> I get this: >>>>>> >>>>>> data: {isbn: chararray,bookauthor: chararray,booktitle: >>>>>> chararray,publisher: chararray,yearofpublication: int} >>>>>> >>>>>> However, if I DUMP data, I get results like these: >>>>>> >>>>>> ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in >>>>>> the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986)) >>>>>> >>>>>> Clearly the results from Cassandra are key/value pairs, as would be >>>>>> expected. I don't know why the schema generated by CqlStorage() would be >>>>>> so >>>>>> different. >>>>>> >>>>>> This is really causing me problems trying to access the column >>>>>> values. I tried a naive approach of FLATTENing each tuple, then trying to >>>>>> access the values that way: >>>>>> >>>>>> flattened = FOREACH data GENERATE >>>>>> FLATTEN(isbn), >>>>>> FLATTEN(booktitle), >>>>>> ... >>>>>> values = FOREACH flattened GENERATE >>>>>> $1 AS ISBN, >>>>>> $3 AS BookTitle, >>>>>> ... >>>>>> >>>>>> As soon as I try to access field $5, Pig complains about the index >>>>>> being out of bounds. >>>>>> >>>>>> Is there a way to solve the schema/reality mismatch? Am I doing >>>>>> something wrong, or have I stumbled across a defect? >>>>>> >>>>>> Thanks, >>>>>> Chad >>>>>> >>>>> >>>>> >>>> >>> >> >