*good/nice job !!!* * * * * *I'd testing with an udf only with string schema type this is better and elaborate work..* * * *Regads*
Miguel Angel Martín Junquera Analyst Engineer. miguelangel.mar...@brainsins.com 2013/8/31 Chad Johnston <cjohns...@megatome.com> > I threw together a quick UDF to work around this issue. It just extracts > the value portion of the tuple while taking advantage of the CqlStorage > generated schema to keep the type correct. > > You can get it here: https://github.com/iamthechad/cqlstorage-udf > > I'll see if I can find more useful information and open a defect, since > that's what this seems to be. > > Chad > > > On Fri, Aug 30, 2013 at 2:02 AM, Miguel Angel Martin junquera < > mianmarjun.mailingl...@gmail.com> wrote: > >> I try this: >> >> *rows = LOAD >> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING >> CqlStorage();* >> >> *dump rows;* >> >> *ILLUSTRATE rows;* >> >> *describe rows;* >> >> * >> * >> >> *values2= FOREACH rows GENERATE TOTUPLE (id) as >> (mycolumn:tuple(name,value));* >> >> *dump values2;* >> >> *describe values2;* >> * >> * >> >> But I get this results: >> >> >> >> ------------------------------------------------------------- >> | rows | id:chararray | age:int | title:chararray | >> ------------------------------------------------------------- >> | | (id, 6) | (age, 30) | (title, QA) | >> ------------------------------------------------------------- >> >> rows: {id: chararray,age: int,title: chararray} >> 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> ERROR 1031: Incompatable field schema: left is >> "tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray))", right is >> "org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)" >> >> >> >> >> >> or >> >> >> >> .... >> >> *values2= FOREACH rows GENERATE TOTUPLE (id) ;* >> *dump values2;* >> *describe values2;* >> >> >> >> >> and the results are: >> >> >> ... >> (((id,6))) >> (((id,5))) >> values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)} >> >> >> >> Aggg!!!!! >> >> >> * >> * >> >> >> >> Miguel Angel Martín Junquera >> Analyst Engineer. >> miguelangel.mar...@brainsins.com >> >> >> >> 2013/8/26 Miguel Angel Martin junquera <mianmarjun.mailingl...@gmail.com> >> >>> hi Chad . >>> >>> I have this issue >>> >>> I send a mail to user-pig-list and I still i can resolve this, and I >>> can not access to column values. >>> In this mail I write some things that I try without results... and >>> information about this issue. >>> >>> >>> >>> http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3ccajeg_hq9s2po3_xytzx5xki4j1mao8q26jydg2wndy_kyiv...@mail.gmail.com%3E >>> >>> >>> >>> I hope someOne reply one comment, idea or solution about this issue >>> or bug. >>> >>> >>> I have reviewed the CqlStorage class in code cassandra 1.2.8 but i do >>> not have configure the environmetn to debug and trace this issue. >>> >>> Only I find some comments like, but I do not understand at all. >>> >>> >>> /** >>> >>> * A LoadStoreFunc for retrieving data from and storing data to >>> Cassandra >>> >>> * >>> >>> * A row from a standard CF will be returned as nested tuples: >>> >>> * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))). >>> */ >>> >>> >>> I you found some idea or solution, please post it >>> >>> thanks >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> 2013/8/23 Chad Johnston <cjohns...@megatome.com> >>> >>>> (I'm using Cassandra 1.2.8 and Pig 0.11.1) >>>> >>>> I'm loading some simple data from Cassandra into Pig using CqlStorage. >>>> The CqlStorage loader defines a Pig schema based on the Cassandra schema, >>>> but it seems to be wrong. >>>> >>>> If I do: >>>> >>>> data = LOAD 'cql://bookdata/books' USING CqlStorage(); >>>> DESCRIBE data; >>>> >>>> I get this: >>>> >>>> data: {isbn: chararray,bookauthor: chararray,booktitle: >>>> chararray,publisher: chararray,yearofpublication: int} >>>> >>>> However, if I DUMP data, I get results like these: >>>> >>>> ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the >>>> Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986)) >>>> >>>> Clearly the results from Cassandra are key/value pairs, as would be >>>> expected. I don't know why the schema generated by CqlStorage() would be so >>>> different. >>>> >>>> This is really causing me problems trying to access the column values. >>>> I tried a naive approach of FLATTENing each tuple, then trying to access >>>> the values that way: >>>> >>>> flattened = FOREACH data GENERATE >>>> FLATTEN(isbn), >>>> FLATTEN(booktitle), >>>> ... >>>> values = FOREACH flattened GENERATE >>>> $1 AS ISBN, >>>> $3 AS BookTitle, >>>> ... >>>> >>>> As soon as I try to access field $5, Pig complains about the index >>>> being out of bounds. >>>> >>>> Is there a way to solve the schema/reality mismatch? Am I doing >>>> something wrong, or have I stumbled across a defect? >>>> >>>> Thanks, >>>> Chad >>>> >>> >>> >> >