On Wed, Dec 12, 2012 at 8:04 AM, <[email protected]> wrote: > Hello Bill, > > The bug didn't block me or waste any time. Regarding the cast, I can't > regenerate the bug right now because I'm running a script, but I can answer > your questions: > > 1) describe of the relation passed to store returns the generated schema > name for the tuple, as described in: http://bb10.com/java-hadoop-** > pig-devel/2011-07/msg00237.**html<http://bb10.com/java-hadoop-pig-devel/2011-07/msg00237.html>
When you do TO_TUPLE try being explicit with the schema with an AS statement. > > > 2) I want to store all the values as a tuple under one key because I want > to minimize the repetitions of the row and column keys. I didn't specify > the caster, so I'm using the default whatever it is (I hope it is the > binary one not the UTF8 one) > Default caster is UTF8, which is what you want. > > 3) The class cast exception says that DataByteArray cannot be cast to Tuple > This is a result of something in your relations before the STORE, not HBaseStorage. It takes what's given to it, so if it's seeing DataByteArrays, something is producing them, possible a UDF. > > Regards! > > -- Younos > > Quoting Bill Graham <[email protected]>: > > Thanks Younos for catching that and sorry that you got bit by it. That is >> in fact a javadoc bug. I've just opened a JIRA for it: >> >> https://issues.apache.org/**jira/browse/PIG-3092<https://issues.apache.org/jira/browse/PIG-3092> >> http://pig.apache.org/docs/r0.**10.0/basic.html#store<http://pig.apache.org/docs/r0.10.0/basic.html#store> >> >> Regarding the casting, what does describe look like of the relation you >> pass to the STORE statement and what do you class cast exceptions look >> like? Which caster are you using? >> >> The relation you pass to STORE should be a flat relation of values, unless >> you want to store the toString of a tuple as a single column in HBase. >> >> >> On Tue, Dec 11, 2012 at 9:37 AM, <[email protected]> wrote: >> >> Hi Bill, >>> >>> Thanks for your reply. Since this is the case then JavaDocs of the class >>> needs to be fixed (see >>> http://pig.apache.org/docs/r0.****<http://pig.apache.org/docs/r0.**> >>> 10.0/api/org/apache/pig/****backend/hadoop/hbase/****HBaseStorage.html< >>> http://pig.**apache.org/docs/r0.10.0/api/** >>> org/apache/pig/backend/hadoop/**hbase/HBaseStorage.html<http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html> >>> > >>> >>> ). >>> >>> Also, I faced a bug that I worked around by explicit casting. For some >>> reason all the objects passed to putNext are of type DataByteArray, while >>> the schema reports their correct types (tuple(string, int, int), long). >>> This causes a lot of ClassCastExceptions because DataByteArray cannot be >>> cast to any other type. I worked around this by passing everything to the >>> STORE as a DataByteArray. >>> >>> Cheers! >>> Younos >>> >>> Quoting Bill Graham <[email protected]>: >>> >>> The STORE command doesn't take the AS clause, that's to define the >>> schema >>> >>>> at LOAD time. When storing, just prepare your relation with the the >>>> desired >>>> schema and then STORE it without the AS. >>>> >>>> You can do all the transformations you need to before the STORE and Pig >>>> will combine them all into as few logical processing steps as possible, >>>> so >>>> no need to worry about specifying many transformation statements. >>>> >>>> >>>> On Mon, Dec 10, 2012 at 7:31 PM, <[email protected]> wrote: >>>> >>>> Hello, >>>> >>>>> >>>>> I'm using HBaseStorage and I want to change the layout of the schema >>>>> before storage. Specifically I want to group some values into a tuple >>>>> (thus >>>>> reducing the number of repetitions of the row and column keys). >>>>> >>>>> Even though the JavaDoc gives an example that uses AS schema Grunt >>>>> complains that it is not parsable. Here's what I am trying: >>>>> >>>>> STORE dataToStore INTO 'hbase://tableName' USING >>>>> HBaseStorage('cf:tuple, >>>>> cf:date') AS TOTUPLE(val1, val2, val3), date; >>>>> >>>>> Is this possible? Or do I have to do the transformation in a separate >>>>> step: >>>>> >>>>> dataTransformed = FOREACH dataToStore GENERATE TOTUPLE(val1, val2, >>>>> val3), >>>>> date; >>>>> >>>>> In case of the latter, can Pig be told to merge this step with the next >>>>> one? I tried a nested FOREACH where I can have an assignment operation, >>>>> but >>>>> I quickly found out that STORE is not supported within the FOREACH.. >>>>> what >>>>> was I thinking :). >>>>> >>>>> Thanks! >>>>> >>>>> -- Younos >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> *Note that I'm no longer using my Yahoo! email address. Please email me >>>> at >>>> [email protected] going forward.* >>>> >>>> >>>> >>> >>> Best regards, >>> Younos Aboulnaga >>> >>> Masters candidate >>> David Cheriton school of computer science >>> University of Waterloo >>> http://cs.uwaterloo.ca >>> >>> E-Mail: [email protected] >>> Mobile: +1 (519) 497-5669 >>> >>> >>> >>> >>> >> > > > Best regards, > Younos Aboulnaga > > Masters candidate > David Cheriton school of computer science > University of Waterloo > http://cs.uwaterloo.ca > > E-Mail: [email protected] > Mobile: +1 (519) 497-5669 > > > > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [email protected] going forward.*
