On Wed, Dec 12, 2012 at 8:04 AM, <[email protected]> wrote:

> Hello Bill,
>
> The bug didn't block me or waste any time. Regarding the cast, I can't
> regenerate the bug right now because I'm running a script, but I can answer
> your questions:
>
> 1) describe of the relation passed to store returns the generated schema
> name for the tuple, as described in: http://bb10.com/java-hadoop-**
> pig-devel/2011-07/msg00237.**html<http://bb10.com/java-hadoop-pig-devel/2011-07/msg00237.html>


When you do TO_TUPLE try being explicit with the schema with an AS
statement.


>
>
> 2) I want to store all the values as a tuple under one key because I want
> to minimize the repetitions of the row and column keys. I didn't specify
> the caster, so I'm using the default whatever it is (I hope it is the
> binary one not the UTF8 one)
>

Default caster is UTF8, which is what you want.


>
> 3) The class cast exception says that DataByteArray cannot be cast to Tuple
>

This is a result of something in your relations before the STORE, not
HBaseStorage. It takes what's given to it, so if it's seeing
DataByteArrays, something is producing them, possible a UDF.


>
> Regards!
>
> -- Younos
>
> Quoting Bill Graham <[email protected]>:
>
>  Thanks Younos for catching that and sorry that you got bit by it. That is
>> in fact a javadoc bug. I've just opened a JIRA for it:
>>
>> https://issues.apache.org/**jira/browse/PIG-3092<https://issues.apache.org/jira/browse/PIG-3092>
>> http://pig.apache.org/docs/r0.**10.0/basic.html#store<http://pig.apache.org/docs/r0.10.0/basic.html#store>
>>
>> Regarding the casting, what does describe look like of the relation you
>> pass to the STORE statement and what do you class cast exceptions look
>> like? Which caster are you using?
>>
>> The relation you pass to STORE should be a flat relation of values, unless
>> you want to store the toString of a tuple as a single column in HBase.
>>
>>
>> On Tue, Dec 11, 2012 at 9:37 AM, <[email protected]> wrote:
>>
>>  Hi Bill,
>>>
>>> Thanks for your reply. Since this is the case then JavaDocs of the class
>>> needs to be fixed (see 
>>> http://pig.apache.org/docs/r0.****<http://pig.apache.org/docs/r0.**>
>>> 10.0/api/org/apache/pig/****backend/hadoop/hbase/****HBaseStorage.html<
>>> http://pig.**apache.org/docs/r0.10.0/api/**
>>> org/apache/pig/backend/hadoop/**hbase/HBaseStorage.html<http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html>
>>> >
>>>
>>> ).
>>>
>>> Also, I faced a bug that I worked around by explicit casting. For some
>>> reason all the objects passed to putNext are of type DataByteArray, while
>>> the schema reports their correct types (tuple(string, int, int), long).
>>> This causes a lot of ClassCastExceptions because DataByteArray cannot be
>>> cast to any other type. I worked around this by passing everything to the
>>> STORE as a DataByteArray.
>>>
>>> Cheers!
>>> Younos
>>>
>>> Quoting Bill Graham <[email protected]>:
>>>
>>>  The STORE command doesn't take the AS clause, that's to define the
>>> schema
>>>
>>>> at LOAD time. When storing, just prepare your relation with the the
>>>> desired
>>>> schema and then STORE it without the AS.
>>>>
>>>> You can do all the transformations you need to before the STORE and Pig
>>>> will combine them all into as few logical processing steps as possible,
>>>> so
>>>> no need to worry about specifying many transformation statements.
>>>>
>>>>
>>>> On Mon, Dec 10, 2012 at 7:31 PM, <[email protected]> wrote:
>>>>
>>>>  Hello,
>>>>
>>>>>
>>>>> I'm using HBaseStorage and I want to change the layout of the schema
>>>>> before storage. Specifically I want to group some values into a tuple
>>>>> (thus
>>>>> reducing the number of repetitions of the row and column keys).
>>>>>
>>>>> Even though the JavaDoc gives an example that uses AS schema Grunt
>>>>> complains that it is not parsable. Here's what I am trying:
>>>>>
>>>>> STORE dataToStore INTO 'hbase://tableName' USING
>>>>> HBaseStorage('cf:tuple,
>>>>> cf:date') AS TOTUPLE(val1, val2, val3), date;
>>>>>
>>>>> Is this possible? Or do I have to do the transformation in a separate
>>>>> step:
>>>>>
>>>>> dataTransformed = FOREACH dataToStore GENERATE TOTUPLE(val1, val2,
>>>>> val3),
>>>>> date;
>>>>>
>>>>> In case of the latter, can Pig be told to merge this step with the next
>>>>> one? I tried a nested FOREACH where I can have an assignment operation,
>>>>> but
>>>>> I quickly found out that STORE is not supported within the FOREACH..
>>>>> what
>>>>> was I thinking :).
>>>>>
>>>>> Thanks!
>>>>>
>>>>> -- Younos
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> *Note that I'm no longer using my Yahoo! email address. Please email me
>>>> at
>>>> [email protected] going forward.*
>>>>
>>>>
>>>>
>>>
>>> Best regards,
>>> Younos Aboulnaga
>>>
>>> Masters candidate
>>> David Cheriton school of computer science
>>> University of Waterloo
>>> http://cs.uwaterloo.ca
>>>
>>> E-Mail: [email protected]
>>> Mobile: +1 (519) 497-5669
>>>
>>>
>>>
>>>
>>>
>>
>
>
> Best regards,
> Younos Aboulnaga
>
> Masters candidate
> David Cheriton school of computer science
> University of Waterloo
> http://cs.uwaterloo.ca
>
> E-Mail: [email protected]
> Mobile: +1 (519) 497-5669
>
>
>
>


-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
[email protected] going forward.*

Reply via email to