Rajesh,
Thanks so much for your answers. However, I am struggling to get the right 
information. 

As you have mentioned, in ReduceSinkOperator.java, keys and values are present 
but I have a hard time to be able to print their content.

For key: I am trying to print it in ReduceSinkOperator.java -> toHiveKey() 
method after:BinaryComparable key = 
(BinaryComparable)keySerializer.serialize(obj, keyObjectInspector);
by doing something like:StructObjectInspector soi = (StructObjectInspector) 
keyObjectInspector;for (Object element : 
((StructObjectInspector)soi).getStructFieldsDataAsList(obj)) {  LOG.info("key 
is: " + String.valueOf(element));
}
For value:in ReduceSinkOperator.java -> process() after the value is computed 
as BytesWritable value = makeValueWritable(row); I am trying to apply the same 
mechanism as before:
StructObjectInspector soi = (StructObjectInspector) valueObjectInspector;for 
(Object element : 
((StructObjectInspector)soi).getStructFieldsDataAsList(value)) {  
LOG.info("value is: " + String.valueOf(element));
}
but there is a cast problem here from BytesWritable to StructObjectInspector.  
I also tried to print the value in makeValueWritable() method, but there the 
value content seems to be the same with key content. 

Do you have a better guess if I am doing the right things, or what else to do 
to extract the proper content for both key/value from ReduceSinkOperator?
Thanks again for your help,Robert
 

    On Sunday, December 4, 2016 8:15 PM, Rajesh Balamohan 
<rbalamo...@apache.org> wrote:
 

 Hi Robert,
Tez deals with bytes and does not understand if the data is coming from 
Hive/Pig/Cascading etc. So in case you print the content from Hive, you would 
get mostly binary data.  For hive, org.apache.hadoop.hive.ql.io.HiveKey, and 
value would be org.apache.hadoop.io.BytesWritable. Printing this would just 
churn out binary contents. You can print it from the below locations in Tez.
Writing keyValues: 
https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java#L375
Reading keyValues: 
https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/ValuesIterator.java#L186,
  
https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/ValuesIterator.java#L213
If you are interested in knowing the real key/value details, you may want to 
print the details from Hive side. This may be best answered in Hive community 
mailing list.  But at a very high level in Hive, key gets converted to HiveKey 
which is a wrapper around BytesWritable. You may want to print the details of 
key values using the relevant object inspector in Hive. E.g 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L526.
 In this case, you may want to get the relevant object inspector and print out 
the contents. This is just an example.
~Rajesh.B

On Mon, Dec 5, 2016 at 5:43 AM, Robert Grandl <rgra...@yahoo.com> wrote:

Hi guys,
I am running Hive atop Tez and run several TPC-DS / TPC-H queries. I am trying 
to print the Key/Value pairs received as input by each vertex and generated as 
output accordingly. 

However, looking at Hive / Tez code, it seems they are converted to Object type 
and use their serialized forms along. I would like to print the original 
content in <Key, Value> pairs both when generated and received by a vertex 
(just for the purpose of  understanding).

Could you please give me some hints on how I can do that?
Thank you,Robert





   

Reply via email to