[jira] [Commented] (HIVE-12025) refactor bucketId generating code

Elliot West (JIRA) Thu, 08 Oct 2015 04:23:44 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-12025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948498#comment-14948498
 ]


Elliot West commented on HIVE-12025:
------------------------------------

This is definitely an improvement. It has broken a test but I suspect this is 
because the {{bucket_id}} calculation in {{BucketIdResolverImpl}} is currently 
incorrect and thus the expectations are also incorrect. This in itself 
underlines why this refactoring is desirable.

May I suggest the following patch to fix the failing test:

{code}
diff --git 
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
 
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
index f81373e..5297c5d 100644
--- 
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
+++ 
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
@@ -23,7 +23,7 @@
   public void testAttachBucketIdToRecord() {
     MutableRecord record = new MutableRecord(1, "hello");
     capturingBucketIdResolver.attachBucketIdToRecord(record);
-    assertThat(record.rowId, is(new RecordIdentifier(-1L, 8, -1L)));
+    assertThat(record.rowId, is(new RecordIdentifier(-1L, 1, -1L)));
     assertThat(record.id, is(1));
     assertThat(record.msg.toString(), is("hello"));
   }
{code}

> refactor bucketId generating code
> ---------------------------------
>
>                 Key: HIVE-12025
>                 URL: https://issues.apache.org/jira/browse/HIVE-12025
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 1.0.1
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>         Attachments: HIVE-12025.2.patch, HIVE-12025.patch
>
>
> HIVE-11983 adds ObjectInspectorUtils.getBucketHashCode() and 
> getBucketNumber().
> There are several (at least) places in Hive that perform this computation:
> # ReduceSinkOperator.computeBucketNumber
> # ReduceSinkOperator.computeHashCode
> # BucketIdResolverImpl - only in 2.0.0 ASF line
> # FileSinkOperator.findWriterOffset
> # GenericUDFHash
> Should refactor it and make sure they all call methods from 
> ObjectInspectorUtils.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12025) refactor bucketId generating code

Reply via email to