[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

Sushanth Sowmyan (JIRA) Mon, 29 Jun 2015 10:42:41 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605959#comment-14605959
 ]


Sushanth Sowmyan commented on HIVE-10983:
-----------------------------------------

Not a problem! As part of the release process, I'm required to go unset all 
jiras marked for older released releases, and that's what I was doing. :)

To expand further, the idea is that Fix Version is set to track which branches 
the commits got committed to, and thus, should not be set unless this patch has 
already been committed to those branches. So, now, for example, if this commit 
is committed to branch-1.2 to track 1.2.x, its fix version would be 1.2.2 once 
it is committed. Setting it to 1.2.0 would mean that this was included as part 
of the 1.2.0 release, which it wasn't. So, for this, when a committer commits a 
patch for this bug, if they commit it to branch-1.2, they should then set the 
fix version to 1.2.2.


> SerDeUtils bug  ,when Text is reused 
> -------------------------------------
>
>                 Key: HIVE-10983
>                 URL: https://issues.apache.org/jira/browse/HIVE-10983
>             Project: Hive
>          Issue Type: Bug
>          Components: API, CLI
>    Affects Versions: 0.14.0, 1.0.0, 1.2.0
>         Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>            Reporter: xiaowei wang
>            Assignee: xiaowei wang
>              Labels: patch
>             Fix For: 2.0.0
>
>         Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
> HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt
>
>
> {noformat}
> The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
> invoke a bad method of Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> When i query data from a lzo table ， I found  in results ： the length of the 
> current row is always largr  than the previous row， and sometimes，the current 
>  row contains the contents of the previous row。 For example ，i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635] <b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb> 
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U0000'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>           OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

Reply via email to