[ 
https://issues.apache.org/jira/browse/HIVE-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273482#comment-13273482
 ] 

Phabricator commented on HIVE-1719:
-----------------------------------

cwsteinbach has commented on the revision "HIVE-1719 [jira] Move RegexSerDe out 
of hive-contrib and over to hive-serde".

INLINE COMMENTS
  contrib/src/test/queries/clientnegative/serde_regex.q:24 Since the first 
CREATE TABLE statement is expected to fail, we should probably remove the rest 
of this code.
  contrib/src/test/queries/clientpositive/serde_regex.q:3 This isn't necessary. 
QTestUtil.clearTestSideEffects() automatically drops any tables/dbs that aren't 
defined in QTestUtil.srcTables. (Also, take a look at QTestUtil.createSources() 
... I remember you asked about this yesterday).
  contrib/src/test/queries/clientpositive/serde_regex.q:43 Please remove.
  contrib/src/test/queries/clientnegative/serde_regex.q:41 Please refer to a 
system variable (set in one of the Ant build files) instead of using a relative 
path, e.g. "${system:test.src.data.dir}/files/apache.access.log"
  contrib/src/test/queries/clientpositive/serde_regex.q:38 Replace relative 
paths.
  ql/src/test/queries/clientnegative/serde_regex.q:2 Please remove.
  ql/src/test/queries/clientnegative/serde_regex.q:23 Remove the rest of this 
code if we're never going to hit it.
  ql/src/test/queries/clientnegative/serde_regex2.q:2 Please remove.
  ql/src/test/queries/clientnegative/serde_regex2.q:3 Same test as 
clientnegative/serde_regex.q?
  ql/src/test/queries/clientnegative/serde_regex2.q:20 Remove dead code.
  ql/src/test/queries/clientnegative/serde_regex3.q:1 Remove. Replace with 'USE 
default;' if you're running into problems placing a comment on the first line 
of the file.
  ql/src/test/queries/clientpositive/serde_regex.q:1 Remove.
  ql/src/test/queries/clientpositive/serde_regex.q:4 I think we should either 
roll the clientpositive tests into a single file (my preference), or modify the 
qfile file names so that they give some hint as to what is being tested. Please 
take a look at clientpositive/database.q for a good example of the former 
approach.
  ql/src/test/queries/clientpositive/serde_regex2.q:27 Please add a newline. 
You might want to configure your editor to add these automatically.
  ql/src/test/queries/clientpositive/serde_regex.q:39 It's a little dangerous 
using 'SELECT *' in tests because it's not always possible to easily determine 
from the output which column contains which field. In addition to 'SELECT *', 
please also include a query which selects a single column in the middle in of 
the table, and a query which selects a subset of two or more columns from the 
table.
  ql/src/test/queries/clientpositive/serde_regex2.q:4 I thought about this some 
 more and think we should probably throw a runtime exception in the 
deserialize() method if this condition is detected. Otherwise, we're basically 
signing up to support this behavior in the future, which is not what we want to 
do.

  Please file a JIRA for this issue (i.e. that we want to catch this condition 
at compile-time instead of runtime), and then move this test case over to the 
clientnegative directory and cite the JIRA in a comment. Thanks.

  ql/src/test/queries/clientpositive/serde_regex3.q:3 This comment implies that 
output.format.string actually did something in the past, which isn't true. 
Also, the warning message doesn't show up in the q.out file, so I think we 
should probably just skip referencing this property in the tests.
  serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:42 Need to 
update this javadoc. RegexSerDe only provides deserialization capabilities.
  serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 Formatting: 
Braces always go on their own line. This project basically uses the Sun Java 
Coding standard (http://www.oracle.com/technetwork/java/codeconv-138413.html) 
with a 100 char line limit and 2 character indent.
  serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 "Not support 
yet" is inaccurate since we never plan to support this feature. Please change 
this to "RegexSerDe does not support the serialize() method"
  serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:188 It's 
possible that this will get triggered for every row in the input file, which 
will quickly overflow the logs. We should log this at most once. Same thing 
goes for the log message below.

  Currently SerDes don't really have a good way of logging error information 
like this. We should talk offline about ways of improving this.

REVISION DETAIL
  https://reviews.facebook.net/D3141

To: JIRA, cwsteinbach
Cc: shreepadma

                
> Move RegexSerDe out of hive-contrib and over to hive-serde
> ----------------------------------------------------------
>
>                 Key: HIVE-1719
>                 URL: https://issues.apache.org/jira/browse/HIVE-1719
>             Project: Hive
>          Issue Type: Task
>          Components: Serializers/Deserializers
>            Reporter: Carl Steinbach
>            Assignee: Shreepadma Venugopalan
>         Attachments: HIVE-1719.3.patch.txt, HIVE-1719.D3051.1.patch, 
> HIVE-1719.D3051.2.patch, HIVE-1719.D3141.1.patch, HIVE-1719.patch
>
>
> RegexSerDe is as much a part of the standard Hive distribution as the other 
> SerDes
> currently in hive-serde. I think we should move it over to the hive-serde 
> module so that
> users don't have to go to the added effort of manually registering the 
> contrib jar before
> using it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to