[ https://issues.apache.org/jira/browse/HIVE-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273482#comment-13273482 ]
Phabricator commented on HIVE-1719: ----------------------------------- cwsteinbach has commented on the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". INLINE COMMENTS contrib/src/test/queries/clientnegative/serde_regex.q:24 Since the first CREATE TABLE statement is expected to fail, we should probably remove the rest of this code. contrib/src/test/queries/clientpositive/serde_regex.q:3 This isn't necessary. QTestUtil.clearTestSideEffects() automatically drops any tables/dbs that aren't defined in QTestUtil.srcTables. (Also, take a look at QTestUtil.createSources() ... I remember you asked about this yesterday). contrib/src/test/queries/clientpositive/serde_regex.q:43 Please remove. contrib/src/test/queries/clientnegative/serde_regex.q:41 Please refer to a system variable (set in one of the Ant build files) instead of using a relative path, e.g. "${system:test.src.data.dir}/files/apache.access.log" contrib/src/test/queries/clientpositive/serde_regex.q:38 Replace relative paths. ql/src/test/queries/clientnegative/serde_regex.q:2 Please remove. ql/src/test/queries/clientnegative/serde_regex.q:23 Remove the rest of this code if we're never going to hit it. ql/src/test/queries/clientnegative/serde_regex2.q:2 Please remove. ql/src/test/queries/clientnegative/serde_regex2.q:3 Same test as clientnegative/serde_regex.q? ql/src/test/queries/clientnegative/serde_regex2.q:20 Remove dead code. ql/src/test/queries/clientnegative/serde_regex3.q:1 Remove. Replace with 'USE default;' if you're running into problems placing a comment on the first line of the file. ql/src/test/queries/clientpositive/serde_regex.q:1 Remove. ql/src/test/queries/clientpositive/serde_regex.q:4 I think we should either roll the clientpositive tests into a single file (my preference), or modify the qfile file names so that they give some hint as to what is being tested. Please take a look at clientpositive/database.q for a good example of the former approach. ql/src/test/queries/clientpositive/serde_regex2.q:27 Please add a newline. You might want to configure your editor to add these automatically. ql/src/test/queries/clientpositive/serde_regex.q:39 It's a little dangerous using 'SELECT *' in tests because it's not always possible to easily determine from the output which column contains which field. In addition to 'SELECT *', please also include a query which selects a single column in the middle in of the table, and a query which selects a subset of two or more columns from the table. ql/src/test/queries/clientpositive/serde_regex2.q:4 I thought about this some more and think we should probably throw a runtime exception in the deserialize() method if this condition is detected. Otherwise, we're basically signing up to support this behavior in the future, which is not what we want to do. Please file a JIRA for this issue (i.e. that we want to catch this condition at compile-time instead of runtime), and then move this test case over to the clientnegative directory and cite the JIRA in a comment. Thanks. ql/src/test/queries/clientpositive/serde_regex3.q:3 This comment implies that output.format.string actually did something in the past, which isn't true. Also, the warning message doesn't show up in the q.out file, so I think we should probably just skip referencing this property in the tests. serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:42 Need to update this javadoc. RegexSerDe only provides deserialization capabilities. serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 Formatting: Braces always go on their own line. This project basically uses the Sun Java Coding standard (http://www.oracle.com/technetwork/java/codeconv-138413.html) with a 100 char line limit and 2 character indent. serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 "Not support yet" is inaccurate since we never plan to support this feature. Please change this to "RegexSerDe does not support the serialize() method" serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:188 It's possible that this will get triggered for every row in the input file, which will quickly overflow the logs. We should log this at most once. Same thing goes for the log message below. Currently SerDes don't really have a good way of logging error information like this. We should talk offline about ways of improving this. REVISION DETAIL https://reviews.facebook.net/D3141 To: JIRA, cwsteinbach Cc: shreepadma > Move RegexSerDe out of hive-contrib and over to hive-serde > ---------------------------------------------------------- > > Key: HIVE-1719 > URL: https://issues.apache.org/jira/browse/HIVE-1719 > Project: Hive > Issue Type: Task > Components: Serializers/Deserializers > Reporter: Carl Steinbach > Assignee: Shreepadma Venugopalan > Attachments: HIVE-1719.3.patch.txt, HIVE-1719.D3051.1.patch, > HIVE-1719.D3051.2.patch, HIVE-1719.D3141.1.patch, HIVE-1719.patch > > > RegexSerDe is as much a part of the standard Hive distribution as the other > SerDes > currently in hive-serde. I think we should move it over to the hive-serde > module so that > users don't have to go to the added effort of manually registering the > contrib jar before > using it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira