[ https://issues.apache.org/jira/browse/HIVE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853254#comment-15853254 ]
Hive QA commented on HIVE-664: ------------------------------ Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12851083/HIVE-664.4.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10226 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple] (batchId=153) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption (batchId=277) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3387/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3387/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3387/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12851083 - PreCommit-HIVE-Build > optimize UDF split > ------------------ > > Key: HIVE-664 > URL: https://issues.apache.org/jira/browse/HIVE-664 > Project: Hive > Issue Type: Bug > Components: UDF > Reporter: Namit Jain > Assignee: Teddy Choi > Labels: optimization > Attachments: HIVE-664.1.patch.txt, HIVE-664.2.patch.txt, > HIVE-664.3.patch.txt, HIVE-664.4.patch > > > Min Zhou added a comment - 21/Jul/09 07:34 AM > It's very useful for us . > some comments: > 1. Can you implement it directly with Text ? Avoiding string decoding and > encoding would be faster. Of course that trick may lead to another problem, > as String.split uses a regular expression for splitting. > 2. getDisplayString() always return a string in lowercase. > [ Show » ] > Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some > comments: > 1. Can you implement it directly with Text ? Avoiding string decoding and > encoding would be faster. Of course that trick may lead to another problem, > as String.split uses a regular expression for splitting. > 2. getDisplayString() always return a string in lowercase. > [ Permlink | « Hide ] > Namit Jain added a comment - 21/Jul/09 09:22 AM > Committed. Thanks Emil > [ Show » ] > Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil > [ Permlink | « Hide ] > Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM > There are some easy (compromise) ways to optimize split: > 1. Check if the regex argument actually contains some "regex specific > characters" and if it doesn't, do a straightforward split without converting > to strings. > 2. Assume some default value for the second argument (for example - > split(str) to be equivalent to split(str, ' ') and optimize for this value > 3. Have two separate split functions - one that does regex and one that > splits around plain text. > I think that 1 is a good choice and can be done rather quickly. > [ Show » ] > Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy > (compromise) ways to optimize split: 1. Check if the regex argument actually > contains some "regex specific characters" and if it doesn't, do a > straightforward split without converting to strings. 2. Assume some default > value for the second argument (for example - split(str) to be equivalent to > split(str, ' ') and optimize for this value 3. Have two separate split > functions - one that does regex and one that splits around plain text. I > think that 1 is a good choice and can be done rather quickly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)