[ https://issues.apache.org/jira/browse/HIVE-20377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582298#comment-16582298 ]
Hive QA commented on HIVE-20377: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12935771/HIVE-20377.8.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/13264/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13264/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13264/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2018-08-16 09:56:40.678 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-13264/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2018-08-16 09:56:40.683 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 109439c HIVE-20393 : Semijoin Reduction : markSemiJoinForDPP behaves inconsistently (Deepak Jaiswal, reviewed by Ashutosh Chauhan) + git clean -f -d Removing standalone-metastore/metastore-server/src/gen/ + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 109439c HIVE-20393 : Semijoin Reduction : markSemiJoinForDPP behaves inconsistently (Deepak Jaiswal, reviewed by Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2018-08-16 09:56:42.010 + rm -rf ../yetus_PreCommit-HIVE-Build-13264 + mkdir ../yetus_PreCommit-HIVE-Build-13264 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-13264 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-13264/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: git apply -p0 /data/hiveptest/working/scratch/build.patch:3991: trailing whitespace. __time timestamp from deserializer /data/hiveptest/working/scratch/build.patch:3992: trailing whitespace. page string from deserializer /data/hiveptest/working/scratch/build.patch:3993: trailing whitespace. user string from deserializer /data/hiveptest/working/scratch/build.patch:3994: trailing whitespace. language string from deserializer /data/hiveptest/working/scratch/build.patch:3995: trailing whitespace. country string from deserializer warning: squelched 13 whitespace errors warning: 18 lines add whitespace errors. + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven protoc-jar: executing: [/tmp/protoc6423815627383744741.exe, --version] libprotoc 2.5.0 protoc-jar: executing: [/tmp/protoc6423815627383744741.exe, -I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore, --java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources, /data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto] ANTLR Parser Generator Version 3.5.2 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (process-resource-bundles) on project hive-shims: Execution process-resource-bundles of goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process failed. ConcurrentModificationException -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :hive-shims + result=1 + '[' 1 -ne 0 ']' + rm -rf yetus_PreCommit-HIVE-Build-13264 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12935771 - PreCommit-HIVE-Build > Hive Kafka Storage Handler > -------------------------- > > Key: HIVE-20377 > URL: https://issues.apache.org/jira/browse/HIVE-20377 > Project: Hive > Issue Type: New Feature > Affects Versions: 4.0.0 > Reporter: slim bouguerra > Assignee: slim bouguerra > Priority: Major > Attachments: HIVE-20377.4.patch, HIVE-20377.5.patch, > HIVE-20377.6.patch, HIVE-20377.8.patch, HIVE-20377.8.patch, HIVE-20377.patch > > > h1. Goal > * Read streaming data form Kafka queue as an external table. > * Allow streaming navigation by pushing down filters on Kafka record > partition id, offset and timestamp. > * Insert streaming data form Kafka to an actual Hive internal table, using > CTAS statement. > h1. Example > h2. Create the external table > {code} > CREATE EXTERNAL TABLE kafka_table (`timestamp` timestamp, page string, `user` > string, language string, added int, deleted int, flags string,comment string, > namespace string) > STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' > TBLPROPERTIES > ("kafka.topic" = "wikipedia", > "kafka.bootstrap.servers"="brokeraddress:9092", > "kafka.serde.class"="org.apache.hadoop.hive.serde2.JsonSerDe"); > {code} > h2. Kafka Metadata > In order to keep track of Kafka records the storage handler will add > automatically the Kafka row metadata eg partition id, record offset and > record timestamp. > {code} > DESCRIBE EXTENDED kafka_table > timestamp timestamp from deserializer > page string from deserializer > user string from deserializer > language string from deserializer > country string from deserializer > continent string from deserializer > namespace string from deserializer > newpage boolean from deserializer > unpatrolled boolean from deserializer > anonymous boolean from deserializer > robot boolean from deserializer > added int from deserializer > deleted int from deserializer > delta bigint from deserializer > __partition int from deserializer > __offset bigint from deserializer > __timestamp bigint from deserializer > {code} > h2. Filter push down. > Newer Kafka consumers 0.11.0 and higher allow seeking on the stream based on > a given offset. The proposed storage handler will be able to leverage such > API by pushing down filters over metadata columns, namely __partition (int), > __offset(long) and __timestamp(long) > For instance Query like > {code} > select `__offset` from kafka_table where (`__offset` < 10 and `__offset`>3 > and `__partition` = 0) or (`__partition` = 0 and `__offset` < 105 and > `__offset` > 99) or (`__offset` = 109); > {code} > Will result on a scan of partition 0 only then read only records between > offset 4 and 109. > h2. With timestamp seeks > The seeking based on the internal timestamps allows the handler to run on > recently arrived data, by doing > {code} > select count(*) from kafka_table where `__timestamp` > 1000 * > to_unix_timestamp(CURRENT_TIMESTAMP - interval '20' hours) ; > {code} > This allows for implicit relationships between event timestamps and kafka > timestamps to be expressed in queries (i.e event_timestamp is always < than > kafka __timestamp and kafka __timestamp is never > 15 minutes from event etc). -- This message was sent by Atlassian JIRA (v7.6.3#76005)