[jira] [Commented] (HIVE-20377) Hive Kafka Storage Handler

Hive QA (JIRA) Thu, 16 Aug 2018 02:58:26 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-20377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582298#comment-16582298
 ]


Hive QA commented on HIVE-20377:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12935771/HIVE-20377.8.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13264/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13264/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13264/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2018-08-16 09:56:40.678
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-13264/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-08-16 09:56:40.683
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 109439c HIVE-20393 : Semijoin Reduction : markSemiJoinForDPP 
behaves inconsistently (Deepak Jaiswal, reviewed by Ashutosh Chauhan)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 109439c HIVE-20393 : Semijoin Reduction : markSemiJoinForDPP 
behaves inconsistently (Deepak Jaiswal, reviewed by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-08-16 09:56:42.010
+ rm -rf ../yetus_PreCommit-HIVE-Build-13264
+ mkdir ../yetus_PreCommit-HIVE-Build-13264
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-13264
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-13264/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: git apply -p0
/data/hiveptest/working/scratch/build.patch:3991: trailing whitespace.
__time                  timestamp               from deserializer   
/data/hiveptest/working/scratch/build.patch:3992: trailing whitespace.
page                    string                  from deserializer   
/data/hiveptest/working/scratch/build.patch:3993: trailing whitespace.
user                    string                  from deserializer   
/data/hiveptest/working/scratch/build.patch:3994: trailing whitespace.
language                string                  from deserializer   
/data/hiveptest/working/scratch/build.patch:3995: trailing whitespace.
country                 string                  from deserializer   
warning: squelched 13 whitespace errors
warning: 18 lines add whitespace errors.
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
protoc-jar: executing: [/tmp/protoc6423815627383744741.exe, --version]
libprotoc 2.5.0
protoc-jar: executing: [/tmp/protoc6423815627383744741.exe, 
-I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore,
 
--java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources,
 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto]
ANTLR Parser Generator  Version 3.5.2
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process 
(process-resource-bundles) on project hive-shims: Execution 
process-resource-bundles of goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process failed. 
ConcurrentModificationException -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hive-shims
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-13264
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12935771 - PreCommit-HIVE-Build

> Hive Kafka Storage Handler
> --------------------------
>
>                 Key: HIVE-20377
>                 URL: https://issues.apache.org/jira/browse/HIVE-20377
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 4.0.0
>            Reporter: slim bouguerra
>            Assignee: slim bouguerra
>            Priority: Major
>         Attachments: HIVE-20377.4.patch, HIVE-20377.5.patch, 
> HIVE-20377.6.patch, HIVE-20377.8.patch, HIVE-20377.8.patch, HIVE-20377.patch
>
>
> h1. Goal
> * Read streaming data form Kafka queue as an external table.
> * Allow streaming navigation by pushing down filters on Kafka record 
> partition id, offset and timestamp. 
> * Insert streaming data form Kafka to an actual Hive internal table, using 
> CTAS statement.
> h1. Example
> h2. Create the external table
> {code} 
> CREATE EXTERNAL TABLE kafka_table (`timestamp` timestamp, page string, `user` 
> string, language string, added int, deleted int, flags string,comment string, 
> namespace string)
> STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
> TBLPROPERTIES 
> ("kafka.topic" = "wikipedia", 
> "kafka.bootstrap.servers"="brokeraddress:9092",
> "kafka.serde.class"="org.apache.hadoop.hive.serde2.JsonSerDe");
> {code}
> h2. Kafka Metadata
> In order to keep track of Kafka records the storage handler will add 
> automatically the Kafka row metadata eg partition id, record offset and 
> record timestamp. 
> {code}
> DESCRIBE EXTENDED kafka_table
> timestamp                     timestamp               from deserializer   
> page                  string                  from deserializer   
> user                  string                  from deserializer   
> language              string                  from deserializer   
> country               string                  from deserializer   
> continent             string                  from deserializer   
> namespace             string                  from deserializer   
> newpage               boolean                 from deserializer   
> unpatrolled           boolean                 from deserializer   
> anonymous             boolean                 from deserializer   
> robot                 boolean                 from deserializer   
> added                 int                     from deserializer   
> deleted               int                     from deserializer   
> delta                 bigint                  from deserializer   
> __partition           int                     from deserializer   
> __offset              bigint                  from deserializer   
> __timestamp           bigint                  from deserializer   
> {code}
> h2. Filter push down.
> Newer Kafka consumers 0.11.0 and higher allow seeking on the stream based on 
> a given offset. The proposed storage handler will be able to leverage such 
> API by pushing down filters over metadata columns, namely __partition (int), 
> __offset(long) and __timestamp(long)
> For instance Query like
> {code} 
> select `__offset` from kafka_table where (`__offset` < 10 and `__offset`>3 
> and `__partition` = 0) or (`__partition` = 0 and `__offset` < 105 and 
> `__offset` > 99) or (`__offset` = 109);
> {code}
> Will result on a scan of partition 0 only then read only records between 
> offset 4 and 109. 
> h2. With timestamp seeks 
> The seeking based on the internal timestamps allows the handler to run on 
> recently arrived data, by doing
> {code}
> select count(*) from kafka_table where `__timestamp` >  1000 * 
> to_unix_timestamp(CURRENT_TIMESTAMP - interval '20' hours) ;
> {code}
> This allows for implicit relationships between event timestamps and kafka 
> timestamps to be expressed in queries (i.e event_timestamp is always < than 
> kafka __timestamp and kafka __timestamp is never > 15 minutes from event etc).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20377) Hive Kafka Storage Handler

Reply via email to