This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository 
https://gitbox.apache.org/repos/asf/incubator-doris-spark-connector.git

commit d1299981bcae2258dbef12fa71d038842f0afb70
Author: Mingyu Chen <morningman....@gmail.com>
AuthorDate: Tue May 19 14:20:21 2020 +0800

    [Spark on Doris] Shade and provide the thrift lib in spark-doris-connector 
(#3631)
    
    Mainly changes:
    1. Shade and provide the thrift lib in spark-doris-connector
    2. Add a `build.sh` for spark-doris-connector
    3. Move the README.md of spark-doris-connector to `docs/`
    4. Change the line delimiter of 
`fe/src/test/java/org/apache/doris/analysis/AggregateTest.java`
---
 README.md | 150 --------------------------------------------------------------
 build.sh  |  59 ++++++++++++++++++++++++
 pom.xml   |  59 +++++++++++++++++++++---
 3 files changed, 112 insertions(+), 156 deletions(-)

diff --git a/README.md b/README.md
deleted file mode 100644
index 3c41b93..0000000
--- a/README.md
+++ /dev/null
@@ -1,150 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-#  Spark-Doris-Connector
-
-## Fetures
-
-- 当前版本只支持从`Doris`中读取数据。
-- 可以将`Doris`表映射为`DataFrame`或者`RDD`,推荐使用`DataFrame`。
-- 支持在`Doris`端完成数据过滤,减少数据传输量。
-
-##  Version Compatibility
-
-| Connector | Spark | Doris  | Java | Scala |
-| --------- | ----- | ------ | ---- | ----- |
-| 1.0.0     | 2.x   | master | 8    | 2.11  |
-
-
-
-## Building
-
-```bash
-mvn clean package
-```
-
-编译成功后,会在`target`目录下生成文件`doris-spark-1.0.0-SNAPSHOT.jar`。将此文件复制到`Spark`的`ClassPath`中即可使用`Spark-Doris-Connector`。例如,`Local`模式运行的`Spark`,将此文件放入`jars`文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
-
-## QuickStart
-
-### SQL
-
-```sql
-CREATE TEMPORARY VIEW spark_doris
-USING doris
-OPTIONS(
-  "table.identifier"="$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME",
-  "fenodes"="$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT",
-  "user"="$YOUR_DORIS_USERNAME",
-  "password"="$YOUR_DORIS_PASSWORD"
-);
-
-SELECT * FROM spark_doris;
-```
-
-### DataFrame
-
-```scala
-val dorisSparkDF = spark.read.format("doris")
-  .option("doris.table.identifier", 
"$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
-       .option("doris.fenodes", 
"$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
-  .option("user", "$YOUR_DORIS_USERNAME")
-  .option("password", "$YOUR_DORIS_PASSWORD")
-  .load()
-
-dorisSparkDF.show(5)
-```
-
-### RDD
-
-```scala
-import org.apache.doris.spark._
-val dorisSparkRDD = sc.dorisRDD(
-  tableIdentifier = Some("$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME"),
-  cfg = Some(Map(
-    "doris.fenodes" -> "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT",
-    "doris.request.auth.user" -> "$YOUR_DORIS_USERNAME",
-    "doris.request.auth.password" -> "$YOUR_DORIS_PASSWORD"
-  ))
-)
-
-dorisSparkRDD.collect()
-```
-
-## Configuration
-
-### General
-
-| Key                              | Default Value     | Comment               
                                       |
-| -------------------------------- | ----------------- | 
------------------------------------------------------------ |
-| doris.fenodes                    | --                | Doris 
Restful接口地址,支持多个地址,使用逗号分隔            |
-| doris.table.identifier           | --                | 
DataFame/RDD对应的Doris表名                                  |
-| doris.request.retries            | 3                 | 向Doris发送请求的重试次数       
                             |
-| doris.request.connect.timeout.ms | 30000             | 向Doris发送请求的连接超时时间     
                           |
-| doris.request.read.timeout.ms    | 30000             | 向Doris发送请求的读取超时时间     
                           |
-| doris.request.query.timeout.s    | 3600              | 
查询doris的超时时间,默认值为1小时,-1表示无超时限制             |
-| doris.request.tablet.size        | Integer.MAX_VALUE | 一个RDD 
Partition对应的Doris Tablet个数。<br />此数值设置越小,则会生成越多的Partition。<br 
/>从而提升Spark侧的并行度,但同时会对Doris造成更大的压力。 |
-| doris.batch.size                 | 1024              | 一次从BE读取数据的最大行数。<br 
/>增大此数值可减少Spark与Doris之间建立连接的次数。<br />从而减轻网络延迟所带来的的额外时间开销。 |
-| doris.exec.mem.limit             | 2147483648        | 单个查询的内存限制。默认为 
2GB,单位为字节                      |
-| doris.deserialize.arrow.async    | false             | 
是否支持异步转换Arrow格式到spark-doris-connector迭代所需的RowBatch                 |
-| doris.deserialize.queue.size     | 64                | 
异步转换Arrow格式的内部处理队列,当doris.deserialize.arrow.async为true时生效        |
-
-### SQL and Dataframe Only
-
-| Key                             | Default Value | Comment                    
                                  |
-| ------------------------------- | ------------- | 
------------------------------------------------------------ |
-| user                            | --            | 访问Doris的用户名                
                            |
-| password                        | --            | 访问Doris的密码                 
                             |
-| doris.filter.query.in.max.count | 100           | 
谓词下推中,in表达式value列表元素最大数量。<br />超过此数量,则in表达式条件过滤在Spark侧处理。 |
-
-### RDD Only
-
-| Key                         | Default Value | Comment                        
                              |
-| --------------------------- | ------------- | 
------------------------------------------------------------ |
-| doris.request.auth.user     | --            | 访问Doris的用户名                    
                        |
-| doris.request.auth.password | --            | 访问Doris的密码                     
                         |
-| doris.read.field            | --            | 读取Doris表的列名列表,多列之间使用逗号分隔       
           |
-| doris.filter.query          | --            | 过滤读取数据的表达式,此表达式透传给Doris。<br 
/>Doris使用此表达式完成源端数据过滤。 |
-
-
-
-## Doris Data Type - Spark Data Type Mapping
-
-| Doris Type | Spark Type                       |
-| ---------- | -------------------------------- |
-| NULL_TYPE  | DataTypes.NullType               |
-| BOOLEAN    | DataTypes.BooleanType            |
-| TINYINT    | DataTypes.ByteType               |
-| SMALLINT   | DataTypes.ShortType              |
-| INT        | DataTypes.IntegerType            |
-| BIGINT     | DataTypes.LongType               |
-| FLOAT      | DataTypes.FloatType              |
-| DOUBLE     | DataTypes.DoubleType             |
-| DATE       | DataTypes.StringType<sup>1</sup> |
-| DATETIME   | DataTypes.StringType<sup>1</sup> |
-| BINARY     | DataTypes.BinaryType             |
-| DECIMAL    | DecimalType                      |
-| CHAR       | DataTypes.StringType             |
-| LARGEINT   | DataTypes.StringType             |
-| VARCHAR    | DataTypes.StringType             |
-| DECIMALV2  | DecimalType                      |
-| TIME       | DataTypes.DoubleType             |
-| HLL        | Unsupported datatype             |
-
-<sup>1</sup>: 
Connector中,将`DATE`和`DATETIME`映射为`String`。由于`Doris`底层存储引擎处理逻辑,直接使用时间类型时,覆盖的时间范围无法满足需求。所以使用`String`类型直接返回对应的时间可读文本。
\ No newline at end of file
diff --git a/build.sh b/build.sh
new file mode 100755
index 0000000..9119841
--- /dev/null
+++ b/build.sh
@@ -0,0 +1,59 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+##############################################################
+# This script is used to compile Spark-Doris-Connector
+# Usage:
+#    sh build.sh
+#
+##############################################################
+
+set -eo pipefail
+
+ROOT=`dirname "$0"`
+ROOT=`cd "$ROOT"; pwd`
+
+export DORIS_HOME=${ROOT}/../../
+
+# include custom environment variables
+if [[ -f ${DORIS_HOME}/custom_env.sh ]]; then
+    . ${DORIS_HOME}/custom_env.sh
+fi
+
+# check maven
+MVN_CMD=mvn
+if [[ ! -z ${CUSTOM_MVN} ]]; then
+    MVN_CMD=${CUSTOM_MVN}
+fi
+if ! ${MVN_CMD} --version; then
+    echo "Error: mvn is not found"
+    exit 1
+fi
+export MVN_CMD
+
+${MVN_CMD} clean package
+
+
+mkdir -p output/
+cp target/doris-spark-1.0.0-SNAPSHOT.jar ./output/
+
+echo "*****************************************"
+echo "Successfully build Spark-Doris-Connector"
+echo "*****************************************"
+
+exit 0
diff --git a/pom.xml b/pom.xml
index 35986ad..cdf1055 100644
--- a/pom.xml
+++ b/pom.xml
@@ -36,6 +36,50 @@
         <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
     </properties>
 
+    <profiles>
+        <!-- for custom internal repository -->
+        <profile>
+            <id>custom-env</id>
+            <activation>
+                <property>
+                    <name>env.CUSTOM_MAVEN_REPO</name>
+                </property>
+            </activation>
+
+            <repositories>
+                <repository>
+                    <id>custom-nexus</id>
+                    <url>${env.CUSTOM_MAVEN_REPO}</url>
+                </repository>
+            </repositories>
+
+            <pluginRepositories>
+                <pluginRepository>
+                    <id>custom-nexus</id>
+                    <url>${env.CUSTOM_MAVEN_REPO}</url>
+                </pluginRepository>
+            </pluginRepositories>
+        </profile>
+
+        <!-- for general repository -->
+        <profile>
+            <id>general-env</id>
+            <activation>
+                <property>
+                    <name>!env.CUSTOM_MAVEN_REPO</name>
+                </property>
+            </activation>
+
+            <repositories>
+                <repository>
+                    <id>central</id>
+                    <name>central maven repo https</name>
+                    <url>https://repo.maven.apache.org/maven2</url>
+                </repository>
+            </repositories>
+        </profile>
+    </profiles>
+
     <dependencies>
         <dependency>
             <groupId>org.apache.spark</groupId>
@@ -53,7 +97,6 @@
             <groupId>org.apache.thrift</groupId>
             <artifactId>libthrift</artifactId>
             <version>${libthrift.version}</version>
-            <scope>provided</scope>
         </dependency>
         <dependency>
             <groupId>org.apache.arrow</groupId>
@@ -150,23 +193,27 @@
                     <relocations>
                         <relocation>
                             <pattern>org.apache.arrow</pattern>
-                            
<shadedPattern>org.apache.doris.arrow</shadedPattern>
+                            
<shadedPattern>org.apache.doris.shaded.org.apache.arrow</shadedPattern>
                         </relocation>
                         <relocation>
                             <pattern>io.netty</pattern>
-                            
<shadedPattern>org.apache.doris.netty</shadedPattern>
+                            
<shadedPattern>org.apache.doris.shaded.io.netty</shadedPattern>
                         </relocation>
                         <relocation>
                             <pattern>com.fasterxml.jackson</pattern>
-                            
<shadedPattern>org.apache.doris.jackson</shadedPattern>
+                            
<shadedPattern>org.apache.doris.shaded.com.fasterxml.jackson</shadedPattern>
                         </relocation>
                         <relocation>
                             <pattern>org.apache.commons.codec</pattern>
-                            
<shadedPattern>org.apache.doris.commons.codec</shadedPattern>
+                            
<shadedPattern>org.apache.doris.shaded.org.apache.commons.codec</shadedPattern>
                         </relocation>
                         <relocation>
                             <pattern>com.google.flatbuffers</pattern>
-                            
<shadedPattern>org.apache.doris.flatbuffers</shadedPattern>
+                            
<shadedPattern>org.apache.doris.shaded.com.google.flatbuffers</shadedPattern>
+                        </relocation>
+                        <relocation>
+                            <pattern>org.apache.thrift</pattern>
+                            
<shadedPattern>org.apache.doris.shaded.org.apache.thrift</shadedPattern>
                         </relocation>
                     </relocations>
                 </configuration>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to