vinson0526 commented on a change in pull request #2228: add 
spark-doris-connector extension
URL: https://github.com/apache/incubator-doris/pull/2228#discussion_r348324265
 
 

 ##########
 File path: extension/spark-doris-connector/README.md
 ##########
 @@ -0,0 +1,126 @@
+#  Spark-Doris-Connector
+
+## Fetures
+
+- 当前版本只支持从`Doris`中读取数据。
+- 可以将`Doris`表映射为`DataFrame`或者`RDD`,推荐使用`DataFrame`。
+- 支持在`Doris`端完成数据过滤,减少数据传输量。
+
+##  Version Compatibility
+
+| Connector | Spark | Doris  | Java | Scala |
+| --------- | ----- | ------ | ---- | ----- |
+| 1.0.0     | 2.x   | master | 8    | 2.11  |
+
+
+
+## Building
+
+```bash
+mvn clean package
+```
+
+
+
+## QuickStart
+
+### SQL
+
+```sql
+CREATE TEMPORARY VIEW spark_doris
+USING doris
+OPTIONS(
+  "table.identifier"="$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME",
+  "fenodes"="$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT",
+  "user"="$YOUR_DORIS_USERNAME",
+  "password"="$YOUR_DORIS_PASSWORD"
+);
+
+SELECT * FROM spark_doris;
+```
+
+### DataFrame
+
+```scala
+val dorisSparkDF = spark.read.format("doris")
+  .option("doris.table.identifier", 
"$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
+       .option("doris.fenodes", 
"$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
+  .option("user", "$YOUR_DORIS_USERNAME")
+  .option("password", "$YOUR_DORIS_PASSWORD")
+
+dorisSparkDF.show(5)
+```
+
+### RDD
+
+```scala
+import org.apache.doris.spark._
+val dorisSparkRDD = sc.dorisRDD(
+  tableIdentifier = Some("$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME"),
+  cfg = Some(Map(
+    "doris.fenodes" -> "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT",
+    "doris.request.auth.user" -> "$YOUR_DORIS_USERNAME",
+    "doris.request.auth.password" -> "$YOUR_DORIS_PASSWORD"
+  ))
+)
+
+dorisSparkRDD.collect()
+```
+
+## Configuration
+
+### General
+
+| Key                              | Default Value     | Comment               
                            |
+| -------------------------------- | ----------------- | 
------------------------------------------------- |
+| doris.fenodes                    | --                | Doris 
Restful接口地址,支持多个地址,使用逗号分隔 |
+| doris.table.identifier           | --                | 
DataFame/RDD对应的Doris表名                       |
+| doris.request.retries            | 3                 | 向Doris发送请求的重试次数       
                  |
+| doris.request.connect.timeout.ms | 30000             | 向Doris发送请求的连接超时时间     
                |
+| doris.request.read.timeout.ms    | 30000             | 向Doris发送请求的读取超时时间     
                |
+| doris.request.tablet.size        | Integer.MAX_VALUE | 一个RDD 
Partition对应的Doris Tablet个数           |
+| doris.batch.size                 | 1024              | 一次从BE读取数据的最大行数        
                |
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to