wuchong commented on code in PR #20719: URL: https://github.com/apache/flink/pull/20719#discussion_r969396810
########## docs/content/docs/dev/table/hiveCompatibility/hiveserver2.md: ########## @@ -0,0 +1,306 @@ +--- +title: HiveServer2 Endpoint +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveserver2.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# HiveServer2 Endpoint + +HiveServer2 Endpoint is compatible with HiveServer2 API and allows users to submit SQL in Hive style. Review Comment: ```suggestion [Flink SQL Gateway](add a link) supports deploying as a HiveServer2 Endpoint which is compatible with [HiveServer2](https://cwiki.apache.org/confluence/display/hive/hiveserver2+overview) wire protocol and allows users to interact (e.g. submit Hive SQL) with Flink SQL Gateway with existing Hive clients, such as Hive JDBC, Beeline, DBeaver, Apache Superset and so on. ``` ########## docs/content/docs/dev/table/hiveCompatibility/hiveserver2.md: ########## @@ -0,0 +1,306 @@ +--- +title: HiveServer2 Endpoint +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveserver2.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# HiveServer2 Endpoint + +HiveServer2 Endpoint is compatible with HiveServer2 API and allows users to submit SQL in Hive style. + +Setting Up +---------------- +Before the trip of the SQL Gateway with the HiveServer2 Endpoint, please prepare the required [dependencies]({{< ref "docs/connectors/table/hive/overview#dependencies" >}}). + +### Configure HiveServer2 Endpoint + +The HiveServer2 endpoint is not the default endpoint for the SQL Gateway. You can configure to use the HiveServer2 endpoint by calling +```bash +$ ./bin/sql-gateway.sh start -Dsql-gateway.endpoint.type=hiveserver2 -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=<path to hive conf> +``` + +or modify the valid [Flink configuration]({{< ref "docs/dev/table/config" >}}) entry. + +### Connecting to HiveServer2 + +After starting the SQL Gateway, you are able to submit SQL with Apache Hive Beeline. + +```bash +$ ./beeline +SLF4J: Class path contains multiple SLF4J bindings. +SLF4J: Found binding in [jar:file:/Users/ohmeatball/Work/hive-related/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] +SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/3.2.1_1/libexec/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] +SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. +SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] +Beeline version 2.3.9 by Apache Hive +beeline> !connect jdbc:hive2://localhost:10000/default;auth=noSasl +Connecting to jdbc:hive2://localhost:10000/default;auth=noSasl +Enter username for jdbc:hive2://localhost:10000/default: +Enter password for jdbc:hive2://localhost:10000/default: +Connected to: Apache Flink (version 1.16) +Driver: Hive JDBC (version 2.3.9) +Transaction isolation: TRANSACTION_REPEATABLE_READ +0: jdbc:hive2://localhost:10000/default> CREATE TABLE Source ( +. . . . . . . . . . . . . . . . . . . .> a INT, +. . . . . . . . . . . . . . . . . . . .> b STRING +. . . . . . . . . . . . . . . . . . . .> ); ++---------+ +| result | ++---------+ +| OK | ++---------+ +0: jdbc:hive2://localhost:10000/default> CREATE TABLE Sink ( +. . . . . . . . . . . . . . . . . . . .> a INT, +. . . . . . . . . . . . . . . . . . . .> b STRING +. . . . . . . . . . . . . . . . . . . .> ); ++---------+ +| result | ++---------+ +| OK | ++---------+ +0: jdbc:hive2://localhost:10000/default> INSERT INTO Sink SELECT * FROM Source; ++-----------------------------------+ +| job id | ++-----------------------------------+ +| 55ff290b57829998ea6e9acc240a0676 | ++-----------------------------------+ +1 row selected (2.427 seconds) +``` + +Endpoint Options +---------------- + +Below are the options supported when creating a HiveServer2 endpoint instance with YAML file or DDL. + +<table class="configuration table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 20%">Key</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-left" style="width: 7%">Default</th> + <th class="text-left" style="width: 10%">Type</th> + <th class="text-left" style="width: 55%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td><h5>sql-gateway.endpoint.type</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">"rest"</td> + <td>List<String></td> + <td>Specify which endpoint to use, here should be 'hiveserver2'.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>URI to your Hive conf dir containing hive-site.xml. The URI needs to be supported by Hadoop FileSystem. If the URI is relative, i.e. without a scheme, local file system is assumed. If the option is not specified, hive-site.xml is searched in class path.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.default-database</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"default"</td> + <td>String</td> + <td>The default database to use when the catalog is set as the current catalog.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.name</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"hive"</td> + <td>String</td> + <td>Name for the pre-registered hive catalog.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.module.name</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"hive"</td> + <td>String</td> + <td>Name for the pre-registered hive module.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.exponential.backoff.slot.length</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">100 ms</td> + <td>Duration</td> + <td>Binary exponential backoff slot time for Thrift clients during login to HiveServer2,for retries until hitting Thrift client timeout</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.host</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>The server address of HiveServer2 host to be used for communication.Default is empty, which means the to bind to the localhost. This is only necessary if the host has multiple network addresses.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.login.timeout</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">20 s</td> + <td>Duration</td> + <td>Timeout for Thrift clients during login to HiveServer2</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.max.message.size</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">104857600</td> + <td>Long</td> + <td>Maximum message size in bytes a HS2 server will accept.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.port</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">10000</td> + <td>Integer</td> + <td>The port of the HiveServer2 endpoint.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.keepalive-time</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">1 min</td> + <td>Duration</td> + <td>Keepalive time for an idle worker thread. When the number of workers exceeds min workers, excessive threads are killed after this time interval.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.threads.max</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">512</td> + <td>Integer</td> + <td>The maximum number of Thrift worker threads</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.threads.min</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">5</td> + <td>Integer</td> + <td>The minimum number of Thrift worker threads</td> + </tr> + </tbody> +</table> + +Features +---------------- + +### Built-in HiveCatalog and Hive Dialect + +The SQL Gateway with HiveServer2 endpoint aims to provide the same experience compared to the HiveServer2. When users connect to the HiveServer2, +the HiveServer2 endpoint creates the Hive Catalog as the default catalog, switches to the Hive dialect, and executes the SQL in batch mode for the session. +Users can submit the Hive SQL in Hive style but execute it in the Flink environment. + +### Integrate into the Hive Ecosystem + +The HiveServer2 endpoint extends the HiveServer2 API. Therefore, the tools that manage the Hive SQL also work for +the SQL Gateway with the HiveServer2 endpoint. Currently, Hive JDBC, Hive Beeline, and Dbeaver can connect to the Review Comment: ```suggestion the SQL Gateway with the HiveServer2 endpoint. Currently, Hive JDBC, Hive Beeline, Dbeaver, Apache Superset and so on are tested to be able to connect to the ``` ########## docs/content/docs/dev/table/sql-gateway/rest.md: ########## @@ -0,0 +1,119 @@ +--- +title: REST Endpoint +weight: 2 +type: docs +aliases: +- /dev/table/sql-gateway/rest.html Review Comment: Please add a "HiveServer2 Endpoint" page, but soft link to `/dev/table/hiveCompatibility/hiveserver2.html`. ########## docs/content/docs/dev/table/sql-gateway/overview.md: ########## @@ -0,0 +1,236 @@ +--- +title: Overview +weight: 1 +type: docs +aliases: +- /dev/table/sql-gateway.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Introduction +---------------- + +The SQL Gateway is a service that enables multiple clients from the remote to execute SQL in concurrency. It provides +an easy way to submit the Flink Job, look up the metadata, and analyze the data online. + +The SQL Gateway is composed of pluggable endpoints and the `SqlGatewayService`. The `SqlGatewayService` is a processor that is +reused by the endpoints to handle the requests. The endpoint is an entry point that allows users to connect. Depending on the +type of the endpoints, users can use different utils to connect. + +{{< img width="80%" src="/fig/sql-gateway-architecture.png" alt="SQL Gateway Architecture" >}} Review Comment: Update the picture with more tools, such as Hive Beeline CLI, DolphinScheduler, Zepplin. ########## docs/content/docs/dev/table/hiveCompatibility/hiveserver2.md: ########## @@ -0,0 +1,306 @@ +--- +title: HiveServer2 Endpoint +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveserver2.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# HiveServer2 Endpoint + +HiveServer2 Endpoint is compatible with HiveServer2 API and allows users to submit SQL in Hive style. + +Setting Up +---------------- +Before the trip of the SQL Gateway with the HiveServer2 Endpoint, please prepare the required [dependencies]({{< ref "docs/connectors/table/hive/overview#dependencies" >}}). + +### Configure HiveServer2 Endpoint + +The HiveServer2 endpoint is not the default endpoint for the SQL Gateway. You can configure to use the HiveServer2 endpoint by calling +```bash +$ ./bin/sql-gateway.sh start -Dsql-gateway.endpoint.type=hiveserver2 -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=<path to hive conf> +``` + +or modify the valid [Flink configuration]({{< ref "docs/dev/table/config" >}}) entry. Review Comment: I think the "docs/dev/table/config" doesn't help much for users, because there's no SQL Gateway configuration there. ########## docs/content/docs/dev/table/hiveCompatibility/hiveserver2.md: ########## @@ -0,0 +1,306 @@ +--- +title: HiveServer2 Endpoint +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveserver2.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# HiveServer2 Endpoint + +HiveServer2 Endpoint is compatible with HiveServer2 API and allows users to submit SQL in Hive style. + +Setting Up +---------------- +Before the trip of the SQL Gateway with the HiveServer2 Endpoint, please prepare the required [dependencies]({{< ref "docs/connectors/table/hive/overview#dependencies" >}}). + +### Configure HiveServer2 Endpoint + +The HiveServer2 endpoint is not the default endpoint for the SQL Gateway. You can configure to use the HiveServer2 endpoint by calling +```bash +$ ./bin/sql-gateway.sh start -Dsql-gateway.endpoint.type=hiveserver2 -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=<path to hive conf> +``` + +or modify the valid [Flink configuration]({{< ref "docs/dev/table/config" >}}) entry. Review Comment: ```suggestion or add the following configuration into `conf/flink-conf.yaml` (please replace the `<path to hive conf>` with your hive conf path). ```yaml sql-gateway.endpoint.type: hiveserver2 sql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir: <path to hive conf> ``` ``` ########## docs/content/docs/dev/table/hiveCompatibility/hiveserver2.md: ########## @@ -0,0 +1,306 @@ +--- +title: HiveServer2 Endpoint +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveserver2.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# HiveServer2 Endpoint + +HiveServer2 Endpoint is compatible with HiveServer2 API and allows users to submit SQL in Hive style. + +Setting Up +---------------- +Before the trip of the SQL Gateway with the HiveServer2 Endpoint, please prepare the required [dependencies]({{< ref "docs/connectors/table/hive/overview#dependencies" >}}). + +### Configure HiveServer2 Endpoint + +The HiveServer2 endpoint is not the default endpoint for the SQL Gateway. You can configure to use the HiveServer2 endpoint by calling +```bash +$ ./bin/sql-gateway.sh start -Dsql-gateway.endpoint.type=hiveserver2 -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=<path to hive conf> +``` + +or modify the valid [Flink configuration]({{< ref "docs/dev/table/config" >}}) entry. + +### Connecting to HiveServer2 + +After starting the SQL Gateway, you are able to submit SQL with Apache Hive Beeline. + +```bash +$ ./beeline +SLF4J: Class path contains multiple SLF4J bindings. +SLF4J: Found binding in [jar:file:/Users/ohmeatball/Work/hive-related/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] +SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/3.2.1_1/libexec/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] +SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. +SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] +Beeline version 2.3.9 by Apache Hive +beeline> !connect jdbc:hive2://localhost:10000/default;auth=noSasl +Connecting to jdbc:hive2://localhost:10000/default;auth=noSasl +Enter username for jdbc:hive2://localhost:10000/default: +Enter password for jdbc:hive2://localhost:10000/default: +Connected to: Apache Flink (version 1.16) +Driver: Hive JDBC (version 2.3.9) +Transaction isolation: TRANSACTION_REPEATABLE_READ +0: jdbc:hive2://localhost:10000/default> CREATE TABLE Source ( +. . . . . . . . . . . . . . . . . . . .> a INT, +. . . . . . . . . . . . . . . . . . . .> b STRING +. . . . . . . . . . . . . . . . . . . .> ); ++---------+ +| result | ++---------+ +| OK | ++---------+ +0: jdbc:hive2://localhost:10000/default> CREATE TABLE Sink ( +. . . . . . . . . . . . . . . . . . . .> a INT, +. . . . . . . . . . . . . . . . . . . .> b STRING +. . . . . . . . . . . . . . . . . . . .> ); ++---------+ +| result | ++---------+ +| OK | ++---------+ +0: jdbc:hive2://localhost:10000/default> INSERT INTO Sink SELECT * FROM Source; ++-----------------------------------+ +| job id | ++-----------------------------------+ +| 55ff290b57829998ea6e9acc240a0676 | ++-----------------------------------+ +1 row selected (2.427 seconds) +``` + +Endpoint Options +---------------- + +Below are the options supported when creating a HiveServer2 endpoint instance with YAML file or DDL. + +<table class="configuration table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 20%">Key</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-left" style="width: 7%">Default</th> + <th class="text-left" style="width: 10%">Type</th> + <th class="text-left" style="width: 55%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td><h5>sql-gateway.endpoint.type</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">"rest"</td> + <td>List<String></td> + <td>Specify which endpoint to use, here should be 'hiveserver2'.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>URI to your Hive conf dir containing hive-site.xml. The URI needs to be supported by Hadoop FileSystem. If the URI is relative, i.e. without a scheme, local file system is assumed. If the option is not specified, hive-site.xml is searched in class path.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.default-database</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"default"</td> + <td>String</td> + <td>The default database to use when the catalog is set as the current catalog.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.name</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"hive"</td> + <td>String</td> + <td>Name for the pre-registered hive catalog.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.module.name</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"hive"</td> + <td>String</td> + <td>Name for the pre-registered hive module.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.exponential.backoff.slot.length</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">100 ms</td> + <td>Duration</td> + <td>Binary exponential backoff slot time for Thrift clients during login to HiveServer2,for retries until hitting Thrift client timeout</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.host</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>The server address of HiveServer2 host to be used for communication.Default is empty, which means the to bind to the localhost. This is only necessary if the host has multiple network addresses.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.login.timeout</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">20 s</td> + <td>Duration</td> + <td>Timeout for Thrift clients during login to HiveServer2</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.max.message.size</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">104857600</td> + <td>Long</td> + <td>Maximum message size in bytes a HS2 server will accept.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.port</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">10000</td> + <td>Integer</td> + <td>The port of the HiveServer2 endpoint.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.keepalive-time</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">1 min</td> + <td>Duration</td> + <td>Keepalive time for an idle worker thread. When the number of workers exceeds min workers, excessive threads are killed after this time interval.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.threads.max</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">512</td> + <td>Integer</td> + <td>The maximum number of Thrift worker threads</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.threads.min</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">5</td> + <td>Integer</td> + <td>The minimum number of Thrift worker threads</td> + </tr> + </tbody> +</table> + +Features +---------------- + +### Built-in HiveCatalog and Hive Dialect + +The SQL Gateway with HiveServer2 endpoint aims to provide the same experience compared to the HiveServer2. When users connect to the HiveServer2, +the HiveServer2 endpoint creates the Hive Catalog as the default catalog, switches to the Hive dialect, and executes the SQL in batch mode for the session. +Users can submit the Hive SQL in Hive style but execute it in the Flink environment. + +### Integrate into the Hive Ecosystem Review Comment: Rename the section name to "Clients" and use H2 titles to display the supported clients in TOC. ########## docs/content/docs/dev/table/hiveCompatibility/hiveserver2.md: ########## @@ -0,0 +1,306 @@ +--- +title: HiveServer2 Endpoint +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveserver2.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# HiveServer2 Endpoint + +HiveServer2 Endpoint is compatible with HiveServer2 API and allows users to submit SQL in Hive style. + +Setting Up +---------------- +Before the trip of the SQL Gateway with the HiveServer2 Endpoint, please prepare the required [dependencies]({{< ref "docs/connectors/table/hive/overview#dependencies" >}}). + +### Configure HiveServer2 Endpoint + +The HiveServer2 endpoint is not the default endpoint for the SQL Gateway. You can configure to use the HiveServer2 endpoint by calling +```bash +$ ./bin/sql-gateway.sh start -Dsql-gateway.endpoint.type=hiveserver2 -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=<path to hive conf> +``` + +or modify the valid [Flink configuration]({{< ref "docs/dev/table/config" >}}) entry. + +### Connecting to HiveServer2 + +After starting the SQL Gateway, you are able to submit SQL with Apache Hive Beeline. + +```bash +$ ./beeline +SLF4J: Class path contains multiple SLF4J bindings. +SLF4J: Found binding in [jar:file:/Users/ohmeatball/Work/hive-related/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] +SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/3.2.1_1/libexec/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] +SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. +SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] +Beeline version 2.3.9 by Apache Hive +beeline> !connect jdbc:hive2://localhost:10000/default;auth=noSasl +Connecting to jdbc:hive2://localhost:10000/default;auth=noSasl +Enter username for jdbc:hive2://localhost:10000/default: +Enter password for jdbc:hive2://localhost:10000/default: +Connected to: Apache Flink (version 1.16) +Driver: Hive JDBC (version 2.3.9) +Transaction isolation: TRANSACTION_REPEATABLE_READ +0: jdbc:hive2://localhost:10000/default> CREATE TABLE Source ( +. . . . . . . . . . . . . . . . . . . .> a INT, +. . . . . . . . . . . . . . . . . . . .> b STRING +. . . . . . . . . . . . . . . . . . . .> ); ++---------+ +| result | ++---------+ +| OK | ++---------+ +0: jdbc:hive2://localhost:10000/default> CREATE TABLE Sink ( +. . . . . . . . . . . . . . . . . . . .> a INT, +. . . . . . . . . . . . . . . . . . . .> b STRING +. . . . . . . . . . . . . . . . . . . .> ); ++---------+ +| result | ++---------+ +| OK | ++---------+ +0: jdbc:hive2://localhost:10000/default> INSERT INTO Sink SELECT * FROM Source; ++-----------------------------------+ +| job id | ++-----------------------------------+ +| 55ff290b57829998ea6e9acc240a0676 | ++-----------------------------------+ +1 row selected (2.427 seconds) +``` + +Endpoint Options +---------------- + +Below are the options supported when creating a HiveServer2 endpoint instance with YAML file or DDL. + +<table class="configuration table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 20%">Key</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-left" style="width: 7%">Default</th> + <th class="text-left" style="width: 10%">Type</th> + <th class="text-left" style="width: 55%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td><h5>sql-gateway.endpoint.type</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">"rest"</td> + <td>List<String></td> + <td>Specify which endpoint to use, here should be 'hiveserver2'.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>URI to your Hive conf dir containing hive-site.xml. The URI needs to be supported by Hadoop FileSystem. If the URI is relative, i.e. without a scheme, local file system is assumed. If the option is not specified, hive-site.xml is searched in class path.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.default-database</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"default"</td> + <td>String</td> + <td>The default database to use when the catalog is set as the current catalog.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.name</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"hive"</td> + <td>String</td> + <td>Name for the pre-registered hive catalog.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.module.name</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"hive"</td> + <td>String</td> + <td>Name for the pre-registered hive module.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.exponential.backoff.slot.length</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">100 ms</td> + <td>Duration</td> + <td>Binary exponential backoff slot time for Thrift clients during login to HiveServer2,for retries until hitting Thrift client timeout</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.host</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>The server address of HiveServer2 host to be used for communication.Default is empty, which means the to bind to the localhost. This is only necessary if the host has multiple network addresses.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.login.timeout</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">20 s</td> + <td>Duration</td> + <td>Timeout for Thrift clients during login to HiveServer2</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.max.message.size</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">104857600</td> + <td>Long</td> + <td>Maximum message size in bytes a HS2 server will accept.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.port</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">10000</td> + <td>Integer</td> + <td>The port of the HiveServer2 endpoint.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.keepalive-time</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">1 min</td> + <td>Duration</td> + <td>Keepalive time for an idle worker thread. When the number of workers exceeds min workers, excessive threads are killed after this time interval.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.threads.max</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">512</td> + <td>Integer</td> + <td>The maximum number of Thrift worker threads</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.threads.min</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">5</td> + <td>Integer</td> + <td>The minimum number of Thrift worker threads</td> + </tr> + </tbody> +</table> + +Features +---------------- + +### Built-in HiveCatalog and Hive Dialect + +The SQL Gateway with HiveServer2 endpoint aims to provide the same experience compared to the HiveServer2. When users connect to the HiveServer2, +the HiveServer2 endpoint creates the Hive Catalog as the default catalog, switches to the Hive dialect, and executes the SQL in batch mode for the session. +Users can submit the Hive SQL in Hive style but execute it in the Flink environment. Review Comment: We can remove the "Features" section and make this as an H2 title. Please mention all the changed default behavior of configuration in this section, e.g., dialect, batch mode, dml-sync, tableau result. ########## docs/content/docs/dev/table/hiveCompatibility/hiveserver2.md: ########## @@ -0,0 +1,306 @@ +--- +title: HiveServer2 Endpoint +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveserver2.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# HiveServer2 Endpoint + +HiveServer2 Endpoint is compatible with HiveServer2 API and allows users to submit SQL in Hive style. + +Setting Up +---------------- +Before the trip of the SQL Gateway with the HiveServer2 Endpoint, please prepare the required [dependencies]({{< ref "docs/connectors/table/hive/overview#dependencies" >}}). + +### Configure HiveServer2 Endpoint + +The HiveServer2 endpoint is not the default endpoint for the SQL Gateway. You can configure to use the HiveServer2 endpoint by calling +```bash +$ ./bin/sql-gateway.sh start -Dsql-gateway.endpoint.type=hiveserver2 -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=<path to hive conf> +``` + +or modify the valid [Flink configuration]({{< ref "docs/dev/table/config" >}}) entry. + +### Connecting to HiveServer2 + +After starting the SQL Gateway, you are able to submit SQL with Apache Hive Beeline. + +```bash +$ ./beeline +SLF4J: Class path contains multiple SLF4J bindings. +SLF4J: Found binding in [jar:file:/Users/ohmeatball/Work/hive-related/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] +SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/3.2.1_1/libexec/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] +SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. +SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] +Beeline version 2.3.9 by Apache Hive +beeline> !connect jdbc:hive2://localhost:10000/default;auth=noSasl +Connecting to jdbc:hive2://localhost:10000/default;auth=noSasl +Enter username for jdbc:hive2://localhost:10000/default: +Enter password for jdbc:hive2://localhost:10000/default: +Connected to: Apache Flink (version 1.16) +Driver: Hive JDBC (version 2.3.9) +Transaction isolation: TRANSACTION_REPEATABLE_READ +0: jdbc:hive2://localhost:10000/default> CREATE TABLE Source ( +. . . . . . . . . . . . . . . . . . . .> a INT, +. . . . . . . . . . . . . . . . . . . .> b STRING +. . . . . . . . . . . . . . . . . . . .> ); ++---------+ +| result | ++---------+ +| OK | ++---------+ +0: jdbc:hive2://localhost:10000/default> CREATE TABLE Sink ( +. . . . . . . . . . . . . . . . . . . .> a INT, +. . . . . . . . . . . . . . . . . . . .> b STRING +. . . . . . . . . . . . . . . . . . . .> ); ++---------+ +| result | ++---------+ +| OK | ++---------+ +0: jdbc:hive2://localhost:10000/default> INSERT INTO Sink SELECT * FROM Source; ++-----------------------------------+ +| job id | ++-----------------------------------+ +| 55ff290b57829998ea6e9acc240a0676 | ++-----------------------------------+ +1 row selected (2.427 seconds) +``` + +Endpoint Options +---------------- + +Below are the options supported when creating a HiveServer2 endpoint instance with YAML file or DDL. + +<table class="configuration table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 20%">Key</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-left" style="width: 7%">Default</th> + <th class="text-left" style="width: 10%">Type</th> + <th class="text-left" style="width: 55%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td><h5>sql-gateway.endpoint.type</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">"rest"</td> + <td>List<String></td> + <td>Specify which endpoint to use, here should be 'hiveserver2'.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>URI to your Hive conf dir containing hive-site.xml. The URI needs to be supported by Hadoop FileSystem. If the URI is relative, i.e. without a scheme, local file system is assumed. If the option is not specified, hive-site.xml is searched in class path.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.default-database</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"default"</td> + <td>String</td> + <td>The default database to use when the catalog is set as the current catalog.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.catalog.name</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"hive"</td> + <td>String</td> + <td>Name for the pre-registered hive catalog.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.module.name</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">"hive"</td> + <td>String</td> + <td>Name for the pre-registered hive module.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.exponential.backoff.slot.length</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">100 ms</td> + <td>Duration</td> + <td>Binary exponential backoff slot time for Thrift clients during login to HiveServer2,for retries until hitting Thrift client timeout</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.host</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>The server address of HiveServer2 host to be used for communication.Default is empty, which means the to bind to the localhost. This is only necessary if the host has multiple network addresses.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.login.timeout</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">20 s</td> + <td>Duration</td> + <td>Timeout for Thrift clients during login to HiveServer2</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.max.message.size</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">104857600</td> + <td>Long</td> + <td>Maximum message size in bytes a HS2 server will accept.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.port</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">10000</td> + <td>Integer</td> + <td>The port of the HiveServer2 endpoint.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.keepalive-time</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">1 min</td> + <td>Duration</td> + <td>Keepalive time for an idle worker thread. When the number of workers exceeds min workers, excessive threads are killed after this time interval.</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.threads.max</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">512</td> + <td>Integer</td> + <td>The maximum number of Thrift worker threads</td> + </tr> + <tr> + <td><h5>sql-gateway.endpoint.hiveserver2.thrift.worker.threads.min</h5></td> + <td>optional</td> + <td style="word-wrap: break-word;">5</td> + <td>Integer</td> + <td>The minimum number of Thrift worker threads</td> + </tr> + </tbody> +</table> + +Features +---------------- + +### Built-in HiveCatalog and Hive Dialect + +The SQL Gateway with HiveServer2 endpoint aims to provide the same experience compared to the HiveServer2. When users connect to the HiveServer2, +the HiveServer2 endpoint creates the Hive Catalog as the default catalog, switches to the Hive dialect, and executes the SQL in batch mode for the session. +Users can submit the Hive SQL in Hive style but execute it in the Flink environment. + +### Integrate into the Hive Ecosystem + +The HiveServer2 endpoint extends the HiveServer2 API. Therefore, the tools that manage the Hive SQL also work for Review Comment: ```suggestion The HiveServer2 Endpoint is compatible with the HiveServer2 wire protocol. Therefore, the tools that manage the Hive SQL also work for ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org