[GitHub] [flink] morsapaes commented on a change in pull request #14003: [FLINK-19527][Doc]Flink SQL Getting Started

GitBox Mon, 30 Nov 2020 06:35:17 -0800


morsapaes commented on a change in pull request #14003:
URL: https://github.com/apache/flink/pull/14003#discussion_r532193631




##########
File path: docs/dev/table/sql/gettingStarted.md
##########
@@ -0,0 +1,226 @@
+---
+title: "Getting Started - Flink SQL"
+nav-parent_id: sql
+nav-pos: 0
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Flink SQL enables SQL developers to design and develop the batch or streaming 
application without writing the Java, Scala, or any other programming language 
code. It provides a unified API for both stream and batch processing. As a 
user, you can perform powerful transformations. Flink’s SQL support is based on 
[Apache Calcite](https://calcite.apache.org/) which implements the SQL standard.
+
+In addition to the SQL API, Flink also has a Table API with similar semantics 
as SQL. The Table API is a language-integrated API, where users develop in a 
specific programming language to write the queries or call the API. For 
example, jobs create a table environment, read a table, and apply different 
transformations and aggregations, and write the results back to another table. 
It supports different languages e.g. Java, Scala, Python. 
+ 
+Flink SQL and Table API are just two different ways to write queries that use 
the same Flink runtime. All the queries are optimized for efficient execution. 
SQL API is a more descriptive way of writing queries using well-known SQL 
standards e.g. `select * from Table`. On the other hand, Table API queries 
start with from clause, followed by joins and where clause, and then finally 
projection or select at the last e.g. `Table.filter(...).select(...)`. Standard 
SQL is easy and quick to learn even for users with no programming background. 
This article will focus on Flink SQL API but Table API details can be found 
[here]({{ site.baseurl }}/dev/table/).
+
+### Pre-requisites
+You only need to have basic knowledge of SQL to follow along. You will not 
need to write Java or Scala code or use an IDE.
+
+### Installation
+There are various ways to [install]({{ site.baseurl }}/ops/deployment/) Flink. 
Probably the easiest one is to download the binaries and run them locally for 
experimentation. We assume [local installation]({{ site.baseurl 
}}/try-flink/local_installation.html) for the rest of the tutorial. You can 
start a local cluster using the following command from the installation folder
+ 
+{% highlight bash %}
+./bin/start-cluster.sh
+{% endhighlight %}
+ 
+Once the cluster is started, it will also start a web server on 
[localhost:8081](localhost:8081) to manage settings and monitor the different 
jobs.
+
+### SQL Client
+The SQL Client is an interactive client to submit SQL queries to Flink and 
visualize the results. It’s like a query editor for any other database 
management system where you can write queries using standard SQL. You can start 
the SQL client from the installation folder as follows
+
+ {% highlight bash %}
+./bin/sql-client.sh embedded
+ {% endhighlight %} 
+
+### Hello World query
+ 
+Once the SQL client, our query editor, is up and running it's time to start 
writing SQL queries. These queries will be submitted to Flink cluster for 
computation and results will be returned to the SQL client UI. Let's start with 
printing 'Hello World'. You can print hello world using the following simple 
query
+ 
+{% highlight sql %}
+SELECT 'Hello World';
+{% endhighlight %}
+
+`Help;` command is used to see different supported DDL (Data definition 
language) commands. Furthermore, Flink SQL does support different built-in 
functions as well. The following query will show all the built-in and 
user-defined functions. 
+{% highlight sql %}
+SHOW FUNCTIONS;
+{% endhighlight %}
+
+Flink SQL provides users with a set of [built-in functions]({{ site.baseurl 
}}/dev/table/functions/systemFunctions.html) for data transformations. The 
following example will print the current timestamp using the 
`CURRENT_TIMESTAMP` function.
+
+{% highlight sql %}
+SELECT CURRENT_TIMESTAMP;
+{% endhighlight %}
+
+---------------
+
+{% top %}
+
+## Setting up tables
+Real-world database queries are run against the SQL tables. Although Flink is 
a stream processing engine, users can define a table on top of the streaming 
data. Generally, Flink data processing pipelines have three components - 
source, compute, sink. 
+
+The source is input or from where data is read e.g. a text file, Kafka topic. 
Then we define some computations that need to be performed on input data. 
Finally, the sink defines what to do with the output or where to store the 
results. A sink can be a console log, another output file, or a Kafka topic. 
It's similar to a database query that reads data from a table, performs a query 
on it, and then displays the results. 
+
+In Flink SQL semantics, source and sink will be tables, but Flink isn’t a 
storage engine hence it cannot store the data. So Flink tables need to backed 
up with a [storage connector]({{ site.baseurl }}/dev/table/connect.html) like 
[file system]({{ site.baseurl }}/dev/table/connect.html#file-system-connector), 
[Kafka]({{ site.baseurl }}/dev/table/connect.html#kafka-connector) or 
[MySQL]({{ site.baseurl }}/dev/table/connect.html#jdbc-connector). While 
creating these tables, storage connector type, [format]({{ site.baseurl 
}}/dev/table/connect.html#table-formats) and schema for each table needs to be 
defined. 
+
+ 
+### Input Source Tables
+SQL API environment is configured via [YAML](https://yaml.org) configuration 
files. When the SQL client starts, it reads the default configuration from the 
`/conf/sql-client-defaults.yaml` but it can be overriden by user defined 
configuration file. These files are used to define different environment 
variables including table source, sinks, [catalogs]({{ site.baseurl 
}}/dev/table/catalogs.html), [user-defined functions]({{ site.baseurl 
}}/dev/table/functions/udfs.html).
+
+Tables can be defined through the SQL client or using environment config file. 
The SQL client support [SQL DDL commands]({{ site.baseurl }}/dev/table/sql) 
similar to traditional SQL. Standard SQL DDL is used to [create]({{ 
site.baseurl }}/dev/table/sql/create.html), [alter]({{ site.baseurl 
}}/dev/table/sql/alter.html), [drop]({{ site.baseurl 
}}/dev/table/sql/drop.html) tables. 
+
+Flink has a support for few [formats]({{ site.baseurl 
}}/dev/table/connectors/formats/) that can be used with tables. Following is an 
example to define a source table on [csv file]({{ site.baseurl 
}}/dev/table/connectors/formats/csv.html) using DDL with `EmpId, EmpName, 
DeptId` as header.

Review comment:
       ```suggestion
   Flink has support for a few [formats]({% link 
dev/table/connectors/formats.md %}) that can be used with tables. Following is 
an example to define a source table on [csv file]({% link 
dev/table/connectors/formats/csv.md %}) using DDL with `EmpId`,` EmpName`, 
`DeptId` as header.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] morsapaes commented on a change in pull request #14003: [FLINK-19527][Doc]Flink SQL Getting Started

Reply via email to