Shammon created FLINK-29317:
-------------------------------

             Summary: Add WebSocket in Dispatcher to support olap query 
submission and push results in session cluster
                 Key: FLINK-29317
                 URL: https://issues.apache.org/jira/browse/FLINK-29317
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Coordination, Runtime / REST
    Affects Versions: 1.15.2, 1.14.5
            Reporter: Shammon


Currently client submit olap query to flink session cluster via http rest api, 
and pull the results through interval polling. The sink task in TaskManager 
creates socket server for each query, when the JobManager receives the pull 
request from client, it requests query results from the socket server. The 
process is as follows
Job submission path:
client -> http rest -> JobManager -> Sink Socket Server
Result acquisition path:
client <- http rest <- JobManager <- Sink Socket Server

This leads to two problems
1. There will be some performance loss when submitting jobs through http rest, 
for example, temporary files will be created for each job
2. The client pulls the result data at a certain time interval, which is a 
fixed cost. The larger interval leads to increase query latency, the smaller 
interval will increase the pressure of Dispatcher.
3. Each sink task initializes a socket server, it will increase the query 
latency, on the other hand, it wastes resources.

For the Flink OLAP scenario, we propose to add websocket protocol in session 
cluster to support submitting jobs and returning results. The client creates 
and manage a connection with websocket server, submits olap query to session 
cluster. The TaskManagers create and manage connection to websocket server too, 
and sink task sends results to the server in stream. When the JobManager 
receives the results from sink task, it pushes the result data to the client 
through the connection between them.

We implemented this feature in the internal Flink version of ByteDance. On 
average, the latency of each query can be reduced by about 100ms, it's a big 
optimization for OLAP queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to