Re: Use Flink for OLAP

Caizhi Weng Thu, 04 Nov 2021 22:16:32 -0700

Hi!

Flink is a distributed, stateful streaming data-flow engine (with
optimizations for batch or olap jobs too) and it currently is not shipped
with any storage system. It needs to be used along with external storage /
computation system like hdfs, hive, kafka, iceberg, etc. to build a data
warehouse [1] or data lake [2].

Most use cases [3] of Flink includes continuous streaming or long-running
batch analytical jobs (which are all levels in a data warehouse, including
etl jobs) so I can't say Flink is specialized to etl or olap. But as for
olap there are a few companies currently using Flink as their olap
execution engine. If you're interested, you can keep an eye on Flink
Forward Asia this year, in which two talks are about using Flink as a olap
execution engine in production (search olap in [4] for more detail).

[1]
https://www.alibabacloud.com/blog/flink-is-attempting-to-build-a-data-warehouse-simply-by-using-a-set-of-sql-statements_596346
[2]
https://www.alibabacloud.com/blog/building-an-enterprise-level-real-time-data-lake-based-on-flink-and-iceberg_597755
[3] https://flink.apache.org/usecases.html
[4] https://flink-forward.org.cn/#agenda

Ww J <junww2...@gmail.com> 于2021年11月5日周五 下午12:49写道：

> Thanks. Can Flink replace the popular OLAP databases, for example, AWS
> redshift?
> It seems to me that generally Flink is used as ETL for OLAP.
>
> On Nov 4, 2021, at 9:33 PM, Caizhi Weng <tsreape...@gmail.com> wrote:
>
> Hi!
>
> Yes you can. Note that it is recommended to run Flink in session cluster
> mode (instead of per job mode) to minimize distribution and scheduling time
> for each OLAP query.
>
> Ww J <junww2...@gmail.com> 于2021年11月5日周五 下午12:30写道：
>
>> Hi,
>>
>> Can Flink be used for OLAP queries?
>>
>> Thanks,
>>
>> Jack
>>
>
>

Re: Use Flink for OLAP

Reply via email to