Re: specify number of TM; how stream app use state of batch app; orc / parquet file format have different impact on tpcds performance benchmark.

Yangze Guo Thu, 01 Jul 2021 01:16:31 -0700

> 1.  how to specify the number of TaskManager?
>     In batch mode, I tried to use (Max Parallelism / (cores per tm)), but it 
> does not work. Number of TaskManager is muchlarger than (Max Parallelism / 
> cores per tm).


It not the cores per tm, but the number of slots per tm. Please refer
to taskmanager.numberOfTaskSlots [1].

[1] 
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#taskmanager-numberoftaskslots


Best,
Yangze Guo

Best,
Yangze Guo


On Thu, Jul 1, 2021 at 3:57 PM vtygoss <vtyg...@126.com> wrote:
>
> Hi,
>
>
> i have some questions
>
>
> 1.  how to specify the number of TaskManager?
>      In batch mode, I tried to use (Max Parallelism / (cores per tm)), but it 
> does not work. Number of TaskManager is much larger than (Max Parallelism / 
> cores per tm).
>
> 2.  in my scenario, there has alot of cumulative data and streaming 
> incremental data. is there a way to compute the result with cumulative data 
> and save the state, then continue to compute incremental data using the 
> computed state?
>
> 3.  in flink 3tb tpc-ds benchmark, i find a stange problem that ORC / Parquet 
> FileFormat has a significant impact on performance.  do i make something 
> wrong?
>
>      tpcds query1, table: store_returns, num records: 833,763,236, bytes: 
> 80GB+.  Flink task parallelism=500
>
>     - using ORC+SNAPPY,  token 10 seconds to read.   picture below
>
>     - using PARQUET+SNAPPY, token 5min 32 seconds to read.  picture below
>
>
>
>
> there are no special configuration about parquet in 
> $FLINK_HOME/conf/hive-site.xml.  and hive-site.xml is in attachment.
>
>
>
> ```
>
> [hive-site.xml]
>
>    parquet.memory.pool.ratio=0.5
>
>    hive.parquet.timestamp.skip.conversion=true
>
> ```
>
>
> pleasure to get some suggestions from you, thank you very much!
>
> Best Regards!

Re: specify number of TM; how stream app use state of batch app; orc / parquet file format have different impact on tpcds performance benchmark.

Reply via email to