> 1. how to specify the number of TaskManager? > In batch mode, I tried to use (Max Parallelism / (cores per tm)), but it > does not work. Number of TaskManager is muchlarger than (Max Parallelism / > cores per tm).
It not the cores per tm, but the number of slots per tm. Please refer to taskmanager.numberOfTaskSlots [1]. [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#taskmanager-numberoftaskslots Best, Yangze Guo Best, Yangze Guo On Thu, Jul 1, 2021 at 3:57 PM vtygoss <vtyg...@126.com> wrote: > > Hi, > > > i have some questions > > > 1. how to specify the number of TaskManager? > In batch mode, I tried to use (Max Parallelism / (cores per tm)), but it > does not work. Number of TaskManager is much larger than (Max Parallelism / > cores per tm). > > 2. in my scenario, there has alot of cumulative data and streaming > incremental data. is there a way to compute the result with cumulative data > and save the state, then continue to compute incremental data using the > computed state? > > 3. in flink 3tb tpc-ds benchmark, i find a stange problem that ORC / Parquet > FileFormat has a significant impact on performance. do i make something > wrong? > > tpcds query1, table: store_returns, num records: 833,763,236, bytes: > 80GB+. Flink task parallelism=500 > > - using ORC+SNAPPY, token 10 seconds to read. picture below > > - using PARQUET+SNAPPY, token 5min 32 seconds to read. picture below > > > > > there are no special configuration about parquet in > $FLINK_HOME/conf/hive-site.xml. and hive-site.xml is in attachment. > > > > ``` > > [hive-site.xml] > > parquet.memory.pool.ratio=0.5 > > hive.parquet.timestamp.skip.conversion=true > > ``` > > > pleasure to get some suggestions from you, thank you very much! > > Best Regards!