[ 
https://issues.apache.org/jira/browse/FLINK-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

swy updated FLINK-9597:
-----------------------
    Description: 
Hi, we found that our Flink application with simple logic, which using process 
function is not scale-able when scale from 8 parallelism onward even though 
with sufficient resources. Below it the result which is capped at ~250k TPS. No 
matter how we tune the parallelism of the operators it just not scale, same to 
increase source parallelism.

Please refer to "scaleNotWork.png",
1. fixed source parallelism 4, other operators parallelism 8
2. fixed source parallelism 4, other operators parallelism 16
3. fixed source parallelism 4, other operators parallelism 32
4. fixed source parallelism 6, other operators parallelism 8
5. fixed source parallelism 6, other operators parallelism 16
6. fixed source parallelism 6, other operators parallelism 32
7. fixed source parallelism 6, other operators parallelism 64 performance worse 
than parallelism 32.

Sample source code attached(flink_app_parser_git.zip). It is a simple program, 
parsing json record into object, and pass it to a empty logic Flink's process 
function. Rocksdb is in used, and the source is generated by the program 
itself. This could be reproduce easily. 

We choose Flink because of it scalability, but this is not the case now, 
appreciated if anyone could help as this is impacting our projects! thank you.

To run the program, sample parameters,

"aggrinterval=6000000 loop=7500000 statsd=1 psrc=4 pJ2R=32 pAggr=72 
URL=do36.comptel.com:8127"

* aggrinterval: time in ms for timer to trigger
* loop: how many row of data to feed
* statsd: to send result to statsd
* psrc: source parallelism
* pJ2R: parallelism of map operator(JsonRecTranslator)
* pAggr: parallelism of process+timer operator(AggregationDuration) 

We are running in VMWare, 5 Task Managers and each has 32 slots.

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             32
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
Stepping:              2
CPU MHz:               2593.993
BogoMIPS:              5187.98
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm 
constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc 
aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt 
aes xsave avx f16c rdrand hypervisor lahf_lm epb fsgsbase smep dtherm ida arat 
pln pts

              total        used        free      shared  buff/cache   available
Mem:             98          24          72           0           1          72
Swap:             3           0           3


Please refer TM.png and JM.png for further details.


  was:
Hi, we found that our Flink application with simple logic, which using process 
function is not scale-able when scale from 8 parallelism onward even though 
with sufficient resources. Below it the result which is capped at ~250k TPS. No 
matter how we tune the parallelism of the operators it just not scale, same to 
increase source parallelism.

Please refer to "scaleNotWork.png",
1. fixed source parallelism 4, other operators parallelism 8
2. fixed source parallelism 4, other operators parallelism 16
3. fixed source parallelism 4, other operators parallelism 32
4. fixed source parallelism 6, other operators parallelism 8
5. fixed source parallelism 6, other operators parallelism 16
6. fixed source parallelism 6, other operators parallelism 32
7. fixed source parallelism 6, other operators parallelism 64 performance worse 
than parallelism 32.

Sample source code attached(flink_app_parser_git.zip). It is a simple program, 
parsing json record into object, and pass it to a empty logic Flink's process 
function. Rocksdb is in used, and the source is generated by the program 
itself. This could be reproduce easily. 

We choose Flink because of it scalability, but this is not the case now, 
appreciated if anyone could help as this is impacting our projects! thank you.

To run the program, sample parameters,

"aggrinterval=6000000 loop=7500000 statsd=1 psrc=4 pJ2R=32 pAggr=72 
URL=do36.comptel.com:8127"

* aggrinterval: time in ms for timer to trigger
* loop: how many row of data to feed
* statsd: to send result to statsd
* psrc: source parallelism
* pJ2R: parallelism of map operator(JsonRecTranslator)
* pAggr: parallelism of process+timer operator(AggregationDuration) 

We are running in VMWare, 5 Task Managers and each has 16 slots.

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             32
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
Stepping:              2
CPU MHz:               2593.993
BogoMIPS:              5187.98
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm 
constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc 
aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt 
aes xsave avx f16c rdrand hypervisor lahf_lm epb fsgsbase smep dtherm ida arat 
pln pts

              total        used        free      shared  buff/cache   available
Mem:             98          24          72           0           1          72
Swap:             3           0           3


Please refer TM.png and JM.png for further details.



> Flink fail to scale!
> --------------------
>
>                 Key: FLINK-9597
>                 URL: https://issues.apache.org/jira/browse/FLINK-9597
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.5.0
>            Reporter: swy
>            Priority: Major
>         Attachments: JM.png, TM.png, flink_app_parser_git.zip, 
> scaleNotWork.png
>
>
> Hi, we found that our Flink application with simple logic, which using 
> process function is not scale-able when scale from 8 parallelism onward even 
> though with sufficient resources. Below it the result which is capped at 
> ~250k TPS. No matter how we tune the parallelism of the operators it just not 
> scale, same to increase source parallelism.
> Please refer to "scaleNotWork.png",
> 1. fixed source parallelism 4, other operators parallelism 8
> 2. fixed source parallelism 4, other operators parallelism 16
> 3. fixed source parallelism 4, other operators parallelism 32
> 4. fixed source parallelism 6, other operators parallelism 8
> 5. fixed source parallelism 6, other operators parallelism 16
> 6. fixed source parallelism 6, other operators parallelism 32
> 7. fixed source parallelism 6, other operators parallelism 64 performance 
> worse than parallelism 32.
> Sample source code attached(flink_app_parser_git.zip). It is a simple 
> program, parsing json record into object, and pass it to a empty logic 
> Flink's process function. Rocksdb is in used, and the source is generated by 
> the program itself. This could be reproduce easily. 
> We choose Flink because of it scalability, but this is not the case now, 
> appreciated if anyone could help as this is impacting our projects! thank you.
> To run the program, sample parameters,
> "aggrinterval=6000000 loop=7500000 statsd=1 psrc=4 pJ2R=32 pAggr=72 
> URL=do36.comptel.com:8127"
> * aggrinterval: time in ms for timer to trigger
> * loop: how many row of data to feed
> * statsd: to send result to statsd
> * psrc: source parallelism
> * pJ2R: parallelism of map operator(JsonRecTranslator)
> * pAggr: parallelism of process+timer operator(AggregationDuration) 
> We are running in VMWare, 5 Task Managers and each has 32 slots.
> lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                32
> On-line CPU(s) list:   0-31
> Thread(s) per core:    1
> Core(s) per socket:    1
> Socket(s):             32
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 63
> Model name:            Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
> Stepping:              2
> CPU MHz:               2593.993
> BogoMIPS:              5187.98
> Hypervisor vendor:     VMware
> Virtualization type:   full
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              20480K
> NUMA node0 CPU(s):     0-31
> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
> mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp 
> lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc 
> aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe 
> popcnt aes xsave avx f16c rdrand hypervisor lahf_lm epb fsgsbase smep dtherm 
> ida arat pln pts
>               total        used        free      shared  buff/cache   
> available
> Mem:             98          24          72           0           1          
> 72
> Swap:             3           0           3
> Please refer TM.png and JM.png for further details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to