Re: GCS/Object Storage Rate Limiting

2021-12-08 Thread Seth Wiesman
Not sure if you've seen this, but Flinks file systems do support connection limiting. https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/common/#connection-limiting Seth On Wed, Dec 8, 2021 at 12:18 PM Kevin Lam wrote: > Hey David, > > Thanks for the response. The

Re: GCS/Object Storage Rate Limiting

2021-12-08 Thread Kevin Lam
Hey David, Thanks for the response. The retry eventually succeeds, but I was wondering if there was anything that people in the community have done to avoid GCS/S3 rate-limiting issues. The retries do result in it taking longer for all the task managers to recover and register. On Mon, Dec 6, 202

Re: GCS/Object Storage Rate Limiting

2021-12-06 Thread David Morávek
Hi Kevin, Flink comes with two schedulers for streaming: - Default - Adaptive (opt-in) Adaptive is still in experimental phase and doesn't support local recover. You're most likely using the first one, so you should be OK. Can you elaborate on this a bit? We aren't changing the parallelism when

Re: GCS/Object Storage Rate Limiting

2021-12-02 Thread Kevin Lam
HI David, Thanks for your response. What's the DefaultScheduler you're referring to? Is that available in Flink 1.13.1 (the version we are using)? How large is the state you're restoring from / how many TMs does the job > consume / what is the parallelism? Our checkpoint is about 900GB, and we

Re: GCS/Object Storage Rate Limiting

2021-12-02 Thread David Morávek
Hi Kevin, this happens only when the pipeline is started up from savepoint / retained checkpoint right? Guessing from the "path" you've shared it seems like a RockDB based retained checkpoint. In this case all task managers need to pull state files from the object storage in order to restore. This

GCS/Object Storage Rate Limiting

2021-12-02 Thread Kevin Lam
Hi all, We're running a large (256 task managers with 4 task slots each) Flink Cluster with High Availability enabled, on Kubernetes, and use Google Cloud Storage (GCS) as our object storage for the HA metadata. In addition, our Flink application writes out to GCS from one of its sinks via streami