SocketException: Too many open files

2020-09-25 Thread mars
Hi, I have a simple Flink job which is reading the data from Kafka topic and generating minute aggregations and writing them to Elastic Search. I am running the Flink Job (Flink Yarn Session) on EMR Cluster and the Job runs for an hour fine and then it is getting stopped and when i checked th

Re: SocketException: Too many open files

2020-09-25 Thread Ken Krugler
Hi Mars, A few questions.. 1. What version of Flink are you using? 2. Are you using the default ES sink, or did you write your own? 3. What class of EC2 slave are you using? 4. What’s the parallelism of the ES sink? 5. To verify the actual open file limit, you need to… * scp your private ke

Re: Better way to share large data across task managers

2020-09-25 Thread Kostas Kloudas
Hi Dongwon, Yes, you are right that I assume that broadcasting occurs once. This is what I meant by "If you know the data in advance". Sorry for not being clear. If you need to periodically broadcast new versions of the data, then I cannot find a better solution than the one you propose with the s

Flink 1.10 classpath

2020-09-25 Thread Richard Moorhead
We're submitting jobs to a flink session running on YARN. Our yarn-site.xml does not have our mapreduce libs in yarn.application.classpath. Our application code has some provided dependencies from hadoop-mapreduce-client-core. Without editing our yarn-site, or adding those libraries to the flink li

Re: Apache Qpid connector.

2020-09-25 Thread Austin Cawley-Edwards
Hey (Master) Parag, I don't know anything about Apache Qpid, but from the homepage[1], it looks like the protocol is just AMQP? Are there more specifics than that? If it is just AMQP would the RabbitMQ connector[2] work for you? Best, Austin [1]: https://qpid.apache.org/ [2]: https://ci.apache.o

Flink being used in other open source projects?

2020-09-25 Thread vinuthomas2008
Hi All, Very new to Flink. Are there any open source projects using Flink? I would like to be involved in a project that uses Flink!! Thanks VT

Apache Qpid connector.

2020-09-25 Thread Master Yoda
Hello, Is there a flink source and sink from/to Apache Qpid. ? I searched around a bit but could not find one. Would I need to write one if there isn't one already ? thanks, Parag

Re: [External Sender] Debugging "Container is running beyond physical memory limits" on YARN for a long running streaming job

2020-09-25 Thread Kye Bae
Not sure about Flink 1.10.x. Can share a few things up to Flink 1.9.x: 1. If your Flink cluster runs only one job, avoid using dynamic classloader for your job: start it from one of the Flink class paths. As of Flink 1.9.x, using the dynamic classloader results in the same classes getting loaded e

Re: Flink stateful functions and Event Driven microservices

2020-09-25 Thread vinuthomas2008
Hi All, Very new to Flink. Are there any open source projects using Flink? I would like to be involved in a project that uses Flink!! Thanks VT On Fri, Sep 25, 2020 at 7:29 PM Igal Shilman wrote: > Hi Mazen, > > What are the differences between Flink stateful functions and Event driven >> mic

Re: Flink stateful functions and Event Driven microservices

2020-09-25 Thread Igal Shilman
Hi Mazen, What are the differences between Flink stateful functions and Event driven > microservices are they almost the same concept > You can think of Stateful Functions as an API and a runtime that helps building event driven microservices. It addresses some of the hardest parts of composing s

Re: Best way to resolve bottlenecks with Flink?

2020-09-25 Thread Khachatryan Roman
The closest thing is the backpressure status which you mentioned. >From there, you can troubleshoot specific subtasks by inspecting their metrics. There is no health summary in Flink at the moment. Regards, Roman On Fri, Sep 25, 2020 at 5:35 AM Dan Hill wrote: > My job has very slow throughput

Reading from HDFS and publishing to Kafka

2020-09-25 Thread Damien Hawes
Hi folks, I've got the following use case, where I need to read data from HDFS and publish the data to Kafka, such that it can be reprocessed by another job. I've searched the web and read the docs. This has turned up no and concrete examples or information of how this is achieved, or even if it'

Re: How can I drop events which are late by more than X hours/days?

2020-09-25 Thread Arvid Heise
Apparently, I haven't made that clear enough in my first mail, so thanks for clarifying that Theo. As Matthias wrote, the general solution is to use (Keyed)ProcessFunction [1]. However, if OP uses watermarks, chances are high that OP uses them for windows, so I wanted to point out the intended way

Re: Back pressure with multiple joins

2020-09-25 Thread Timo Walther
Hi Dan, could you share the plan with us using `TableEnvironment.explainSql()` for both queries? In general, views should not have an impact on the performance. They are a logical concept that gives a bunch of operations a name. The contained operations are inlined into the bigger query duri

Re: How can I drop events which are late by more than X hours/days?

2020-09-25 Thread Theo Diefenthal
Hi Arvid, be aware that allowedLateness will only be applied when your job has some windowing in use. If you have late events and you only apply mapFunctions like enrichment, as far as I know, the event's won't be filtered out automatically . Best regards Theo Von: "Arvid Heise" An: "Ma