Re: Using flink-connector-kafka-1.9.1 with flink-core-1.7.2

2020-05-06 Thread Arvid Heise
Hi Nick, all Flink dependencies are only compatible with the same major version. You can workaround it by checking out the code [1] and manually set the dependency of the respective module to your flink-core version and revert all changes that are not compiling. But there is no guarantee that thi

Re: Shared Checkpoint Cleanup and S3 Lifecycle Policy

2020-05-06 Thread Trystan
Thanks Congxian! To make sure I'm understanding correctly, if I retain 3 incremental checkpoints (say every minute), and I've just completed checkpoint 10, then anything in shared is from checkpoint 8 and 9 only. So anything older than ~3 minutes can safely be deleted? The state from checkpoint 5 d

Re: Shared Checkpoint Cleanup and S3 Lifecycle Policy

2020-05-06 Thread Congxian Qiu
Hi For the rate limit, could you please try entropy injection[1]. For checkpoint, Flink will handle the file lifecycle(it will delete the file if it will never be used in the future). The file in the checkpoint will be there if the corresponding checkpoint is still valid. [1] https://ci.apache.org

Re: Flink pipeline;

2020-05-06 Thread Leonard Xu
Hi Aissa Looks like your requirements is to enrich a real stream data(from kafka) with dimension data(your case will like: {sensor_id, equipment_id, workshop_id, factory_id} ), you can achieve your purpose by Flink DataStream API or just use FLINK SQL. I think use pure SQL will be esaier if you

Using flink-connector-kafka-1.9.1 with flink-core-1.7.2

2020-05-06 Thread Nick Bendtner
Hi guys, I am using flink 1.7.2 version. I have to deserialize data from kafka into consumer records therefore I decided to update the flink-connector-kafka to 1.9.1 which provides support for consumer record. We use child first class loading. However it seems like I have compatibility issue as I g

Re: Window processing in Stateful Functions

2020-05-06 Thread Oytun Tez
Oops – will follow the thread 🙊 -- [image: MotaWord] Oytun Tez M O T A W O R D | CTO & Co-Founder oy...@motaword.com On Wed, May 6, 2020 at 5:37 PM m@xi wrote: > Hello Tez, > > With all the respect, I doubt your answer is related the question. > > *Jus

Statefun 2.0 questions

2020-05-06 Thread Wouter Zorgdrager
Hi all, I've been using Flink for quite some time now and for a university project I'm planning to experiment with statefun. During the walkthrough I've run into some issues, I hope you can help me with. 1) Is it correct that the Docker image of statefun is not yet published? I couldn't find it a

Re: Window processing in Stateful Functions

2020-05-06 Thread m@xi
Hello Tez, With all the respect, I doubt your answer is related the question. *Just to re-phase a bit*: Assuming we use SF for our application, how can we apply window logic when a function does some processing? *Is there a proper way?* @*Igal*: we would very much appreciate your answer. :) Bes

Re: Rich Function Thread Safety

2020-05-06 Thread Joey Echeverria
I’ve seen a few mailing list posts (including this one) that say Flink guarantees there is no concurrent access to operator methods (e.g. flatMap, snapshotState, etc.) and thus synchronization isn’t needed when writing operators that support checkpointing. I was trying to find a place in the of

Re: Window processing in Stateful Functions

2020-05-06 Thread Oytun Tez
I think this is also related to my question about CEP in Statefun. @Annemarie Burger , I was looking into using Siddhi library within the Function's context. -- [image: MotaWord] Oytun Tez M O T A W O R D | CTO & Co-Founder oy...@motaword.com On Wed,

Window processing in Stateful Functions

2020-05-06 Thread Annemarie Burger
Hi, I want to do windowed processing in each function when using Stateful Functions. Is this possible? Some pseudo code would be very helpful! Some more context: I'm having a stream of edges as input. I want to window these and save the graph representation (either as edge list, adjacency list, o

Shared Checkpoint Cleanup and S3 Lifecycle Policy

2020-05-06 Thread Trystan
Hello! Recently we ran into an issue when checkpointing to S3. Because S3 ratelimits based on prefix, the /shared directory would get slammed and cause S3 throttling. There is no solution for this, because /job/checkpoint/:id/shared is all part of the prefix, and is limited to 3,500 PUT/COPY/POST/

Assertion failed: (last_ref), function ~ColumnFamilySet, file db/column_family.cc, line 1238

2020-05-06 Thread Luis Fernando Amaral
Hello, I'm looking for a way to modify state inside an operator in Flink. I’m following State Processor API guide - https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/libs/state_processor_api.html#modifying-savepoints

Re: Casting from org.apache.flink.api.java.tuple.Tuple2 to scala.Product; using java and scala in flink job

2020-05-06 Thread Aljoscha Krettek
No, I think that should be all right. On 06.05.20 16:57, Vishwas Siravara wrote: Thanks I figured that would be the case. I m using the flink tuple type in the map functions ,so there is no casting required now. Can you think of any downsides of using flink tuples in scala code, especially since

Re: Export user metrics with Flink Prometheus endpoint

2020-05-06 Thread Eleanore Jin
Hi Aljoscha, Thanks for the response, yes prometheus reporter is already enabled and I am able to get the flink metrics from prometheus. My question is more like, currently I am using opencensus library to collect application metrics, so

Re: Casting from org.apache.flink.api.java.tuple.Tuple2 to scala.Product; using java and scala in flink job

2020-05-06 Thread Aljoscha Krettek
Hi, Flink will not do any casting between types. You either need to output to correct (Scala) Tuple type from the deserialization schema or insert a step (say a map function) that converts between the two types. The Tuple2 type and the Scala tuple type, i.e. (foo, bar) have nothing in common

Re: MongoDB as a Sink;

2020-05-06 Thread Aljoscha Krettek
Hi, yes, that is correct. You need to implement a SinkFunction. For getting started you can take a look at the Elasticsearch connector because Elasticsearch and MongoDB are roughly similar in terms of how you work with them, i.e. they are both key-value stores. Best, Aljoscha On 06.05.20 02

Re: Export user metrics with Flink Prometheus endpoint

2020-05-06 Thread Aljoscha Krettek
Hi, that should be possible. Did you have a look at the documentation for setting up a Prometheus metrics reporter: https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter Best, Aljoscha On 06.05.20 03:02, El

Re: Autoscaling vs backpressure

2020-05-06 Thread Aljoscha Krettek
I'd say the two can't be considered equivalent because the back pressure does not "reach" back into the source system. It only goes as far back as the Flink source. So if the outside system produces data to fast into the queue from which Flink is reading this input would keep piling up. Best,

Re: Flink pipeline;

2020-05-06 Thread hemant singh
You will have to enrich the data coming in for eg- { "equipment-id" : "1-234", "sensor-id" : "1-vcy", . } . Since you will most likely have a keyedstream based on equipment-id+sensor-id or equipment-id, you can have a control stream with data about equipment to workshop/factory mapping somethin

Re: checkpointing opening too many file

2020-05-06 Thread Congxian Qiu
Hi Yes, for your use case, if you do not have large state size, you can try to use FsStateBackend. Best, Congxian ysnakie 于2020年4月27日周一 下午3:42写道: > Hi > If I use FsStateBackend instead of RocksdbFsStateBackend, will the open > files decrease significantly? I dont have large state size. > > tha

Re: Flink: For terabytes of keyed state.

2020-05-06 Thread Congxian Qiu
Hi Gowri Please let us know if you meet any problem~ Best, Congxian Gowri Sundaram 于2020年5月6日周三 下午1:53写道: > Hi Congxian, > Thank you so much for your response! We will go ahead and do a POC to test > out how Flink performs at scale. > > Regards, > - Gowri > > On Wed, May 6, 2020 at 8:34 AM Co

Re: What is the RocksDB local directory in flink checkpointing?

2020-05-06 Thread Till Rohrmann
Hi LakeShen, `state.backend.rocksdb.localdir` defines the directory in which RocksDB will store its local files. Local files are RocksDB's SST and metadata files for example. This directory does not need to be persisted. If the config option is not configured, then it will use the nodes temporary

Re: upgrade flink from 1.9.1 to 1.10.0 on EMR

2020-05-06 Thread Yang Wang
Hi aj, >From the logs you have provided, the hadoop version is still 2.4.1. Could you check the user jar(i.e. events-processor-1.0-SNAPSHOT.jar) have some hadoop classes? If it is, you need to exclude the hadoop dependency. Best, Yang aj 于2020年5月6日周三 下午3:38写道: > Hello, > > Please help me upgr

Re: MongoDB sink;

2020-05-06 Thread Jingsong Li
Hi, My impression is that MongoDB's API is not complicated. So you can implement a MongoDB sink. Something like: @Override public void invoke(Row value, Context context) throws Exception { Map map = new HashMap<>(); for (int i = 0; i < fieldNames.length; i++) { map.put(fieldNames

Re: upgrade flink from 1.9.1 to 1.10.0 on EMR

2020-05-06 Thread aj
Hello, Please help me upgrade to 1.10 in AWS EMR. On Fri, May 1, 2020 at 4:05 PM aj wrote: > Hi Yang, > > I am attaching the logs for your reference, please help me what i am doing > wrong. > > Thanks, > Anuj > > On Wed, Apr 29, 2020 at 9:06 AM Yang Wang wrote: > >> Hi Anuj, >> >> I think the