Re: Best way to compute the difference between 2 datasets

2019-09-20 Thread Fabian Hueske
Hi Juan, Both, the local execution environment and the remote execution environment run the same code to execute the program. The implementation of the sortPartition operator was designed to scale to data sizes that exceed the memory. Internally, it serializes all records into byte arrays and sort

Re: Add Bucket File System Table Sink

2019-09-20 Thread Fabian Hueske
Hi Jun, Thank you very much for your contribution. I think a Bucketing File System Table Sink would be a great addition. Our code contribution guidelines [1] recommend to discuss the design with the community before opening a PR. First of all, this ensures that the design is aligned with Flink's

Re: Window metadata removal

2019-09-20 Thread Fabian Hueske
at least 64 bytes. > If we have 200,000,000 per day and the allowed lateness is > set to 7 days: > 200,000,000 * 64 * 7 = ~83GB > > *For the scenario above the window metadata is useless*. > Is there a possibility to *keep using window API*, *set allowed lateness* > and *not

Re: changing flink/kafka configs for stateful flink streaming applications

2019-09-20 Thread Fabian Hueske
Hi, It depends. There are many things that can be changed. A savepoint in Flink contains only the state of the application and not the configuration of the system. So an application can be migrated to another cluster that runs with a different configuration. There are some exceptions like the con

Re: Best way to compute the difference between 2 datasets

2019-09-20 Thread Fabian Hueske
Btw. there is a set difference or minus operator in the Table API [1] that might be helpful. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/tableApi.html#set-operations Am Fr., 20. Sept. 2019 um 15:30 Uhr schrieb Fabian Hueske : > Hi Juan, > > Both,

Re: Time Window Flink SQL join

2019-09-20 Thread Fabian Hueske
rty(ProducerConfig.RETRIES_CONFIG, "3"); > > ObjectMapper mapper = new ObjectMapper(); > DataStream sinkStreamMaliciousData = outStreamMalicious > .map(new MapFunction,String>() { > private static final long serialVersionUID = -6347120202L; > @Override > public St

Re: How to use thin JAR instead of fat JAR when submitting Flink job?

2019-09-24 Thread Fabian Hueske
Hi, To expand on Dian's answer. You should not add Flink's core libraries (APIs, core, runtime, etc.) to your fat JAR. However, connector dependencies (like Kafka, Cassandra, etc.) should be added. If all your jobs require the same dependencies, you can also add JAR files to the ./lib folder of y

Re: Question about reading ORC file in Flink

2019-09-24 Thread Fabian Hueske
Hi QiShu, It might be that Flink's OrcInputFormat has a bug. Can you open a Jira issue to report the problem? In order to be able to fix this, we need as much information as possible. It would be great if you could create a minimal example of an ORC file and a program that reproduces the issue. If

Re: Approach to match join streams to create unique streams.

2019-09-24 Thread Fabian Hueske
Hi, AFAIK, Flink SQL Temporal table function joins are only supported as inner equality joins. An extension to left outer joins would be great, but is not on the immediate roadmap AFAIK. If you need the inverse, I'd recommend to implement the logic in a DataStream program with a KeyedCoProcessFun

Re: How do I create a temporal function using Flink Clinet SQL?

2019-09-24 Thread Fabian Hueske
Hi, It's not possible to create a temporal table function from SQL, but you can define it in the config.yaml of the SQL client as described in the documentation [1]. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#temporal-tables Am Di., 24.

Re: Question about reading ORC file in Flink

2019-09-25 Thread Fabian Hueske
atch the column, so it can read the fields. > > Thanks for your Help! > > Qi Shu > > > 在 2019年9月24日,下午4:36,Fabian Hueske 写道: > > Hi QiShu, > > It might be that Flink's OrcInputFormat has a bug. > Can you open a Jira issue to report the problem? > In order

Re: Joins Usage Clarification

2019-09-25 Thread Fabian Hueske
Hi Nishant, To answer your questions: 1) yes, the SQL time-windowed join and the DataStream API Interval Join are the same (with different implementations though) 2) DataStream Session-window joins are not directly supported in SQL. You can play some tricks to make it work, but it wouldn't be eleg

Re: Flink job manager doesn't remove stale checkmarks

2019-09-25 Thread Fabian Hueske
Hi, You enabled incremental checkpoints. This means that parts of older checkpoints that did not change since the last checkpoint are not removed because they are still referenced by the incremental checkpoints. Flink will automatically remove them once they are not needed anymore. Are you sure t

Re: Flink- Heap Space running out

2019-09-26 Thread Fabian Hueske
Hi, I don' think that the memory configuration is the issue. The problem is the join query. The join does not have any temporal boundaries. Therefore, both tables are completely stored in memory and never released. You can configure a memory eviction strategy via idle state retention [1] but you

Re: Broadcast state

2019-10-02 Thread Fabian Hueske
Hi, State is always associated with a single task in Flink. The state of a task cannot be accessed by other tasks of the same operator or tasks of other operators. This is true for every type of state, including broadcast state. Best, Fabian Am Di., 1. Okt. 2019 um 08:22 Uhr schrieb Navneeth Kr

Re: Increasing trend for state size of keyed stream using ProcessWindowFunction with ProcessingTimeSessionWindows

2019-10-02 Thread Fabian Hueske
Hi Oliwer, I think you are right. There seems to be something going wrong. Just to clarify, you are sure that the growing state size is caused by the window operator? >From your description I assume that the state size does not depend (solely) on the number of distinct keys. Otherwise, the state

Re: Fencing token exceptions from Job Manager High Availability mode

2019-10-02 Thread Fabian Hueske
Hi Bruce, I haven't seen such an exception yet, but maybe Till (in CC) can help. Best, Fabian Am Di., 1. Okt. 2019 um 05:51 Uhr schrieb Hanson, Bruce < bruce.han...@here.com>: > Hi all, > > > > We are running some of our Flink jobs with Job Manager High Availability. > Occasionally we get a clu

Re: Increasing number of task slots in the task manager

2019-10-02 Thread Fabian Hueske
Hi Vishwas, First of all, 8 GB for 60 cores is not a lot. You might not be able to utilize all cores when running Flink. However, the memory usage depends on several things. Assuming your are using Flink for stream processing, the type of the state backend is important. If you use the FSStateBack

Re: [Problem] Unable to do join on TumblingEventTimeWindows using SQL

2019-10-25 Thread Fabian Hueske
Hi, the exception says: "Rowtime attributes must not be in the input rows of a regular join. As a workaround you can cast the time attributes of input tables to TIMESTAMP before.". The problem is that your query first joins the two tables without a temporal condition and then wants to do a window

Re: Issue with writeAsText() to S3 bucket

2019-10-25 Thread Fabian Hueske
Hi Michael, One reason might be that S3's file listing command is only eventually consistent. It might take some time until the file appears and is listed. Best, Fabian Am Mi., 23. Okt. 2019 um 22:41 Uhr schrieb Nguyen, Michael < michael.nguye...@t-mobile.com>: > Hello all, > > > > I am running

Re: Flink 1.9 measuring time taken by each operator in DataStream API

2019-10-25 Thread Fabian Hueske
Hi Komal, Measuring latency is always a challenge. The problem here is that your functions are chained, meaning that the result of a function is directly passed on to the next function and only when the last function emits the result, the first function is called with a new record. This makes meas

Re: JDBCInputFormat does not support json type

2019-10-25 Thread Fabian Hueske
Hi Fanbin, One approach would be to ingest the field as a VARCHAR / String and implement a Scalar UDF to convert it into a nested tuple. The UDF could use the code of the flink-json module. AFAIK, there is some work on the way to add built-in JSON functions. Best, Fabian Am Do., 24. Okt. 2019 u

Re: Can a Flink query outputs nested json?

2019-10-25 Thread Fabian Hueske
Hi, I did not understand what you are trying to achieve. Which field of the input table do you want to write to the output table? Flink SQL> insert into nestedSink select nested from nestedJsonStream; [INFO] Submitting SQL update statement to the cluster... [ERROR] Could not execute SQL statement

Re: Using STSAssumeRoleSessionCredentialsProvider for cross account access

2019-10-25 Thread Fabian Hueske
Hi Vinay, Maybe Gordon (in CC) has an idea about this issue. Best, Fabian Am Do., 24. Okt. 2019 um 14:50 Uhr schrieb Vinay Patil < vinay18.pa...@gmail.com>: > Hi, > > Can someone pls help here , facing issues in Prod . I see the following > ticket in unresolved state. > > https://issues.apache.

Re: Flink 1.5+ performance in a Java standalone environment

2019-10-25 Thread Fabian Hueske
Hi Jakub, I had a look at the changes of Flink 1.5 [1] and didn't find anything obvious. Something that might cause a different behavior is the new deployment and process model (FLIP-6). In Flink 1.5, there is a switch to disable it and use the previous deployment mechanism. You could try to disa

Re: Guarantee of event-time order in FlinkKafkaConsumer

2019-10-25 Thread Fabian Hueske
Hi Wojciech, I posted an answer on StackOverflow. Best, Fabian Am Do., 24. Okt. 2019 um 13:03 Uhr schrieb Wojciech Indyk < wojciechin...@gmail.com>: > Hi! > I use Flink 1.8.0 with Kafka 2.2.1. I need to guarantee of correct order > of events by event timestamp. I generate periodic watermarks ev

Re: Are Dynamic tables backed by rocksdb?

2019-10-31 Thread Fabian Hueske
Hi, Dynamic tables might not be persisted at all but only when it is necessary for the computation of a query. For example a simple "SELECT * FROM t WHERE a = 1" query on an append only table t does not require to persist t. However, there are a bunch of operations that require to store some part

Re: Issue with writeAsText() to S3 bucket

2019-11-06 Thread Fabian Hueske
tream with various filters applied to it. I usually see > around 6-7 of my datastreams successfully list the JSON file in my S3 > bucket upon cancelling my Flink job. > > > > Even in my situation, would this still be an issue with S3’s file listing > command? > > > &g

Flink Forward North America 2020 - Call for Presentations open until January 12th, 2020

2019-11-20 Thread Fabian Hueske
Hi all, Flink Forward North America returns to San Francisco on March 23-25, 2020. For the first time in North America, the conference will feature two days of talks and one day of training. We are happy to announce that the Call for Presentations is open! If you'd like to give a talk and share

Re: Row arity of from does not match serializers.

2019-12-06 Thread Fabian Hueske
Hi, The inline lambda MapFunction produces a Row with 12 String fields (12 calls to String.join()). You use RowTypeInfo rowTypeDNS to declare the return type of the lambda MapFunction. However, rowTypeDNS is defined with much more String fields. The exception tells you that the number of fields r

Re: Joining multiple temporal tables

2019-12-06 Thread Fabian Hueske
Hi Chris, Your query looks OK to me. Moreover, you should get a SQLParseException (or something similar) if it wouldn't be valid SQL. Hence, I assume you are running in a bug in one of the optimizer rules. I tried to reproduce the problem on the SQL training environment and couldn't write a query

Re: Joining multiple temporal tables

2019-12-06 Thread Fabian Hueske
you suggested, > https://issues.apache.org/jira/browse/FLINK-15112 > > Many thanks, > Chris > > > ------ Original Message -- > From: "Fabian Hueske" > To: "Chris Miller" > Cc: "user@flink.apache.org" > Sent: 06/12/2019 14:52:16 > S

Re: [ANNOUNCE] Zhu Zhu becomes a Flink committer

2019-12-13 Thread Fabian Hueske
Congrats Zhu Zhu and welcome on board! Best, Fabian Am Fr., 13. Dez. 2019 um 17:51 Uhr schrieb Till Rohrmann < trohrm...@apache.org>: > Hi everyone, > > I'm very happy to announce that Zhu Zhu accepted the offer of the Flink PMC > to become a committer of the Flink project. > > Zhu Zhu has been

[ANNOUNCE] Flink Forward SF Call for Presentation closing soon!

2020-01-06 Thread Fabian Hueske
Hi all, First of all, Happy New Year to everyone! Many of you probably didn't spent the holidays thinking a lot about Flink. Now, however, is the right time to focus again and decide which talk(s) to submit for Flink Forward San Francisco because the Call for Presentations is closing this Sunday,

[ANNOUNCE] Flink Forward San Francisco 2020 Call for Presentation extended!

2020-01-13 Thread Fabian Hueske
Hi everyone, We know some of you only came back from holidays last week. To give you more time to submit a talk, we decided to extend the Call for Presentations for Flink Forward San Francisco 2020 until Sunday January 19th. The conference takes place on March 23-25 with two days of talks and one

Re: Why would indefinitely growing state an issue for Flink while doing stream to stream joins?

2020-01-17 Thread Fabian Hueske
Hi, Large state is mainly an issue for Flink's fault tolerance mechanism which is based on periodic checkpoints, which means that the state is copied to a remote storage system in regular intervals. In case of a failure, the state copy needs to be loaded which takes more time with growing state si

Re: Filter with large key set

2020-01-17 Thread Fabian Hueske
Hi Eleanore, A dynamic filter like the one you need, is essentially a join operation. There is two ways to do this: * partitioning the key set and the message on the attribute. This would be done with a KeyedCoProcessFunction. * broadcasting the key set and just locally forwarding the messages. T

Re: How to declare the Row object schema

2020-01-17 Thread Fabian Hueske
Hi, Which version are you using? I can't find the error message in the current code base. When writing data to a JDBC database, all Flink types must be correctly matched to a JDBC type. The problem is probably that Flink cannot match the 8th field of your Row to a JDBC type. What's the type of th

Re: PostgreSQL JDBC connection drops after inserting some records

2020-01-28 Thread Fabian Hueske
Hi, The exception is thrown by Postgres. I'd start investigating there what the problem is. Maybe you need to tweak your Postgres configuration, but it might also be that the Flink connector needs to be differently configured. If the necessary config option is missing, it would be good to add. H

[ANNOUNCE] Community Discounts for Flink Forward SF 2020 Registrations

2020-01-30 Thread Fabian Hueske
Hi everyone, The registration for Flink Forward SF 2020 is open now! Flink Forward San Francisco 2020 will take place from March 23rd to 25th. The conference will start with one day of training and continue with two days of keynotes and talks. We would like to invite you to join the Apache Flink

Re: Flink solution for having shared variable between task managers

2020-02-03 Thread Fabian Hueske
Hi, I think you are looking for BroadcastState [1]. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html Am Fr., 17. Jan. 2020 um 14:50 Uhr schrieb Soheil Pourbafrani < soheil.i...@gmail.com>: > Hi, > > According to the processing logic,

Re: [ANNOUNCE] Apache Flink 1.10.0 released

2020-02-12 Thread Fabian Hueske
Congrats team and a big thank you to the release managers! Am Mi., 12. Feb. 2020 um 16:33 Uhr schrieb Timo Walther : > Congratualations everyone! Great stuff :-) > > Regards, > Timo > > > On 12.02.20 16:05, Leonard Xu wrote: > > Great news! > > Thanks everyone involved ! > > Thanks Gary and Yu fo

[ANNOUNCE] Flink Forward San Francisco 2020 Program is Live

2020-02-14 Thread Fabian Hueske
Hi everyone, We announced the program of Flink Forward San Francisco 2020. The conference takes place at the Hyatt Regency in San Francisco from March 23rd to 25th. On the first day we offer four training sessions [1]: * Apache Flink Developer Training * Apache Flink Runtime & Operations Training

Re: [ANNOUNCE] Flink Forward San Francisco 2020 Program is Live

2020-02-14 Thread Fabian Hueske
Fr., 14. Feb. 2020 um 17:48 Uhr schrieb Fabian Hueske : > Hi everyone, > > We announced the program of Flink Forward San Francisco 2020. > The conference takes place at the Hyatt Regency in San Francisco from > March 23rd to 25th. > > On the first day we offer four

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-21 Thread Fabian Hueske
Congrats Jingsong! Cheers, Fabian Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong : > Congratulations Jingsong!! > > Cheers, > Rong > > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li wrote: > > > Congrats, Jingsong! > > > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann > > wrote: > > > >> Congr

Re: Late data before window end is even close

2018-06-08 Thread Fabian Hueske
the actual timestamps of their input data. For me it was helpful to make > this change in my Flink job: for late data output, include both processing > time (DateTime.now()) along with the event time (original timestamp). > > On Mon, May 14, 2018 at 12:42 PM, Fabian Hueske wrote: >

Re: Flink multiple windows

2018-06-10 Thread Fabian Hueske
Hi Antonio, Cascading window aggregations as done in your example is a good idea and is preferable if the aggregation function is combinable, which is true for sum (count can be done as sum of 1s). Best, Fabian 2018-06-09 4:00 GMT+02:00 antonio saldivar : > Hello > > Has anyone work this way? I

Re: How to submit two Flink jobs to the same Flink cluster?

2018-06-12 Thread Fabian Hueske
Hi Angelica, The Flink cluster needs to provide a sufficient number of slots to process the tasks of all submitted jobs. Besides that there is no limit. However, if you run super many jobs, you might need to tune a few configuration parameters. Best, Fabian 2018-06-12 8:46 GMT+02:00 Sampath Bhat

Re: IoT Use Case, Problem and Thoughts

2018-06-14 Thread Fabian Hueske
ike the issue is caused by the fact > that Memory states are large as it is throwing error states are larger than > certain size. So solution of (1) will possibly solve (2) as well. > > Thanks again, > > Ashish > > > On Jun 7, 2018, at 4:25 PM, Fabian Hueske wrote: >

[ANNOUNCE] Registration for Flink Forward Berlin is open

2018-06-14 Thread Fabian Hueske
Hi everyone, *Flink Forward Berlin 2018 will take place from September 3rd to 5th.* The conference will start with one day of training and continue with two days of keynotes and talks. *The registration for Flink Forward Berlin 2018 is now open!* A limited amount of early-bird passes is available

Re: Restore state from save point with add new flink sql

2018-06-15 Thread Fabian Hueske
Hi, At the moment (Flink 1.5.0), the operator UIDs depend on the overall application and not only on the query. Hence, changing the application by adding another query might change the existing UIDs. In general, you can only expect savepoint restarts to work if you don't change the application an

Re: Flink application does not scale as expected, please help!

2018-06-18 Thread Fabian Hueske
Hi, Which Flink version are you using? Did you try to analyze the bottleneck of the application, i.e., is it CPU, disk IO, or network bound? Regarding the task scheduling. AFAIK, before version 1.5.0, Flink tried to schedule tasks on the same machine to reduce the amount of network transfer. Henc

Re: Stream Join With Early firings

2018-06-18 Thread Fabian Hueske
Hi Johannes, EventTimeSessionWindows [1] use the EventTimeTrigger [2] as default trigger (see EventTimeSessionWindows.getDefaultTrigger()). I would take the EventTimeTrigger and extend it with early firing functionality. However, there are a few things to consider * you need to be aware that sess

Re: Flink application does not scale as expected, please help!

2018-06-18 Thread Fabian Hueske
fectly 1-2-4-8-16 because all happens in same TM. When > scale to 32 the performance drop, not even in par with case of parallelism > 16. Is this something expected? Thank you. > > Regards, > Yow > > -- > *From:* Fabian Hueske > *Sent:* Mon

Re: Flink application does not scale as expected, please help!

2018-06-19 Thread Fabian Hueske
h > more TM in play. > > @Ovidiu question is interesting to know too. @Till do you mind to share > your thoughts? > > Thank you guys! > > -- > *From:* Ovidiu-Cristian MARCU > *Sent:* Monday, June 18, 2018 6:28 PM > *To:* Fabian Hueske >

Re: # of active session windows of a streaming job

2018-06-19 Thread Fabian Hueske
intain an operator state inside a trigger. > TriggerContext only allows to interact with state that is scoped to the > window and the key of the current trigger invocation (as shown in > Trigger#TriggerContext) > > Now I've come to a conclusion that it might not be possible using

Re: Stream Join With Early firings

2018-06-19 Thread Fabian Hueske
approach wasn't driven by the requirements but by operational > aspects (state size), so using a concept like idle state retention time > would be a more natural fit. > > Thanks, > > Johannes > > On Mon, Jun 18, 2018 at 9:57 AM Fabian Hueske wrote: > >> Hi

Re: DataStreamCalcRule$1802" grows beyond 64 KB when execute long sql.

2018-06-19 Thread Fabian Hueske
Hi, Which version are you using? We fixed a similar issue for Flink 1.5.0. If you can't upgrade yet, you can also implement a user-defined function that evaluates the big CASE WHEN statement. Best, Fabian 2018-06-19 16:27 GMT+02:00 zhangminglei <18717838...@163.com>: > Hi, friends. > > When I e

Re: DataStreamCalcRule$1802" grows beyond 64 KB when execute long sql.

2018-06-19 Thread Fabian Hueske
nt. Is it hard to implement ? I am a new to flink table api & sql. > > Best Minglei. > > 在 2018年6月19日,下午10:36,Fabian Hueske 写道: > > Hi, > > Which version are you using? We fixed a similar issue for Flink 1.5.0. > If you can't upgrade yet, you can also implement

Re: # of active session windows of a streaming job

2018-06-20 Thread Fabian Hueske
0, we'll > see an incorrect value from a dashboard. > This is the biggest concern of mine at this point. > > Best, > > - Dongwon > > > On Tue, Jun 19, 2018 at 7:14 PM, Fabian Hueske wrote: > >> Hi Dongwon, >> >> Do you need to n

Re: Passing records between two jobs

2018-06-20 Thread Fabian Hueske
Hi Avihai, Rafi pointed out the two common approaches to deal with this situation. Let me expand a bit on those. 1) Transactional producing in to queues: There are two approaches to accomplish exactly-once producing into queues, 1) using a system with transactional support such as Kafka or 2) mai

Re: A question about Kryo and Window State

2018-06-21 Thread Fabian Hueske
Hi Vishal, In general, Kryo serializers are not very upgrade friendly. Serializer compatibility [1] might be right approach here, but Gordon (in CC) might know more about this. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/custom_serialization.html

Re: Control insert database with dataset

2018-06-21 Thread Fabian Hueske
Hi Dulce, This functionality is not supported by the JDBCOutputFormat. Some database systems (AFAIK, MySQL) support Upsert writes, i.e., writes that insert if the primary key is not present or update the row if the PK exists. Not sure if that would meet your requirements. If you don't want to go

Re: Flink 1.5 Yarn Connection unexpectedly closed

2018-06-21 Thread Fabian Hueske
Hi Garrett, I agree, there seems to be an issue and increasing the timeout should not be the right approach to solve it. Are you running streaming or batch jobs, i.e., do some of the tasks finish much earlier than others? I'm adding Till to this thread who's very familiar with scheduling and proc

Re: Debug job execution from savepoint

2018-06-21 Thread Fabian Hueske
Hi Manuel, I had a look and couldn't find a way to do it. However, this sounds like a very useful feature to me. Would you mind creating a Jira issue [1] for that? Thanks, Fabian [1] https://issues.apache.org/jira/projects/FLINK 2018-06-18 16:23 GMT+02:00 Haddadi Manuel : > Hi all, > > > I wo

Re: # of active session windows of a streaming job

2018-06-21 Thread Fabian Hueske
re right. It is Trigger.clear(), not > Trigger.onClose(). > > Best, > - Dongwon > > > On Wed, Jun 20, 2018 at 7:30 PM, Chesnay Schepler > wrote: > >> Checkpointing of metrics is a manual process. >> The operator must write the current value into state, retrie

Re: How to use broadcast variables in data stream

2018-06-21 Thread Fabian Hueske
Hi, if the list is static and not too large, you can pass it as a parameter to the function. Function objects are serialized (using Java's default serialization) and shipped to the workers for execution. If the data is dynamic, you might want to have a look at Broadcast state [1]. Best, Fabian

Re: How to use broadcast variables in data stream

2018-06-21 Thread Fabian Hueske
state seems not supported in Flink-1.3 . > I found this in Flink-1.3: > Broadcasting > DataStream → DataStream > > Broadcasts elements to every partition. > > dataStream.broadcast(); > > But I don’t know how to convert it to list and get it in stream context . > > 在 20

Re: SQL Do Not Support Custom Trigger

2018-06-22 Thread Fabian Hueske
Hi, Although this solution looks straight-forward, custom triggers cannot be added that easily. The problem is that a window operator with a Trigger that emit early results produces updates, i.e., results that have been emitted might be updated later. The default Trigger only emits the final resu

Re: Strictly use TLSv1.2

2018-06-22 Thread Fabian Hueske
Hi Vinay, This looks like a bug. Would you mind creating a Jira ticket [1] for this issue? Thank you very much, Fabian [1] https://issues.apache.org/jira/projects/FLINK 2018-06-21 9:25 GMT+02:00 Vinay Patil : > Hi, > > I have deployed Flink 1.3.2 and enabled SSL settings. From the ssl debug >

Re: Strictly use TLSv1.2

2018-06-22 Thread Fabian Hueske
Great, thank you! 2018-06-22 10:16 GMT+02:00 Vinay Patil : > Hi Fabian, > > Created a JIRA ticket : https://issues.apache.org/jira/browse/FLINK-9643 > > Regards, > Vinay Patil > > > On Fri, Jun 22, 2018 at 1:25 PM Fabian Hueske wrote: > >> Hi Vinay, >&

Re: Custom Watermarks with Flink

2018-06-25 Thread Fabian Hueske
Hi, I would not encode this information in watermarks. Watermarks are rather an internal mechanism to reason about event-time. Flink also generates watermarks internally. This makes the behavior less predictive. You could either inject special meta data records (which Flink handles just like othe

Re: How to partition within same physical node in Flink

2018-06-25 Thread Fabian Hueske
Hi, Flink distributes task instances to slots and does not expose physical machines. Records are partitioned to task instances by hash partitioning. It is also not possible to guarantee that the records in two different operators are send to the same slot. Sharing information by side-passing it (e

Re: Few question about upgrade from 1.4 to 1.5 flink ( some very basic )

2018-06-25 Thread Fabian Hueske
Hi Vishal, 1. I don't think a rolling update is possible. Flink 1.5.0 changed the process orchestration and how they communicate. IMO, the way to go is to start a Flink 1.5.0 cluster, take a savepoint on the running job, start from the savepoint on the new cluster and shut the old job down. 2. Sav

Re: How to partition within same physical node in Flink

2018-06-26 Thread Fabian Hueske
different >> slots/threads on the same Task Manager instance(aka cam1 partition) using >> keyBy(seq#) & setParallelism() ? Can *forward* Strategy be used to >> achieve this ? >> >> TIA >> >> >> On Mon, Jun 25, 2018 at 1:03 AM Fabian Hueske wrote:

Re: Measure Latency from source to sink

2018-06-26 Thread Fabian Hueske
Hi, Measuring latency is tricky and you have to be careful about what you measure. Aggregations like window operators make things even more difficult because you need to decide which timestamp(s) to forward (smallest?, largest?, all?) Depending on the operation, the measurement code might even add

Re: [Flink-9407] Question about proposed ORC Sink !

2018-06-27 Thread Fabian Hueske
Hi Sagar, That's more a question for the ORC community, but AFAIK, the top-level type is always a struct because it needs to wrap the fields, e.g., struct(name:string, age:int) Best, Fabian 2018-06-26 22:38 GMT+02:00 sagar loke : > @zhangminglei, > > Question about the schema for ORC format: >

Re: env.setStateBackend deprecated in 1.5 for FsStateBackend

2018-06-27 Thread Fabian Hueske
Hi, You can just add a cast to StateBackend to get rid of the deprecation warning: env.setStateBackend((StateBackend) new FsStateBackend("hdfs://myhdfsmachine:9000/flink/checkpoints")); Best, Fabian 2018-06-27 5:47 GMT+02:00 Rong Rong : > Hmm. > > If you have a wrapper function like this, i

Re: high-availability.storageDir clean up?

2018-06-27 Thread Fabian Hueske
Hi Elias, Till (in CC) is familiar with Flink's HA implementation. He might be able to answer your question. Thanks, Fabian 2018-06-25 23:24 GMT+02:00 Elias Levy : > I noticed in one of our cluster that they are relatively old > submittedJobGraph* and completedCheckpoint* files. I was wonderin

Re: Over Window Not Processing Messages

2018-06-27 Thread Fabian Hueske
Hi, The OVER window operator can only emit result when the watermark is advanced, due to SQL semantics which define that all records with the same timestamp need to be processed together. Can you check if the watermarks make sufficient progress? Btw. did you observe state size or IO issues? The O

Re: Over Window Not Processing Messages

2018-06-28 Thread Fabian Hueske
a full day's worth of data is >>>> loaded into the system before the watermark advances. At that point the >>>> checkpoints stall indefinitely with a couple of the tasks in the 'over' >>>> operator never acknowledging. Any thoughts on what wou

Re: DataSet with Multiple reduce Actions

2018-06-28 Thread Fabian Hueske
Hi Osh, You can certainly apply multiple reduce function on a DataSet, however, you should make sure that the data is only partitioned and sorted once. Moreover, you would end up with multiple data sets that you need to join afterwards. I think the easier approach is to wrap your functions in a s

Re: How to partition within same physical node in Flink

2018-06-28 Thread Fabian Hueske
ipeline. > > I guess I might have to use a ThreadPool within each Slot(cam partition) > to work on each seq# ?? > > TIA > > On Tue, Jun 26, 2018 at 1:06 AM Fabian Hueske wrote: > >> Hi, >> >> keyBy() does not work hierarchically. Each keyBy() overrides the prev

Re: How to partition within same physical node in Flink

2018-07-02 Thread Fabian Hueske
t;> have physical partitioning in a way where physical partiotioning happens >> first by parent key and localize grouping by child key, is there a need to >> using custom partitioner? Obviously we can keyBy twice but was wondering if >> we can minimize the re-partition stress. >

Re: The program didn't contain a Flink job

2018-07-03 Thread Fabian Hueske
Hi, Let me summarize: 1) Sometimes you get the error message "org.apache.flink.client.program.ProgramMissingJobException: The program didn't contain a Flink job.". when submitting a program through the YarnClusterClient 2) The logs and the dashboard state that the job ran successful 3) The job pe

Re: java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.assign

2018-07-03 Thread Fabian Hueske
Hi Mich, FlinkKafkaConsumer09 is the connector for Kafka 0.9.x. Have you tried to use FlinkKafkaConsumer011 instead of FlinkKafkaConsumer09? Best, Fabian 2018-07-02 22:57 GMT+02:00 Mich Talebzadeh : > This is becoming very tedious. > > As suggested I changed the kafka dependency from > > ibra

Re: Kafka Avro Table Source

2018-07-03 Thread Fabian Hueske
Hi Will, The community is currently working on improving the Kafka Avro integration for Flink SQL. There's a PR [1]. If you like, you could try it out and give some feedback. Timo (in CC) has been working Kafka Avro and should be able to help with any specific questions. Best, Fabian [1] https:

Re: Regarding external metastore like HIVE

2018-07-03 Thread Fabian Hueske
Hi, The docs explain that the ExternalCatalog interface *can* be used to implement a catalog for HCatalog or Metastore. However, there is no such implementation in Flink yet. You would need to implement such as catalog connector yourself. I think there would be quite a few people interested in su

Re: Let BucketingSink roll file on each checkpoint

2018-07-03 Thread Fabian Hueske
Hi Xilang, Let me try to summarize your requirements. If I understood you correctly, you are not only concerned about the exactly-once guarantees but also need a consistent view of the data. The data in all files that are finalized need to originate from a prefix of the stream, i.e., all records w

Re: Passing type information to JDBCAppendTableSink

2018-07-03 Thread Fabian Hueske
Hi, In addition to what Rong said: - The types look OK. - You can also use Types.STRING, and Types.LONG instead of BasicTypeInfo.xxx - Beware that in the failure case, you might have multiple entries in the database table. Some databases support an upsert syntax which (together with key or unique

Re: Passing type information to JDBCAppendTableSink

2018-07-04 Thread Fabian Hueske
d example source code that does that. > Thanks again, > Chris > > > On Tue, Jul 3, 2018 at 5:24 AM, Fabian Hueske wrote: > >> Hi, >> >> In addition to what Rong said: >> >> - The types look OK. >> - You can also use Types.STRING, and Types.LON

Re: Passing type information to JDBCAppendTableSink

2018-07-04 Thread Fabian Hueske
There is also the SQL:2003 MERGE statement that can be used to implement UPSERT logic. It is a bit verbose but supported by Derby [1]. Best, Fabian [1] https://issues.apache.org/jira/browse/DERBY-3155 2018-07-04 10:10 GMT+02:00 Fabian Hueske : > Hi Chris, > > MySQL (and maybe othe

Re: run time error java.lang.NoClassDefFoundError: org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumer09

2018-07-04 Thread Fabian Hueske
Looking at the other threads, I assume you solved this issue. The problem should have been that FlinkKafka09Consumer is not included in the flink-connector-kafka-0.11 module, because it is the connector for Kafka 0.9 and not Kafka 0.11. Best, Fabian 2018-07-02 11:20 GMT+02:00 Mich Talebzadeh :

Re: How to partition within same physical node in Flink

2018-07-04 Thread Fabian Hueske
2018-07-02 15:37 GMT+02:00 ashish pok : > Thanks Fabian! It sounds like KeyGroup will do the trick if that can be > made publicly accessible. > > On Monday, July 2, 2018, 5:43:33 AM EDT, Fabian Hueske > wrote: > > > Hi Ashish, hi Vijay, > > Flink does not disting

Re: Description of Flink event time processing

2018-07-04 Thread Fabian Hueske
Hi Elias, I agree, the docs lack a coherent discussion of event time features. Thank you for this write up! I just skimmed your document and will provide more detailed feedback later. It would be great to add such a page to the documentation. Best, Fabian 2018-07-03 3:07 GMT+02:00 Elias Levy :

Re: Why should we use an evictor operator in flink window

2018-07-04 Thread Fabian Hueske
Hi, The Evictor is useful if you want to remove some elements from the window state but not all. This also implies that a window is evaluated multiple times because otherwise you could just filter in the the user function (as you suggested) and purge the whole window afterwards. Evictors are commo

Re: Flink memory management in table api

2018-07-04 Thread Fabian Hueske
Hi Amol, The memory consumption depends on the query/operation that you are doing. Time-based operations like group-window-aggregations, over-window-aggregations, or window-joins can automatically clean up their state once data is not no longer needed. Operations such as non-windowed aggregations

Re: [Table API/SQL] Finding missing spots to resolve why 'no watermark' is presented in Flink UI

2018-07-04 Thread Fabian Hueske
Hi Jungtaek, Flink 1.5.0 features a TableSource for Kafka and JSON [1], incl. timestamp & watemark generation [2]. It would be great if you could let us know, if that addresses your use case and if not what's missing or not working. So far Table API / SQL does not have support for late-data side

Re: Flink memory management in table api

2018-07-04 Thread Fabian Hueske
ording to above conversation flink will persist state forever for non > windowed operations. I want to know how flink persiat the state i.e. > Database or file system or in memory etc. > > On Wed, 4 Jul 2018 at 2:12 PM, Fabian Hueske wrote: > >> Hi Amol, >> >> The memory

Re: How to implement Multi-tenancy in Flink

2018-07-04 Thread Fabian Hueske
Hi Ahmad, Some tricks that might help to bring down the effort per tenant if you run one job per tenant (or key per tenant): - Pre-aggregate records in a 5 minute Tumbling window. However, pre-aggregation does not work for FoldFunctions. - Implement the window as a custom ProcessFunction that mai

<    1   2   3   4   5   6   7   8   9   10   >