Re: Flink + Avro GenericRecord - first field value overwrites all other fields

2016-05-12 Thread Fabian Hueske
Hi Tarandeep, the AvroInputFormat was recently extended to support GenericRecords. [1] You could also try to run the latest SNAPSHOT version and see if it works for you. Cheers, Fabian [1] https://issues.apache.org/jira/browse/FLINK-3691 2016-05-12 10:05 GMT+02:00 Tarandeep Singh : > I think I

Re: Flink recovery

2016-05-13 Thread Fabian Hueske
Hi, Flink's exactly-once semantics do not mean that events are processed exactly-once but that events will contribute exactly-once to the state of an operator such as a counter. Roughly, the mechanism works as follows: - Flink peridically injects checkpoint markers into the data stream. This happe

Re: Flink recovery

2016-05-13 Thread Fabian Hueske
master/apis/streaming/fault_tolerance.html#fault-tolerance-guarantees-of-data-sources-and-sinks > > If not what other Sinks can I use to have the exactly once output since > getting exactly once output is critical for our use case. > > > > Thanks, > Naveen > > From: Fabian Hueske >

Re: Flink recovery

2016-05-14 Thread Fabian Hueske
t; pipeline. > > > > Thanks, > Naveen > > From: Fabian Hueske > Reply-To: "user@flink.apache.org" > Date: Friday, May 13, 2016 at 4:26 PM > > To: "user@flink.apache.org" > Subject: Re: Flink recovery > > Hi Naveen, > > the Ro

Re: Flink Kafka Streaming - Source [Bytes/records Received] and Sink [Bytes/records sent] show zero messages

2016-05-16 Thread Fabian Hueske
Hi Prateek, the missing numbers are an artifact from how the stats are collected. ATM, Flink does only collect these metrics for data which is sent over connections *between* Flink operators. Since sources and sinks connect to external systems (and not Flink operators), the dash board does not sho

Re: Flink recovery

2016-05-17 Thread Fabian Hueske
".valid-length" file. >>> >>> The fix you mentioned is part of later Flink releases (like 1.0.3) >>> >>> Stephan >>> >>> >>> On Mon, May 16, 2016 at 11:46 PM, Madhire, Naveen < >>> naveen.madh...@capitalone

Re: Performing Reduce on a group of datasets

2016-05-18 Thread Fabian Hueske
I think union is what you are looking for. Note that all data sets must be of the same type. 2016-05-18 16:15 GMT+02:00 Ritesh Kumar Singh : > Hi, > > How can I perform a reduce operation on a group of datasets using Flink? > Let's say my map function gives out n datasets: d1, d2, ... dN > Now I

Re: Performing Reduce on a group of datasets

2016-05-19 Thread Fabian Hueske
I think that sentence is misleading and refers to the internals of Flink. It should be removed, IMO. You can only union two DataSets. If you want to union more, you have to do it one by one. Btw. union does not cause additional processing overhead. Cheers, Fabian 2016-05-19 14:44 GMT+02:00 Rites

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-20 Thread Fabian Hueske
The problem seems to occur quite often. Did you update your Flink version recently? If so, could you try to downgrade and see if the problem disappears. Is it otherwise possible that it is cause by faulty hardware? 2016-05-20 18:05 GMT+02:00 Flavio Pompermaier : > This time (Europed instead of E

Re: keyBy on a collection of Pojos

2016-05-23 Thread Fabian Hueske
Actually, the program works correctly (according to the DataStream API) Let me explain what happens: 1) You do not initialize the count variable, so it will be 0 (summing 0s results in 0) 2) DataStreams are considered to be unbound (have an infinite size). KeyBy does not group the records because

Re: writeAsCSV with partitionBy

2016-05-23 Thread Fabian Hueske
Hi Kirsti, I'm not aware of anybody working on this issue. Would you like to create a JIRA issue for it? Best, Fabian 2016-05-23 16:56 GMT+02:00 KirstiLaurila : > Is there any plans to implement this kind of feature (possibility to write > to > data specified partitions) in the near future? > >

Re: Apache Beam and Flink

2016-05-26 Thread Fabian Hueske
No, that is not supported yet. Beam provides a common API but the Flink runner translates programs against batch sources into the DataSet API programs and Beam programs against streaming source into DataStream programs. It is not possible to mix both. 2016-05-26 10:00 GMT+02:00 Ashutosh Kumar : >

Re: WindowedStream aggregation methods pre-aggregate?

2016-05-27 Thread Fabian Hueske
Hi Elias, yes, reduce, fold, and the aggregation functions (sum, min, max, minBy, maxBy) on WindowedStream preform eager aggregation, i.e., the functions are apply for each value that enters the window and the state of the window will consist of a single value. In case you need access to the Windo

Re: [SQL DDL] How to extract timestamps from Kafka message's metadata

2020-08-11 Thread Fabian Hueske
Hi Dongwon, Maybe you can add your use case to the FLIP-107 discussion thread [1] and thereby support the proposal (after checking that it would solve your problem). It's always helpful to learn about the requirements of users when designing new features. It also helps to prioritize which feature

Re: coordination of sinks

2020-08-17 Thread Fabian Hueske
Hi Marco, You cannot really synchronize data that is being emitted via different streams (without bringing them together in an operator). I see two options: 1) emit the event to create the partition and the data to be written into the partition to the same stream. Flink guarantees that records d

Re: Configure vvp 2.3 with file blob storage

2020-11-03 Thread Fabian Hueske
Hi Laurent, Thanks for trying out Ververica platform! However, please note that this is the mailing list of the Apache Flink project. Please post further questions using the "Community Edition Feedback" button on this page: https://ververica.zendesk.com/hc/en-us We are working on setting up a bett

Re: Flink job getting killed

2020-04-06 Thread Fabian Hueske
Hi Giriraj, This looks like the deserialization of a String failed. Can you isolate the problem to a pair of sending and receiving tasks? Best, Fabian Am So., 5. Apr. 2020 um 20:18 Uhr schrieb Giriraj Chauhan < graj.chau...@gmail.com>: > Hi, > > We are submitting a flink(1.9.1) job for data pro

Re: Storing Operator state in RocksDb during runtime - plans

2020-04-06 Thread Fabian Hueske
Hi Kristoff, I'm not aware of any concrete plans for such a feature. Best, Fabian Am So., 5. Apr. 2020 um 22:33 Uhr schrieb KristoffSC < krzysiek.chmielew...@gmail.com>: > Hi, > according to [1] operator state and broadcast state (which is a "special" > type of operator state) are not stored in

Re: how to hold a stream until another stream is drained?

2020-04-06 Thread Fabian Hueske
Hi, With Flink streaming operators However, these parts are currently being reworked to enable a better integration of batch and streaming use cases (or hybrid use cases such as yours). A while back, we wrote a blog post about these plans [1]: > *"Unified Stream Operators:* Blink extends the Fli

Re: FlinkKafakaProducer with Confluent SchemaRegistry and KafkaSerializationSchema

2020-04-20 Thread Fabian Hueske
Hi Anil, Here's a pointer to Flink's end-2-end test that's checking the integration with schema registry [1]. It was recently updated so I hope it works the same way in Flink 1.9. Best, Fabian [1] https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-confluent-schema-registry/

Re: Problem getting watermark right with event time

2020-04-20 Thread Fabian Hueske
Hi Sudan, I noticed a few issues with your code: 1) Please check the computation of timestamps. Your code public long extractAscendingTimestamp(Eventi.Event element) { return element.getEventTime().getSeconds() * 1000; } only seems to look at the seconds of a timestamp. Typically, you wou

Re: FlinkKafakaProducer with Confluent SchemaRegistry and KafkaSerializationSchema

2020-04-21 Thread Fabian Hueske
> } > > Then used a new object of GenericSerializer in the FlinkKafkaProducer > > FlinkKafkaProducer producer = > new FlinkKafkaProducer<>(topic, new GenericSerializer(topic, schema, > schemaRegistryUrl), kafkaConfig, Semantic.AT_LEAST_ONCE); > > Thanks , Anil. > > &

Re: multiple joins in one job

2020-05-04 Thread Fabian Hueske
Hi, If the interval join emits the time attributes of both its inputs, you can use either of them as a time attribute in a following operator because the join ensures that the watermark will be aligned with both of them. Best, Fabian Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi : > Thanks

Re: multiple joins in one job

2020-05-05 Thread Fabian Hueske
ement to make it . Can it be possible? > > Fabian Hueske 于2020年5月4日周一 下午4:04写道: > >> Hi, >> >> If the interval join emits the time attributes of both its inputs, you >> can use either of them as a time attribute in a following operator because >> the join ensur

Re: table.show() in Flink

2020-05-05 Thread Fabian Hueske
There's also the Table API approach if you want to avoid typing a "full" SQL query: Table t = tEnv.from("myTable"); Cheers, Fabian Am Di., 5. Mai 2020 um 16:34 Uhr schrieb Őrhidi Mátyás < matyas.orh...@gmail.com>: > Thanks guys for the prompt answers! > > On Tue, May 5, 2020 at 2:49 PM Kurt You

Re: multiple joins in one job

2020-05-05 Thread Fabian Hueske
one? >>>> >>>> Benchao Li 于 2020年5月5日周二 17:26写道: >>>> >>>>> Hi lec, >>>>> >>>>> You don't need to specify time attribute again like `TUMBLE_ROWTIME`, >>>>> you just select the time attribute field

Re: Adaptive Watermarks Generator

2020-05-22 Thread Fabian Hueske
Hi, The code of the implementation is linked in the paper: https://github.com/DataSystemsGroupUT/Adaptive-Watermarks Since this is a prototype for a research paper, I'm doubtful that the project is maintained. I also didn't find an open-source license attached to the code. Hence adding the project

Re: Flink 1.8.3 Kubernetes POD OOM

2020-05-22 Thread Fabian Hueske
Hi Josson, I don't have much experience setting memory bounds in Kubernetes myself, but my colleague Andrey (in CC) reworked Flink's memory configuration for the last release to ease the configuration in container envs. He might be able to help. Best, Fabian Am Do., 21. Mai 2020 um 18:43 Uhr sch

Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-17 Thread Fabian Hueske
Congrats Yu! Cheers, Fabian Am Mi., 17. Juni 2020 um 10:20 Uhr schrieb Till Rohrmann < trohrm...@apache.org>: > Congratulations Yu! > > Cheers, > Till > > On Wed, Jun 17, 2020 at 7:53 AM Jingsong Li > wrote: > > > Congratulations Yu, well deserved! > > > > Best, > > Jingsong > > > > On Wed, Jun

Re: How to ensure that job is restored from savepoint when using Flink SQL

2020-07-07 Thread Fabian Hueske
Hi Jie Feng, As you said, Flink translates SQL queries into streaming programs with auto-generated operator IDs. In order to start a SQL query from a savepoint, the operator IDs in the savepoint must match the IDs in the newly translated program. Right now this can only be guaranteed if you transl

Re: How to ensure that job is restored from savepoint when using Flink SQL

2020-07-08 Thread Fabian Hueske
t; > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=shadowell&uid=shadowell%40126.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22shadowell%40126.com%22%5D> > 签名由 网易邮箱大师 <https://mail.1

Re: Custom metrics output

2020-07-21 Thread Fabian Hueske
Hi Joris, I don't think that the approach of "add methods in operator class code that can be called from the main Flink program" will work. The most efficient approach would be implementing a ProcessFunction that counts in 1-min time buckets (using event-time semantics) and updates the metrics. I

Re: Pravega connector cannot recover from the checkpoint due to "Failure to finalize checkpoint"

2020-07-21 Thread Fabian Hueske
Hi Brian, AFAIK, Arvid and Piotr (both in CC) have been working on the threading model of the checkpoint coordinator. Maybe they can help with this question. Best, Fabian Am Mo., 20. Juli 2020 um 03:36 Uhr schrieb : > Anyone can help us on this issue? > > > > Best Regards, > > Brian > > > > *Fr

Re: Flink rest api cancel job

2020-07-21 Thread Fabian Hueske
Hi White, Can you describe your problem in more detail? * What is your Flink version? * How do you deploy the job (application / session cluster), (Kubernetes, Docker, YARN, ...) * What kind of job are you running (DataStream, Table/SQL, DataSet)? Best, Fabian Am Mo., 20. Juli 2020 um 08:42 Uhr

Re: Simple MDC logs don't show up

2020-07-21 Thread Fabian Hueske
Hi, When running your code in the IDE, everything runs in the same local JVM. When you run the job on Kubernetes, the situation is very different. Your code runs in multiple JVM processes distributed in a cluster. Flink provides a metrics collection system that you should use to collect metrics f

Re: Support priority of the Flink YARN application in Flink 1.9

2019-08-02 Thread Fabian Hueske
Hi Boxiu, This sounds like a good feature. Please have a look at our contribution guidelines [1]. To propose a feature, you should open a Jira issue [2] and start a discussion there. Please note that the feature freeze for the Flink 1.9 release happened a few weeks ago. The community is currentl

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Fabian Hueske
. 2019 um 20:00 Uhr schrieb Ahmad Hassan < ahmad.has...@gmail.com>: > > Hi Fabian, > > > On 4 Jul 2018, at 11:39, Fabian Hueske wrote: > > > > - Pre-aggregate records in a 5 minute Tumbling window. However, > pre-aggregation does not work for FoldFunctions. &g

Re: From Kafka Stream to Flink

2019-08-02 Thread Fabian Hueske
Hi, Flink does not distinguish between streams and tables. For the Table API / SQL, there are only tables that are changing over time, i.e., dynamic tables. A Stream in the Kafka Streams or KSQL sense, is in Flink a Table with append-only changes, i.e., records are only inserted and never deleted

Re: Best pattern to signal a watermark > t across all tasks?

2019-08-02 Thread Fabian Hueske
Hi, Regarding step 3, it is sufficient to check that you got on message from each parallel task of the previous operator. That's because a task processes the timers of all keys before moving forward. Timers are always processed per key, but you could deduplicate on the parallel task id and check t

Re: Apache Flink and additional fileformats (Excel, blockchains)

2019-08-02 Thread Fabian Hueske
Hi Joern, Thanks for sharing your connectors! The Flink community is currently working on a website that collects and lists externally maintained connectors and libraries for Flink. We are still figuring out some details, but hope that it can go live soon. Would be great to have your repositories

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Fabian Hueske
> > > However, with your proposed solution, how would we be able to achieve this > sliding window mechanism of emitting 24 hour window every 5 minute using > processfunction ? > > > Best, > > > On Fri, 2 Aug 2019 at 09:48, Fabian Hueske wrote: > >> Hi Ahmad, >&g

Re: Flink Table API schema doesn't support nested object in ObjectArray

2019-08-02 Thread Fabian Hueske
Thanks for the bug report Jacky! Would you mind opening a Jira issue, preferably with a code snippet that reproduces the bug? Thank you, Fabian Am Fr., 2. Aug. 2019 um 16:01 Uhr schrieb Jacky Du : > Hi, All > > Just find that Flink Table API have some issue if define nested object in > an objec

Re: [DataStream API] Best practice to handle custom session window - time gap based but handles event for ending session

2019-08-05 Thread Fabian Hueske
Hi Jungtaek, I would recommend to implement the logic in a ProcessFunction and avoid Flink's windowing API. IMO, the windowing API is difficult to use, because there are many pieces like WindowAssigner, Window, Trigger, Evictor, WindowFunction that are orchestrated by Flink. This makes it very har

Re: From Kafka Stream to Flink

2019-08-07 Thread Fabian Hueske
>> how much state the query will need to maintain. >> >> >> I am not sure to understand the problem. If i have to append-only table >> and perform some join on it, what's the issue ? >> >> >> On Tue, Aug 6, 2019 at 8:03 PM Maatary Okouya &

Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread Fabian Hueske
Congratulations Hequn! Am Mi., 7. Aug. 2019 um 14:50 Uhr schrieb Robert Metzger < rmetz...@apache.org>: > Congratulations! > > On Wed, Aug 7, 2019 at 1:09 PM highfei2...@126.com > wrote: > > > Congrats Hequn! > > > > Best, > > Jeff Yang > > > > > > Original Message > > Subject:

Re: how to get the code produced by Flink Code Generator

2019-08-07 Thread Fabian Hueske
Hi Vincent, I don't think there is such a flag in Flink. However, this sounds like a really good idea. Would you mind creating a Jira ticket for this? Thank you, Fabian Am Di., 6. Aug. 2019 um 17:53 Uhr schrieb Vincent Cai < caidezhi...@foxmail.com>: > Hi Users, > In Spark, we can invoke Data

Re: Flink Table API schema doesn't support nested object in ObjectArray

2019-08-09 Thread Fabian Hueske
x this issue could be > pretty simple . > > Thanks > Jacky Du > > Fabian Hueske 于2019年8月2日周五 下午12:07写道: > >> Thanks for the bug report Jacky! >> >> Would you mind opening a Jira issue, preferably with a code snippet that >> reproduces the bug? >

Re: Apache flink 1.7.2 security issues

2019-08-13 Thread Fabian Hueske
Thanks for reporting this issue. It is already discussed on Flink's dev mailing list in this thread: -> https://lists.apache.org/thread.html/10f0f3aefd51444d1198c65f44ffdf2d78ca3359423dbc1c168c9731@%3Cdev.flink.apache.org%3E Please continue the discussion there. Thanks, Fabian Am Di., 13. Aug.

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-15 Thread Fabian Hueske
Congrats Andrey! Am Do., 15. Aug. 2019 um 07:58 Uhr schrieb Gary Yao : > Congratulations Andrey, well deserved! > > Best, > Gary > > On Thu, Aug 15, 2019 at 7:50 AM Bowen Li wrote: > > > Congratulations Andrey! > > > > On Wed, Aug 14, 2019 at 10:18 PM Rong Rong wrote: > > > >> Congratulations A

Re: Implementing a low level join

2019-08-15 Thread Fabian Hueske
Hi, Just to clarify. You cannot dynamically switch the join strategy while a job is running. What Hequn suggested was to have a util method Util.joinDynamically(ds1, ds2) that chooses the join strategy when the program is generated (before it is submitted for execution). The problem is that distr

Re: Implementing a low level join

2019-08-15 Thread Fabian Hueske
ess a > checkpoint I can change the join strategy. > > and if you do, do you have any toy example of this? > > Thanks, > Felipe > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipe

Re: [External] Re: From Kafka Stream to Flink

2019-08-16 Thread Fabian Hueske
sult of that queries taking into account only the last > values of each row. The result is inserted/updated in a in-memory K-V > database for fast access. > > > > Thanks in advance! > > > > Best > > > > *De: *Fabian Hueske > *Fecha: *miércoles, 7 de agost

Re: End of Window Marker

2019-08-16 Thread Fabian Hueske
Hi Padarn, What you describe is essentially publishing Flink's watermarks to an outside system. Flink processes time windows, by waiting for a watermark that's past the window end time. When it receives such a WM it processes and emits all ended windows and forwards the watermark. When a sink rece

Re: I'm not able to make a stream-stream Time windows JOIN in Flink SQL

2019-08-16 Thread Fabian Hueske
Hi Theo, The main problem is that the semantics of your join (Join all events that happened on the same day) are not well-supported by Flink yet. In terms of true streaming joins, Flink supports the time-windowed join (with the BETWEEN predicate) and the time-versioned table join (which does not

Re: Kafka producer failed with InvalidTxnStateException when performing commit transaction

2019-08-16 Thread Fabian Hueske
Hi Tony, I'm sorry I cannot help you with this issue, but Becket (in CC) might have an idea what went wrong here. Best, Fabian Am Mi., 14. Aug. 2019 um 07:00 Uhr schrieb Tony Wei : > Hi, > > Currently, I was trying to update our kafka cluster with larger ` > transaction.max.timeout.ms`. The > o

Re: How to implement Multi-tenancy in Flink

2019-08-16 Thread Fabian Hueske
dows. > > Thanks. > > Best, > > On 2 Aug 2019, at 12:49, Fabian Hueske wrote: > > Ok, I won't go into the implementation detail. > > The idea is to track all products that were observed in the last five > minutes (i.e., unique product ids) in a five minute tumb

Re: How to implement Multi-tenancy in Flink

2019-08-19 Thread Fabian Hueske
Great! Thanks for the feedback. Cheers, Fabian Am Mo., 19. Aug. 2019 um 17:12 Uhr schrieb Ahmad Hassan < ahmad.has...@gmail.com>: > > Thank you Fabian. This works really well. > > Best Regards, > > On Fri, 16 Aug 2019 at 09:22, Fabian Hueske wrote: > >> Hi

Re: combineGroup get false results

2019-08-22 Thread Fabian Hueske
Hi Anissa, Are you using combineGroup or reduceGroup? Your question refers to combineGroup, but the code only shows reduceGroup. combineGroup is non-deterministic by design to enable efficient partial results without network and disk IO. reduceGroup is deterministic given a deterministic key extr

Re: Error while sinking results to Cassandra using Flink Cassandra Connector

2019-08-22 Thread Fabian Hueske
Hi Manvi, A NoSuchMethodError typically indicates a version mismatch. I would check if the Flink versions of your program, the client, and the cluster are the same. Best, Fabian Am Di., 20. Aug. 2019 um 21:09 Uhr schrieb manvmali : > Hi, I am facing the issue of writing the data stream result i

Re: How to shorten MATCH_RECOGNIZE's DEFINE clause

2019-08-22 Thread Fabian Hueske
Hi Dongwon, I'm not super familiar with Flink's MATCH_RECOGNIZE support, but Dawid (in CC) might have some ideas about it. Best, Fabian Am Mi., 21. Aug. 2019 um 07:23 Uhr schrieb Dongwon Kim < eastcirc...@gmail.com>: > Hi, > > Flink relational apis with MATCH_RECOGNITION looks very attractive a

Re: Maximal watermark when two streams are connected

2019-08-22 Thread Fabian Hueske
Hi Sung, There is no switch to configure the WM to be the max of both streams and it would also in fact violate the core principles of the mechanism. Watermarks are used to track the progress of event time in streams. The implementations of operators rely on the fact that (almost) all records tha

Re: combineGroup get false results

2019-08-22 Thread Fabian Hueske
Hi Anissa, This looks strange. If I understand your code correctly, your GroupReduce function is summing up a field. Looking at the results that you posted, it seems as if there is some data missing (the total sum does not seem to match). For groupReduce it is important that the grouping keys are

Re: combineGroup get false results

2019-08-22 Thread Fabian Hueske
gt; My key fields is array of multiple type, in this case is string and long. > The result that i'm posting is just represents sampling of output dataset. > > Thank you in advance ! > > Anissa > > Le jeu. 22 août 2019 à 11:24, Fabian Hueske a écrit : > >> H

Re: tumbling event time window , parallel

2019-08-26 Thread Fabian Hueske
Hi, Can you share a few more details about the data source? Are you continuously ingesting files from a folder? You are correct, that the parallelism should not affect the results, but there are a few things that can affect that: 1) non-determnistic keys 2) out-of-order data with inappropriate wa

Re: OVER operator filtering out records

2019-08-26 Thread Fabian Hueske
Hi Vinod, This sounds like a watermark issue to me. The commonly used watermark strategies (like bounded out-of-order) are only advancing when there is a new record. Moreover, the current watermark is the minimum of the current watermarks of all input partitions. So, the watermark only moves forwa

Re: tumbling event time window , parallel

2019-08-26 Thread Fabian Hueske
ler. The latest timestamp > will be handled first ? > > > > > > BTW I tried to use a ContinuousEventTimeTrigger to make sure the window > is calculated ? and got the processing to trigger multiple times so I’m > not sure exactly how this type of trigger works.. > > > > Tha

Re: [DataStream API] Best practice to handle custom session window - time gap based but handles event for ending session

2019-08-26 Thread Fabian Hueske
I'd like to thank, I'm learning Flink with the new book "Stream > Processing with Apache Flink". :) Thanks for your amazing efforts on > publishing nice book! > > Thanks, > Jungtaek Lim (HeartSaVioR) > > > On Mon, Aug 5, 2019 at 10:21 PM Fabian Hueske wr

Re: tumbling event time window , parallel

2019-08-26 Thread Fabian Hueske
ng on Flink’s monitoring page - for the watermarks I see > different vales even after all my files were processed. Which is > something I would not expect > I would expect that eventually the WM will be the highest EVENT_TIME on > my set of files.. > > > > > > than

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-27 Thread Fabian Hueske
Hi all, Flink 1.9 Docker images are available at Docker Hub [1] now. Due to some configuration issue, there are only Scala 2.11 issues at the moment but this was fixed [2]. Flink 1.9 Scala 2.12 images should be available soon. Cheers, Fabian [1] https://hub.docker.com/_/flink [2] https://github.

Re: Are there any news on custom trigger support for SQL/Table API?

2019-08-27 Thread Fabian Hueske
Hi Theo, The work on custom triggers has been put on hold due to some major refactorings (splitting the modules, porting Scala code to Java, new type system, new catalog interfaces, integration of the Blink planner). It's also not on the near-time roadmap AFAIK. To be honest, I'm not sure how much

Re: I'm not able to make a stream-stream Time windows JOIN in Flink SQL

2019-08-27 Thread Fabian Hueske
27;t output a result any more when testing all of those > combinations. Now the second attempt works but isn't really what I wanted > to query (as the "same day"-predicate is still missing). > > Best regards > Theo > > -- > *Von: *&qu

Re: End of Window Marker

2019-08-27 Thread Fabian Hueske
eems to have some quirks). > > I think ideally each partition of the kafka topic would have some regular > information about watermarks. Perhaps the kafka producer can be modified to > support this. > > Padarn > > On Fri, Aug 16, 2019 at 3:50 PM Fabian Hueske wrote: > >&

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-27 Thread Fabian Hueske
D* > The World's Fastest Human Translation Platform. > oy...@motaword.com — www.motaword.com > > > On Tue, Aug 27, 2019 at 8:11 AM Fabian Hueske wrote: > >> Hi all, >> >> Flink 1.9 Docker images are available at Docker Hub [1] now. >> Due to some configur

Re: checkpoint failure in forever loop suddenly even state size less than 1 mb

2019-09-02 Thread Fabian Hueske
Hi Sushant, It's hard to tell what's going on. Maybe the thread pool of the async io operator is too small for the ingested data rate? This could cause the backpressure on the source and eventually also the failing checkpoints. Which Flink version are you using? Best, Fabian Am Do., 29. Aug. 2

Re: End of Window Marker

2019-09-02 Thread Fabian Hueske
up with a partition >> containing element out of ‘window order’. >> >> I was also thinking this problem is very similar to that of checkpoint >> barriers. I intended to dig into the details of the exactly once Kafka sink >> for some inspiration. >> >> Padarn >

Re: tumbling event time window , parallel

2019-09-02 Thread Fabian Hueske
9 Uhr schrieb Hanan Yehudai < hanan.yehu...@radcom.com>: > Im not sure what you mean by use process function and not window process > function , as the window operator takes in a windowprocess function.. > > > > *From:* Fabian Hueske > *Sent:* Monday, August 26, 20

[ANNOUNCE] Flink Forward training registration closes on September 30th

2019-09-05 Thread Fabian Hueske
Hi all, The registration for the Flink Forward Europe training sessions closes in four weeks. The training takes place in Berlin at October 7th and is followed by two days of talks by speakers from companies like Airbus, Goldman Sachs, Netflix, Pinterest, and Workday [1]. The following four train

Re: Join with slow changing dimensions/ streams

2019-09-05 Thread Fabian Hueske
Hi, Flink does not have good support for mixing bounded and unbounded streams in its DataStream API yet. If the dimension table is static (and small enough), I'd use a RichMapFunction and load the table in the open() method into the heap. In this case, you'd probably need to restart the job (can b

Re: Window metadata removal

2019-09-05 Thread Fabian Hueske
Hi, A window needs to keep the data as long as it expects new data. This is clearly the case before the end time of the window was reached. If my window ends at 12:30, I want to wait (at least) until 12:30 before I remove any data, right? In case you expect some data to be late, you can configure

Re: understanding task manager logs

2019-09-05 Thread Fabian Hueske
Hi Vishwas, This is a log statement from Kafka [1]. Not sure how when AppInfoParser is created (the log message is written by the constructor). For Kafka versions > 1.0, I'd recommend the universal connector [2]. Not sure how well it works if producers and consumers have different versions. Mayb

Re: error in my job

2019-09-05 Thread Fabian Hueske
Hi, Are you getting this error repeatedly or was this a single time? If it's just a single time error, it's probably caused by a task manager process that died for some reason (as suggested by the error message). You should have a look at the TM logs whether you can finds something that would exp

Re: TABLE API + DataStream outsourcing schema or Pojo?

2019-09-05 Thread Fabian Hueske
Hi Steve, Maybe you could implement a custom TableSource that queries the data from the rest API and converts the JSON directly into a Row data type. This would also avoid going through the DataStream API just for ingesting the data. Best, Fabian Am Mi., 4. Sept. 2019 um 15:57 Uhr schrieb Steve

Re: Exception when trying to change StreamingFileSink S3 bucket

2019-09-05 Thread Fabian Hueske
Hi, Kostas (in CC) might be able to help. Best, Fabian Am Mi., 4. Sept. 2019 um 22:59 Uhr schrieb sidhartha saurav < sidsau...@gmail.com>: > Hi, > > Can someone suggest a workaround so that we do not get this issue while > changing the S3 bucket ? > > On Thu, Aug 22, 2019 at 4:24 PM sidhartha s

[ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-06 Thread Fabian Hueske
Hi everyone, I'm very happy to announce that Kostas Kloudas is joining the Flink PMC. Kostas is contributing to Flink for many years and puts lots of effort in helping our users and growing the Flink community. Please join me in congratulating Kostas! Cheers, Fabian

Re: TABLE API + DataStream outsourcing schema or Pojo?

2019-09-06 Thread Fabian Hueske
String key = iterator.next(); > row.setField(pos, jsonNode.get(key).asText()); > pos++; > } > return row; > } > }).returns(convert); > > Table tableA = tEnv.fromDataStream(dataStreamRow); > > > Le jeu. 5 sept. 2019 à 13:23, Fabian Hueske a écr

Re: Implementing CheckpointableInputFormat

2019-09-06 Thread Fabian Hueske
Hi, CheckpointableInputFormat is only relevant if you plan to use the InputFormat in a MonitoringFileSource, i.e., in a streaming application. If you plan to use it in a DataSet (batch) program, InputFormat is fine. Btw. the latest release Flink 1.9.0 has major improvements for the recovery of ba

Re: Flink SQL: How to tag a column as 'rowtime' from a Avro based DataStream?

2019-09-10 Thread Fabian Hueske
Hi Niels, I think (not 100% sure) you could also cast the event time attribute to TIMESTAMP before you emit the table. This should remove the event time property (and thereby the TimeIndicatorTypeInfo) and you wouldn't know to fiddle with the output types. Best, Fabian Am Mi., 21. Aug. 2019 um 1

Re: Join with slow changing dimensions/ streams

2019-09-10 Thread Fabian Hueske
database systems have to deal with. Best, Fabian Am Do., 5. Sept. 2019 um 13:37 Uhr schrieb Hanan Yehudai < hanan.yehu...@radcom.com>: > Thanks Fabian. > > > is there any advantage using broadcast state VS using just CoMap function > on 2 connected streams ? > > > &g

Re: Flink SQL: How to tag a column as 'rowtime' from a Avro based DataStream?

2019-09-10 Thread Fabian Hueske
Hi, that would be regular SQL cast syntax: SELECT a, b, c, CAST(eventTime AS TIMESTAMP) FROM ... Am Di., 10. Sept. 2019 um 18:07 Uhr schrieb Niels Basjes : > Hi. > > Can you give me an example of the actual syntax of such a cast? > > On Tue, 10 Sep 2019, 16:30 Fabian Hueske,

Re: Filter events based on future events

2019-09-11 Thread Fabian Hueske
Hi Theo, I would implement this with a KeyedProcessFunction. These are the important points to consider: 1) partition the output of the Kafka source by Kafka partition (or the attribute that determines the partition). This will ensure that the data stay in order (per partition). 2) The KeyedProce

Re:

2019-09-11 Thread Fabian Hueske
Hi, This is clearly a Scala version issue. You need to make sure that all Flink dependencies have the same version and are compiled for Scala 2.11. The "_2.11" postfix in the dependency name indicates that it is a Scala 2.11 dependency ("_2.12 indicates Scala 2.12 compatibility). Best, Fabian Am

Re: Checkpointing is not performing well

2019-09-11 Thread Fabian Hueske
Hi, There is no upper limit for state size in Flink. There are applications with 10+ TB state. However, it is natural that checkpointing time increases with state size as more data needs to be serialized (in case of FSStateBackend) and written to stable storage. (The same is btw true for recovery

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Fabian Hueske
Congrats Zili Chen :-) Cheers, Fabian Am Mi., 11. Sept. 2019 um 12:48 Uhr schrieb Biao Liu : > Congrats Zili! > > Thanks, > Biao /'bɪ.aʊ/ > > > > On Wed, 11 Sep 2019 at 18:43, Oytun Tez wrote: > >> Congratulations! >> >> --- >> Oytun Tez >> >> *M O T A W O R D* >> The World's Fastest Human Tran

Re: How to handle avro BYTES type in flink

2019-09-13 Thread Fabian Hueske
Thanks for reporting back Catlyn! Am Do., 12. Sept. 2019 um 19:40 Uhr schrieb Catlyn Kong : > Turns out there was some other deserialization problem unrelated to this. > > On Mon, Sep 9, 2019 at 11:15 AM Catlyn Kong wrote: > >> Hi fellow streamers, >> >> I'm trying to support avro BYTES type in

Re: Uncertain result when using group by in stream sql

2019-09-13 Thread Fabian Hueske
Hi, A GROUP BY query on a streaming table requires that the result is continuously updated. Updates are propagated as a retraction stream (see tEnv.toRetractStream(table, Row.class).print(); in your code). A retraction stream encodes the type of the update as a boolean flag, the "true" and "false

Re: Compound Keys Using Temporal Tables

2019-09-16 Thread Fabian Hueske
Hi, No, this is not possible at the moment. You can only pass a single expression as primary key. A work around might be to put the two fields in a nested field (haven't tried if this works) or combine them in a single attribute, for example by casting them to VARCHAR and concating them. Best, Fa

Re: Is Idle state retention time in SQL client possible?

2019-09-17 Thread Fabian Hueske
Hi, This can be set via the environment file. Please have a look at the documentation [1] (see "execution: min-idle-state-retention: " and "execution: max-idle-retention: " keys). Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#environment-files A

Re: Time Window Flink SQL join

2019-09-18 Thread Fabian Hueske
Hi, The query that you wrote is not a time-windowed join. INSERT INTO sourceKafkaMalicious SELECT sourceKafka.* FROM sourceKafka INNER JOIN badips ON sourceKafka.`source.ip`=badips.ip WHERE sourceKafka.timestamp_received BETWEEN CURRENT_TIMESTAMP - INTERVAL '15' MINUTE AND CURRENT_TIMESTAMP; The

Re: Time Window Flink SQL join

2019-09-18 Thread Fabian Hueske
But with that 60 gb memory getting run out > > So i used below query. > Can u please guide me in this regard > > On Wed, 18 Sep 2019 at 5:53 PM, Fabian Hueske wrote: > >> Hi, >> >> The query that you wrote is not a time-windowed join. >> >> INSERT IN

Re: Batch mode with Flink 1.8 unstable?

2019-09-19 Thread Fabian Hueske
Hi Ken, Changing the parallelism can affect the generation of input splits. I had a look at BinaryInputFormat, and it adds a bunch of empty input splits if the number of generated splits is less than the minimum number of splits (which is equal to the parallelism). See --> https://github.com/apac

<    1   2   3   4   5   6   7   8   9   10   >