Re: count(DISTINCT) in flink SQL

2019-06-11 Thread Fabian Hueske
t;> ) >> >> GROUP BY user_id, event_month, event_year >> >> >> >> We are also using idle state retention time to clean up unused state, but >> that is much longer (a week or month depending on the usecase). We will >> switch to count(DISTINCT) as soo

Re: Avro serde classes in Flink

2019-06-11 Thread Fabian Hueske
Hi Debasish, No, I don't think there's a particular reason. There a few Jira issues that propose adding an Avro Serialization Schema for Confluent Schema Registry [1] [2]. Please check them out and add a new one if they don't describe what you are looking for. Cheers, Fabian [1] https://issues.a

Re: About SerializationSchema/DeserializationSchema's concurrency safety

2019-06-11 Thread Fabian Hueske
Hi, Yes, multiple instances of the same De/SerializationSchema can be executed in the same JVM. Regarding 2. I'm not 100%, but would suspect that one De/SerializationSchema instance handles multiple partitions. Gordon (in CC) should know this for sure. Best, Fabian Am Mo., 10. Juni 2019 um 05:25

Re: Building flink from source---error of resolving dependencies

2019-06-13 Thread Fabian Hueske
Hi Syed, The build fails because Maven could not download the required dependency com.mapr.hadoop:maprfs:jar:5.2.1-mapr. The dependency is hosted on MapR's Maven repository. Maybe the service was not available for some time. I checked it right now and it seems to be working. I'd suggest to try it

Re: Deletion of topic from a pattern in KafkaConsumer if auto-create is true will always recreate the topic

2019-06-14 Thread Fabian Hueske
Hi Visha, If I remember correctly, the behavior of the Kafka consumer was changed in Flink 1.8 to account for such situations. Please check the release notes [1] and the corresponding Jira issue [2]. If this is not the behavior you need, please feel free to create a new Jira issue and start a dis

Re: Flink Kafka consumer with low latency requirement

2019-06-24 Thread Fabian Hueske
Hi Ben, Flink's Kafka consumers track their progress independent of any worker. They keep track of the reading offset for themselves (committing progress to Kafka is optional and only necessary to have progress monitoring in Kafka's metrics). As soon as a consumer reads and forwards an event, it i

Re: Unexpected behavior from interval join in Flink

2019-06-24 Thread Fabian Hueske
Hi Wouter, Not sure what is going wrong there, but something that you could try is to use a custom watemark assigner and always return a watermark of 0. When the source finished serving the watermarks, it emits a final Long.MAX_VALUE watermark. Hence the join should consume all events and store th

Re: Unexpected behavior from interval join in Flink

2019-06-24 Thread Fabian Hueske
d up using xStream as a 'base' while > I needed to use yStream as a 'base'. I.e. yStream.element - 60 min <= > xStream.element <= yStream.element + 30 min. Interchanging both datastreams > fixed this issue. > > Thanks anyways. > > Cheers, Wouter >

Re: Flink Kafka consumer with low latency requirement

2019-06-24 Thread Fabian Hueske
Hi, What kind of function do you use to implement the operator that has the blocking call? Did you have a look at the AsyncIO operator? It was designed for exactly such use cases. It issues multiple asynchronous requests to an external service and waits for the response. Best, Fabian Am Mo., 24.

Re: How does Flink recovers uncommited Kafka offset in AsyncIO?

2019-07-08 Thread Fabian Hueske
Hi, Kafka offsets are only managed by the Flink Kafka Consumer. All following operators do not care whether the events were read from Kafka, files, Kinesis or whichever source. It is the responsibility of the source to include its reading position (in case of Kafka the partition offsets) in a chec

Re: Error checkpointing to S3 like FS (EMC ECS)

2019-07-08 Thread Fabian Hueske
Hi Vishwas, Sorry for the late response. Are you still facing the issue? I have no experience with EMC ECS, but the exception suggests an issue with the host name: 1378 Caused by: java.net.UnknownHostException: aip-featuretoolkit.SU73ECSG1P1d.***.COM 1379 at java.net.InetAddress.getAl

Re: Disk full problem faced due to the Flink tmp directory contents

2019-07-10 Thread Fabian Hueske
Hi, AFAIK Flink should remove temporary files automatically when they are not needed anymore. However, I'm not 100% sure that there are not corner cases when a TM crashes. In general it is a good idea to properly configure the directories that Flink uses for spilling, logging, blob storage, etc.

Re: How are kafka consumer offsets handled if sink fails?

2019-07-11 Thread Fabian Hueske
Hi John, let's say Flink performed a checkpoint after the 2nd record (by injecting a checkpoint marker into the data flow) and the sink fails on the 5th record. When Flink restarts the application, it resets the offset after the 2nd record (it will read the 3rd record first). Hence, the 3rd and 4t

Re: Cannot catch exception throws by kafka consumer with JSONKeyValueDeserializationSchema

2019-07-11 Thread Fabian Hueske
Hi, I'd suggest to implement your own custom deserialization schema for example by extending JSONKeyValueDeserializationSchema. Then you can implement whatever logic you need to handle incorrectly formatted messages. Best, Fabian Am Mi., 10. Juli 2019 um 04:29 Uhr schrieb Zhechao Ma < mazhechaom

Re: Apache Flink - Relation between stream time characteristic and timer triggers

2019-07-11 Thread Fabian Hueske
Hi Mans, IngestionTime is uses the same internal mechanisms as EventTime (record timestamps and watermarks). The difference is that instead of extracting a timestamp from the record (using a custom timestamp extractor & wm assigner), Flink will assign timestamps based on the machine clock of the

[ANNOUNCE] Rong Rong becomes a Flink committer

2019-07-11 Thread Fabian Hueske
Hi everyone, I'm very happy to announce that Rong Rong accepted the offer of the Flink PMC to become a committer of the Flink project. Rong has been contributing to Flink for many years, mainly working on SQL and Yarn security features. He's also frequently helping out on the user@f.a.o mailing l

Re: Apache Flink - Relation between stream time characteristic and timer triggers

2019-07-11 Thread Fabian Hueske
ics, if a > processor or window trigger registers with a ProcessingTime and EventTime > timers - they will all fire when the appropriate watermarks arrive. > > Thanks again. > > On Thursday, July 11, 2019, 05:41:54 AM EDT, Fabian Hueske < > fhue...@gmail.com> wrote: > >

Re: GroupBy result delay

2019-07-23 Thread Fabian Hueske
Hi Fanbin, The delay is most likely caused by the watermark delay. A window is computed when the watermark passes the end of the window. If you configured the watermark to be 10 minutes before the current max timestamp (probably to account for out of order data), then the window will be computed w

Re: MiniClusterResource class not found using AbstractTestBase

2019-07-23 Thread Fabian Hueske
Hi Juan, Which Flink version do you use? Best, Fabian Am Di., 23. Juli 2019 um 06:49 Uhr schrieb Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.com>: > Hi, > > I'm trying to use AbstractTestBase in a test in order to use the mini > cluster. I'm using specs2 with Scala, so I cannot extend

Re: timeout exception when consuming from kafka

2019-07-23 Thread Fabian Hueske
Hi Yitzchak, Thanks for reaching out. I'm not an expert on the Kafka consumer, but I think the number of partitions and the number of source tasks might be interesting to know. Maybe Gordon (in CC) has an idea of what's going wrong here. Best, Fabian Am Di., 23. Juli 2019 um 08:50 Uhr schrieb Y

Re: Union of streams performance issue (10x)

2019-07-23 Thread Fabian Hueske
Hi Peter, The performance drops probably be due to de/serialization. When tasks are chained, records are simply forwarded as Java objects via method calls. When a task chain in broken into multiple operators, the records (Java objects) are serialized by the sending task, possibly shipped over the

Re: Does Flink support raw generic types in a merged stream?

2019-07-23 Thread Fabian Hueske
Hi John, You could implement your own n-ary Either type. It's a bit of work because you'd need also a custom TypeInfo & Serializer but rather straightforward if you follow the implementation of Either. Best, Fabian Am Mi., 17. Juli 2019 um 16:28 Uhr schrieb John Tipper < john_tip...@hotmail.com>

Re: Flink Zookeeper HA: FileNotFoundException blob - Jobmanager not starting up

2019-07-23 Thread Fabian Hueske
Hi Richard, I hope you could resolve the problem in the meantime. Nonetheless, maybe Till (in CC) has an idea what could have gone wrong. Best, Fabian Am Mi., 17. Juli 2019 um 19:50 Uhr schrieb Richard Deurwaarder < rich...@xeli.eu>: > Hello, > > I've got a problem with our flink cluster where

Re: AsyncDataStream on key of KeyedStream

2019-07-23 Thread Fabian Hueske
Hi Flavio, Not sure I understood the requirements correctly. Couldn't you just collect and bundle all records with a regular window operator and forward one record for each key-window to an AsyncIO operator? Best, Fabian Am Do., 18. Juli 2019 um 12:20 Uhr schrieb Flavio Pompermaier < pomperma...

Re: Flink SinkFunction for WebSockets

2019-07-23 Thread Fabian Hueske
Hi Tim, One thing that might be interesting is that Flink might emit results more than once when a job recovers from a failure. It is up to the receiver to deal with that. Depending on the type of results this might be easy (idempotent updates) or impossible. Best, Fabian Am Fr., 19. Juli 2019

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-23 Thread Fabian Hueske
Hi Dongwon, regarding the question about the conversion: If you keep using the Row type and not adding/removing fields, the conversion is pretty much for free right now. It will be a MapFunction (sometimes even not function at all) that should be chained with the other operators. Hence, it should

Re: Use batch and stream environment in a single pipeline

2019-07-23 Thread Fabian Hueske
Hi, Right now it is not possible to mix batch and streaming environments in a job. You would need to implement the batch logic via the streaming API which is not always straightforward. However, the Flink community is spending a lot of effort on unifying batch and stream processing. So this will

Re: Flink Zookeeper HA: FileNotFoundException blob - Jobmanager not starting up

2019-07-23 Thread Fabian Hueske
our production clustered crippled like this. > > Richard > > On Tue, Jul 23, 2019 at 12:47 PM Fabian Hueske wrote: > >> Hi Richard, >> >> I hope you could resolve the problem in the meantime. >> >> Nonetheless, maybe Till (in CC) has an idea what could h

Re: AsyncDataStream on key of KeyedStream

2019-07-23 Thread Fabian Hueske
AsyncDataStream should provide a first-class support to keyed streams > (and thus perform a single call per key and window..). What do you think? > > On Tue, Jul 23, 2019 at 12:56 PM Fabian Hueske wrote: > >> Hi Flavio, >> >> Not sure I understood the requirements c

Re: CEP Pattern limit

2019-07-23 Thread Fabian Hueske
Hi Pedro, each pattern gets translated into one or more Flink operators. Hence, your Flink program becomes *very* large and requires much more time to be deployed. Hence, the timeout. I'd try to limit the size your job by grouping your patterns and creating an own job for each group. You can also

Re: AsyncDataStream on key of KeyedStream

2019-07-23 Thread Fabian Hueske
Sure: /--> AsyncIO --\ STREAM --> ProcessFunc -- -- Union -- WindowFunc \--/ ProcessFunc keeps track of the unique keys per window duration and emits each

[ANNOUNCE] The Program of Flink Forward EU 2019 is live

2019-07-24 Thread Fabian Hueske
Hi everyone, I'm happy to announce the program of the Flink Forward EU 2019 conference. The conference takes place in the Berlin Congress Center (bcc) from October 7th to 9th. On the first day, we'll have four training sessions [1]: * Apache Flink Developer Training * Apache Flink Operations Trai

Re: LEFT JOIN issue SQL API

2019-07-29 Thread Fabian Hueske
If you need an outer join, the only solution is to convert the table into a retraction stream and correctly handle the retraction messages. Btw. even then this might not perform as you would like it to be. The query will store all input tables completely in state. So you might run out of space soon

Re: Extending REST API with new endpoints

2019-07-29 Thread Fabian Hueske
Hi Oytun, Thanks for your input and feature request! The right way to propose a feature and contribute it is described here [1]. Basically, you should open a Jira issue and start a discussion about the feature there. If it is a bigger features, you should also bring it to the dev@f.a.o mailing li

Re: Is Queryable State not working in standalone cluster with 1 task manager?

2019-07-30 Thread Fabian Hueske
Hi Oytun, Is QS enabled in your Docker image or did you enable QS by copying/moving flink-queryable-state-runtime_2.11-1.8.0.jar from ./opt to ./lib [1]? Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html#activating-queryable-state Am M

Re: NotSerializableException

2016-06-09 Thread Fabian Hueske
Hi Tarandeep, the exception suggests that Flink tries to serialize RecordsFilterer as a user function (this happens via Java Serialization). I said suggests because the code that uses RecordsFilterer is not included. To me it looks like RecordsFilterer should not be used as a user function. It is

Re: FlinkKafkaProducer API

2016-06-09 Thread Fabian Hueske
Hi Elias, thanks for your feedback. I think those are good observations and suggestions to improve the Kafka producers. The best place to discuss such improvements is the dev mailing list. Would like to repost your mail there or open JIRAs where the discussion about these changes can continue? T

Re: How to maintain the state of a variable in a map transformation.

2016-06-09 Thread Fabian Hueske
Hi Ravikumar, I'll try to answer your questions: 1) If you set the parallelism of a map function to 1, there will be only a single instance of that function regardless whether it is execution locally or remotely in a cluster. 2) Flink does also support aggregations, (reduce, groupReduce, combine,

Re: Strange behavior of DataStream.countWindow

2016-06-09 Thread Fabian Hueske
Hi Yukun, the problem is that the KeySelector is internally invoked multiple times. Hence it must be deterministic, i.e., it must extract the same key for the same object if invoked multiple times. The documentation is not discussing this aspect and should be extended. Thanks for pointing out thi

Re: Maxby() and KeyBy() question

2016-06-09 Thread Fabian Hueske
Hi, you are computing a running aggregate, i.e., you're getting one output record for each input record and the output record is the record with the largest value observed so far. If the record with the largest value is the first, the record is sent out another time. This is what happened with Mat

Re: HBase reads and back pressure

2016-06-09 Thread Fabian Hueske
Hi Christophe, where does the backpressure appear? In front of the sink operator or before the window operator? In any case, I think you can improve your WindowFunction if you convert parts of it into a FoldFunction. The FoldFunction would take care of the statistics computation and the WindowFun

Re: How to maintain the state of a variable in a map transformation.

2016-06-09 Thread Fabian Hueske
ion Cleared. > > 4) My question was can I use same ExecutionEnvironment for all flink > programs in a module. > > 5) Question Cleared. > > > Regards > Ravikumar > > > > On 9 June 2016 at 17:58, Fabian Hueske wrote: > >> Hi Ravikumar, >> >> I

Re: Data Source Generator emits 4 instances of the same tuple

2016-06-09 Thread Fabian Hueske
We solved this problem yesterday at the Flink Hackathon. The issue was that the source function was started with parallelism 4 and each function read the whole file. Cheers, Fabian 2016-06-06 16:53 GMT+02:00 Biplob Biswas : > Hi, > > I tried streaming the source data 2 ways > > 1. Is a simple st

Re: FlinkKafkaProducer API

2016-06-09 Thread Fabian Hueske
Great, thank you! 2016-06-09 17:38 GMT+02:00 Elias Levy : > > On Thu, Jun 9, 2016 at 5:16 AM, Fabian Hueske wrote: > >> thanks for your feedback. I think those are good observations and >> suggestions to improve the Kafka producers. >> The best place to discuss s

Re: HBase reads and back pressure

2016-06-09 Thread Fabian Hueske
rwyck < christophe.salperw...@gmail.com>: > Hi Fabian, > > Thanks for the help, I will try that. The backpressure was on the source > (HBase). > > Christophe > > 2016-06-09 16:38 GMT+02:00 Fabian Hueske : > >> Hi Christophe, >> >> where does the backpre

Re: Strange behavior of DataStream.countWindow

2016-06-13 Thread Fabian Hueske
necessary since > I don't really care about keys. > > On 9 June 2016 at 22:00, Fabian Hueske wrote: > >> Hi Yukun, >> >> the problem is that the KeySelector is internally invoked multiple times. >> Hence it must be deterministic, i.e., it must extract the

Re: HBase reads and back pressure

2016-06-13 Thread Fabian Hueske
nk 1.x release line. >> >> >> >> Cheers, >> >> Aljoscha >> >> >> >> On Mon, 13 Jun 2016 at 11:04 Christophe Salperwyck >> >> wrote: >> >>> >> >>> Thanks for the feedback and sorry that I can't try

Re: Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-17 Thread Fabian Hueske
Hi Josh, I assume that you build the SNAPSHOT version yourself. I had similar version conflicts for Apache HttpCore with Flink SNAPSHOT versions on EMR. The problem is cause by a changed behavior in Maven 3.3 and following versions. Due to these changes, the dependency shading is not working corre

Re: Optimizations not performed - please confirm

2016-06-29 Thread Fabian Hueske
Yes, that was my fault. I'm used to auto reply-all on my desktop machine, but my phone just did a simple reply. Sorry for the confusion, Fabian 2016-06-29 19:24 GMT+02:00 Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr>: > Thank you, Aljoscha! > I received a similar update from Fabian, o

Re: [ANNOUNCE] Flink 1.1.0 Released

2016-08-09 Thread Fabian Hueske
Thanks Ufuk and everybody who contributed to the release! Cheers, Fabian 2016-08-08 20:41 GMT+02:00 Henry Saputra : > Great work all. Great Thanks to Ufuk as RE :) > > On Monday, August 8, 2016, Stephan Ewen wrote: > > > Great work indeed, and big thanks, Ufuk! > > > > On Mon, Aug 8, 2016 at 6:

Re: Complex batch workflow needs (too) much time to create executionPlan

2016-08-22 Thread Fabian Hueske
Hi Markus, you might be right, that a lot of time is spend in optimization. The optimizer enumerates all alternatives and chooses the plan with the least estimated cost. The degrees of freedom of the optimizer are rather restricted (execution strategies and the used partitioning & sorting keys. Th

Re: Batch jobs with a very large number of input splits

2016-08-23 Thread Fabian Hueske
Hi Niels, yes, in YARN mode, the default parallelism is the number of available slots. You can change the default task parallelism like this: 1) Use the -p parameter when submitting a job via the CLI client [1] 2) Set a parallelism on the execution environment: env.setParallelism() Best, Fabian

Re: Joda exclude in java quickstart maven archetype

2016-08-29 Thread Fabian Hueske
Hi Flavio, yes, Joda should not be excluded. This will be fixed in Flink 1.1.2. Cheers, Fabian 2016-08-29 11:00 GMT+02:00 Flavio Pompermaier : > Hi to all, > I've tried to upgrade from Flink 1.0.2 to 1.1.1 so I've copied the > excludes of the maven shade plugin from the java quickstart pom bu

Re: CountTrigger FIRE or FIRE_AND_PURGE

2016-08-29 Thread Fabian Hueske
Hi Paul, This blog post [1] includes an example of an early trigger that should pretty much do what you are looking for. This one [2] explains the windowing mechanics of Flink (window assigner, trigger, function, etc). Hope this helps, Fabian [1] https://www.mapr.com/blog/essential-guide-streami

Re: NoClassDefFoundError with ElasticsearchSink on Yarn

2016-09-01 Thread Fabian Hueske
Hi Steffen, this looks like a Guava version mismatch to me. Are you running exactly the same program on your local machine or did you add dependencies to run it on the cluster (e.g. Kinesis). Maybe Kinesis and Elasticsearch are using different Guava versions? Best, Fabian 2016-09-01 10:45 GMT+02

Re: CountTrigger FIRE or FIRE_AND_PURGE

2016-09-01 Thread Fabian Hueske
us windows will not purge, is that correct? > > final DataStream alertingMsgs = keyedStream > .window(TumblingEventTimeWindows.of(Time.minutes(1))) > .trigger(CountTrigger.of(1)) > .apply(new MyWindowProcessor()); > > Paul &

Re: Streaming - memory management

2016-09-01 Thread Fabian Hueske
state >> backend since the state is not gone after checkpointing ? >> >> P.S I have kept the watermark behind by 1500 secs just to be safe on >> handling late elements but to tackle edge case scenarios like the one >> mentioned above we are having a backup plan of using Ca

Re: Streaming - memory management

2016-09-01 Thread Fabian Hueske
the data to local TM disk, > the retrieval will be faster here than Cassandra , right ? > > What do you think ? > > > Regards, > Vinay Patil > > On Thu, Sep 1, 2016 at 10:19 AM, Fabian Hueske-2 [via Apache Flink User > Mailing List archive.] <[hidden email] > <

Re: Streaming - memory management

2016-09-01 Thread Fabian Hueske
ng and > flatmap to take care of rest logic. > > Or are you suggestion to use Co-FlatMapFunction after the outer-join > operation (I mean after doing the window and > getting matchingAndNonMatching stream )? > > Regards, > Vinay Patil > > On Thu, Sep 1, 2016 at 1

Re: Windows and Watermarks Clarification

2016-09-01 Thread Fabian Hueske
Hi Paul, BoundedOutOfOrdernessTimestampExtractor implements the AssignerWithPeriodicWatermarks interface. This means, Flink will ask the assigner in regular intervals (configurable via StreamExecutionEnvironment.getConfig().setAutoWatermarkInterval()) for the current watermark. The watermark will

Re: Streaming - memory management

2016-09-01 Thread Fabian Hueske
rm in storing the > DTO ? > > I think the documentation should specify the point that the state will be > maintained for user-defined operators to avoid confusion. > > Regards, > Vinay Patil > > On Thu, Sep 1, 2016 at 1:12 PM, Fabian Hueske-2 [via Apache Flink User >

Re: Windows and Watermarks Clarification

2016-09-01 Thread Fabian Hueske
entTime - lastWaterMarkTime. So if (maxEventTime > - lastWaterMarkTime) > x * 1000 then the window is evaluated? > > > Paul > ------ > *From:* Fabian Hueske > *Sent:* Thursday, September 1, 2016 1:25:55 PM > *To:* user@flink.apache.org > *Subje

Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Fabian Hueske
Thanks for the suggestion Vishnu! Stackoverflow documentation looks great. I like the easy contribution and versioning features. However, I am a bit skeptical. IMO, Flink's primary documentation must be hosted by Apache. Out-sourcing such an important aspect of a project to an external service is

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Fabian Hueske
Hi, Flink does not provide shared state. However, you can broadcast a stream to CoFlatMapFunction, such that each operator has its own local copy of the state. If that does not work for you because the state is too large and if it is possible to partition the state (and both streams), you can als

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Fabian Hueske
r. > Is there a way? > > Best Regards > CVP > > On Wed, Sep 7, 2016 at 5:00 PM, Fabian Hueske wrote: > >> Hi, >> >> Flink does not provide shared state. >> However, you can broadcast a stream to CoFlatMapFunction, such that each >> operator h

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Fabian Hueske
s job1 ? > > > > On Wed, Sep 7, 2016 at 6:33 PM, Fabian Hueske wrote: > >> Is writing DataStream2 to a Kafka topic and reading it from the other job >> an option? >> >> 2016-09-07 19:07 GMT+02:00 Chakravarthy varaga >> : >> >>> Hi Fabian,

Re: Sharing Java Collections within Flink Cluster

2016-09-07 Thread Fabian Hueske
keys' from DS2 and DS2 could shrink/expand in terms of the no., of > keys will the key-value shard work in this case? > > On Wed, Sep 7, 2016 at 7:44 PM, Fabian Hueske wrote: > >> Operator state is always local in Flink. However, with key-value state, >> you can h

Re: Sharing Java Collections within Flink Cluster

2016-09-08 Thread Fabian Hueske
Hi Pushpendra, 1. Queryable state is an upcoming feature and not part of an official release yet. With queryable state you can query operator state from outside the application. 2. Have you had a look at the CoFlatMap operator? This operator "connects" two streams and allows to have state which i

Re: assignTimestamp after keyBy

2016-09-08 Thread Fabian Hueske
I would assign timestamps directly at the source. Timestamps are not striped of by operators. Reassigning timestamps somewhere in the middle of a job can cause very unexpected results. 2016-09-08 9:32 GMT+02:00 Dong-iL, Kim : > Thanks for replying. pushpendra. > The assignTimestamp method return

Re: scala version of flink mongodb example

2016-09-08 Thread Fabian Hueske
Hi Frank, input should be of DataSet[(BSONWritable, BSONWritable)], so a Tuple2[BSONWritable, BSONWritable], right? Something like this should work: input.map( pair => pair._1.toString ) Pair is a Tuple2[BSONWritable, BSONWritable], and pair._1 accesses the key of the pair. Alternatively you c

Re: scala version of flink mongodb example

2016-09-08 Thread Fabian Hueske
scala.reflect.ClassTag[R])org.apache.flink.api.scala. > DataSet[R] > match expected type ? > > Thanks! > Frank > > > On Thu, Sep 8, 2016 at 6:56 PM, Fabian Hueske wrote: > >> Hi Frank, >> >> input should be of DataSet[(BSONWritable, BSONWritable)], so

Re: Sharing Java Collections within Flink Cluster

2016-09-08 Thread Fabian Hueske
g and > generates key1:, key2:, key3: keyN: > > Now, > I wish to map elementKeyStream with look ups within (key1, > key2...keyN) where key1, key2.. keyN and their respective values should be > available across the cluster... > > Thanks a million ! > CVP > >

Re: NoClassDefFoundError with ElasticsearchSink on Yarn

2016-09-09 Thread Fabian Hueske
+1 I ran into that issue as well. Would be great to have that in the docs! 2016-09-09 11:49 GMT+02:00 Robert Metzger : > Hi Steffen, > > I think it would be good to add it to the documentation. > Would you like to open a pull request? > > > Regards, > Robert > > > On Mon, Sep 5, 2016 at 10:26 PM,

Re: Why tuples are not ignored after watermark?

2016-09-15 Thread Fabian Hueske
No, this is not possible unless you use an external service such as a database. The assigners might run on different machines and Flink does not provide utilities for r/w shared state. Best, Fabian 2016-09-15 20:17 GMT+02:00 Saiph Kappa : > And is it possible to share state across parallel insta

Re: Streaming - memory management

2016-09-15 Thread Fabian Hueske
maintained in local disk even after checkpointing. > > Or I am not getting it correclty :) > > Regards, > Vinay Patil > > On Thu, Sep 1, 2016 at 1:38 PM, Fabian Hueske-2 [via Apache Flink User > Mailing List archive.] <[hidden email] > <http:///user/SendEmail.jtp?type=node&a

Re: SQL / Tuple question

2016-09-19 Thread Fabian Hueske
Hi Radu, you can pass the TypeInfo directly without accessing the TypeClass. Have you tried this? TypeInformation> tpinf = new TypeHint>(){}.getTypeInfo(); .toDataStream( , tpinf ) Best, Fabian 2016-09-19 17:53 GMT+02:00 Radu Tudoran : > Hi, > > > > I am trying to create an sql statement th

Re: Job Stuck in cancel state

2016-09-19 Thread Fabian Hueske
Hi Janardhan, to sure what's going wrong here. Maybe Till (in CC) has an idea? Best, Fabian 2016-09-19 19:45 GMT+02:00 Janardhan Reddy : > HI, > > I cancelled a restarting job from flink UI and the job is stuck in > cancelling state. (Fixed delay restart strategy was configured for the > job).

Re: Problem with CEPPatternOperator when taskmanager is killed

2016-09-19 Thread Fabian Hueske
Thanks for looking into this Frank! I opened FLINK-4636 [1] to track the issue. Would you or Jaxbihani like to contribute a patch for this bug? [1] https://issues.apache.org/jira/browse/FLINK-4636 2016-09-17 21:15 GMT+02:00 Frank Dekervel : > Hello, > > looks like a bug ... when a PriorityQueu

Re: flink run throws NPE, JobSubmissionResult is null when interactive and not isDetached()

2016-09-19 Thread Fabian Hueske
Hi Luis, this looks like a bug. Can you open a JIRA [1] issue and provide a more detailed description of what you do (Environment, DataStream / DataSet, how do you submit the program, maybe add a small program that reproduce the problem on your setup)? Thanks, Fabian 2016-09-19 17:30 GMT+02:00 L

Re: Serialization problem for Guava's TreeMultimap

2016-09-20 Thread Fabian Hueske
Hi Yukun, I debugged this issue and found that this is a bug in the serialization of the StateDescriptor. I have created FLINK-4640 [1] to resolve the issue. Thanks for reporting the issue. Best, Fabian [1] https://issues.apache.org/jira/browse/FLINK-4640 2016-09-20 10:35 GMT+02:00 Yukun Guo :

Re: Flink JDBCOutputFormat logs wrong WARN message

2016-09-20 Thread Fabian Hueske
Yes, the condition needs to be fixed. @Swapnil, would you like to create a JIRA issue and open a pull request to fix it? Thanks, Fabian 2016-09-20 11:22 GMT+02:00 Chesnay Schepler : > I would agree that the condition should be changed. > > > On 20.09.2016 10:52, Swapnil Chougule wrote: > >> I c

Re: Simple batch job hangs if run twice

2016-09-23 Thread Fabian Hueske
Hi Yassine, can you share a stacktrace of the job when it got stuck? Thanks, Fabian 2016-09-22 14:03 GMT+02:00 Yassine MARZOUGUI : > The input splits are correctly assgined. I noticed that whenever the job > is stuck, that is because the task *Combine (GroupReduce at > first(DataSet.java:573)) *

Re: Simple batch job hangs if run twice

2016-09-23 Thread Fabian Hueske
:584) > at java.lang.Thread.run(Thread.java:745) > > Best, > Yassine > > > 2016-09-23 11:28 GMT+02:00 Yassine MARZOUGUI : > >> Hi Fabian, >> >> Is it different from the output I already sent? (see attached file). If >> yes, how can I obtain t

Re: Flink Checkpoint runs slow for low load stream

2016-09-23 Thread Fabian Hueske
Hi CVP, I'm not so much familiar with the internals of the checkpointing system, but maybe Stephan (in CC) has an idea what's going on here. Best, Fabian 2016-09-23 11:33 GMT+02:00 Chakravarthy varaga : > Hi Aljoscha & Fabian, > > I have a stream application that has 2 stream source as belo

Re: Complex batch workflow needs (too) much time to create executionPlan

2016-09-26 Thread Fabian Hueske
Hi Markus, thanks for the stacktraces! The client is indeed stuck in the optimizer. I have to look a bit more into this. Did you try to set JoinHints in your plan? That should reduce the plan space that is enumerated and therefore reduce the optimization time (maybe enough to run your application

Re: TaskManager & task slots

2016-09-26 Thread Fabian Hueske
Hi Buvana, A TaskManager runs as a single JVM process. A TaskManager provides a certain number of processing slots. Slots do not guard CPU time, IO, or JVM memory. At the moment they only isolate managed memory which is only used for batch processing. For streaming applications their only purpose

Re: Flink-HBase connectivity through Kerberos keytab | Simple get/put sample implementation

2016-09-29 Thread Fabian Hueske
Hi Anchit, Flink does not yet have a streaming sink connector for HBase. Some members of the community are working on this though [1]. I think we resolved a similar issue for the Kafka connector recently [2]. Maybe the related commits contain some relevant code for your problem. Best, Fabian [1]

Re: Iterations vs. combo source/sink

2016-09-29 Thread Fabian Hueske
Hi Ken, you can certainly have partitioned sources and sinks. You can control the parallelism by calling .setParallelism() method. If you need a partitioned sink, you can call .keyBy() to hash partition. I did not completely understand the requirements of your program. Can you maybe provide pseud

Re: AW: Problem with CEPPatternOperator when taskmanager is killed

2016-09-29 Thread Fabian Hueske
Great, thanks! I gave you contributor permissions in JIRA. You can now also assign issues to yourself if you decide to continue to contribute. Best, Fabian 2016-09-29 16:48 GMT+02:00 jaxbihani : > Hi Fabian > > My JIRA user is: jaxbihani > I have created a pull request for the fix : > https://gi

Re: Events B33/35 in Parallel Streams Diagram

2016-09-29 Thread Fabian Hueske
Hi Neil, "B" only refers to the key-part of the record, the number is the timestamp (as you assumed out). The payload of the record is not displayed in the figure. So B35 and B31 are two different records with identical key. The keyBy() operation sends all records with the same key to the same sub

Re: Events B33/35 in Parallel Streams Diagram

2016-09-29 Thread Fabian Hueske
n fact the key of > the stream and not the id of the event? Probably seems trivial, but I > struggled with this one. haha. I’ll submit a PR for the docs if there’s > interest. > > Neil > > On Sep 29, 2016, at 11:36 AM, Fabian Hueske wrote: > > Hi Neil, > > "B&

Re: Counting latest state of stateful entities in streaming

2016-09-30 Thread Fabian Hueske
Hi Simone, I think I have a solution for your problem: val s: DataStream[(Long, Int, ts)] = ??? // (id, state, time) val stateChanges: DataStream[(Int, Int)] = s // (state, cntUpdate) .keyBy(_._1) // key by id .flatMap(new StateUpdater) // StateUpdater is a stateful FlatMapFunction. It has a

Re: Counting latest state of stateful entities in streaming

2016-09-30 Thread Fabian Hueske
, this is slightly different from what I need. > > 2016-09-30 10:04 GMT+02:00 Fabian Hueske : > >> Hi Simone, >> >> I think I have a solution for your problem: >> >> val s: DataStream[(Long, Int, ts)] = ??? // (id, state, time) >> >> val st

Re: Using Flink and Cassandra with Scala

2016-10-04 Thread Fabian Hueske
FYI: FLINK-4497 [1] requests Scala tuple and case class support for the Cassandra sink and was opened about a month ago. [1] https://issues.apache.org/jira/browse/FLINK-4497 2016-09-30 23:14 GMT+02:00 Stephan Ewen : > How hard would it be to add case class support? > > Internally, tuples and cas

Re: Enriching a tuple mapped from a datastream with data coming from a JDBC source

2016-10-04 Thread Fabian Hueske
Hi Philipp, If I got your requirements right you would like to: 1) load an initial hashmap via JDBC 2) update the hashmap from a stream 3) use the hashmap to enrich another stream. You can use a CoFlatMap to do this: stream1.connect(stream2).flatMap(new YourCoFlatMapFunction). YourCoFlatMapFunc

Re: Presented Flink use case in Japan

2016-10-04 Thread Fabian Hueske
Thanks Hironori for sharing these excellent news! Do you think it would be possible to add your use case to Flink's Powered-By wiki page [1] ? Thanks, Fabian [1] https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink 2016-10-04 14:04 GMT+02:00 Hironori Ogibayashi : > Hello, > > Just

Re: Handling decompression exceptions

2016-10-04 Thread Fabian Hueske
Hi Yassine, AFAIK, there is no built-in way to ignore corrupted compressed files. You could try to implement a FileInputFormat that wraps the CsvInputFormat and forwards all calls to the wrapped CsvIF. The wrapper would also catch and ignore the EOFException. If you do that, you would not be able

Re: Processing events through web socket

2016-10-05 Thread Fabian Hueske
Hi, the TextSocketSink is rather meant for demo purposes than to be used in an actual applications. I am not aware of any other built-in source that would provide what you are looking for. You can implement a custom SourceFunction that does what you need. Best, Fabian 2016-10-05 9:48 GMT+02:00 A

Re: Presented Flink use case in Japan

2016-10-05 Thread Fabian Hueske
PM, Hironori Ogibayashi < > ogibaya...@gmail.com> > > wrote: > >> > >> Thank you for the response. > >> Regarding adding to the page, I will check with our PR department. > >> > >> Regards, > >> Hironori > >> > >&g

Re: readCsvFile

2016-10-06 Thread Fabian Hueske
Hi Alberto, if you want to read a single column you have to wrap it in a Tuple1: val text4 = env.readCsvFile[Tuple1[String]]("file:data.csv" ,includedFields = Array(1)) Best, Fabian 2016-10-06 20:59 GMT+02:00 Alberto Ramón : > I'm learning readCsvFile > (I discover if the file ends on "/n", yo

<    3   4   5   6   7   8   9   10   11   12   >