Re: Classes missing from jar

2018-01-12 Thread Tzu-Li (Gordon) Tai
Hi Jason, You actually should not be adding the flink-dist jar as a dependency in your application. It seems like you are not using a build tool for your application, but adding dependencies manually. In general, I would recommend build management tools like Maven / Gradle for building Java app

Re: Hadoop compatibility and HBase bulk loading

2018-01-12 Thread Flavio Pompermaier
Any progress on this Fabian? HBase bulk loading is a common task for us and it's very annoying and uncomfortable to run a separate YARN job to accomplish it... On 10 Apr 2015 12:26, "Flavio Pompermaier" wrote: Great! That will be awesome. Thank you Fabian On Fri, Apr 10, 2015 at 12:14 PM, Fabia

Parallel stream consumption

2018-01-12 Thread Jason Kania
Hi, I have a question that I have not resolved via the documentation, looking in the "Parallel Execution", "Streaming"  and the "Connectors" sections. If I retrieve a kafka stream and then call the process function against it in parallel, as follows, does it consume in some round robin fashion b

Re: Aggregation using event timestamp than clock window

2018-01-12 Thread Rohan Thimmappa
Hi Gary, This is perfect. I am able to get the window working on message timestamp then clock window also stream the data that are late. I also having one edge case. for eg i get my last report at 4.57 and i never get 5.00+ hour report *ever*. i would like to wait for sometime. My reporting int

Re: Classes missing from jar

2018-01-12 Thread Jason Kania
Thanks. That resolved it. Also had to pull in the kafka 10 and 9 versions of the connector jars. Once the base jar is in the mvn repository, this won't be as problematic. On Friday, January 12, 2018, 9:46:22 AM EST, Tzu-Li (Gordon) Tai wrote: Hi Jason, The KeyedDeserializationSchema

Unrecoverable job failure after Json parse error?

2018-01-12 Thread Adrian Vasiliu
Hello, When using FlinkKafkaConsumer011 with JSONKeyValueDeserializationSchema, if an invalid, non-parsable message is sent to the Kafka topic, the consumer expectedly fails with JsonParseException. So far so good, but this leads to the following loop: the job switches to FAILED then attempts to re

Re: Java types

2018-01-12 Thread Timo Walther
Could you send us the definition of the class or even better a small code example on Github to reproduce your error? If you are implementing a Flink job in Java you should not have any org.apache.flink...scala import in your class file. Regards, Timo Hi Timo "You don't need to specify

Re: Restoring 1.3.1 savepoint in 1.4.0 fails in TimerService

2018-01-12 Thread Tzu-Li (Gordon) Tai
Thanks a lot for looking into this with so much detail, Juho! It was super helpful. Shortly put: This is indeed a bug with Flink. The HeapInternalTimerService should be performing compatibility checks on the restored / provided serializers and reconfigure serializers if possible, instead of jus

Re: CEP issue in 1.3.2. Does 1.4 fix this ?

2018-01-12 Thread Vishal Santoshi
Thanks. We will. When is 1.4.1 scheduled for release ? On Fri, Jan 12, 2018 at 3:24 AM, Dawid Wysakowicz < wysakowicz.da...@gmail.com> wrote: > Hi Vishal, > I think it might be due to this bug: https://issues.apache.org/ > jira/browse/FLINK-8226 > It was merged for 1.4.1 and 1.5.0. Could you

Re: CaseClassTypeInfo fails to deserialize in flink 1.4 with Parent First Class Loading

2018-01-12 Thread Seth Wiesman
Here is the stack trace: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot instantiate user function. at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:235) at org.apache.flink.streaming.runtime.tasks.OperatorChain.(Oper

Re: Restoring 1.3.1 savepoint in 1.4.0 fails in TimerService

2018-01-12 Thread Juho Autio
Thanks, the window operator is just: .timeWindow(Time.seconds(10)) We haven't changed key types. I tried debugging this issue in IDE and found the root cause to be this: !this.keyDeserializer.equals(keySerializer) -> true => throw new IllegalStateException("Tried to initialize restored TimerS

Re: class loader issues when closing streams

2018-01-12 Thread Tzu-Li (Gordon) Tai
Hi Jared, I currently don't have a solid idea of what may be happening, but from the stack dump you provided, it seems like the client connection you are using in the Elasticsearch API call bridge is stuck, even after the cleanup. Do you think there could be some issue with closing the client you

Re: Classes missing from jar

2018-01-12 Thread Tzu-Li (Gordon) Tai
Hi Jason, The KeyedDeserializationSchema is located in the flink-connector-kafka-base module, so you'll need to include the jar for that too [1]. Cheers, Gordon [1] https://repo1.maven.org/maven2/org/apache/flink/flink-connector-kafka-base_2.11/1.4.0/ -- Sent from: http://apache-flink-user-

Re: How can I count the element in datastream

2018-01-12 Thread Tzu-Li (Gordon) Tai
Hi! Do you mean that you want to count all elements across all partitions of a DataStream? To do that, you'll need to transform the DataStream with an operator of parallelism 1, e.g. DatatStream stream = ...; stream.map(new CountingMap<>()).setParallelism(1); Cheers, Gordon -- Sent from: http

Re: SideOutput doesn't receive anything if filter is applied after the process function

2018-01-12 Thread Juho Autio
Maybe I could express it in a slightly different way: if adding the .filter() after .process() causes the side output to be somehow totally "lost", then I believe the .getSideOutput() could be aware that there is not such side output to be listened to from upstream, and throw an exception. I mean,

Re: CaseClassTypeInfo fails to deserialize in flink 1.4 with Parent First Class Loading

2018-01-12 Thread Tzu-Li (Gordon) Tai
Hi Seth, Thanks a lot for the report! I think your observation is expected behaviour, if there really is a binary incompatible change between Scala minor releases. And yes, the type information macro in the Scala API is very sensitive to the exact Scala version used. I had in the past also observ

Re: can we recover job use latest checkpointed state instead of savepoint, and how?

2018-01-12 Thread Tzu-Li (Gordon) Tai
Hi, Externalized checkpoints [1] seems to be exactly what you are looking for. Checkpoints are by default not persisted, unless configured otherwise to be externalized so that they are not automatically cleaned up when the job fails. They can be used to resume the job. On the other hand, it woul

Re: Restoring 1.3.1 savepoint in 1.4.0 fails in TimerService

2018-01-12 Thread Tzu-Li (Gordon) Tai
Hi Juho, Could your key type have possibly changed / been modified across the upgrade? Also, from the error trace, it seems like the failing restore is of a window operator. What window type are you using? That exception is a result of either mismatching key serializers or namespace serializers (

Re: SideOutput doesn't receive anything if filter is applied after the process function

2018-01-12 Thread Tzu-Li (Gordon) Tai
Hi Juho, Now that I think of it this seems like a bug to me: why does the call to getSideOutput succeed if it doesn't provide _any_ input? With the way side outputs work, I don’t think this is possible (or would make sense). An operator does not know whether or not it would ever emit some elem

Restoring 1.3.1 savepoint in 1.4.0 fails in TimerService

2018-01-12 Thread Juho Autio
I'm trying to restore savepoints that were made with Flink 1.3.1 but getting this exception. The few code changes that had to be done to switch to 1.4.0 don't seem to be related to this, and it seems like an internal issue of Flink. Is 1.4.0 supposed to be able to restore a savepoint that was made

Re: Single Source of Truth for States among Multiple Process Functions

2018-01-12 Thread Sendoh
Hi, Isn't accumulator like what fits your use case? Accumulator is shared. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: CEP issue in 1.3.2. Does 1.4 fix this ?

2018-01-12 Thread Dawid Wysakowicz
Hi Vishal, I think it might be due to this bug: https://issues.apache.org/jira/browse/FLINK-8226 It was merged for 1.4.1 and 1.5.0. Could you check with this changes applied? Would be really helpful. If the error still persists could you file a jira? Regards Dawid > On 11 Jan 2018, at 19:49, Vi

SideOutput doesn't receive anything if filter is applied after the process function

2018-01-12 Thread Juho Autio
When I run the code below (Flink 1.4.0 or 1.3.1), only "a" is printed. If I switch the position of .process() & .filter() (ie. filter first, then process), both "a" & "b" are printed, as expected. I guess it's a bit hard to say what the side output should include in this case: the stream before fi

Re: Aggregation using event timestamp than clock window

2018-01-12 Thread Gary Yao
Hi Rohan, Your ReportTimestampExtractor assigns timestamps to the stream records correctly but uses the wall clock to emit Watermarks (System.currentTimeMillis). In Flink Watermarks are the mechanism to advance the event time. Hence, you should emit Watermarks according to the time that you extrac