reams, where some events might get buffered while later ones make
it through?
More concretely, when the Kafka consumer persists an offset to Zookeeper
based on receiving a checkpoint trigger, can I trust that all events from
before that offset are not held in any windowing intermediate state?
Th
My Flink job is doing aggregations on top of event-time based windowing
across Kafka partitions. As I have been developing and restarting it, the
state for the catch-up periods becomes unreliable -- lots of duplicate emits
for time windows already seen before, that I have to discard since my sink
c
Things make more sense after coming across
http://mail-archives.apache.org/mod_mbox/flink-user/201512.mbox/%3CCANC1h_vVUT3BkFFck5wJA2ju_sSenxmE=Fiizr=ds6tbasy...@mail.gmail.com%3E
I need to ensure the parallelism is at least the number of partitions. This
seems like a gotcha that could be better d
Stephan explained in that thread that we're picking the min watermark when
doing operations that join streams from multiple sources. If we have m:n
partition-source assignment where m>n, the source is going to end up with
the max watermark. Having m<=n ensures that the lowest watermark is used.
Re
I am assigning timestamps using a threshold-based extractor
<https://gist.github.com/shikhar/2d9306e2ebd8ca89728c> -- the static delta
from last timestamp is probably sufficient and the PriorityQueue for
allowing outliers not necessary, that is something I added while figuring
out wh
Hi Fabian,
Sorry, I should have been clearer. What I meant (or now know!) by duplicate
emits is that since the watermark is progressing more rapidly than the state
of the offsets on some partitions due to the source multiplexing more than 1
partition, when messages from the lagging partitions are
Yes that approach seems perfect Stephan, thanks for creating the JIRA!
It is not only when resetting to smallest, I have observed uneven progress
on partitions skewing the watermark any time the source is not caught up to
the head of each partition it is handling, like when stopping for a few mins
Repro at https://github.com/shikhar/flink-sbt-fatjar-troubles, run `sbt
assembly`
A fat jar seems like the best way to provide jobs for Flink to execute.
I am declaring deps like:
{noformat}
"org.apache.flink" %% "flink-clients" % "1.0-SNAPSHOT" % "pro
Stephan Ewen wrote
> Do you know why you are getting conflicts on the FashHashMap class, even
> though the core Flink dependencies are "provided"? Does adding the Kafka
> connector pull in all the core Flink dependencies?
Yes, the core Flink dependencies are being pulled in transitively from the
K
This seems to work to generate the assembly, hopefully not missing any
required transitive deps:
```
"org.apache.flink" %% "flink-clients" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
"org.apache.kafka" %% "kafka" % "0.8.2.2",
("or
Hi Till,
Thanks so much for sorting this out!
One suggestion, can the Flink template depend on a connector (Kafka?) --
this would verify that assembly works smoothly for a very common use-case
when you need to include connector JAR's.
Cheers,
Shikhar
--
View this message in context:
Hi Till,
I just tried creating an assembly with RC4:
```
"org.apache.flink" %% "flink-clients" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-connector-kafka-0.8" % flinkVersion,
```
It actually succeeds
I am trying to have my job also run a periodic action by using a custom
source that emits a dummy element periodically and a sink that executes the
callback, as shown in the code below. However as soon as I start the job and
check the state in the JobManager UI this particular Sink->Source combo is
In case this helps, this is a Scala helper I am using to filter out late data
on a KeyedStream. The last timestamp state is maintained at the key-level.
```
implicit class StrictlyAscendingByTime[T, K](stream: KeyedStream[T, K]) {
def filterStrictlyAscendingTime(timestampExtractor: T =>
Lon
Thanks Till. I can confirm that things are looking good with RC5.
sbt-assembly works well with the flink-kafka connector dependency not marked
as "provided".
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-packaging-makes-life-hard-for-
Wow that's embarassing :D That was indeed the issue
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Periodic-actions-tp5290p5304.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.com.
16 matches
Mail list logo