Starting out with flint from a scala background I would like to use the
Typesafe configuration like: https://github.com/pureconfig/pureconfig,
however,
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/best_practices.html
link
recommends to setup:
env.getConfig().setGlobalJobParamete
Hello everyone!
I want to implement a streaming algorithm like Misa-Gries or Space Saving in
Flink. The goal is to maintain the heavy hitters for my (possibly unbounded)
input streams throughout all the time my app runs. More precisely, I want to
have a non-stop running task that runs the Space Sa
Hi,
I am running a Beam Pipeline on Flink 1.2 and facing an issue in
restoring a job from checkpoint. If I modify my beam pipeline to add a
new operator and try to restore from the externalized checkpoint, I get
the error
/java.lang.IllegalStateException: Invalid Invalid number of operator
You would suggest: https://github.com/ottogroup/flink-spector for unit
tests?
Georg Heiler schrieb am Mi., 29. Nov. 2017 um
22:33 Uhr:
> Thanks, this sounds like a good idea - can you recommend such a project?
>
> Jörn Franke schrieb am Mi., 29. Nov. 2017 um
> 22:30 Uhr:
>
>> If you want to rea
Thanks, this sounds like a good idea - can you recommend such a project?
Jörn Franke schrieb am Mi., 29. Nov. 2017 um
22:30 Uhr:
> If you want to really learn then I recommend you to start with a flink
> project that contains unit tests and integration tests (maybe augmented
> with https://wiki.
If you want to really learn then I recommend you to start with a flink project
that contains unit tests and integration tests (maybe augmented with
https://wiki.apache.org/hadoop/HowToDevelopUnitTests to simulate a HDFS cluster
during unit tests). It should also include coverage reporting. These
Getting started with Flink / scala, I wonder whether the scala base library
should be excluded as a best practice:
https://github.com/tillrohrmann/flink-project/blob/master/build.sbt#L32
// exclude Scala library from assembly
assemblyOption in assembly := (assemblyOption in
assembly).value.copy(inc
Hi Ebru,
the count() operator is a very simple utility functions that calls
execute() internally. If you want to have a more complex pipeline you
can take a look at how our WordCount [0] example works. The general
concept is to emit a 1 for every record and sum the ones in parallel. If
you ne
This issues sounds strikingly similar to FLINK-6965.
TL;DR: You must place classes loaded during serialization by the kafka
connector under /lib.
On 29.11.2017 16:15, Timo Walther wrote:
Hi Bart,
usually, this error means that your Maven project configuration is not
correct. Is your custom
Hi Bart,
usually, this error means that your Maven project configuration is not
correct. Is your custom class included in the jar file that you submit
to the cluster?
It might make sense to share your pom.xml with us.
Regards,
Timo
Am 11/29/17 um 2:44 PM schrieb Bart Kastermans:
I have a
We also saw issues in the failure detection/quarantining with some Hadoop
versions because of a subtle runtime netty version conflict. Fink 1.4
shades Flink's / Akka's Netty, in Flink 1.3 you may need to exclude the
Netty dependency pulled in through Hadoop explicitly.
Also, Hadoop version mismatc
Hi all,
We are trying to use more than one count operator for dataset, but it executes
first count and skips other operations. Also we call env.execute().
How can we solve this problem?
-Ebru
Thanks Fabian and Tony for the info. It's very helpful.
Looks like the general approach is to implement a job topology containing
parameterized (CoXXXMapFunction) operators. The user defined parameters
will be ingested using the extra input the CoXXXMapFunction take.
Ken
On Wed, Nov 29, 2017 at
I have a custom serializer for writing/reading from kafka. I am setting
this up in main with code as follows:
val kafkaConsumerProps = new Properties()
kafkaConsumerProps.setProperty("bootstrap.servers", kafka_bootstrap)
kafkaConsumerProps.setProperty("group.id",s"normalize-call-even
Hi,
you could also try increasing the heartbeat timeout via
`akka.watch.heartbeat.pause`. Maybe this helps to overcome the GC pauses.
Cheers,
Till
On Wed, Nov 29, 2017 at 12:41 PM, T Obi wrote:
> Warnings of Datanode appeared not in all cases of timeout. They seem
> to be raised just by timeou
Hi all,
I've just come across a FlinkKafkaProducer misconfiguration issue especially
when a FlinkKafkaProducer is created without specifying a kafka partitioner
then a FlinkFixedPartitioner instance is used, and all messages end up in a
single kafka partition (in case I have a single task manage
Warnings of Datanode appeared not in all cases of timeout. They seem
to be raised just by timeout while snapshotting.
We output GC logs on taskmanagers and found that someone kicks
System.gc() every an hour.
So a full GC runs every an hour, and it takes about a minute or more
in our cases...
When
Hi Wangsan,
I opened an issue to document the behavior properly in the future
(https://issues.apache.org/jira/browse/FLINK-8169). Basically, both your
event-time and processing-time timestamps should be GMT. We plan to
support offsets for windows in the future
(https://issues.apache.org/jira/
Hi Oriol,
This estimation is not accurate and the whole plan is a bit outdated.
This was based on an outdated time-based release model that the community tried
but without the expected results,
so we changed it.
You can follow the release voting for 1.4 in the dev mailing list. And the
archived
Thanks, it helped a lot!
But I've seen on
https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+and+Feature+Plan#FlinkReleaseandFeaturePlan-Flink1.4
that they estimated releasing 1.4 at September. Do you know if it will be
released this year or we may have to wait longer?
Thanks a
Another example is King's RBEA platform [1] which was built on Flink.
In a nutshell, RBEA runs a single large Flink job, to which users can add
queries that should be computed.
Of course, the query language is restricted because they queries must match
on the structure of the running job.
Hope thi
The monitoring REST interface provides detailed stats about a job, its
tasks, and processing verticies including their start and end time [1].
However, it is not trivial to make sense of the execution times because
Flink uses pipelined shuffles by default.
That means that the execution of multiple
Hi Oriol,
As you may have seen form the mailing list we are currently in the process of
releasing Flink 1.4. This is going
to be a hadoop-free distribution which means that it should work with any
hadoop version, including Hadoop 2.9.0.
Given this, I would recommend to try out the release cand
Hi Timo,
What I am doing is extracting a timestamp field (may be string format as
“2017-11-28 11:00:00” or a long value base on my current timezone) as Event
time attribute. So In timestampAndWatermarkAssigner , for string format I
should parse the data time string using GMT, and for long value
Hi, I'm currently working on designing a data-processing cluster, and one of
the distributed processing tools we want to use is Flink.
As we're creating our cluster from barebones, without relying on any Hadoop
distributions such as Hortonworks or Cloudera, we want to use Flink with Hadoop
2.9
Hi Wangsan,
currently the timestamps in Flink SQL do not depend on a timezone. All
calculations happen on the UTC timestamp. This also guarantees that an
input with Timestamp.valueOf("XXX") remains consistent when parsing and
outputing it with toString().
Regards,
Timo
Am 11/29/17 um 3:43
Hey Dominik,
yes, we should definitely add this to the docs.
@Nico: You recently updated the Flink S3 setup docs. Would you mind
adding these hints for eu-central-1 from Steve? I think that would be
super helpful!
Best,
Ufuk
On Tue, Nov 28, 2017 at 10:00 PM, Dominik Bruhn wrote:
> Hey Stephan
27 matches
Mail list logo