Hi,
Dawid, great, thanks for answering.
Jeff,
flink 1.8 with default settings, standalone cluster, one job node and three
task managers nodes.
zeppelin 0.9 config
checked "Connect to existing cluster"
host: 10.219.179.16
port: 6123
create simple notebook:
%flink
ExecutionEnvironment env = Execut
Hi,
Dawid, great, thanks!
Any plans to make it stable? 1.9?
Regards,
Sergey
From: Dawid Wysakowicz [mailto:dwysakow...@apache.org]
Sent: Thursday, April 25, 2019 10:54 AM
To: Smirnov Sergey Vladimirovich (39833) ; Ken Krugler
Cc: user@flink.apache.org; d...@flink.apache.org
Subject: Re: kafka
Hi Rong,
thanks for your insights. I agree with the three points that you said. My
plan is to implement an operator that compute the Count-min sketch and
developers can assign functions to increase the estimative of the sketch
(adding more/different functions the sketch will be more precise, hence
Hi Sergey,
The point why this I flagged as beta is actually less about stability but more
about the fact that this is supposed to be more of a "power user" feature
because bad things can happen if your data is not 100% correctly partitioned in
the same way as Flink would partition it. This is w
Hi Avi,
did you have a look at the .connect() and .broadcast() API
functionalities? They allow you to broadcast a control stream to all
operators. Maybe this example [1] or other examples in this repository
can help you.
Regards,
Timo
[1]
https://github.com/ververica/flink-training-exercis
Hi Juan,
as far as I know we do not provide any concurrency guarantees for
count() or collect(). Those methods need to be used with caution anyways
as the result size must not exceed a certain threshold. I will loop in
Fabian who might know more about the internals of the execution?
Regards,
Good day,
I have a case-class defined like this:
case class MyClass(ts: Long, s1: String, s2: String, i1: Integer, i2:
Integer)
object MyClass {
val EMPTY = MyClass(0L, null, null, 0, 0)
def apply(): MyClass = EMPTY
}
My code has been running fine (I was not aware of
Hi Dawid,
I just tried to change from CoProcessFunction with onTimer() to
ProcessWindowFunction with Trigger and TumblingWindow. So I can key my
stream by (id) instead of (id, eventTime). With this, I can use
/reinterpretAsKeyedStream/, and hope that it would give better performance.
I can also us
At present, all YARN clusters adopt JAVA 7 environment.
While trying to use FLINK to handle the deployment of flow processing business
scenarios, it was found that FLINK ON YARN mode always failed to perform a
session task. The application log of YARN shows Unsupported major. minor
version 52.0
Hi Averell,
the reason for this lies in the internal serializer implementation. In
general, the composite/wrapping type serializer is responsible for
encoding nulls. The case class serialzer does not support nulls, because
Scala discourages the use of nulls and promotes `Option`. Some
seriali
Thank you Timo.
In term of performance, does the use of Option[] cause performance impact? I
guess that there is because there will be one more layer of object handling,
isn't it?
I am also confused about choosing between primitive types (Int, Long) vs
object type (Integer, JLong). I have seen ma
Currently, tuples and case classes are the most efficient data types
because they avoid the need for special null handling. Everything else
is hard to estimate. You might need to perform micro benchmarks with the
serializers you want to use if you have a very performance critical use
case. Obje
Hi David,
Thank you!
Yes, fair enough, but take for instance a BucketingSink class[1], it is a
RichFunction which employs Timeservice to execute time-based logic, which is
not directly associated with an event flow, like for example closing files
every n minutes, etc. In an AsyncFunction I int
Hi Timo,
Thanks for your answer. I was surprised to have problems calling those
methods concurrently, because I though data sets were immutable. Now I
understand calling count or collect mutates the data set, not its contents
but some kind of execution plan included in the data set.
I suggest add
Hi Steve,
(1)
The CLI action you are looking for is called "modify" [1]. However, we
want
to temporarily disable this feature beginning from Flink 1.9 due to some
caveats with it [2]. If you have objections, it would be appreciated if
you
could comment on the respective thread on
Hi Timo,
I defiantly did. but broadcasting a command and trying to address the
persisted state (I mean the state of the data stream and not the
broadcasted one) you get the exception that I wrote
(java.lang.NullPointerException: No key set. This method should not be
called outside of a keyed contex
Given a directory with input files of the following format:
/data/shard1/file1.json
/data/shard1/file2.json
/data/shard1/file3.json
/data/shard2/file1.json
/data/shard2/file2.json
/data/shard2/file3.json
Is there a way to make FileInputFormat with parallelism 2 split processing
by "shard" (folder
I'm not sure how to express my logic simply where early triggers are a
necessity.
My application has large windows (2 weeks~) where early triggering is
absolutely required. But, also, my application has mostly relatively simple
logic which can be expressed in SQL. There's a ton of duplication, lik
18 matches
Mail list logo