Hi all,
I’m currently using the keyed process function, I see there’s serialization
happening when I collect the object / update the object to rocksdb. For me the
performance of serialization seems to be the bottleneck.
By default, POJO serializer is used, and the timecost of collect / update to
Dear community,
this is the weekly community update thread #33. Please post any news and
updates you want to share with the community to this thread.
# Flink 1.6.0 has been released
The community released Flink 1.6.0 [1]. Thanks to everyone who made this
release possible.
# Improving tutorials
Thank you Vino & Xingcan.
@Vino: could you help explain more details on using DBMS? Would that be with
using TableAPI, or you meant directly reading DBMS data inside the
ProcessFunction?
@Xingcan: I don't know what are the benefits of using CoProcess over
RichCoFlatMap in this case.
Regarding usin
Hi Averell,
What I mean is that if you store stream_c data in an RDBMS, you can access
the RDBMS directly in the CoFlatMapFunction instead of using the Table API.
This is somewhat similar to stream and dimension table joins.
Of course, the premise of adopting this option is that the amount of data
Thanks Gary..
What could be blocking the RPC threads? Slow checkpointing?
In production we're still using a self-built Flink package 1.5-SNAPSHOT,
flink commit 8395508b0401353ed07375e22882e7581d46ac0e, and the jobs are
stable.
Now with 1.5.2 the same jobs are failing due to heartbeat timeouts ev
Vishal, from which version did you upgrade to 1.5.1? Maybe from 1.5.0
(release)? Knowing that might help narrowing down the source of this.
On Wed, Aug 15, 2018 at 11:38 AM Juho Autio wrote:
> Thanks Gary..
>
> What could be blocking the RPC threads? Slow checkpointing?
>
> In production we're s
Gary, I found another mail thread about similar issue:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Testing-on-Flink-1-5-tp19565p19647.html
Specifically I found this:
> we are observing Akka.ask.timeout error for few of our jobs (JM's
logs[2]), we tried to increase this pa
Hi John,
Watermarks cannot make progress if you have stream partitions that do not
carry any data.
What kind of source are you using?
Best,
Fabian
2018-08-15 4:25 GMT+02:00 vino yang :
> Hi Johe,
>
> In local mode, it should also work.
> When you debug, you can set a breakpoint in the getCurren
Hi Darshan,
This looks like a file system configuration issue to me.
Flink supports different file systems for S3 and there are also a few
tuning knobs.
Did you have a look at the docs for file system configuration [1]?
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.
I did some more testing.
Below is a pseudo version of by setup.
kafkaconsumer->
assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor)->
process(print1 ctx.timerService().currentWatermark()) ->
keyBy(_.someProp) ->
process(print2 ctx.timerService().currentWatermark()) ->
I am man
Hey,
The problem is that your command does start Job Manager container, but it does
not start the Task Manager . That is why you have 0 slots. Currently, the
default numberOfTaskSlots is set to the number of CPUs avaialbe on the machine.
So, You generally can to do 2 things:
1) Start Job Mana
Hi Juho,
the main thread of the RPC endpoint should never be blocked. Blocking on
that
thread is considered an implementation error. Unfortunately, without logs it
is difficult to tell what the exact problem is. If you are able to reproduce
heartbeat timeouts on your test staging environment, can
Hi Mingliang,
first of all the POJO serializer is not very performant. Tuple or Row
are better. If you want to improve the performance of a collect()
between operators, you could also enable object reuse. You can read more
about this here [1] (section "Issue 2: Object Reuse"), but make sure
y
Hi John,
I guess the source data of local are different from the cluster. And as
Fabian said, it is probably that some partitions don't carry data.
As a choice, you can set job parallelism to 1 and check the result.
Best, Hequn
On Wed, Aug 15, 2018 at 5:22 PM, John O wrote:
> I did some more t
Hello!
I'm wondering if there is anywhere I can see Flink's roadmap for Scala 2.12
support. The last email I can find on the list for this was back in
January, and the FLINK-7811[0], the ticket asking for Scala 2.12 support,
hasn't been updated in a few months.
Recently Spark fixed the ClosureCle
Hi Averell,
With the CoProcessFunction, you could get access to the time-related services
which may be useful when maintaining the elements in Stream_C and you could get
rid of type casting with the Either class.
Best,
Xingcan
> On Aug 15, 2018, at 3:27 PM, Averell wrote:
>
> Thank you Vino
Thanks Dominik, I will try that.
On Wed, Aug 15, 2018 at 3:10 AM, Dominik Wosiński wrote:
> Hey,
> The problem is that your command does start Job Manager container, but it
> does not start the Task Manager . That is why you have 0 slots. Currently,
> the default *numberOfTaskSlots* is set to th
You can also instead of defining 2 services (taskmanager and taskmanager1),
set the scale parameter on taskmanager to the number of desired slots.
Something like this:
taskmanager:
image: "${FLINK_DOCKER_IMAGE:-flink:1.5.2}"
scale: 2
expose:
- "6121"
- "6122"
- "8081"
On Wed, A
Hi there,
I am trying to run a single flink job on YARN in detached mode. as per
the docs for flink 1.4.2, I am using -yd to do that.
The problem I am having is the flink bash script doesn't terminate
execution and return until I press control + c. In detached mode, I would
expect the flink C
Hi all,
It looks to me like the OperatorSubtaskState returned from
OneInputStreamOperatorTestHarness.snapshot fails to include any timers that had
been registered via registerProcessingTimeTimer but had not yet fired when the
snapshot was saved.
Is this a known limitation of OneInputStreamOper
Hi, Madhav,
> ./flink-1.4.2/bin/flink run -m yarn-cluster *-yd* -yn 2 -yqu "default"
> -ytm 2048 myjar.jar
Modified to, ./flink-1.4.2/bin/flink run -m yarn-cluster -*d* -yn 2 -yqu
"default" -ytm 2048 myjar.jar
[image: image.png]
madhav Kelkar 于2018年8月16日周四 上午5:01写道:
> Hi there,
>
>
I dont think your exception / code was attached.
In general, this is largely depending on how your setup is. Are you trying
to setup a long-running YARN session cluster or are you trying to directly
use YARN cluster submit? [1].
We have an open-sourced project [2] with similar requirement submitti
Thank you Xingcan.
Regarding that Either, I still see the need to do TypeCasting/CaseClass
matching. Could you please help give a look?
val transformed = dog
.union(cat)
.connect(transformer)
.keyBy(r
=> r.name, r2 => r2.name)
.process(new Transfo
Hi, Flink guys,
U really to a quick release, it's fantastic !
I'v got a situation ,
window 1 is time driven, slice is 1min, trigger is 1 count
window 2 is count driven, slice is 3 count, trigger is 1count
1. Then element is out of window1 and just right into window2.
For example if there is o
Hi Marvin777,
You are wrong. It uses the Flink on YARN single job mode and should use the
"-yd" parameter.
Hi Madhav,
I seem to have found the problem, the source code of your log is here.[1]
It is based on a judgment method "isUsingInteractiveMode".
The source code for this method is here[2],
25 matches
Mail list logo