Hi Kant,
The stream-stream outer joins are work in progress now(left/right/full),
and will probably be ready before the end of this month. You can check the
progress from[1].
Best, Hequn
[1] https://issues.apache.org/jira/browse/FLINK-5878
On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui wrote:
>
Hi Gordon,
The issue really is we are trying to avoid checkpointing as datasets are really
heavy and all of the states are really transient in a few of our apps (flushed
within few seconds). So high volume/velocity and transient nature of state make
those app good candidates to just not have ch
Hi Ashish,
Could you elaborate a bit more on why you think the restart of all operators
lead to data loss?
When restart occurs, Flink will restart the job from the latest complete
checkpoint.
All operator states will be reloaded with state written in that checkpoint,
and the position of the input
Hi Filipe,
What Gordon mentioned is correct. Did you manage to fix the issue?
>From your code snippet, it looks like that the `Schema` field may not be
serializable. Could you double check that?
Cheers,
Gordon
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Hi Mu,
You mentioned that the job stopped after the "n/a" topic error, but the job
failed to recover.
What exception did you encounter in the restart executions? Was it the same
error?
This would verify if we actually should be removing more than one of these
special MARKER partition states.
On t
Hi Kant,
I suppose you refer to the stream join in SQL/Table API since the outer join
for windowed-streams can always be achieved with the `JoinFunction` in
DataStream API.
There are two kinds of stream joins, namely, the time-windowed join and the
non-windowed join in Flink SQL/Table API [1,
Hi Pawel,
The data transfer process on sender side is in the following way:operator
collect record --> serilize to flink buffer --> copy to netty buffer --> flush
to socket
On receiver side: socket --> netty --> flink buffer --> deserialize to record
--> operator process
On receiver side, if the
Hi Aljoscha and Robert,
You guys are right.
I resubmit the application with # session window tasks equal to # Kafka sink
tasks.
I never thought that multiple different Kafka tasks can write to the same
partition.
Initially, I do not set the default parallelism and I explicitly set #
partitions
Hi guys,
I am using flink 1.4.1 and I am working with the rest api.
If I run a flink job through the command line (./bin/flink run job.jar) is it
uploaded to the folder set in variable jobmanager.web.upload.dir? It seems no.
So, through the rest api I can cancel the job creating a savepoint, but
Hi Nico,
Thanks for your prompt response.
I'm using Flink 1.3.0 for this job.
Please let me know if you need more information.
Best regards,
Mu
On Tue, Mar 6, 2018 at 10:17 PM, Nico Kruber wrote:
> Hi Mu,
> which version of flink are you using? I checked the latest branches for
> 1.2 - 1.5 t
Hi there
I Have a CSV file with the timestamp deconstructed into 3 fields and I was
wondering what is the best way to specify the those 3 fields are the event
time ? Should I make extend CsvTableSource and do the preprocessing or can
CsvTableSource.builder() handle it. Or is there a better
Can you explain how back pressure affect the source in flink? I read the
great article
https://data-artisans.com/blog/how-flink-handles-backpressure and got the
idea but I would like to know more details. Let's consider
org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext
Related source code:
https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/start-cluster.sh#L40
On Wed, Mar 7, 2018 at 2:11 AM, Yesheng Ma wrote:
> Hi Nico,
>
> Thanks for your reply. My major concern is actually the `-l` argument.
> The command I executed is: `nohup /bin
Hi Nico,
Thanks for your reply. My major concern is actually the `-l` argument.
The command I executed is: `nohup /bin/bash -x -l
"/state/partition1/ysma/flink-1.4.1/bin/jobmanager.sh" start cluster
dell-01.epcc 8091`, with and without the `-l` argument (the script in
Flink's bin directory uses th
The 010 consumer extends 09, so I'd guess whatever code is reporting sees
the FlinkKafkaConsumer010 as its superclass.
I've seen this error a bunch, and it's because MyDeserializationSchema
isn't serializable, or likely one of its fields is not serializable, or one
of the fields of its fields - yo
Hi All,
Does Flink support stream-stream outer joins in the latest version?
Thanks!
Hi Lukas,
those are akka-internal names that you don't have to worry about.
It looks like your TaskManager cannot reach the JobManager.
Is 'jobmanager.rpc.address' configured correctly on the TaskManager? And
is it reachable by this name? Is port 6123 allowed through the firewall?
Are you sure th
Hi Yesheng,
`nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit
strange since it should (imho) be an absolute path towards flink.
What you could do to diagnose further, is to try to run the ssh command
manually, i.e. figure out what is being executed by calling
bash -x ./bin/start
Thanks for the reply, Nico.
I've been testing with OffsetCommitMode.ON_CHECKPOINTS, and I can confirm
that this fixes the issue -- even if a single commit time out when
communicating with Kafka, subsequent offset commits are still successful.
--
Sent from: http://apache-flink-user-mailing-list-
Hi Edward,
looking through the Kafka code, I do see a path where they deliberately
do not want recursive retries, i.e. if the coordinator is unknown. It
seems like you are getting into this scenario.
I'm no expert on Kafka and therefore I'm not sure on the implications or
ways to circumvent/fix th
That depends on your job and the setup.
Remember that all operators will write their checkpoint data into that file
system.
If the state grows very large and only have an NFS with little write
performance, it might be a problem. But the same would apply to HDFS as
well.
2018-03-06 2:51 GMT-08:00 J
Hi Miki,
I'm no expert on the Kubernetes part, but could that be related to
https://github.com/kubernetes/kubernetes/issues/6667?
I'm not sure this is an Akka issue: if it cannot communicate with some
address it basically blocks it from further connection attempts for a
given time (here 5 seconds)
Hi Dongwon,
I think there is currently no way of ensuring that tasks are spread out across
different machines because the scheduling logic does not take into account what
machine a slot is on. I currently see two workarounds:
- Let all operations have the same parallelism and only have 8 slots
Hi Mu,
which version of flink are you using? I checked the latest branches for
1.2 - 1.5 to look for findLeaderForPartitions at line 205 in
Kafka08Fetcher but they did not match. From what I can see in the code,
there is a MARKER partition state with topic "n/a" but that is
explicitly removed from
Hi,
I have encountered a wired problem.
After I start the job for several days, Flink gave me the following error:
*java.lang.RuntimeException: Unable to find a leader for partitions:
[Partition: KafkaTopicPartition{topic='n/a', partition=-1},
KafkaPartitionHandle=[n/a,-1], offset=(not set)]*
*
Thanks Fabian.
Will there be any performance issues if I use NFS as the shared filesystem
(as compared to HDFS or S3)?
Jayant Ameta
On Mon, Mar 5, 2018 at 10:31 PM, Fabian Hueske wrote:
> Yes, that is correct.
>
> 2018-03-05 8:57 GMT-08:00 Jayant Ameta :
>
>> Oh! Somehow I missed while reading
Hi ,
Im running flink jobs on kubernetes after a day or so.
the task manager and job managerlosing connection and i have to
restart earthing .
Im assuming that one of the pods crashed and when now pod start he cant
find the job manager ?
Also i saw that is an Akka issue... and it wiil be fi
There are still some upcoming changes for the network stack, but most of
the heavy stuff it already through - you may track this under
https://issues.apache.org/jira/browse/FLINK-8581
FLIP-6 is somewhat similar and currently only undergoes some stability
improvements/bug fixing. The architectural
Hey,
Indeed checkpointing iterations and dealing with closed sources are orthogonal
issues, that is why the latter is not part of FLIP-15. Though, you kinda need
both to have meaningful checkpoints for jobs with iterations.
One has to do with correctness (checkpointing strongly connected compone
Hi Ken,
sorry, I was mislead by the fact that you are using iterations and those
were only documented for the DataSet API.
Running checkpoints with closed sources sounds like a more general thing
than being part of the iterations rework of FLIP-15. I couldn't dig up
anything on jira regarding this
Hi
Thank you, it worked, but there was another problem now in same example.
How to use .filter():
val table = tEnv
.scan("customers")
.filter('name.isNotNull && 'last_update > "2016-01-01 00:00:00".toTimestamp)
.select('id, 'name.lowerCase(), 'prefs)
Error in compiling: "Value > is not member o
31 matches
Mail list logo