Hi Curt,
Could you try if it works by reducing python.fn-execution.bundle.size to
1000 or 100?
Regards,
Dian
On Thu, Oct 14, 2021 at 2:47 AM Curt Buechter wrote:
> Hi guys,
> I'm still running into this problem. I checked the logs, and there is no
> evidence that the python process crashed. I
Hi guys,
I'm still running into this problem. I checked the logs, and there is no
evidence that the python process crashed. I checked the process IDs and
they are still active after the error. No `killed process` messages in
/var/log/messages.
I don't think it's necessarily related to checkpointin
Guess my last reply didn't go through, so here goes again...
Possibly, but I don't think so. Since I submitted this, I have done some
more testing. It works fine with file system or memory state backends, but
not with rocksdb. I will try again and check the logs, though.
I've also tested rocksdb c
PS: there are more information about this configuration in
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/python/python_config/#python-fn-execution-bundle-size
> 2021年9月24日 上午10:07,Dian Fu 写道:
>
> I agree with Roman that it seems that the Python process has crashed.
>
> Be
I agree with Roman that it seems that the Python process has crashed.
Besides the suggestions from Roman, I guess you could also try to configure the
bundle size to smaller value via “python.fn-execution.bundle.size”.
Regards,
Dian
> 2021年9月24日 上午3:48,Roman Khachatryan 写道:
>
> Hi,
>
> Is it
Hi,
Is it possible that the python process crashed or hung up? (probably
performing a snapshot)
Could you validate this by checking the OS logs for OOM killer
messages or process status?
Regards,
Roman
On Wed, Sep 22, 2021 at 6:30 PM Curt Buechter wrote:
>
> Hi,
> I'm getting an error after ena
Hi,
I'm getting an error after enabling checkpointing in my pyflink application
that uses a keyed stream and rocksdb state.
Here is the error message:
2021-09-22 16:18:14,408 INFO
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend [] -
Closed RocksDB State Backend. Cleaning up Rock
3 not 1.11.1.
>
> [1] https://issues.apache.org/jira/browse/FLINK-16753
>
> Best
> Yun Tang
> --
> *From:* Dan Hill
> *Sent:* Tuesday, April 27, 2021 7:50
> *To:* Yun Tang
> *Cc:* Robert Metzger ; user
> *Subject:* Re: Checkpoint error - "The jo
n Tang
Cc: Robert Metzger ; user
Subject: Re: Checkpoint error - "The job has failed"
Hey Yun and Robert,
I'm using Flink v1.11.1.
Robert, I'll send you a separate email with the logs.
On Mon, Apr 26, 2021 at 12:46 AM Yun Tang
mailto:myas...@live.com>> wrote:
Hi Dan,
Flink-1.10.3.
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-16753
>
> Best
> Yun Tang
> --
> *From:* Robert Metzger
> *Sent:* Monday, April 26, 2021 14:46
> *To:* Dan Hill
> *Cc:* user
> *Subject:* Re: Checkpoint error - "
Hill
Cc: user
Subject: Re: Checkpoint error - "The job has failed"
Hi Dan,
can you provide me with the JobManager logs to take a look as well? (This will
also tell me which Flink version you are using)
On Mon, Apr 26, 2021 at 7:20 AM Dan Hill
mailto:quietgol...@gmail.com>>
Hi Dan,
can you provide me with the JobManager logs to take a look as well? (This
will also tell me which Flink version you are using)
On Mon, Apr 26, 2021 at 7:20 AM Dan Hill wrote:
> My Flink job failed to checkpoint with a "The job has failed" error. The
> logs contained no other recent e
My Flink job failed to checkpoint with a "The job has failed" error. The
logs contained no other recent errors. I keep hitting the error even if I
cancel the jobs and restart them. When I restarted my jobmanager and
taskmanager, the error went away.
What error am I hitting? It looks like there
og?
>
> Also, have you enabled concurrent checkpoint?
>
> Best,
> Yun
>
>
> --Original Mail --
> *Sender:*Navneeth Krishnan
> *Send Date:*Mon Mar 8 13:10:46 2021
> *Recipients:*Yun Gao
> *CC:*user
> *Subject:*Re: Re: Checkpoint
:46 2021
Recipients:Yun Gao
CC:user
Subject:Re: Re: Checkpoint Error
Hi Yun,
Thanks for the response. I checked the mounts and only the JM's and TM's are
mounted with this EFS. Not sure how to debug this.
Thanks
On Sun, Mar 7, 2021 at 8:29 PM Yun Gao wrote:
Hi Navneeth,
It seem
*Navneeth Krishnan
> *Send Date:*Sun Mar 7 15:44:59 2021
> *Recipients:*user
> *Subject:*Re: Checkpoint Error
>
>> Hi All,
>>
>> Any suggestions?
>>
>> Thanks
>>
>> On Mon, Jan 18, 2021 at 7:38 PM Navneeth Krishnan <
>> reachnavnee.
--Original Mail --
Sender:Navneeth Krishnan
Send Date:Sun Mar 7 15:44:59 2021
Recipients:user
Subject:Re: Checkpoint Error
Hi All,
Any suggestions?
Thanks
On Mon, Jan 18, 2021 at 7:38 PM Navneeth Krishnan
wrote:
Hi All,
We are running our streaming job on flink 1.7.2 and we are
Hi All,
Any suggestions?
Thanks
On Mon, Jan 18, 2021 at 7:38 PM Navneeth Krishnan
wrote:
> Hi All,
>
> We are running our streaming job on flink 1.7.2 and we are noticing the
> below error. Not sure what's causing it, any pointers would help. We have
> 10 TM's checkpointing to AWS EFS.
>
> Asy
Hi All,
We are running our streaming job on flink 1.7.2 and we are noticing the
below error. Not sure what's causing it, any pointers would help. We have
10 TM's checkpointing to AWS EFS.
AsynchronousException{java.lang.Exception: Could not materialize
checkpoint 11 for operator Processor -> Sink
Thanks for opening the ticket. I've asked a committer who knows the
streaming sink well to take a look at the ticket.
On Fri, Apr 24, 2020 at 6:47 AM Lu Niu wrote:
> Hi, Robert
>
> BTW, I did some field study and I think it's possible to support streaming
> sink using presto s3 filesystem. I thi
Hi, Robert
BTW, I did some field study and I think it's possible to support streaming
sink using presto s3 filesystem. I think that would help user to use presto
s3 fs in all access to s3. I created this jira ticket
https://issues.apache.org/jira/browse/FLINK-17364 . what do you think?
Best
Lu
O
Cool, thanks!
On Tue, Apr 21, 2020 at 4:51 AM Robert Metzger wrote:
> I'm not aware of anything. I think the presto s3 file system is generally
> the recommended S3 FS implementation.
>
> On Mon, Apr 13, 2020 at 11:46 PM Lu Niu wrote:
>
>> Thank you both. Given the debug overhead, I might just
I'm not aware of anything. I think the presto s3 file system is generally
the recommended S3 FS implementation.
On Mon, Apr 13, 2020 at 11:46 PM Lu Niu wrote:
> Thank you both. Given the debug overhead, I might just try out presto s3
> file system then. Besides that presto s3 file system doesn't
Thank you both. Given the debug overhead, I might just try out presto s3
file system then. Besides that presto s3 file system doesn't support
streaming sink, is there anything else I need to keep in mind? Thanks!
Best
Lu
On Thu, Apr 9, 2020 at 12:29 AM Robert Metzger wrote:
> Hey,
> Others have
Hey,
Others have experienced this as well, yes:
https://lists.apache.org/thread.html/5cfb48b36e2aa2b91b2102398ddf561877c28fdbabfdb59313965f0a%40%3Cuser.flink.apache.org%3EDiskErrorException
I have also notified the Hadoop project about this issue:
https://issues.apache.org/jira/browse/HADOOP-15915
Hi LU
I'm not familiar with S3 file system, maybe others in Flink community can
help you in this case, or maybe you can also reach out to s3
teams/community for help.
Best,
Congxian
Lu Niu 于2020年4月8日周三 上午11:05写道:
> Hi, Congxiao
>
> Thanks for replying. yeah, I also found those references. How
Hi, Congxiao
Thanks for replying. yeah, I also found those references. However, as I
mentioned in original post, there is enough capacity in all disk. Also,
when I switch to presto file system, the problem goes away. Wondering
whether others encounter similar issue.
Best
Lu
On Tue, Apr 7, 2020 a
Hi
>From the stack, seems the problem is that "org.apache.flink.fs.shaded.
hadoop3.org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
find any valid local directory for s3ablock-0001-", and I googled the
exception, found there is some relative page[1], could you please make sure
there
Hi, flink users
Did anyone encounter such error? The error comes from S3AFileSystem. But
there is no capacity issue on any disk. we are using hadoop 2.7.1.
```
Caused by: java.util.concurrent.ExecutionException:
java.io.IOException: Could not open output stream for state backend
at java.u
Hi
The root cause is checkpoint error due to fail to send data to kafka during
'preCommit'. The right solution is avoid to send data to kafka unsuccessfully
which might be scope of Kafka.
If you cannot ensure the status of kafka with its client and no request for
exactly once, yo
Hi
I have the same issue.
BR
Jose
On Thu, 9 Jan 2020 at 10:28, ouywl wrote:
> Hi all:
> When I use flink 1.9.1 and produce data to Kafka 1.1.1. the error was
> happen as* log-1,code is::*
>
> input.addSink(
> new FlinkKafkaProducer(
> parameterTool.getRequired("bootstra
Hi all: When I use flink 1.9.1 and produce data to Kafka 1.1.1. the error was happen as log-1,code is::input.addSink(new FlinkKafkaProducer(parameterTool.getRequired("bootstrap.servers"),parameterTool.getRequired("output-topic"),
Thanks for the tip! I did change the jobGraph this time.
Hao Sun
Team Lead
1019 Market St. 7F
San Francisco, CA 94103
On Thu, Dec 6, 2018 at 2:47 AM Till Rohrmann wrote:
> Hi Hao,
>
> if Flink tries to recover from a checkpoint, then the JobGraph should not
> be modified and the system should
Hi Hao,
if Flink tries to recover from a checkpoint, then the JobGraph should not
be modified and the system should be able to restore the state.
Have you changed the JobGraph and are you now trying to recover from the
latest checkpoint which is stored in ZooKeeper? If so, then you can also
start
Till, Flink is automatically trying to recover from a checkpoint not
savepoint. How can I get allowNonRestoredState applied in this case?
Hao Sun
Team Lead
1019 Market St. 7F
San Francisco, CA 94103
On Wed, Dec 5, 2018 at 10:09 AM Till Rohrmann wrote:
> Hi Hao,
>
> I think you need to provide
Hi Hao,
I think you need to provide a savepoint file via --fromSavepoint to resume
from in order to specify --allowNonRestoredState. Otherwise this option
will be ignored because it only works if you resume from a savepoint.
Cheers,
Till
On Wed, Dec 5, 2018 at 12:29 AM Hao Sun wrote:
> I am us
I am using 1.7 and job cluster on k8s.
Here is how I start my job
docker-entrypoint.sh job-cluster -j
com.zendesk.fraud_prevention.examples.ConnectedStreams
--allowNonRestoredState
*Seems like --allowNonRestoredState is not honored*
=== Logs ===
java","line":"1041","message":"Restoring
Ah yes, if you used a local filesystem for backups this certainly was the
source of the problem.
On Sun, 29 May 2016 at 17:57 arpit srivastava wrote:
> I think the problem was that i was using local filesystem in a cluster.
> Now I have switched to hdfs.
>
> Thanks,
> Arpit
>
> On Sun, May 29, 2
I think the problem was that i was using local filesystem in a cluster. Now
I have switched to hdfs.
Thanks,
Arpit
On Sun, May 29, 2016 at 12:57 PM, Aljoscha Krettek
wrote:
> Hi,
> could you please provide the code of your user function that has the
> Checkpointed interface and is keeping state
Hi,
could you please provide the code of your user function that has the
Checkpointed interface and is keeping state? This might give people a
chance of understanding what is going on.
Cheers,
Aljoscha
On Sat, 28 May 2016 at 20:55 arpit srivastava wrote:
> Hi,
>
> I am using Flink on yarn clust
Hi,
I am using Flink on yarn cluster. My job was running for 2-3 days. After
that it failed with two errors
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
Error at remote task manager 'ip-xx.xx.xx.xxx'.
at
org.apache.flink.runtime.io.network.netty.PartitionR
41 matches
Mail list logo