That should work as well.
On 20/08/2020 22:46, Vishwas Siravara wrote:
Thank you Chesnay.
Yes but I could change the staging directory by adding
-Djava.io.tmpdir=/data/flink-1.7.2/tmp to /env.java.opts /in the
flink-conf.yaml file. Do you see any problem with that?
Best,
Vishwas
On Thu, Aug 20, 2020 at 2:01 PM Chesnay Schepler <ches...@apache.org
<mailto:ches...@apache.org>> wrote:
Could you try adding this to your flink-conf.yaml?
s3.staging-directory:/usr/mware/flink/tmp
On 20/08/2020 20:50, Vishwas Siravara wrote:
Hi Piotr,
I did some analysis and realised that the temp files for s3
checkpoints are staged in /tmp although the /io.tmp.dirs /is set
to a different directory.
ls -lrth
drwxr-xr-x. 2 was was 32 Aug 20 17:52 hsperfdata_was
-rw-------. 1 was was 505M Aug 20 18:45
presto-s3-8158855975833379228.tmp
-rw-------. 1 was was 505M Aug 20 18:45
presto-s3-7048419193714606532.tmp
drwxr-xr--. 2 root root 6 Aug 20 18:46 hsperfdata_root
[was@sl73rspapd031 tmp]$
flink-conf.yaml configuration
io.tmp.dirs: /usr/mware/flink/tmp
The /tmp has only 2GB, is it possible to change the staging
directory for s3 checkpoints ?
Best,
Vishwas
On Thu, Aug 20, 2020 at 10:27 AM Vishwas Siravara
<vsirav...@gmail.com <mailto:vsirav...@gmail.com>> wrote:
Hi Piotr,
Thank you for your suggestion. I will try that, are the
temporary files created in the directory set in
/io.tmp.dirs/ in the flink-conf.yaml ? Would these files be
the same size as checkpoints ?
Thanks,
Vishwas
On Thu, Aug 20, 2020 at 8:35 AM Piotr Nowojski
<pnowoj...@apache.org <mailto:pnowoj...@apache.org>> wrote:
Hi,
As far as I know when uploading a file to S3, the writer
needs to first create some temporary files on the local
disks. I would suggest to double check all of the
partitions on the local machine and monitor available
disk space continuously while the job is running. If you
are just checking the free space manually, you can
easily miss a point of time when you those temporary
files are too big and approaching the available disk
space usage, as I'm pretty sure those temporary files are
cleaned up immediately after throwing this exception that
you see.
Piotrek
czw., 20 sie 2020 o 00:56 Vishwas Siravara
<vsirav...@gmail.com <mailto:vsirav...@gmail.com>>
napisał(a):
Hi guys,
I have a deduplication job that runs on flink 1.7,
that has some state which uses FsState backend. My TM
heap size is 16 GB. I see the below error while
trying to checkpoint a state of size 2GB. There is
enough space available in s3, I tried to upload
larger files and they were all successful. There is
also enough disk space in the local file system, the
disk utility tool does not show anything suspicious.
Whenever I try to start my job from the last
successful checkpoint , it runs into the same error.
Can someone tell me what is the cause of this issue?
Many thanks.
Note: This error goes away when I delete io.tmp.dirs
and restart the job from last checkpoint , but the
disk utility tool does not show much usage before
deletion, so I am not able to figure out what
the problem is.
2020-08-19 21:12:01,909 WARN
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory
- Could not close the state stream for
s3p://featuretoolkit.c
heckpoints/dev_dedup/9b64aafadcd6d367cfedef84706abcba/chk-189/f8e668dd-8019-4830-ab12-d48940ff5353.
1363 java.io.IOException: No space left on device
1364 at java.io.FileOutputStream.writeBytes(Native
Method)
1365 at
java.io.FileOutputStream.write(FileOutputStream.java:326)
1366 at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
1367 at
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
1368 at
java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
1369 at
java.io.FilterOutputStream.close(FilterOutputStream.java:158)
1370 at
org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3OutputStream.close(PrestoS3FileSystem.java:986)
1371 at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
1372 at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
1373 at
org.apache.flink.fs.s3.common.hadoop.HadoopDataOutputStream.close(HadoopDataOutputStream.java:52)
1374 at
org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
1375 at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.close(FsCheckpointStreamFactory.java:269)
1376 at
org.apache.flink.runtime.state.CheckpointStreamWithResultProvider.close(CheckpointStreamWithResultProvider.java:58)
1377 at
org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
1378 at
org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:250)
1379 at
org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:122)
1380 at
org.apache.flink.runtime.state.AsyncSnapshotCallable.closeSnapshotIO(AsyncSnapshotCallable.java:185)
1381 at
org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:84)
1382 at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
1383 at
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
1384 at
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:47)
1385 at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853)
1386 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
1387 at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
1388 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
1389 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
1390 at java.lang.Thread.run(Thread.java:748)
1391 Suppressed: java.io.IOException: No space left
on device
1392 at java.io.FileOutputStream.writeBytes(Native
Method)
1393 at
java.io.FileOutputStream.write(FileOutputStream.java:326)
1394 at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
1395 at
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
1396 at
java.io.FilterOutputStream.close(FilterOutputStream.java:158)
1397 at
java.io.FilterOutputStream.close(FilterOutputStream.java:159)
1398 ... 21 more
Thanks,
Vishwas