Hi,
you need to enable checkpointing for your job. Flink uses ".pending"
extensions to mark parts that have been completely written, but are not
included in a checkpoint yet.
Once you enable checkpointing, the .pending extensions will be removed
whenever a checkpoint completes.
Regards,
Urs
On 02.09.2017 02:46, Krishnanand Khambadkone wrote:
> BTW, I am using a BucketingSink and a DateTimeBucketer. Do I need to set
> any other property to move the files from .pending state.
> BucketingSink<String> sink = new
> BucketingSink<String>("hdfs://localhost:8020/flinktwitter/");sink.setBucketer(new
> DateTimeBucketer<String>("yyyy-MM-dd--HHmm"));
> On Friday, September 1, 2017, 5:03:46 PM PDT, Krishnanand Khambadkone
> <[email protected]> wrote:
>
> This message is eligible for Automatic Cleanup! ([email protected]) Add
> cleanup rule | More info
> Hi, I have written a small program that uses a Twitter input stream and a
> HDFS output sink. When the files are written to HDFS each part file in the
> directory has a .pending extension. I am able to cat the file and see the
> tweet text. Is this normal for the part files to have .pending extension.
>
> -rw-r--r-- 3 user supergroup 46399 2017-09-01 16:35
> /flinktwitter/2017-09-01--1635/_part-0-95.pending
>
> -rw-r--r-- 3 user supergroup 54861 2017-09-01 16:35
> /flinktwitter/2017-09-01--1635/_part-0-96.pending
>
> -rw-r--r-- 3 user supergroup 41878 2017-09-01 16:35
> /flinktwitter/2017-09-01--1635/_part-0-97.pending
>
> -rw-r--r-- 3 user supergroup 42813 2017-09-01 16:35
> /flinktwitter/2017-09-01--1635/_part-0-98.pending
>
> -rw-r--r-- 3 user supergroup 42887 2017-09-01 16:35
> /flinktwitter/2017-09-01--1635/_part-0-99.pending
>
>
>
> BTW, I am using a BucketingSink and a DateTimeBucketer. Do I need to
> set any other property to move the files from .pending state.
>
> BucketingSink<String> sink = new
> BucketingSink<String>("hdfs://localhost:8020/flinktwitter/");
> sink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm"));
>
> On Friday, September 1, 2017, 5:03:46 PM PDT, Krishnanand Khambadkone
> <[email protected]> wrote:
>
>
> Boxbe <https://www.boxbe.com/overview> This message is eligible for
> Automatic Cleanup! ([email protected]) Add cleanup rule
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3DEtlbVGf2IoFyqVd%252BYTQgoYh7IBe%252BIpOJYK7qDVCFAc0%253D%26token%3Dvrvb4I8bZMqQO%252BIQo4LNdIPzxul4NPZ3oJxE1mxcxH%252Bl4O3xClWrPt9haYNIyocLTiCZU9Hz03W2YAj7r%252BrvypJRDvZuV2DQKZIO0jWxjDDidXcdSYtJf6vQSofw8eMWiaV6575VpAnd8HTL3AsZgQ%253D%253D&tc_serial=32491392088&tc_rand=158279498&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> | More info
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=32491392088&tc_rand=158279498&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>
> Hi, I have written a small program that uses a Twitter input stream and
> a HDFS output sink. When the files are written to HDFS each part file
> in the directory has a .pending extension. I am able to cat the file
> and see the tweet text. Is this normal for the part files to have
> .pending extension.
>
> -rw-r--r-- 3 user supergroup 46399 2017-09-01 16:35
> /flinktwitter/2017-09-01--1635/_part-0-95.pending
>
> -rw-r--r-- 3 user supergroup 54861 2017-09-01 16:35
> /flinktwitter/2017-09-01--1635/_part-0-96.pending
>
> -rw-r--r-- 3 user supergroup 41878 2017-09-01 16:35
> /flinktwitter/2017-09-01--1635/_part-0-97.pending
>
> -rw-r--r-- 3 user supergroup 42813 2017-09-01 16:35
> /flinktwitter/2017-09-01--1635/_part-0-98.pending
>
> -rw-r--r-- 3 user supergroup 42887 2017-09-01 16:35
> /flinktwitter/2017-09-01--1635/_part-0-99.pending
>
>
--
Urs Schönenberger - [email protected]
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Müller
Sitz: Unterföhring * Amtsgericht München * HRB 135082