Re: Parquet S3 Sink Part files are not rolling over with checkpoint

2020-04-09 Thread Kostas Kloudas
I would say so, yes. Also could you set the paths where you want to use Presto to "s3p", as described in [1], just to be sure that there is not ambiguity. You could also make use of [2]. And thanks for looking into it! Cheers, Kostas [1] https://ci.apache.org/projects/flink/flink-docs-stable/d

Re: Parquet S3 Sink Part files are not rolling over with checkpoint

2020-04-09 Thread Roshan Punnoose
Btw, I ran the same exact code on a local Flink cluster run with `./bin/start-cluster.sh` on my local machine. With `s3a` it did not work, the part files do not roll over; however, with the local filesystem it works perfectly. Should I be looking at the S3Committer in Flink to see if there is somet

Re: Parquet S3 Sink Part files are not rolling over with checkpoint

2020-04-09 Thread Roshan Punnoose
Nope just the s3a. I'll keep looking around to see if there is anything else I can see. If you think of anything else to try, let me know. On Thu, Apr 9, 2020, 7:41 AM Kostas Kloudas wrote: > It should not be a problem because from what you posted, you are using > "s3a" as the scheme for s3. > A

Re: Parquet S3 Sink Part files are not rolling over with checkpoint

2020-04-09 Thread Kostas Kloudas
It should not be a problem because from what you posted, you are using "s3a" as the scheme for s3. Are you using "s3p" for Presto? This should also be done in order for Flink to understand where to use the one or the other. On Thu, Apr 9, 2020 at 1:30 PM Roshan Punnoose wrote: > > Lastly, could i

Re: Parquet S3 Sink Part files are not rolling over with checkpoint

2020-04-09 Thread Roshan Punnoose
Lastly, could it be the way I built the flink image for kube? I added both the presto and Hadoop plugins On Thu, Apr 9, 2020, 7:29 AM Roshan Punnoose wrote: > Sorry realized this came off the user list by mistake. Adding the thread > back in. > > On Thu, Apr 9, 2020, 7:26 AM Roshan Punnoose wro

Re: Parquet S3 Sink Part files are not rolling over with checkpoint

2020-04-09 Thread Roshan Punnoose
Sorry realized this came off the user list by mistake. Adding the thread back in. On Thu, Apr 9, 2020, 7:26 AM Roshan Punnoose wrote: > Yes sorry, no errors on the task manager. However, I am new to flink so > don't know all the places to look for the logs. Been looking at the task > manager log

Re: Parquet S3 Sink Part files are not rolling over with checkpoint

2020-04-09 Thread Kostas Kloudas
Hi Roshan, Your logs refer to a simple run without any failures or re-running from a savepoint, right? I am asking because I am trying to reproduce it by running a modified ParquetStreamingFileSinkITCase [1] and so far I cannot. The ITCase runs against the local filesystem, and not S3, but I adde

Parquet S3 Sink Part files are not rolling over with checkpoint

2020-04-08 Thread Roshan Punnoose
Hi, I am trying to get the parquet writer to write to s3; however, the files do not seem to be rolling over. The same file "part-0-0.parquet" is being created each time. Like the 'partCounter" is not being updated? Maybe the Bucket is being recreated each time? I don't really know... Here are some