Re: AvroParquetWriter issues writing to S3

2020-04-17 Thread Arvid Heise
Hi Diogo, I saw similar issues already. The root cause is always users actually not using any Flink specific stuff, but going to the Parquet Writer of Hadoop directly. As you can see in your stacktrace, there is not one reference to any Flink class. The solution usually is to use the respective F

Re: AvroParquetWriter issues writing to S3

2020-04-16 Thread Diogo Santos
Hi Till, definitely seems to be a strange issue. The first time the job is loaded (with a clean instance of the Cluster) the job goes well, but if it is canceled or started again the issue came. I built an example here https://github.com/congd123/flink-s3-example You can generate the artifact o

Re: AvroParquetWriter issues writing to S3

2020-04-16 Thread Till Rohrmann
For future reference, here is the stack trace in an easier to read format: Caused by: java.lang.NoClassDefFoundError: org/joda/time/format/DateTimeParserBucket at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:825 at com.amazonaws.util.DateUtils.parseRFC822Date(DateUtil

Re: AvroParquetWriter issues writing to S3

2020-04-16 Thread Till Rohrmann
Hi Diogo, thanks for reporting this issue. It looks quite strange to be honest. flink-s3-fs-hadoop-1.10.0.jar contains the DateTimeParserBucket class. So either this class wasn't loaded when starting the application from scratch or there could be a problem with the plugin mechanism on restarts. I'

AvroParquetWriter issues writing to S3

2020-04-15 Thread Diogo Santos
Hi guys, I'm using AvroParquetWriter to write parquet files into S3 and when I setup the cluster (starting fresh instances jobmanager/taskmanager etc), the scheduled job starts executing without problems and could write the files into S3 but if the job is canceled and starts again the job throws t