Re: Issues while writing data to a parquet sink

2021-05-17 Thread Till Rohrmann
to write data to a parquet sink. I am getting > the following stack trace for some reason(Though I am not using a decimal > logical type anywhere), > > java.lang.IncompatibleClassChangeError: class org.apache.avro. > LogicalTypes$Decimal has interface org.apache.avro.LogicalT

Re: Flink 1.11 test Parquet sink

2020-07-15 Thread Flavio Pompermaier
I've just opened a ticket on JIRA: https://issues.apache.org/jira/browse/FLINK-18608 On Wed, Jul 15, 2020 at 10:10 AM Dawid Wysakowicz wrote: > Hi, > > Unfortunately this is a bug. > > The problem is in CustomizedConvertRule#convertCast as it drops the > requested nullability. It was fixed in ma

Re: Flink 1.11 test Parquet sink

2020-07-15 Thread Dawid Wysakowicz
Hi, Unfortunately this is a bug. The problem is in CustomizedConvertRule#convertCast as it drops the requested nullability. It was fixed in master as part of FLINK-13784[1]. Therefore the example works on master. Could you create a jira issue for 1.11 version? We could backport the corresponding

Re: Flink 1.11 test Parquet sink

2020-07-15 Thread Flavio Pompermaier
If I use tableEnv.fromValues(Row.of(1L, "Hello"),...) things works if I change also the query to "INSERT INTO `out` SELECT CAST(f0 AS STRING) ,f1 FROM ParquetDataset". If there is still a bug fill a proper JIRA ticket with the exact description of the problem.. Just to conclude this thread there a

Re: Flink 1.11 test Parquet sink

2020-07-14 Thread Jark Wu
I think this might be a bug in `tableEnv.fromValues`. Could you try to remove the DataType parameter, and let the framework derive the types? final Table inputTable = tableEnv.fromValues( Row.of(1L, "Hello"), // Row.of(2L, "Hello"), // Row.of(3L, ""), // Row.of(4L,

Re: Flink 1.11 test Parquet sink

2020-07-14 Thread Leonard Xu
Hi, Flavio I reproduced your issue, and I think it should be a bug. But I’m not sure it comes from Calcite or Flink shaded Calcite, Flink Table Planner module shaded calcite. Maybe Danny can help explain more. CC: Danny Best Leonard Xu > 在 2020年7月14日,23:06,Flavio Pompermaier 写道: > > If I

Re: Flink 1.11 test Parquet sink

2020-07-14 Thread Flavio Pompermaier
If I use final Table inputTable = tableEnv.fromValues( DataTypes.ROW( DataTypes.FIELD("col1", DataTypes.STRING().notNull()), DataTypes.FIELD("col2", DataTypes.STRING().notNull()) ), .. tableEnv.executeSql(// "CREATE TABLE `out` (" + "co

Re: Flink 1.11 test Parquet sink

2020-07-14 Thread Flavio Pompermaier
Sorry, obviously " 'format' = 'parquet'" + is without comment :D On Tue, Jul 14, 2020 at 4:48 PM Flavio Pompermaier wrote: > Hi to all, > I'm trying to test write to parquet using the following code but I have an > error: > > final TableEnvironment tableEnv = > DatalinksExecutionEnvironment.ge

Flink 1.11 test Parquet sink

2020-07-14 Thread Flavio Pompermaier
Hi to all, I'm trying to test write to parquet using the following code but I have an error: final TableEnvironment tableEnv = DatalinksExecutionEnvironment.getBatchTableEnv(); final Table inputTable = tableEnv.fromValues(// DataTypes.ROW(// DataTypes.FIELD("col1", DataTyp

Re: Flink Stream job to parquet sink

2020-06-29 Thread aj
ent to several internal sinks. Then you >>>> could simply add a new sink whenever a new event occurs. >>>> >>>> The first option is the easiest and the last option the most versatile >>>> (could even have different sink types mixed). >>>>

Re: Flink Stream job to parquet sink

2020-06-25 Thread Arvid Heise
t sink types mixed). >>> >>> On Tue, Jun 23, 2020 at 5:34 AM aj wrote: >>> >>>> I am stuck on this . Please give some suggestions. >>>> >>>> On Tue, Jun 9, 2020, 21:40 aj wrote: >>>> >>>>> please help with this. Any

Re: Flink Stream job to parquet sink

2020-06-25 Thread aj
gt;>>> Hello All, >>>>> >>>>> I am receiving a set of events in Avro format on different topics. I >>>>> want to consume these and write to s3 in parquet format. >>>>> I have written a below job that creates a different stream for each &

Re: Flink Stream job to parquet sink

2020-06-25 Thread Rafi Aroch
Sat, Jun 6, 2020 at 12:20 PM aj wrote: >>> >>>> Hello All, >>>> >>>> I am receiving a set of events in Avro format on different topics. I >>>> want to consume these and write to s3 in parquet format. >>>> I have written a below jo

Re: Flink Stream job to parquet sink

2020-06-24 Thread Arvid Heise
I am receiving a set of events in Avro format on different topics. I >>> want to consume these and write to s3 in parquet format. >>> I have written a below job that creates a different stream for each >>> event and fetches it schema from the confluent schema registry to creat

Re: Flink Stream job to parquet sink

2020-06-22 Thread aj
cs. I want >> to consume these and write to s3 in parquet format. >> I have written a below job that creates a different stream for each event >> and fetches it schema from the confluent schema registry to create a >> parquet sink for an event. >> This is working fine

Re: Flink Stream job to parquet sink

2020-06-09 Thread aj
rent stream for each event > and fetches it schema from the confluent schema registry to create a > parquet sink for an event. > This is working fine but the only problem I am facing is whenever a new > event start coming I have to change in the YAML config and restart the job > every t

Flink Stream job to parquet sink

2020-06-05 Thread aj
Hello All, I am receiving a set of events in Avro format on different topics. I want to consume these and write to s3 in parquet format. I have written a below job that creates a different stream for each event and fetches it schema from the confluent schema registry to create a parquet sink for

Re: S3 parquet sink - failed with S3 connection exception

2019-03-14 Thread Averell
Hi Kostas and everyone, I tried to change setFailOnCheckpointingErrors from True to False, and got the following trace in Flink GUI when the checkpoint/uploading failed. Not sure whether it would be of any help in identifying the issue. BTW, could you please help tell where to find the log file t

Re: S3 parquet sink - failed with S3 connection exception

2019-03-10 Thread Averell
Hi Kostas, and everyone, Just some update to my issue: I have tried to: * changed s3 related configuration in hadoop as suggested by hadoop document [1]: increased /fs.s3a.threads.max/ from 10 to 100, and /fs.s3a.connection.maximum/ from 15 to 120. For reference, I am having only 3 S3 sinks, wi

Re: S3 parquet sink - failed with S3 connection exception

2019-03-05 Thread Averell
Hello Kostas, Thanks for your time. I started that job from fresh, set checkpoint interval to 15 minutes. It completed the first 13 checkpoints successfully, only started failing from the 14th. I waited for about 20 more checkpoints, but all failed. Then I cancelled the job, restored from the las

Re: S3 parquet sink - failed with S3 connection exception

2019-03-05 Thread Kostas Kloudas
Hi Averell, Did you have other failures before (from which you managed to resume successfully)? Can you share a bit more details about your job and potentially the TM/JM logs? The only thing I found about this is here https://forums.aws.amazon.com/thread.jspa?threadID=130172 but Flink does not di

S3 parquet sink - failed with S3 connection exception

2019-03-04 Thread Averell
Hello everyone, I have a job which is writing some streams into parquet files in S3. I use Flink 1.7.2 on EMR 5.21. My job had been running well, but suddenly it failed to make a checkpoint with the full stack trace mentioned below. After that failure, the job restarted from the last successful ch

Parquet sink

2016-08-29 Thread Egor Mateshuk
Hi, Does anyone have an example of working Parquet Sink for Flink DataStreaming API? In Flink 1.1 a new RollingSink with custom writers was released. According to [FLINK-3637] it was done to allow working with ORC and Parquet formats. Maybe someone has already used it for Parquet? I’ve found