Re: Beam with Flink runner - Issues when writing to S3 in Parquet Format

2021-09-15 Thread Kathula, Sandeep
to S3. But as we are writing to S3 in parquet format 5 files once for every 5 minutes, its compressed and we estimate each file size to be around 100-150 MB in size. We even tried with 6 pods each with 4 CPU and 64GB of memory (32 GB going to off heap for RocksDB) but still not able

Re: Beam with Flink runner - Issues when writing to S3 in Parquet Format

2021-09-14 Thread David Morávek
ave a fixed window of 5 minutes after conversion to > PCollection and then writing to S3. We have around 320 > columns in our data. Our intention is to write large files of size 128MB or > more so that it won’t have a small file problem when reading back from > Hive. But from what we obse

Re: Beam with Flink runner - Issues when writing to S3 in Parquet Format

2021-09-14 Thread Jan Lukavský
plication which reads from Kafka, converts to parquet and write to S3 with Flink runner (Beam 2.29 and Flink 1.12). We have a fixed window of 5 minutes after conversion to PCollection and then writing to S3. We have around 320 columns in our data. Our intention is to write large files of size 128MB

Beam with Flink runner - Issues when writing to S3 in Parquet Format

2021-09-14 Thread Kathula, Sandeep
Hi, We have a simple Beam application which reads from Kafka, converts to parquet and write to S3 with Flink runner (Beam 2.29 and Flink 1.12). We have a fixed window of 5 minutes after conversion to PCollection and then writing to S3. We have around 320 columns in our data. Our intention is

Re: Writing to S3 parquet files in Blink batch mode. Flink 1.10

2020-06-17 Thread Dmytro Dragan
user@flink.apache.org Subject: Re: Writing to S3 parquet files in Blink batch mode. Flink 1.10 Hi Dmytro, Yes, Batch mode must disabled checkpoint, So StreamingFileSink can not be used in batch mode (StreamingFileSink requires checkpoint whatever formats), we are refactoring it to more generic, and ca

Re: Writing to S3 parquet files in Blink batch mode. Flink 1.10

2020-06-17 Thread Jingsong Li
Hi Dmytro, Yes, Batch mode must disabled checkpoint, So StreamingFileSink can not be used in batch mode (StreamingFileSink requires checkpoint whatever formats), we are refactoring it to more generic, and can be used in batch mode, but this is a future topic. Currently, in batch mode, for sink, we

Writing to S3 parquet files in Blink batch mode. Flink 1.10

2020-06-16 Thread Dmytro Dragan
Hi guys, In our use case we consider to write data to AWS S3 in parquet format using Blink Batch mode. As far as I see from one side to write parquet file valid approach is to use StreamingFileSink with Parquet bulk-encoded format, but Based to documentation and tests it works only with OnCheckp

Re: AvroParquetWriter issues writing to S3

2020-04-17 Thread Arvid Heise
Hi Diogo, I saw similar issues already. The root cause is always users actually not using any Flink specific stuff, but going to the Parquet Writer of Hadoop directly. As you can see in your stacktrace, there is not one reference to any Flink class. The solution usually is to use the respective F

Re: AvroParquetWriter issues writing to S3

2020-04-16 Thread Diogo Santos
Hi Till, definitely seems to be a strange issue. The first time the job is loaded (with a clean instance of the Cluster) the job goes well, but if it is canceled or started again the issue came. I built an example here https://github.com/congd123/flink-s3-example You can generate the artifact o

Re: AvroParquetWriter issues writing to S3

2020-04-16 Thread Till Rohrmann
For future reference, here is the stack trace in an easier to read format: Caused by: java.lang.NoClassDefFoundError: org/joda/time/format/DateTimeParserBucket at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:825 at com.amazonaws.util.DateUtils.parseRFC822Date(DateUtil

Re: AvroParquetWriter issues writing to S3

2020-04-16 Thread Till Rohrmann
Hi Diogo, thanks for reporting this issue. It looks quite strange to be honest. flink-s3-fs-hadoop-1.10.0.jar contains the DateTimeParserBucket class. So either this class wasn't loaded when starting the application from scratch or there could be a problem with the plugin mechanism on restarts. I'

AvroParquetWriter issues writing to S3

2020-04-15 Thread Diogo Santos
Hi guys, I'm using AvroParquetWriter to write parquet files into S3 and when I setup the cluster (starting fresh instances jobmanager/taskmanager etc), the scheduled job starts executing without problems and could write the files into S3 but if the job is canceled and starts again the job throws t

Re: Writing to S3

2018-11-15 Thread Steve Bistline
Hi Ken, Thank you for the link... I had just found this and when I removed the Hadoop dependencies ( not using in this project anyway ) things worked fine. Now just trying to figure out the credentials. Thanks, Steve On Thu, Nov 15, 2018 at 7:12 PM Ken Krugler wrote: > Hi Steve, > > This loo

Re: Writing to S3

2018-11-15 Thread Ken Krugler
Hi Steve, This looks similar to https://stackoverflow.com/questions/52009823/flink-shaded-hadoop-s3-filesystems-still-requires-hdfs-default-and-hdfs-site-con I see th

Writing to S3

2018-11-15 Thread Steve Bistline
I am trying to write out to S3 from Flink with the following code and getting the error below. Tried adding the parser as a dependency, etc. Any help would be appreciated Table result = tableEnv.sql("SELECT 'Alert ',t_sKey, t_deviceID, t_sValue FROM SENSORS WHERE t_sKey='TEMPERATURE' AND t_sValue

Adding headers to tuples before writing to S3

2017-10-23 Thread ShB
Hi, I'm working with Flink for data analytics and reporting. The use case is that, when a user requests a report, a Flink cluster does some computations on the data, generates the final report(a DataSet of tuples) and uploads the report to S3, post which an email is sent to the corresponding email

Re: NPE while writing to s3://

2017-03-02 Thread Till Rohrmann
Hi Sathi, which version of Flink are you using? Since Flink 1.2 the RollingSink is deprecated. It is now recommend to use the BucketingSink. Maybe this problem is resolved with the newer sink. Cheers, Till ​ On Thu, Mar 2, 2017 at 9:44 AM, Sathi Chowdhury < sathi.chowdh...@elliemae.com> wrote:

NPE while writing to s3://

2017-03-02 Thread Sathi Chowdhury
I get the NPE from the below code I am running this from my mac in a local flink cluster. RollingSink s3Sink = new RollingSink("s3://sc-sink1/"); s3Sink.setBucketer(new DateTimeBucketer("-MM-dd--HHmm")); s3Sink.setWriter(new StringWriter()); s3Sink.setBatchSize(200); s3Sink.setPendingPrefix(

Re: Reading and Writing to S3

2017-01-11 Thread M. Dale
-site.xml. I have also tried saving the core-site.xml in the src/main/resources folder but get the same errors. I want to know if my core-site.xml file is configured correctly for using s3a and how to have IntelliJ read the core-site.xml file? Also, are the core-site.xml configurations differe

Re: Reading and Writing to S3

2017-01-11 Thread Samra Kasim
gt; wrote: > > > Hi, > > I am new to Flink and I've written two small test projects: 1) to read > data from s3 and 2) to push data to s3. However, I am getting two different > errors for the projects relating to, i think, how the core-site.xml file is > being read. I am

Re: Reading and Writing to S3

2017-01-11 Thread M. Dale
ead the core-site.xml file? Also, are the core-site.xml configurations different for reading versus writing to s3? This is my code for reading data from s3: public class DesktopWriter { public static voidmain(String[] args) throws Exception {ExecutionEnvironment env =ExecutionEnvironment

Re: Reading and Writing to S3

2017-01-11 Thread Samra Kasim
ve the > environment variable in run configurations set to > HADOOP_HOME=path/to/dir-with-core-site.xml. I have also tried saving the > core-site.xml in the src/main/resources folder but get the same errors. I > want to know if my core-site.xml file is configured correctly for using s3a

Re: Reading and Writing to S3

2017-01-10 Thread M. Dale
le is configured correctly for using s3a and how to have IntelliJ read the core-site.xml file? Also, are the core-site.xml configurations different for reading versus writing to s3? This is my code for reading data from s3: public class DesktopWriter { public static voidmain(String[] args) t

Reading and Writing to S3

2017-01-10 Thread Samra Kasim
elliJ read the core-site.xml file? Also, are the core-site.xml configurations different for reading versus writing to s3? This is my code for reading data from s3: public class DesktopWriter { public static void main(String[] args) throws Exception { ExecutionEnvironmen