Re: Reading csv-files in parallel

2018-05-09 Thread Fabian Hueske
; Best, Esa > > > > *From:* Fabian Hueske > *Sent:* Tuesday, May 8, 2018 10:26 PM > > *To:* Esa Heikkinen > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-files in parallel > > > > Hi, > > the Table API / SQL and the DataSet API can be used

RE: Reading csv-files in parallel

2018-05-09 Thread Esa Heikkinen
Hi Sorry the stupid question, but how to connect readTextFile (or readCsvFile), MapFunction and SQL together in Scala code ? Best, Esa From: Fabian Hueske Sent: Tuesday, May 8, 2018 10:26 PM To: Esa Heikkinen Cc: user@flink.apache.org Subject: Re: Reading csv-files in parallel Hi, the Table

Re: Reading csv-files in parallel

2018-05-08 Thread Fabian Hueske
> I did mean, if I want to read many csv-files and I have certain > consecutive reading order of them. Is that possible and how ? > > > > Actually I want to implement upper level (state-machine-based) logic for > reading csv-files by certain order. > > > > Esa > >

RE: Reading csv-files in parallel

2018-05-08 Thread Esa Heikkinen
(state-machine-based) logic for reading csv-files by certain order. Esa From: Fabian Hueske Sent: Tuesday, May 8, 2018 2:00 PM To: Esa Heikkinen Cc: user@flink.apache.org Subject: Re: Reading csv-files in parallel Hi, the easiest approach is to read the CSV files linewise as regular text files

Re: Reading csv-files in parallel

2018-05-08 Thread Fabian Hueske
t; > > > Best, Esa > > > > *From:* Fabian Hueske > *Sent:* Monday, May 7, 2018 3:48 PM > *To:* Esa Heikkinen > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-files in parallel > > > > Hi Esa, > > you can certainly read CSV files in p

RE: Reading csv-files in parallel

2018-05-08 Thread Esa Heikkinen
To: Esa Heikkinen Cc: user@flink.apache.org Subject: Re: Reading csv-files in parallel Hi Esa, you can certainly read CSV files in parallel. This works very well in a batch query. For streaming queries, that expect data to be ingested in timestamp order this is much more challenging, because you

Re: Reading csv-files in parallel

2018-05-07 Thread Fabian Hueske
Hi Esa, you can certainly read CSV files in parallel. This works very well in a batch query. For streaming queries, that expect data to be ingested in timestamp order this is much more challenging, because you need 1) read the files in the right order and 2) cannot split files (unless you guarante

Reading csv-files in parallel

2018-05-07 Thread Esa Heikkinen
Hi I would want to read many different type csv-files (time series data) parallel using by CsvTableSource. Is that possible in Flink application ? If yes, are there exist the examples about that ? If it is not, do you have any advices how to do that ? Should I combine all csv-files to one csv-

Re: Reading csv-files

2018-03-01 Thread Fabian Hueske
ail.com] > *Sent:* Thursday, March 1, 2018 11:23 AM > > *To:* Esa Heikkinen > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-files > > > > Hi Esa, > > IMO, the easiest approach would be to implement a custom source function > that reads the CSV files li

RE: Reading csv-files

2018-03-01 Thread Esa Heikkinen
Hi Should the custom source function be written by Java, but no Scala, like in that RideCleansing exercise ? Best, Esa From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Thursday, March 1, 2018 11:23 AM To: Esa Heikkinen Cc: user@flink.apache.org Subject: Re: Reading csv-files Hi Esa, IMO

Re: Reading csv-files

2018-03-01 Thread Fabian Hueske
ueske [mailto:fhue...@gmail.com] > *Sent:* Tuesday, February 27, 2018 11:27 PM > *To:* Esa Heikkinen > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-files > > > > Yes, that is mostly correct. > You can of course read files in parallel, assign watermarks, and

RE: Reading csv-files

2018-02-28 Thread Esa Heikkinen
do not know better. I also tried Spark, but it also had its own problems. For example CEP is not good in Spark than in Flink. Best, Esa From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Tuesday, February 27, 2018 11:27 PM To: Esa Heikkinen Cc: user@flink.apache.org Subject: Re: Reading csv

Re: Reading csv-files

2018-02-27 Thread Fabian Hueske
Yes, that is mostly correct. You can of course read files in parallel, assign watermarks, and obtain a DataStream with correct timestamps and watermarks. If you do that, you should ensure that each parallel source tasks reads the files in the order of increasing timestamps. As I said before, you ca

Re: Reading csv-files

2018-02-27 Thread Esa Heikkinen
Hi Thanks for the answer. All csv-files are already present and they will not change during the processing. Because Flink can read many streams in parallel, i think it is also possbile to read many csv-files in parallel. From what i have understand, it is possible to convert csv-files to

Re: Reading csv-files

2018-02-27 Thread Fabian Hueske
Hi Esa, Reading records from files with timestamps that need watermarks can be tricky. If you are aware of Flink's watermark mechanism, you know that records should be ingested in (roughly) increasing timestamp order. This means that files usually cannot be split (i.e, need to be read by a single

Reading csv-files

2018-02-27 Thread Esa Heikkinen
I'd want to read csv-files, which includes time series data and one column is timestamp. Is it better to use addSource() (like in Data-artisans RideCleansing-exercise) or CsvSourceTable() ? I am not sure CsvTableSource() can undertand timestamps ? I have not found good examples about that.

Re: Apache Flink Reading CSV Files ,Transform and Writting Back to CSV using Paralliesm

2017-08-25 Thread Lokesh Gowda
Hi Robert my question was if I need to read and write the csv file of size which will be in gb how i can distribute the data sink to write into files 1gb exactly and since I am New to flink I am not sure about this Regards Lokesh.r On Sat, Aug 26, 2017 at 2:56 AM Robert Metzger wrote: > Hi

Re: Apache Flink Reading CSV Files ,Transform and Writting Back to CSV using Paralliesm

2017-08-25 Thread Robert Metzger
Hi Lokesh, I'm not sure if I fully understood your question. But you can not write the result in a single file from multiple writers. If you want to process the data fully distributed, you'll also have to write it distributed. On Wed, Aug 23, 2017 at 8:07 PM, Lokesh R wrote: > Hi Team, > > I am

Apache Flink Reading CSV Files ,Transform and Writting Back to CSV using Paralliesm

2017-08-23 Thread Lokesh R
Hi Team, I am using the apache flink with java for below problem statement 1.where i will read a csv file with field delimeter character ; 2.transform the fields 3.write back the data back to csv my doubts are as below 1. if i need to read the csv file of size above 50 gb what would be the app