; Best, Esa
>
>
>
> *From:* Fabian Hueske
> *Sent:* Tuesday, May 8, 2018 10:26 PM
>
> *To:* Esa Heikkinen
> *Cc:* user@flink.apache.org
> *Subject:* Re: Reading csv-files in parallel
>
>
>
> Hi,
>
> the Table API / SQL and the DataSet API can be used
Hi
Sorry the stupid question, but how to connect readTextFile (or readCsvFile),
MapFunction and SQL together in Scala code ?
Best, Esa
From: Fabian Hueske
Sent: Tuesday, May 8, 2018 10:26 PM
To: Esa Heikkinen
Cc: user@flink.apache.org
Subject: Re: Reading csv-files in parallel
Hi,
the Table
>
> *From:* Fabian Hueske
> *Sent:* Tuesday, May 8, 2018 2:00 PM
>
> *To:* Esa Heikkinen
> *Cc:* user@flink.apache.org
> *Subject:* Re: Reading csv-files in parallel
>
>
>
> Hi,
>
> the easiest approach is to read the CSV files linewise as regular text
&g
(state-machine-based) logic for
reading csv-files by certain order.
Esa
From: Fabian Hueske
Sent: Tuesday, May 8, 2018 2:00 PM
To: Esa Heikkinen
Cc: user@flink.apache.org
Subject: Re: Reading csv-files in parallel
Hi,
the easiest approach is to read the CSV files linewise as regular text files
t;
>
>
> Best, Esa
>
>
>
> *From:* Fabian Hueske
> *Sent:* Monday, May 7, 2018 3:48 PM
> *To:* Esa Heikkinen
> *Cc:* user@flink.apache.org
> *Subject:* Re: Reading csv-files in parallel
>
>
>
> Hi Esa,
>
> you can certainly read CSV files in p
To: Esa Heikkinen
Cc: user@flink.apache.org
Subject: Re: Reading csv-files in parallel
Hi Esa,
you can certainly read CSV files in parallel. This works very well in a batch
query.
For streaming queries, that expect data to be ingested in timestamp order this
is much more challenging, because you
Hi Esa,
you can certainly read CSV files in parallel. This works very well in a
batch query.
For streaming queries, that expect data to be ingested in timestamp order
this is much more challenging, because you need 1) read the files in the
right order and 2) cannot split files (unless you guarante
ail.com]
> *Sent:* Thursday, March 1, 2018 11:23 AM
>
> *To:* Esa Heikkinen
> *Cc:* user@flink.apache.org
> *Subject:* Re: Reading csv-files
>
>
>
> Hi Esa,
>
> IMO, the easiest approach would be to implement a custom source function
> that reads the CSV files li
Hi
Should the custom source function be written by Java, but no Scala, like in
that RideCleansing exercise ?
Best, Esa
From: Fabian Hueske [mailto:fhue...@gmail.com]
Sent: Thursday, March 1, 2018 11:23 AM
To: Esa Heikkinen
Cc: user@flink.apache.org
Subject: Re: Reading csv-files
Hi Esa,
IMO
ueske [mailto:fhue...@gmail.com]
> *Sent:* Tuesday, February 27, 2018 11:27 PM
> *To:* Esa Heikkinen
> *Cc:* user@flink.apache.org
> *Subject:* Re: Reading csv-files
>
>
>
> Yes, that is mostly correct.
> You can of course read files in parallel, assign watermarks, and
do not know
better. I also tried Spark, but it also had its own problems. For example CEP
is not good in Spark than in Flink.
Best, Esa
From: Fabian Hueske [mailto:fhue...@gmail.com]
Sent: Tuesday, February 27, 2018 11:27 PM
To: Esa Heikkinen
Cc: user@flink.apache.org
Subject: Re: Reading csv
Yes, that is mostly correct.
You can of course read files in parallel, assign watermarks, and obtain a
DataStream with correct timestamps and watermarks.
If you do that, you should ensure that each parallel source tasks reads the
files in the order of increasing timestamps.
As I said before, you ca
Hi
Thanks for the answer. All csv-files are already present and they will
not change during the processing.
Because Flink can read many streams in parallel, i think it is also
possbile to read many csv-files in parallel.
From what i have understand, it is possible to convert csv-files to
Hi Esa,
Reading records from files with timestamps that need watermarks can be
tricky.
If you are aware of Flink's watermark mechanism, you know that records
should be ingested in (roughly) increasing timestamp order.
This means that files usually cannot be split (i.e, need to be read by a
single
14 matches
Mail list logo