I have the same situation. If CSV is splittable, we could use SDF.
ᐧ

On Mon, Jul 23, 2018 at 1:38 PM Raghu Angadi <[email protected]> wrote:

> It might be simpler to discuss if you replicate the question here.
>
> Are your CSV files splittable? Otherwise Flink/Dataflow runners would not
> load the entire file into memory. This is a streaming application, right?
> MatchAll in FileIO.java is used in TextIO, AvroIO etc to read files
> continuously in streaming applications. It is built on SDF and allows
> reading smaller chunks of the file (as long as the file is splittable).
>
> Raghu.
>
>
> On Mon, Jul 23, 2018 at 7:16 AM Andrew Pilloud <[email protected]>
> wrote:
>
>> Hi Kelsey,
>>
>> I posted a reply on stackoverflow. It sounds like you might be using the
>> DirectRunner, which isn't meant to handle datasets that are too big to fit
>> into memory. If that is the case, have you tried the Flink local runner or
>> the Dataflow runner?
>>
>> Andrew
>>
>> On Mon, Jul 23, 2018 at 4:06 AM Kelsey RIDER <
>> [email protected]> wrote:
>>
>>> Hello,
>>>
>>>
>>>
>>> SO question here :
>>> https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam
>>>
>>> Anybody have any ideas? Am I missing something?
>>>
>>>
>>>
>>> Thanks
>>> Suite à l’évolution des dispositifs de réglementation du travail, si
>>> vous recevez ce mail avant 7h00, en soirée, durant le week-end ou vos
>>> congés merci, sauf cas d’urgence exceptionnelle, de ne pas le traiter ni
>>> d’y répondre immédiatement.
>>>
>>

Reply via email to