Are you trying to read a growing file ? I don't think this scenario is well
supported. You can use FileIO.MatchAll.continuously() if you want to read a
growing list of files (where new files get added to a given directory).

If you are reading a large but fixed set of files then what you need is a
bounded source not an unbounded source. We do not have pre-defined a source
for reading CSV files with multi-line records (unless you can identify a
record delimiter and use TextIO with withDelimiter() option). So I'd
suggest using FileIO.match() or FileIO.matchAll() and using a custom ParDo
to read records.

Thanks,
Cham



On Mon, Jul 23, 2018 at 11:28 PM Kai Jiang <[email protected]> wrote:

> I have the same situation. If CSV is splittable, we could use SDF.
> ᐧ
>
> On Mon, Jul 23, 2018 at 1:38 PM Raghu Angadi <[email protected]> wrote:
>
>> It might be simpler to discuss if you replicate the question here.
>>
>> Are your CSV files splittable? Otherwise Flink/Dataflow runners would not
>> load the entire file into memory. This is a streaming application, right?
>> MatchAll in FileIO.java is used in TextIO, AvroIO etc to read files
>> continuously in streaming applications. It is built on SDF and allows
>> reading smaller chunks of the file (as long as the file is splittable).
>>
>> Raghu.
>>
>>
>> On Mon, Jul 23, 2018 at 7:16 AM Andrew Pilloud <[email protected]>
>> wrote:
>>
>>> Hi Kelsey,
>>>
>>> I posted a reply on stackoverflow. It sounds like you might be using the
>>> DirectRunner, which isn't meant to handle datasets that are too big to fit
>>> into memory. If that is the case, have you tried the Flink local runner or
>>> the Dataflow runner?
>>>
>>> Andrew
>>>
>>> On Mon, Jul 23, 2018 at 4:06 AM Kelsey RIDER <
>>> [email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>>
>>>>
>>>> SO question here :
>>>> https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam
>>>>
>>>> Anybody have any ideas? Am I missing something?
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Suite à l’évolution des dispositifs de réglementation du travail, si
>>>> vous recevez ce mail avant 7h00, en soirée, durant le week-end ou vos
>>>> congés merci, sauf cas d’urgence exceptionnelle, de ne pas le traiter ni
>>>> d’y répondre immédiatement.
>>>>
>>>

Reply via email to