I have the same situation. If CSV is splittable, we could use SDF. ᐧ On Mon, Jul 23, 2018 at 1:38 PM Raghu Angadi <[email protected]> wrote:
> It might be simpler to discuss if you replicate the question here. > > Are your CSV files splittable? Otherwise Flink/Dataflow runners would not > load the entire file into memory. This is a streaming application, right? > MatchAll in FileIO.java is used in TextIO, AvroIO etc to read files > continuously in streaming applications. It is built on SDF and allows > reading smaller chunks of the file (as long as the file is splittable). > > Raghu. > > > On Mon, Jul 23, 2018 at 7:16 AM Andrew Pilloud <[email protected]> > wrote: > >> Hi Kelsey, >> >> I posted a reply on stackoverflow. It sounds like you might be using the >> DirectRunner, which isn't meant to handle datasets that are too big to fit >> into memory. If that is the case, have you tried the Flink local runner or >> the Dataflow runner? >> >> Andrew >> >> On Mon, Jul 23, 2018 at 4:06 AM Kelsey RIDER < >> [email protected]> wrote: >> >>> Hello, >>> >>> >>> >>> SO question here : >>> https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam >>> >>> Anybody have any ideas? Am I missing something? >>> >>> >>> >>> Thanks >>> Suite à l’évolution des dispositifs de réglementation du travail, si >>> vous recevez ce mail avant 7h00, en soirée, durant le week-end ou vos >>> congés merci, sauf cas d’urgence exceptionnelle, de ne pas le traiter ni >>> d’y répondre immédiatement. >>> >>
