Hi Guys, The TextIo can handle the tar.gz type double compressed files. See the code test code.
PipelineOptions optios = PipelineOptionsFactory.fromArgs(args).withValidation().create(); Pipeline p = Pipeline.create(optios); * p.apply("ReadLines", TextIO.read().from("/dataset.tar.gz"))* .apply(ParDo.of(new DoFn<String, String>(){ @ProcessElement public void processElement(ProcessContext c) { c.output(c.element()); } })) .apply(TextIO.write().to("/tmp/filout/outputfile")); p.run().waitUntilFinish(); Thanks /Saj On 16 March 2018 at 04:29, Pablo Estrada <pabl...@google.com> wrote: > Hi! > Quick questions: > - which sdk are you using? > - is this batch or streaming? > > As JB mentioned, TextIO is able to work with compressed files that contain > text. Nothing currently handles the double decompression that I believe > you're looking for. > TextIO for Java is also able to"watch" a directory for new files. If > you're able to (outside of your pipeline) decompress your first zip file > into a directory that your pipeline is watching, you may be able to use > that as work around. Does that sound like a good thing? > Finally, if you want to implement a transform that does all your logic, > well then that sounds like SplittableDoFn material; and in that case, > someone that knows SDF better can give you guidance (or clarify if my > suggestions are not correct). > Best > -P. > > On Thu, Mar 15, 2018, 8:09 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Hi >> >> TextIO supports compressed file. Do you want to read files in text ? >> >> Can you detail a bit the use case ? >> >> Thanks >> Regards >> JB >> Le 15 mars 2018, à 18:28, Shirish Jamthe <sjam...@google.com> a écrit: >>> >>> Hi, >>> >>> My input is a tar.gz or .zip file which contains thousands of tar.gz >>> files and other files. >>> I would lile to extract the tar.gz files from the tar. >>> >>> Is there a transform that can do that? I couldn't find one. >>> If not is it in works? Any pointers to start work on it? >>> >>> thanks >>> >> -- > Got feedback? go/pabloem-feedback >