date:20221206

Re: [PySpark] Reader/Writer for bgzipped data

2022-12-06 Thread Oliver Ruebenacker

Thank you, yes, it would be great if this could be extended to use an index. In our case, we're reading files from Amazon S3. S3 does offer the option to request only a chunk out of a file, and any efficient solution would need to use this rather than downloading the file multiple times. On

Re: [PySpark] Reader/Writer for bgzipped data

2022-12-06 Thread Holden Karau

Take a look at https://github.com/nielsbasjes/splittablegzip :D On Tue, Dec 6, 2022 at 7:46 AM Oliver Ruebenacker < oliv...@broadinstitute.org> wrote: > > Hello Holden, > > Thank you for the response, but what is "splittable gzip"? > > Best, Oliver > > On Tue, Dec 6, 2022 at 9:22 AM H

Re: [PySpark] Reader/Writer for bgzipped data

2022-12-06 Thread Oliver Ruebenacker

Hello Holden, Thank you for the response, but what is "splittable gzip"? Best, Oliver On Tue, Dec 6, 2022 at 9:22 AM Holden Karau wrote: > There is the splittable gzip Hadoop input format, maybe someone could > extend that to use support bgzip? > > On Tue, Dec 6, 2022 at 1:43 PM Ol

Re: [PySpark] Reader/Writer for bgzipped data

2022-12-06 Thread Holden Karau

There is the splittable gzip Hadoop input format, maybe someone could extend that to use support bgzip? On Tue, Dec 6, 2022 at 1:43 PM Oliver Ruebenacker < oliv...@broadinstitute.org> wrote: > > Hello Chris, > > Yes, you can use gunzip/gzip to uncompress a file created by bgzip, but > to s

Re: [PySpark] Reader/Writer for bgzipped data

2022-12-06 Thread Oliver Ruebenacker

Hello Chris, Yes, you can use gunzip/gzip to uncompress a file created by bgzip, but to start reading from somewhere other than the beginning of the file, you would need to use an index to tell you where the blocks start. Originally, a Tabix index was used and is still the popular choice, a

Re: [PySpark] Reader/Writer for bgzipped data

Re: [PySpark] Reader/Writer for bgzipped data

Re: [PySpark] Reader/Writer for bgzipped data

Re: [PySpark] Reader/Writer for bgzipped data

Re: [PySpark] Reader/Writer for bgzipped data

5 matches

Site Navigation

Mail list logo

Footer information