Hi, Yes, sorry, I should have specified that I already checked that the original fastq files are indeed paired and sorted with the same number of lines and same starting/ending IDs, narrowing down the issue to a problem with split. ~ Heather
(base) [hwick@zappalogin ~]$ zcat MH2_R2.fastq.gz | wc -l 3778103832 (base) [hwick@zappalogin ~]$ zcat MH2_R1.fastq.gz | wc -l 3778103832 (base) [hwick@zappalogin test_2019]$ zcat MH2_R1.fastq.gz | head -n8 | grep ^@ @A00197:48:HF2GWDMXX:1:1101:1741:1000 1:N:0:GATCAG+TCTTTCCC @A00197:48:HF2GWDMXX:1:1101:2754:1000 1:N:0:GATCAG+TCTTTCCC (base) [hwick@zappalogin test_2019]$ zcat MH2_R2.fastq.gz | head -n8 | grep ^@ @A00197:48:HF2GWDMXX:1:1101:1741:1000 2:N:0:GATCAG+TCTTTCCC @A00197:48:HF2GWDMXX:1:1101:2754:1000 2:N:0:GATCAG+TCTTTCCC (base) [hwick@zappalogin test_2019]$ zcat MH2_R1.fastq.gz | tail -n8 | grep ^@ @E00489:288:HMFWCCCXY:2:2224:29305:73106 1:N:0:GATCAG @E00489:288:HMFWCCCXY:2:2224:29325:73106 1:N:0:GATCAG (base) [hwick@zappalogin test_2019]$ zcat MH2_R2.fastq.gz | tail -n8 | grep ^@ @E00489:288:HMFWCCCXY:2:2224:29305:73106 2:N:0:GATCAG @E00489:288:HMFWCCCXY:2:2224:29325:73106 2:N:0:GATCAG On Fri, Jun 7, 2019 at 9:29 PM Assaf Gordon <assafgor...@gmail.com> wrote: > Hello, > > On Fri, Jun 07, 2019 at 02:23:15PM -0400, Heather Wick wrote: > > I am using split to split up some large, paired fastq files [...]: > > > > zcat MH1_R1.fastq.gz | split - -l 40000000 DHT_R1_ > > zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_R2_ > > > > This creates 96 chunks for the R1 and 95 chunks for R2, even though the > > orignal fastq files have the same number of reads. > > > > Do you have any suggestions for how to proceed? Perhaps zcatting and > piping > > the files is not the best way to call split? > > To help diagnose to issue better, please run the following commands > and tell us what are the results: > > 1. number of lines in each file: > > zcat MH1_R1.fastq.gz | wc -l > zcat MH1_R2.fastq.gz | wc -l > > 2. The first two sequence IDs: > > zcat MH1_R1.fastq.gz | head -n8 | grep ^@ > zcat MH1_R2.fastq.gz | head -n8 | grep ^@ > > 3. Last two sequence IDs: > > zcat MH1_R1.fastq.gz | tail -n8 | grep ^@ > zcat MH1_R2.fastq.gz | tail -n8 | grep ^@ > > These will just verify the FASTQ files are indeed paired with no > surprises. The files should have the same number of lines, > and matching sequence IDs in the first and last lines. > > regards, > - assaf > > -- Heather Wick PhD Candidate, Human Genetics Labs of Sarah Wheelan and Vasan Yegnasubramanian Institute of Genetic Medicine Johns Hopkins University School of Medicine hwi...@jhmi.edu