bug#36130: split bug

2019-06-07 Thread Assaf Gordon
Hello, On Fri, Jun 07, 2019 at 09:48:44PM -0400, Heather Wick wrote: > Yes, sorry, I should have specified that I already checked that the > original fastq files are indeed paired and sorted with the same number of > lines and same starting/ending IDs, narrowing down the issue to a problem > with

bug#36130: split bug

2019-06-07 Thread Heather Wick
Hi, Yes, sorry, I should have specified that I already checked that the original fastq files are indeed paired and sorted with the same number of lines and same starting/ending IDs, narrowing down the issue to a problem with split. ~ Heather (base) [hwick@zappalogin ~]$ zcat MH2_R2.fastq.gz | wc

bug#36130: split bug

2019-06-07 Thread Assaf Gordon
Hello, On Fri, Jun 07, 2019 at 02:23:15PM -0400, Heather Wick wrote: > I am using split to split up some large, paired fastq files [...]: > > zcat MH1_R1.fastq.gz | split - -l 4000 DHT_R1_ > zcat MH1_R2.fastq.gz | split - -l 4000 DHT_R2_ > > This creates 96 chunks for the R1 and 95 chu

bug#35291: [PATCH] split: fix incorrect suffix length computation

2019-06-07 Thread Johannes Altmanninger
Does anyone have time to review this? I think it's an evident bug. I can try to improve the clarity of the patch if needed. On Mon, Apr 15, 2019 at 08:05:34PM +0200, Johannes Altmanninger wrote: > * src/split.c (set_suffix_length): suffix_needed is now computed > to be the equivalent of ceil(log(n

bug#36130: split bug

2019-06-07 Thread Heather Wick
Hello, I am using split to split up some large, paired fastq files (nearly 4 billion lines each). I am using the -l flag to split into files of 10 million reads (40 million lines) each and though the fastq files have matched and sorted reads, split is creating different numbers of split files for t