Re: File divide to feed parallel

2014-03-27 Thread Ole Tange
On Wed, Mar 26, 2014 at 9:32 PM, David wrote: > ETL programs like Ab Initio know how to tell parallel processes to split up > big files and process each part separately, even when the files are linefeed > delimited (they all agree to search up (or down) for the dividing linefeed > closest to N byt

Re: Another: Running >10 jobs in parallel on the same remote machine

2014-03-27 Thread Ole Tange
On Wed, Mar 26, 2014 at 8:25 AM, phoebus phoebus wrote: > Hello, > > Just to continue about a previous thread: 'Running >10 jobs in parallel on > the same remote machine'. > My Environment: OS (Centos 6.4 - 64 bits, GNU parallel 20140322). > > Sorry for this long and boring email but it can helps

Re: File divide to feed parallel

2014-03-27 Thread David
Ole, Yes, the idea is to level the parallel loads without a single point bottleneck of a serial reader. In the world of big data, you want the parallel processes to use their logical id to seek to the desired position in the desired starting file, find the first new record, and read through th

Re: File divide to feed parallel

2014-03-27 Thread Ole Tange
On Thu, Mar 27, 2014 at 2:32 PM, David wrote: > Ole, > > Yes, the idea is to level the parallel loads without a single point > bottleneck of a serial reader. In the world of big data, you want the > parallel processes to use their logical id to seek to the desired position > in the desired starti

Re: File divide to feed parallel

2014-03-27 Thread David
Ole, Well, I suspect that again, 'dd skip=XXX file' would read sequentially to find record delimiters. The speed and parallelism support comes from seeking and then looking for the delimiter and starting after it. I am not sure there is currently a general UNIX tool that does that. If read