On Wed, Mar 26, 2014 at 9:32 PM, David wrote:
> ETL programs like Ab Initio know how to tell parallel processes to split up
> big files and process each part separately, even when the files are linefeed
> delimited (they all agree to search up (or down) for the dividing linefeed
> closest to N byt
On Wed, Mar 26, 2014 at 8:25 AM, phoebus phoebus wrote:
> Hello,
>
> Just to continue about a previous thread: 'Running >10 jobs in parallel on
> the same remote machine'.
> My Environment: OS (Centos 6.4 - 64 bits, GNU parallel 20140322).
>
> Sorry for this long and boring email but it can helps
Ole,
Yes, the idea is to level the parallel loads without a single point bottleneck
of a serial reader. In the world of big data, you want the parallel processes
to use their logical id to seek to the desired position in the desired starting
file, find the first new record, and read through th
On Thu, Mar 27, 2014 at 2:32 PM, David wrote:
> Ole,
>
> Yes, the idea is to level the parallel loads without a single point
> bottleneck of a serial reader. In the world of big data, you want the
> parallel processes to use their logical id to seek to the desired position
> in the desired starti
Ole,
Well, I suspect that again, 'dd skip=XXX file' would read sequentially to find
record delimiters. The speed and parallelism support comes from seeking and
then looking for the delimiter and starting after it. I am not sure there is
currently a general UNIX tool that does that. If read