On Sat, Oct 30, 2010 at 1:06 AM, Shrinivasan T <[email protected]> wrote:
> I have to work on some huge text files that are around 2GB - 10GB.
> Have to read some content on those files and have to rewrite them.
> Mostly have to cut/copy/paste/edit the contents in random areas.
>
> sed, awk kind of utilities can not be used as the data is not under
> regex and the manual operation/verification is must.

(Can't help wondering how long it would take someone to interactively
edit 2GB text files!)

Without knowing anything about the layout of your text files, the simplest
suggestion would be to split up the files into smaller chunks, perform
your edits, and join them in the end.


> I have a 2GB RAM machines.
> Opening in vim takes much time and every action takes 20-30 min.

Vim does a lot of things in background -- maintain undo database,
syntax highlighting, automatic indenting, etc.  To edit large files
in vim see this:
http://www.vim.org/scripts/script.php?script_id=1506


> How can I create a cluster of RAMs from those idle machines, so that I
> can work on those huge files easily?

Will not help.  You can't really cluster RAM unless you have a custom
interconnect hardware as on super computers (eg. NUMA, RDMA).

With commodity hardware the latency in moving multiple GB of data
between your computers will defeat any expected speed improvements.


Main bottleneck is when you write these files out to disk.  In Unix you
can only extend the file in the end, if you want to insert data in the
beginning/middle of a file, you have to rewrite all data following your
insertion point.  Splitting large files into smaller sizes helps.

- Raja
_______________________________________________
ILUGC Mailing List:
http://www.ae.iitm.ac.in/mailman/listinfo/ilugc

Reply via email to