On Fri, Apr 24, 2009 at 08:33:59AM -0700, ron minnich wrote:
> 
> [snipped precisions about some of my notes]
>
> Not sure what you're getting at here, but you've barely scratched the surface.

The fact that I'm not an english native speaker does not help and my
wording may be rude.
This was not intended to say that clusters have no usage or whatever.
This would be ridiculous. And I do expect that solutions that are indeed
hard to get right are still used and not chosen or thrown away depending
on the mood.

Simply that, precisely, specially in the "open source", for somebody
like me, who does not work in this area, but wants to have at least some
rough ideas just to, if not write programs that can be used efficiently
out of the box on such beasts, but at least try to avoid mistakes that
make the program a nightmare to try to use such tools (or impact the
design of the solutions, having to deal with spaghetti code), following 
what appears on the surface is disappointing, since it appears,
disappears, and the hype around some solutions is not always a
clear indicator about the real value, or emphasize something that is not
the crux. In my area, the "watershed" computation
on a grid (raster) for geographical informations is a heavy process
stuff, and processing some huge data calls for solution both on
the algorithmic side (including the implementation), and on the
processing power. So even if I'm not a specialist, and don't plan to be
(assuming I could understand the basics), I feel compelled to have at
least some ideas about the problems.

For this kind of stuff, the Plan 9 organization has given me at least
some principles and some hard facts and tools: separate the 
representation (the terminal) from the processing. Remember that 
processing is about data that may not be served by the same instance
of the OS, i.e. that the locking of data, "ensured" during processing
is, on some OS and depending on the fileserver or filesystem,
"advisory". So perhaps think differently about rights and locking. And,
no, this can not work in whatever environment or with whatever i
filesystem and fileservers. And adding Plan 9 to POSIX, showing the
differences is a great help in organizing the sources to, between
guarantedd by C, and system dependant for example.

After that, my only guidelines are that if some limited, calculus
intensive sub-tasks can be made in parallel but the whole is
interdependant, one can think about multiple threads sharing the
same address space.

But if I can design my data formats to allow independant processing of
chunks (locality in geometrical stuff is rather obvious ; and finally 
sewing all the chunks together afterwards, even with
some processing on the edges of the chunks), I can
imagine processes (tasks) distributed among distinct CPUs. In this case,
an OS can, too, launch the tasks on the same CPU with multiple cores.
At the moment, I think more on multiple tasks, than threads. 

But that's vague. I know what to avoid doing, but I'm not sure that what
I do is not to be added to the list of "don't do that" things.
-- 
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
                 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

Reply via email to