What about xcpu?

On Wed, Mar 4, 2009 at 12:33 PM, hugo rivera <uai...@gmail.com> wrote:
> you are right. I was totally confused at the beggining.
> Thanks a lot.
>
> 2009/3/4, Vincent Schut <sc...@sarvision.nl>:
>> hugo rivera wrote:
>>
>> > The cluster has torque installed as the resource manager. I think it
>> > runs of top of pbs (an older project).
>> > As far as I know now I just have to call a qsub command to submit my
>> > jobs on a queue, then the resource manager allocates a processor in
>> > the cluster for my process to run till is finished.
>> >
>>
>>  Well, I don't know torque neither pbs, but I'm guessing that when you
>> submit a job, this job will be some program or script that is run on the
>> allocated processor? If so, your initial question of forking vs threading is
>> bogus. Your cluster manager will run (exec) your job, which if it is a
>> python script will start a python interpreter for each job. I guess that's
>> the overhead you get when running a flexible cluster system, flexible
>> meaning that it can run any type of job (shell script, binary executable,
>> python script, perl, etc.).
>>  However, your overhead of starting new python processes each time may seem
>> significant when viewed in absolute terms, but if each job processes lots of
>> data and takes, as you said, 5 min to run on a decent processor, don't you
>> think the startup time for the python process would become non-significant?
>> For example, on a decent machine here, the first time python takes 0.224
>> secs to start and shutdown immediately, and consequetive starts take only
>> about 0.009 secs because everything is still in memory. Let's take the 0.224
>> secs for a worst case scenario. That would be approx 0.075 percent of your
>> job execution time. Now lets say you have 6 machines with 8 cores each and
>> perfect scaling, all your jobs would take 6000 / (6*8) *5min = 625 minutes
>> (10 hours 25 mins) without python starting each time, and 625 minutes and 28
>> seconds with python starting anew each job. Don't you think you could just
>> live with these 28 seconds more? Just reading this message might already
>> have taken you more than those 28 seconds...
>>
>>  Vincent.
>>
>>
>>
>> > And I am not really sure if I have access to all the nodes, so I can
>> > install pp on each one of them.
>> >
>> > 2009/3/4, Vincent Schut <sc...@sarvision.nl>:
>> >
>> > > hugo rivera wrote:
>> > >
>> > >
>> > > > Thanks for the advice.
>> > > > Nevertheless I am in no position to decide what pieces of software the
>> > > > cluster will run, I just have to deal with what I have, but anyway I
>> > > > can suggest other possibilities.
>> > > >
>> > > >
>> > >  Well, depends on how you define 'software the cluster will run'. Do you
>> > > mean cluster management software, or really any program or script or
>> python
>> > > module that needs to be installed on each node? Because for pp, you
>> won't
>> > > need any cluster software. pp is just some python module and helper
>> scripts.
>> > > You *do* need to install this (pure python) module on each node, yes,
>> but
>> > > that's it, nothing else needed.
>> > >  Btw, you said 'it's a small cluster, about 6 machines'. Now I'm not an
>> > > expert, but I don't think you can do threading/forking from one machine
>> to
>> > > another (on linux). So I suppose there already is some cluster
>> management
>> > > software involved? And while you appear to be "in no position to decide
>> what
>> > > pieces of software the cluster will run", you might want to enlighten us
>> on
>> > > what this cluster /will/ run? Your best solution might depend on that...
>> > >
>> > >  Cheers,
>> > >  Vincent.
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>>
>>
>>
>
>
> --
> Hugo
>
>

Reply via email to