Re: [gridengine users] Strange SGE PE issue (threaded PE with 999 slots but scheduler thinks the value is 0)

2020-06-11 Thread Chris Dagdigian
e but resolved now. Regards Chris Reuti wrote on 6/11/20 4:17 PM: Hi, Any consumables in place like memory or other resource requests? Any output of `qalter -w v …` or "-w p"? -- Reuti Am 11.06.2020 um 20:32 schrieb Chris Dagdigian : Hi folks, Got a bewildering situation I&#x

[gridengine users] Strange SGE PE issue (threaded PE with 999 slots but scheduler thinks the value is 0)

2020-06-11 Thread Chris Dagdigian
Hi folks, Got a bewildering situation I've never seen before with simple SMP/threaded PE techniques I made a brand new PE called threaded: $ qconf -sp threaded pe_name    threaded slots  999 user_lists NONE xuser_lists    NONE start_proc_args    NONE stop_proc_

Re: [gridengine users] Alternatives to Son of GridEngine

2018-11-12 Thread Chris Dagdigian
My $.02 The commercial version of GE from univa is excellent. I'm working with it now. New features and excellent support as always For non-GE options the trend seems to be moving towards Slurm -- at least from what I can see in my particular industry niche Chris Taras Shapovalov

Re: [gridengine users] Son of GridEngine succession?

2018-05-12 Thread Chris Dagdigian
+1 for both the idea as well as the DODGE name "Daughter of Grid Engine" is pretty awesome. Simon Matthews May 11, 2018 at 10:14 PM You could call it "the DOGE" (Daughter of Grid Engine). Simon ___ users mailing li

Re: [gridengine users] Scheduler node rebooted - what happens to running jobs?

2018-04-20 Thread Chris Dagdigian
Running jobs continue to run. The only jobs that would be affected is if your master node was also running jobs or if the master node contained a critical dependency like an NFS file share that the running jobs needed -- but if the master node simply bounces and nothing on that host is required

Re: [gridengine users] SGE 8.1.9. email notification

2018-03-20 Thread Chris Dagdigian
Does /bin/mail exist? On a lot of systems it may actually be /usr/bin/mail Regards, Chris /* Sent via phone - apologies for typos & terseness */ > On Mar 20, 2018, at 11:31 AM, Peter Sigl wrote: > > Hi All, > > I am trying to receive email notification for job start and job completion > wi

Re: [gridengine users] RUNNING GROMACS SIMULATIONS THROUGH SCRIPT FILE

2017-08-10 Thread Chris Dagdigian
Simply put the system can't find your gromacs binary that is what the "gmx: command not found" means. Just edit your submit script to pass the full path to gmx and you should be fine -Chris Subashini K August 10, 2017 at 7:59 AM Hi sun grid engine users,

Re: [gridengine users] new error I've never seen before! ("sge_shepherd won't run -- dynamic library missing?")

2017-08-09 Thread Chris Dagdigian
To answer my own question: /opt/sge/bin/lx-amd64/sge_shepherd /opt/sge/bin/lx-amd64/sge_shepherd: error while loading shared libraries: libhwloc.so.5: cannot open shared object file: No such file or directory The answer is the hwloc-1.5-3.el6_5.x86_64 RPM ... Chris Chris Dagdigian

[gridengine users] new error I've never seen before! ("sge_shepherd won't run -- dynamic library missing?")

2017-08-09 Thread Chris Dagdigian
Sorry this is exciting, I've been using SGE forever and rarely see something new. However I see this now on AWS trying to run a new exechost on a centos box that I bound to an aws cfncluster grid: Starting execution daemon. Please wait ... sge_shepherd won't run -- dynamic library missing?

Re: [gridengine users] (resend) dealing with AD usernames that contain "@" character

2017-08-02 Thread Chris Dagdigian
Yeah short names are guaranteed unique in my environment. The new patch for SSSD allows one to define an AD domain search/preference order and I think the implication there is that if a dupe shortname is detected it will assume that the shortname belongs to the 1st domain listed in the order

Re: [gridengine users] (resend) dealing with AD usernames that contain "@" character

2017-08-02 Thread Chris Dagdigian
Thanks Reuti! I can't use the trick in that tip because we have more than one AD domain to support and that "default_ad_domain_suffix=" setting only works for one AD domain The real solution is for us to wait for the next SSSD patch to come out - they've added features that should allow uni

[gridengine users] (resend) dealing with AD usernames that contain "@" character

2017-08-01 Thread Chris Dagdigian
oops. Sent last email in HTML format which likely got stripped. Resending Hi folks, Has anyone used FreeIPA or RHEL IDM to integrate an SGE cluster into a complex active directory environment? I've got an issue where the AD integration is working fine across a pretty complex set of

[gridengine users] Dealing with AD integration and usernames that contain "@" ...

2017-08-01 Thread Chris Dagdigian
Hi folks, Has anyone used FreeIPA or RHEL IDM to integrate an SGE cluster into a complex active directory environment? I've got an issue where the AD integration is working fine across a pretty complex set of Active Directory domains and transitive trusts but the structure of our AD

Re: [gridengine users] move SoGE from berkeleydb to classic

2017-07-19 Thread Chris Dagdigian
Changing the spooling method is usually a "destroy and rebuild" operation in my experience. Roberto Nunnari July 19, 2017 at 10:56 AM Hello. A couple of months ago I installed SoGE-8.1.9 building it with -spool-berkeleydb Now I would like to move to -spoo

Re: [gridengine users] Fwd: eqw for qsub jobs

2016-09-28 Thread Chris Dagdigian
I think the "queue instance dropped because ... full" is not related to your user/job problem. The dropped message is a sign from the job placement process that the queue instance was skipped during the active host select-and-job-dispatch round because it had no more job slots free to take ne

Re: [gridengine users] Hardware thoughts?

2016-07-20 Thread Chris Dagdigian
In environments where you do tens of thousands of jobs per day or tons or really short jobs or a constant flow of jobs always active you may need a master node that is somewhat beefy. If you've never seen your head node get slammed then you can downsize. If there is a chance that your workloa

[gridengine users] prolog execution location and behavior?

2016-06-08 Thread Chris Dagdigian
Hey folks -- need my brain refreshed on prolog behavior ... Trying to figure out if a prolog script would be suitable for dramatically changing the execution environment -- doing things like NFS filesystem unmounts or chroot actions so that an incoming job would execute in the changed environ

Re: [gridengine users] docker under GE

2016-05-30 Thread Chris Dagdigian
This may not be a Univa mailing list but it is also not your mailing list. Univa folk are welcome here, as always. It's a tough line for Univa to walk between their commercial interests and the open source forks of GE that we know and love. By my view Univa has done all the right things on th

Re: [gridengine users] All queues dropped because of overload or full

2016-05-25 Thread Chris Dagdigian
Something is fundamentally broken with Grid Engine. An empty "qconf -sql" means that SGE is unaware of *any* cluster queues -- at the very least you should see the default all.q show up And also this is clear via blank "qstat -f' output -- SGE simply does not think that any compute nodes or

Re: [gridengine users] All queues dropped because of overload or full

2016-05-25 Thread Chris Dagdigian
I'd be willing to bet the output of "qstat -f -u '*' " shows that all your compute nodes are in 'au' state If there is no sge_execd process running on each compute node then Grid Engine won't work and it can't dispatch "work" to those nodes. The errors you see and the jobs pending forever

Re: [gridengine users] GE 2011.11p1: got no connection within 60 seconds

2015-12-16 Thread Chris Dagdigian
This looks and feels like an MPI job launching failure Especially as it fails exactly when it tries to cross the threshold from single chassis to multiple boxes The #1 debugging advice in this scenario is this: -- Can you definitively run on more than 12 cores OUTSIDE of grid engine? My ex

Re: [gridengine users] Queue configurations still stored in text files?

2015-11-03 Thread Chris Dagdigian
This is true but only when classic mode spooling is in use. I've been able to resurrect totally busted environments via simple text edits in the past. The useful/fixable config data in text form is why I still tend to config a majority of SGE clusters with classic spooling, even on large clu

Re: [gridengine users] At what point does the network overhead of adding additional nodes to a queue offset the benefit?

2015-09-24 Thread Chris Dagdigian
SGE is fine on 1GB fabrics and I don't know of anyone who uses 10Gb for SGE unless it's a combined network fabric that is carrying storage and application traffic along with SGE traffic on the same links. Or if you are running all new stuff with 10Gb for everything and maybe a 1GB NIC held ba

[gridengine users] quick question re formatting of complex_values in global exec host

2015-08-11 Thread Chris Dagdigian
I've got a consumable resource called "foo" that I manage via an entry in the global exechost object ("qconf -me global; set foo=100") But now due to a networking issue only 50% of my compute nodes are capable of running jobs that request this resource I'd rather not pin the complex entry t

Re: [gridengine users] command runs in grid engine but does not complete.

2015-06-08 Thread Chris Dagdigian
Most common scenario when "it works from the command line" but "it does not work in grid engine" is usually: - Different shell environment between command-line and batch execution (especially if SGE is running in POSIX_COMPLIANT mode) - Different ENV variables between CLI and batch environme

Re: [gridengine users] sanity check on usage of "-p" priority value: per-user effect or global across waitlist?

2015-04-30 Thread Chris Dagdigian
Yep I think we are going to try this and monitor it. Gut feeling is that the fairshare by user policy has so much more impact/weight on the global job waitlist that if we just have a single user doing stuff with "-p -10" and "-p -1" to distinguish between her own jobs it might actually do cl

[gridengine users] sanity check on usage of "-p" priority value: per-user effect or global across waitlist?

2015-04-30 Thread Chris Dagdigian
GE and UGE man pages are not clear about the scope of "-p" priority values when a user uses it. It's been a long time since I needed this and I wanted to confirm the scope of the behavior .. Use case: - I need to submit 100 personal jobs as "dag" with 10 jobs being slightly more important

Re: [gridengine users] load grpah

2015-04-17 Thread Chris Dagdigian
There are tons of systems for measuring and displaying load on a grid. Ganglia would give you pretty graphs of CPU usage etc while tools like php-qstat would be able to show/display info about SGE usage, queue state and pending jobs etc. Jacques Foucry

[gridengine users] Anyone have scripts for detecting users who bypass grid engine?

2015-04-09 Thread Chris Dagdigian
I'm one of the people who has been arguing for years that technological methods for stopping abuse of GE systems never work in the long term because motivated users always have more time and interest than overworked admins so it's kind of embarrassing to ask this but ... Does anyone have a s

Re: [gridengine users] best way to start new exechost in disabled (d) state during template driven install?

2015-04-08 Thread Chris Dagdigian
THANK YOU! That is where I remember initial_state from -- the queue config and not the autoInstall template -dag Reuti wrote: What is the value of "initial_state" in the queue definition right now? -- Reuti ___ users mailing list users@gridengin

[gridengine users] best way to start new exechost in disabled (d) state during template driven install?

2015-04-08 Thread Chris Dagdigian
Been a while since I needed this so I'm being lazy and asking the list first, heh ... We are doing some elastic grid engine building in Amazon using methods other than the StarCluster suite due to some unique scaling and security requirements. I recall from memory that there was a way to c

Re: [gridengine users] Anyone using S-GAE reporting app with Univa grid engine?

2015-04-08 Thread Chris Dagdigian
Fantastic timing! Thank you very much. I'm going to install and try it out right now. Regards, Chris RDlab April 8, 2015 at 6:06 AM Hello, My name is Gabriel and I am the IT manager at RDlab, we are the S-GAE guys :) A new version of S-GAE compliant and tested

Re: [gridengine users] Anyone using S-GAE reporting app with Univa grid engine?

2015-03-03 Thread Chris Dagdigian
I'll give some impressions of S-GAE since I have it installed in a lot of places ... - It's a good basic reporting tool for monthly metrics. - I don't use all of the features; mainly the full cluster "view" - In the full cluster view there are 4-6 PNG graphics that I just generate and copy/em

Re: [gridengine users] Anyone using S-GAE reporting app with Univa grid engine?

2015-03-02 Thread Chris Dagdigian
ooh the various MoD ("metrics on demand") look pretty interesting. Would love to chat about how people have made XDMoD and other variants work with Grid Engine(s) -- can we get a little thread going on best practices and recommendations for 3rd party reporting/metrics tools? Suspect there

[gridengine users] Anyone using S-GAE reporting app with Univa grid engine?

2015-03-02 Thread Chris Dagdigian
Hey folks, I'm a big fan of the php based S-GAE grid engine reporting tool that the fine folks over at http://rdlab.cs.upc.edu/index.php/en/services/s-gae.html have put together However it looks like S-GAE is falling over on a cluster where we recently converted from open source grid engine

Re: [gridengine users] Can SGE handle job dependencies?

2015-02-22 Thread Chris Dagdigian
Yes SGE can handle dependencies between jobs and even dependencies between tasks in a job array. The job dependency syntax depends on job naming in the most common use case, here is a simple example: qsub -N DataStagerTask ./my-SGE-job.sh qsub -hold_jid DataStagerTask ./my-analytic-job

Re: [gridengine users] suggestions on setting up queues

2015-01-16 Thread Chris Dagdigian
Queues are just a piece of the puzzle when it comes to handling resource allocation on a multi user system, what (if any) scheduling policies and resource quotas are you currently using? That said you are using the queue methods in a good way. There are certain things that can only be reall

Re: [gridengine users] SGE and NFS

2014-11-12 Thread Chris Dagdigian
my $.02 SGE can run 100% local without NFS - the main thing (in my experience) that you lose in this config is the easy troublshooting ability of going into a central $SGE_ROOT/$SGE_CELL/ and seeing all of the various node spool and message files. It's annoying but not a dealbreaker especially

[gridengine users] wiki is back, now hopefully with far more resiliency

2014-07-25 Thread Chris Dagdigian
Hi folks, http://wiki.gridengine.info/wiki/index.php/Main_Page ... is back online although it's running an old version of mediawiki that needs some attention eventually so it may go down for upgrades at some point. That site has been on and offline randomly for quite some time mostly due to it r

Re: [gridengine users] pbsdsh

2014-07-02 Thread Chris Dagdigian
HUMMEL Michel wrote: > I wonder if there is, in OGS, an equivalent of the pbsdsh command from torque. > This command spawns a program on all nodes allocated to the PBS job. The spawns take place concurrently - all execute at (about) the same time. > Not within OGS - most people independently ins

[gridengine users] [administrative] test of users mailing list after DNS provider swap

2014-06-03 Thread Chris Dagdigian
Sorry for the admin note; we've effectively been offline since our DNS provider (zoneedit.com) had 2 nameservers go down for multi-day periods. I've got 30+ domains managed with them and only gridengine.org had the bad luck to have it's primary and secondary nameservers assigned to the two zoneedi

Re: [gridengine users] Configurations during kickstart

2014-05-01 Thread Chris Dagdigian
Random ideas: 1. try disabling the log redirects to see if anything ends up in the standard kickstart log? 2. SGE is unusually sensitive to hostname and DNS resolution. Is your kickstart environment giving the node the same IP address during provisioning as it has when running? Does your kicksta

Re: [gridengine users] Gridengine on MAC

2014-04-06 Thread Chris Dagdigian
Run linux as a virtual machine on your mac. It will be easier all around. SGE usually builds and compiles under OS X without too much hassle but dealing with all of the "mac stuff" like switching the startup scripts over to the OS X Launchd() framework files is a pain in the arse. My $.02 of cour

Re: [gridengine users] sge_qmaster uses too much memory and becomes unresponsive

2014-04-02 Thread Chris Dagdigian
Same symptoms seen at one of my clients just yesterday. Programmatic scripts that send a small number of jobs into qsub that all use a threaded PE or similar. Our cluster routinely runs much larger workloads all the time. Our sge_qmaster ran the master node out of memory and was killed hard by th

[gridengine users] "SAS Grid" and "SAS VA" integrated with gridengine?

2014-03-20 Thread Chris Dagdigian
Hi folks, Looking for pointers, documentation URLs or even just personal anecdotes regarding integrating a few SAS products within grid engine environments. In this case I'm looking info/tips on two separate SAS products: SAS Grid SAS Visual Analytics I have a few potential projects with t

Re: [gridengine users] Please help with installation URGENT

2014-03-07 Thread Chris Dagdigian
Have you set a $JAVA_HOME environment variable? That is how I believe the installer finds your java environment The only other thing I see in your error logs is a lot of effort spent generating ssl certficates and otherwise prepping for "Secure Mode" which is a mode that almost nobody runs Grid E

[gridengine users] OGS 2011.11 , Ubuntu 12.04 LTS and NFSv4

2013-10-10 Thread Chris Dagdigian
Hey folks, Got a cluster running OGS 2011.11 via the dropbox download courtesy binaries that is having trouble when the NFSv4 share is getting hammered by file access. I'm 99% certain that this is an NFSv4/kernel/driver/Ubuntu 12.04 LTS issue but wanted to check in to see if anyone has any awar

Re: [gridengine users] Cluster Data Management

2012-12-19 Thread Chris Dagdigian
My own random thoughts about the storage pod and random ideas. (1) The backblaze pod we built (Rayson has my bioteam.net URL in his post below) cost roughly $12K for 100 terabytes of usable storage. Even with all the downsides and negatives to this particular hardware rig the "$12,000 for 100

[gridengine users] will changes to a hard limit in a queue config roll down into running jobs?

2012-11-15 Thread Chris Dagdigian
Quick question ... I've got a job with a user running in a queue that has a 48 hour hard wallclock limit. The user is prepared to move into a long.q but his job is *almost* complete and will not go much past the 48h limit. Trying to see if I can preserve the job and not lose 48 hours of compu

Re: [gridengine users] Fwd: Gridengine and Hadoop

2012-03-30 Thread Chris Dagdigian
I'm registering my interest here. Reuti -- if you could pass my email along to Ralph I'd appreciate it. I have several consulting customers using EMC Isilon storage on Grid Engine HPC clusters and we've been getting pinged from EMC/Greenplum sales reps pushing to show off the combination of n

Re: [gridengine users] Build on OS X

2012-02-07 Thread Chris Dagdigian
: What am I missing ? What am I supposed to do to get it to compile ? Thanks, ___users mailing listusers@gridengine.orghttps://gridengine.org/mailman/listinfo/users Chris Dagdigian Principal Consultant, BioTeam Inc. http://

Re: [gridengine users] installing qmaster and exec on Solaris 11

2012-01-30 Thread Chris Dagdigian
rking spooling setup in the $SGE_CELL directory for the remainder of the install to proceed. What happens when you try "classic" spooling mode? Regards, Chris   Chris Dagdigian Principal Consultant, BioTeam Inc. http://b

[gridengine users] best way to instrument/troubleshoot a segfaulting sge_qmaster?

2012-01-24 Thread Chris Dagdigian
Hi folks, I've got a fresh set of GE2011 binaries where the sge_qmaster segfaults almost instantly on startup. Looking for quick tips on instrumenting or dialing up the debug data to the point where I can get useful error data. Is the best method still to try strace or are there other optio

Re: [gridengine users] deciding spool directory location

2012-01-13 Thread Chris Dagdigian
That's an awesome epilog script Reuti! I might modify it so that a user can trigger a request for the archive but it's disabled by default. That would be a pretty excellent debug tool... Thanks again! -dag Reuti wrote: Am 13.01.2012 um 17:33 schrieb Chris Dagdigian: Whoa. If

Re: [gridengine users] deciding spool directory location

2012-01-13 Thread Chris Dagdigian
Whoa. If there is a tool out there that gives users access to debug and info from the spool area I'd love to hear about it and get it out into the community. One of the downsides to spool locations is that they are usually only accessible to admins. One of my minor gripes about Grid Engine

[gridengine users] My notes on building Open GridScheduler 2011.11 on RedHat/CentOS 6.x based systems

2012-01-12 Thread Chris Dagdigian
Tried to reverse engineer my crusty old build environment into something that I (or even others) can actually replicate or follow. Going to try similar for 32bit binaries as well as document the process for RHEL/CentOS 5.x based systems in the near future... Short link: http://biote.am/6y

Re: [gridengine users] deciding spool directory location

2012-01-12 Thread Chris Dagdigian
Hi Dale, We are trying to determine where the spool directory should reside based on performance Versus ease of administration. Can somebody explain how ease of administration would be made easier? Here is a short answer: When the spool directory is shared it is far easier for an admini

Re: [gridengine users] More Univa FUD???

2012-01-11 Thread Chris Dagdigian
Rayson Ho wrote: And finally, thanks Chris for not selling gridengine.org. We started telling people to subscribe to this list since late last year on the Open Grid Scheduler homepage, and hopefully gridengine.org will not be sold in the foreseeable future. History time! -- I bought gridengine

Re: [gridengine users] Restoring SGE accounting file after re-build

2012-01-04 Thread Chris Dagdigian
Almost! I'm not near an SGE install but there is one other file you need to worry about. It's a text file that contains a simple integer value for the "next" SGE job ID. The file is called "jobseqnum" and it's found spool/qmaster/jobseqnum You don't have to restore it from backup, just find

Re: [gridengine users] More Univa FUD???

2011-12-16 Thread Chris Dagdigian
> First they closed the > source code, now they are taking over the 'Grid Engine' name. What will > they do next?? my $.02 ... Mark if you wanna see a textbook example of infantile FUD in action all you need to do is read your own blog at http://gridenginetruth.blogspot.com/ gridengine.co

Re: [gridengine users] SGE (univa 8.0.1) - anyone running SGE with Centrify active directory integration?

2011-11-23 Thread Chris Dagdigian
William Hay wrote: > As others have pointed out community support for closed source > versions is necessarily limited but nothing stops us from having a go. > As Univa and Oracle diverge from the open source versions this will > become harder though. Just wanted to mention on the list and in p

Re: [gridengine users] SGE (univa 8.0.1) - anyone running SGE with Centrify active directory integration?

2011-11-22 Thread Chris Dagdigian
us to do useful things. If you are using Univa Grid Engine, then you are paying a Univa customer (since it is commercial only). Univa has support engineers and they are the people who are hired to support Univa customers. Now Chris, tell me why my (as well as Reuti's) original response was not

Re: [gridengine users] SGE (univa 8.0.1) - anyone running SGE with Centrify active directory integration?

2011-11-22 Thread Chris Dagdigian
mer? You can send an email to supp...@univa.com or login to the support portal http://www.univa.com/support and we can help. Regards, Bill. On 2011-11-22, at 3:32 PM, Reuti wrote: Hi Chris, Am 22.11.2011 um 21:05 schrieb Chris Dagdigian: I'm hands-on with a shiny new cluster running U

[gridengine users] SGE (univa 8.0.1) - anyone running SGE with Centrify active directory integration?

2011-11-22 Thread Chris Dagdigian
Hi folks, I'm hands-on with a shiny new cluster running Univa's 8.0.1 release and am having some issues running jobs as a non-root user via an account that lives in Active Directory. The cluster is the standard sort of RHEL 5.7 based system but we are using Centrify and in particular the Ce

Re: [gridengine users] cannot run in PE ... because it only offers 0 slots

2011-11-18 Thread Chris Dagdigian
Check the value of "pe_list" in your queue configuration. The MPI PE you are trying to use is not listed in the pe_list parameter for the queue you are submitting to. The queue you show only has "make" as a supported PE. -Chris Gerard Henry wrote: hello all, i got trouble to confgure a

Re: [gridengine users] Re??? `cloud' nodes

2011-11-07 Thread Chris Dagdigian
itching to Chef. But configuration management is real and it can cut down a lot of IT infrastructure maintenance. Rayson On Mon, Oct 10, 2011 at 7:01 PM, Kristen Eisenberg wrote: Chris Dagdigian writes: By FAR the best way to run standalone Grid Engine clusters on the Amazon Cloud tod

Re: [gridengine users] OT: IBM to acquire Platform Computing!

2011-10-11 Thread Chris Dagdigian
On a related note I was talking to a former Platform person who I'm sure many of us know on this list and he mentioned that the stripped down older variant of Platform LSF that platform produced back in the day ("lava") has a new open source home and developer group: http://openlava.net/ -

Re: [gridengine users] differentiating queues/hosts in a heterogenous system

2011-10-05 Thread Chris Dagdigian
That's the right way to do it but you don't need to do it at the queue level if you don't want. You can assign attributes to the nodes themselves and then request them like... qsub -hard -l resourceX=TRUE ./path-to-my-job.script That will run on any queue and only on hosts where the boolea

Re: [gridengine users] oracle's online course for gridengine, any good?

2011-09-29 Thread Chris Dagdigian
The biggest complaint about Sun's SGE training classes was that the instructors were professional trainers rather than people who had actually used Grid Engine. That matters a lot in technical training - the war stories & "mistakes I've made" anecdotes are pretty important. Not sure about th

Re: [gridengine users] Which Grid Engine?

2011-09-08 Thread Chris Dagdigian
I recommend Univa all the time to environments where local SGE expertise may be limited or if commercial support looks like it will be needed or desired. I also maintain handbuilt binaries of the open source forks and I've used Dave L's 'son of gridengine' codebase on two different client c

Re: [gridengine users] Rocks 5.4: Terminate Non-SGE Jobs on Compute Nodes by Normal Users

2011-08-19 Thread Chris Dagdigian
I think I learned this trick from Reuti: - Any legit job running under Grid Engine will be a child process of an sge_execd daemon. A nice little trick is a cronjob that does a "kill -9" on any user process that is not a child of sge_execd -- that will quickly send a message to the people by

Re: [gridengine users] tight openmpi integration - how to alter hostnames for selected exechosts?

2011-08-17 Thread Chris Dagdigian
Thanks Joe & Reuti - [cdagdigian@master ib-mpi-tests]$ ompi_info | grep grid MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.4.3) [cdagdigian@master ib-mpi-tests]$ ompi_info | egrep '(rdma|openib)' MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.3)

[gridengine users] tight openmpi integration - how to alter hostnames for selected exechosts?

2011-08-17 Thread Chris Dagdigian
Hi folks, I'm sorta stymied by the magic of effortless openmpi tight integration with SGE and am wondering how best to proceed... Here is my situation: - Cluster has nodes named "node1 ... nodeN" - Cluster also has IB NICs in each node - Cluster hosts file declares the IB interfaces as "inode

[gridengine users] 3rd party 'qtcsh' build failing on son-of-gridengine?

2011-08-11 Thread Chris Dagdigian
Hi folks, One of the nicer outcomes of the post-Oracle world is how easy it's starting to become to actually build SGE source ... I'm currently trying to build an x86_64 version of the latest son-of-gridengine and am running into an issue with 'qtcsh' All of the normally flaky java, qmon,

Re: [gridengine users] Queue and Parallell Environments

2011-08-08 Thread Chris Dagdigian
ITY s_vmemINFINITY h_vmemINFINITY Here is the output of On Mon, Aug 8, 2011 at 4:05 PM, Chris Dagdigian mailto:d...@sonsorol.org>> wrote: If you post the output of 'qconf -sq ' we can provide more targeted advice. qstat -f -q output might be

Re: [gridengine users] Queue and Parallell Environments

2011-08-08 Thread Chris Dagdigian
INFINITY s_rss INFINITY h_rss INFINITY s_vmemINFINITY h_vmemINFINITY Here is the output of On Mon, Aug 8, 2011 at 4:05 PM, Chris Dagdigian mailto:d...@sonsorol.org>> wrote: If you post the output of 'qconf -sq ' we

Re: [gridengine users] Queue and Parallell Environments

2011-08-08 Thread Chris Dagdigian
If you post the output of 'qconf -sq ' we can provide more targeted advice. qstat -f -q output might be useful as well just so we can be sure your nodes are actually up and not in error state It sounds as though you have a cluster queue set up without any available hosts configured within

Re: [gridengine users] any plans for a fall GE users/developers meeting?

2011-07-27 Thread Chris Dagdigian
If we can't get a standalone meeting going, it might be reasonable to try to get something together for SC11 in Seattle: http://sc11.supercomputing.org/ -Chris Brooks Davis wrote: In the past there have been meetings of GE users and developers many Octobers. I've usually found about them t

Re: [gridengine users] hedeby install howto?

2011-06-29 Thread Chris Dagdigian
My overall advice for people trying to run Grid Engine on the Amazon Cloud is this: (1) If you just want to run Grid Engine in standalone mode on the Amazon Cloud then you should be using StarCluster http://web.mit.edu/stardev/cluster/ -- those folks made a fantastic and free system for ela

Re: [gridengine users] Web based forums

2011-06-07 Thread Chris Dagdigian
Hi Rich, The most active people on the list who provide the most support almost unanimously hate web forums and having to use a web browser to communicate so I don't think forums are in the future ... We don't expect users to download and search .gzip files, my preferred method is to use ex

Re: [gridengine users] qmaster startup due to communication errors

2011-05-26 Thread Chris Dagdigian
Compare the contents of $SGE_ROOT/$SGE_CELL/act_qmaster to what you have in /etc/hosts -- the act_qmaster file contains the hostname for what SGE believes is the qmaster. That hostname needs to resolve perfectly in DNS or in your /etc/hosts file. You can also experiment with the $SGE_ROOT/ut

Re: [gridengine users] write my own accounting log parser..

2011-05-05 Thread Chris Dagdigian
I've written perl scripts to scrape the accounting log and throw the entries into a mysql database - mainly so we could write our own simple queries and text based reporting tools. Never wrote the web app though at one time when I was entranced with ruby-on-rails I thought it would be a cool

Re: [gridengine users] `cloud' nodes

2011-04-29 Thread Chris Dagdigian
I'm buried in work and biz travel so apologies if this quick reply is not on topic ... By FAR the best way to run standalone Grid Engine clusters on the Amazon Cloud today is to simply use MIT Starcluster : http://web.mit.edu/stardev/cluster/index.html The people behind starcluster basical

Re: [gridengine users] Green Computing (power control)

2011-04-29 Thread Chris Dagdigian
I have absolutely seen this done with very real results. The most important thing is have the system generate emails to senior management saying things like "... I saved $12,000 in electricity last quarter ..." -- I can't overstate enough the importance of making sure that you have the PR stuff

Re: [gridengine users] Berkeley DB (was building RHEL5)

2011-04-08 Thread Chris Dagdigian
I was lucky enough to have a Panasas PAS12 ("fastest HPC storage system in the world!") chassis in my home office for a few weeks. Suffice to say I don't think it will have any troubles handling classic spooling at all, heh. -Chris Mark Suhovecky wrote: Our current installation uses AFS

Re: [gridengine users] Berkeley DB (was building RHEL5)

2011-04-08 Thread Chris Dagdigian
Rayson answered the ARCO question - spooling does not matter since the only ARCO involved files that get scraped are the accounting and reporting files classic vs. berkeley is always an interesting question. I also am firmly in the classic spooling camp but we sometimes use berkeley spooli

Re: [gridengine users] Contribs and GridGraph

2011-04-07 Thread Chris Dagdigian
or links on gridengine.org or gridengine.info. (If people have suggestions, let us know -- at least I and Chris Dagdigian can edit it, and possibly others.) Thanks for posting. For what it's worth, it looks as if the code would need parametrizing for general use, and I guess it could fall fou

Re: [gridengine users] building RHEL5

2011-04-07 Thread Chris Dagdigian
I could be wrong but ... Even though Univa (and others?) expect to depreciate the use of RPC based spooling to a remote berkeley DB server the current SGE codebase and aimk built scripts still expect to see a berkeley installation that has a ready to go "rpc_server" binary or whatever ... S

[gridengine users] great blog post w/ deep dive into SGE priority calculations

2011-04-06 Thread Chris Dagdigian
Jiri forwarded me the URL to his post and I found it fascinating: "Calculating GE Job Priorities" http://olwynion.blogspot.com/2011/04/calculating-ge-job-priorities.html I've always felt that one of the strengths of GE (unlimited number of knobs that you can alter) is also one of it's biggest

[gridengine users] building jgdi on Mac OS X 10.6

2011-03-30 Thread Chris Dagdigian
Hi folks, After very smooth Linux builds I figured I'd test my luck with OS X ... Hitting an error now in the "./aimk -only-core" step, due to an arch mismatch errors: "libjvm.dylib, file was built for i386 which is not the architecture being linked (x86_64)" It looks like this may be ca

Re: [gridengine users] [gridengine dev] building on centos 5.5 64 bit?

2011-03-27 Thread Chris Dagdigian
needed to build one of the visualization panes. I'm going to do some testing of those binaries this week and also work on OS X builds. The goal on our end (bioteam) is to have curtesy binaries that we don't mind sharing with others. -Chris Dave Love wrote: Chris Dagdigian w

Re: [gridengine users] [gridengine dev] building on centos 5.5 64 bit?

2011-03-24 Thread Chris Dagdigian
I just did a 'git clone' of the current source over at Github and was able to build (I think) 100% of the code. This is a 64 bit system running CentOS 5.5 Everything built from source without too much hassle ... including the java stuff classes, the GUI installer and the hadoop herd classes

[gridengine users] posted some user-centric training materials online

2011-03-10 Thread Chris Dagdigian
FYI, Following up on the 2009 posting of some Admin-centric training materials I threw some PDFs on the bioteam blog that represent a quick and dirty introduction to Grid Engine usage and workflows -- the materials are simple and aimed at an audience of users rather than admins. Shortened l

Re: [gridengine users] does anyone have workshop proceedings archived?

2011-02-28 Thread Chris Dagdigian
The 2007 workshop proceedings are here: http://gridengine.org/assets/static/ws2007/ And I hacked through the index HTML page, I think I have all the links working off of this index/contents page: http://gridengine.org/assets/static/ws2007/SGEWorkshop2007.htm -Chris __

[gridengine users] 6.2u5 courtesy binaries for OSX/Darwin ?

2011-02-22 Thread Chris Dagdigian
... anyone have a copied/saved version of the OS X binaries for the last open Sun/Oracle SGE release? Regards, Chris ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

[gridengine users] SGE and Matlab Distributed Computing Server integration?

2011-02-22 Thread Chris Dagdigian
Has anyone done any real world integration with MDCS and modern versions of grid engine? A quick google search pulls up this old URL: http://www.mathworks.com/support/solutions/en/data/1-2MC1RY/?solution=1-2MC1RY .. and from that the method looks looks pretty straightforward. Any real world

Re: [gridengine users] SGE Benchmark Tools

2011-02-16 Thread Chris Dagdigian
What exactly are you trying to benchmark? Job types and workflows are far to variable to produce a usable generic reference. The real benchmark is "does it do what I need?" and there are many people on this list who can help you zero in on answering that question. SGE is used on anything fr