[slurm-users] Adding gres usage to the accounting database

2024-12-13 Thread Mark Dixon via slurm-users
Hello! Our jobs can ask for dedicated per-node disk space, e.g. "--gres=tmp:1G", where an ephemeral directory is managed by the site prolog/epilog and usage is capped using an xfs project quota. This works well, although we really need to look at job_container/tmpfs. I note that slurm alread

[lustre-discuss] Lustre 2.15 LTS EOL?

2024-09-23 Thread Mark Dixon
Hi there, I was wondering if anyone knew what the expected lifetime of 2.15.x might be, please? With LTS releases being tied to a RHEL major version, having some idea of this would help plan server installs/upgrades. Thanks! Mark ___ lustre-discu

Re: [Alpine-info] Examining contents of alpine's passfile?

2024-09-12 Thread Mark Dixon via Alpine-info
Hi Jason, Thanks, that's a big help! Best, Mark On Wed, 11 Sep 2024, jason-alpine-i...@shalott.net wrote: [You don't often get email from jason-alpine-i...@shalott.net. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] [EXTERNAL EMAIL] I'd like to take a loo

[Bug 2056495] [NEW] Ubiquity crashes a few seconds into the install process

2024-03-07 Thread Mark Dixon
Public bug reported: I'm following the instructions at: https://mutschler.dev/linux/ubuntu-btrfs-20-04/#create-filesystems-for-root-and-efi-system-partitions. All goes well until I attempt to work with the installer ("ubiquity --no-bootloader" command). I can select language (English), keyboard

[Bacula-users] Limit number of concurrent verify jobs

2023-10-30 Thread Mark Dixon
Hi all, We find that our bacula catalog database can handle more concurrent backup jobs than it can verify catalog jobs. We'd therefore like to have a lower concurrency limit for verify catalog jobs, but I cannot see an obvious way to do this. One way would be to run two director/catalog pai

Re: [lustre-discuss] Ongoing issues with quota

2023-10-04 Thread Mark Dixon via lustre-discuss
uld be a simple way to reconcile or regenerate what quota has recorded vs what is actually on disk, which I have verified two different ways. -- Dan On Wed, 2023-10-04 at 15:01 +0100, Mark Dixon wrote: Hi Dan, I think it gets corrected when you umount and fsck the OST's themselves (not lfsck)

Re: [lustre-discuss] Ongoing issues with quota

2023-10-04 Thread Mark Dixon via lustre-discuss
Hi Dan, I think it gets corrected when you umount and fsck the OST's themselves (not lfsck). At least I recall seeing such messages when fsck'ing on 2.12. Best, Mark On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote: [EXTERNAL EMAIL] No combination of lfsck runs has helped with t

Re: [lustre-discuss] Rocky 9.2/lustre 2.15.3 client questions

2023-06-26 Thread Mark Dixon via lustre-discuss
Hi Christopher, Not an exact match, but we've seen problems running Vasp on a 2.15.x client against 2.12.6 servers. It can get in quite a tangle, to the point that other clients cannot "ls -l" the Vasp working directory. Don't know (yet) if it's also true of 2.12.9. Best, Mark On Fri, 23 J

Re: [Shorewall-users] Rule matching with USER?

2022-12-16 Thread Mark Dixon
On Fri, 16 Dec 2022, Justin Pryzby wrote: ... That limitation isn't due to shorewall but rather networking in general. You can't know and certainly couldn't trust the username from a remote system (like "identd"), and loopback has the same limitation. Any ideas on how to handle this in shorewal

[Shorewall-users] Rule matching with USER?

2022-12-16 Thread Mark Dixon
Hi all, I'm having a play with shorewall rules, specifically using the USER column to restrict access to a local port. If I have a rule like this... DROP:info fw fw tcp 1332 - - - !foo - - - - - - ...then only local user foo can connect to 1332/tcp on the server's normal IP address. Howeve

[jira] [Commented] (DIRSERVER-2371) Inclusion of 'top' objectclass in searches yields no results

2022-11-28 Thread Mark Dixon (Jira)
[ https://issues.apache.org/jira/browse/DIRSERVER-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640055#comment-17640055 ] Mark Dixon commented on DIRSERVER-2371: --- {color:#172b4d}I think I'm s

Re: [slurm-users] temporary SLURM directories

2022-05-25 Thread Mark Dixon
In addition to the other suggestions, there's this: https://slurm.schedmd.com/faq.html#tmpfs_jobcontainer https://slurm.schedmd.com/job_container.conf.html I would be interested in hearing how well it works - it's so buried in the documentation that unfortunately I didn't see it until after I r

Re: [slurm-users] Slurm 21.08.8-2 upgrade

2022-05-09 Thread Mark Dixon
On Thu, 5 May 2022, Legato, John (NIH/NHLBI) [E] wrote: ... We are in the process of upgrading from Slurm 21.08.6 to Slurm 21.08.8-2. We’ve upgraded the controller and a few partitions worth of nodes. We notice the nodes are losing contact with the controller but slurmd is still up. We thought

Re: [slurm-users] SLURM: reconfig

2022-05-06 Thread Mark Dixon
On Thu, 5 May 2022, Ole Holm Nielsen wrote: ... You're right, probably the correct order for Configless must be: * stop slurmctld * edit slurm.conf etc. * start slurmctld * restart the slurmd nodes to pick up new slurm.conf See also slides 29-34 in https://slurm.schedmd.com/SLUG21/Field_Notes_5

Re: [slurm-users] SLURM: reconfig

2022-05-05 Thread Mark Dixon
On Thu, 5 May 2022, Ole Holm Nielsen wrote: ... That is correct. Just do "scontrol reconfig" on the slurmctld server. If all your slurmd's are truly running Configless[1], they will pick up the new config and reconfigure without restarting. Details are summarized in https://wiki.fysik.dtu.dk/n

Re: [Bacula-users] Maximum Reload Requests

2022-02-02 Thread Mark Dixon
We're on a more recent version of bacula, but we reload our config quite frequently and have never needed to restart bacula-dir to get it to work. Best wishes, Mark On Wed, 2 Feb 2022, Shaligram Bhagat, Yateen (Nokia - IN/Bangalore) wrote: [EXTERNAL EMAIL] Hi all, In the Bacula 9.4.4 main r

Re: [Bacula-users] Verify jobs and "Warning: sql_get.c:186 More than one Filename!"

2021-12-21 Thread Mark Dixon
On Fri, 17 Dec 2021, Phil Stracchino wrote: ... I've been running my Bacula jobs in parallel for literally decades now and have never seen it result in a duplicate Path or Filename warning. And they ARE only warnings. I would not worry about it as long as it is only a warning. You could manuall

[Bacula-users] Verify jobs and "Warning: sql_get.c:186 More than one Filename!"

2021-12-17 Thread Mark Dixon
Hi all, I find the verify job feature of bacula really useful to see what has changed since the last verifyinit job, but the job output is becoming difficult to read as it's clustered with lots and lots of messages of the form: 17-Dec 18:28 foo-dir JobId 67663: Warning: sql_get.c:186 More th

Re: [slurm-users] srun fails with "srun: error: Security violation, slurm message from uid" if delay in job starting

2021-12-14 Thread Mark Dixon
Hi all, Sorry for the noise, this was down to a problem with our configless setup. Really must start running slurmd everywhere and get rid of the compute-only version of slurm.conf... Cheers, Mark On Mon, 13 Dec 2021, Mark Dixon wrote: [EXTERNAL EMAIL] Hi all, Just wondering if anyone

[slurm-users] srun fails with "srun: error: Security violation, slurm message from uid" if delay in job starting

2021-12-13 Thread Mark Dixon
Hi all, Just wondering if anyone else had seen this. Running slurm 21.08.2, we're seeing srun work normally if it is able to run immediately. However, if there is a delay in job start, for example after a wait for another job to end, srun fails. e.g. [test@foo ~]$ srun -p test --pty bash

Re: [slurm-users] Per-job TMPDIR: how to lookup gres allocation in prolog?

2021-11-17 Thread Mark Dixon
On Wed, 17 Nov 2021, Bjørn-Helge Mevik wrote: ... We are using basically the same setup, and have not found any other way than running "scontrol show job ..." in the prolog (even though it is not recommended). I have yet to see any problems arising from it, but YMMW. If you find a different way

[slurm-users] Per-job TMPDIR: how to lookup gres allocation in prolog?

2021-11-16 Thread Mark Dixon
Hi everyone, I'd like to configure slurm such that users can request an amount of disk space for TMPDIR... and for that request to be reserved and quota'd via commands like "sbatch --gres tmp:10G jobscript.sh". Probably reinventing someone's wheel, but I'm almost there. I have: - created a

Re: [slurm-users] enable_configless, srun and DNS vs. hosts file

2021-11-16 Thread Mark Dixon
rly UCNS) -----Original Message- From: slurm-users On Behalf Of Mark Dixon Sent: Wednesday, November 10, 2021 10:14 To: slurm-users@lists.schedmd.com Subject: [slurm-users] enable_configless, srun and DNS vs. hosts file [EXTERNAL SENDER - PROCEED CAUTIOUSLY] Hi, I'm using the "

[slurm-users] enable_configless, srun and DNS vs. hosts file

2021-11-10 Thread Mark Dixon
Hi, I'm using the "enable_configless" mode to avoid the need for a shared slurm.conf file, and am having similar trouble to others when running "srun", e.g. srun: error: fwd_tree_thread: can't find address for host cn120, check slurm.conf srun: error: Task launch for StepId=113.0 failed

Re: [Puppet Users] hiera merge lookup/alias of array defined at multiple levels

2021-09-03 Thread Mark Dixon
UTC+1 dhei...@opentext.com wrote: > Am Donnerstag, dem 02.09.2021 um 08:46 -0700 schrieb Mark Dixon: > > I'd like to do a simple merge lookup of an array within a hiera yaml file. > At the moment I seem to be getting the array from the hiera location that > "wins",

[Puppet Users] hiera merge lookup/alias of array defined at multiple levels

2021-09-02 Thread Mark Dixon
Hi there, I'd like to do a simple merge lookup of an array within a hiera yaml file. At the moment I seem to be getting the array from the hiera location that "wins", and not a merged version. I guess it's not possible, perhaps due to possible circular dependencies, etc. Is that right, please

Re: [chirp_users] Download problem

2021-06-07 Thread Mark Dixon
a poor connection - try squeezing the cable tightly into the side of the radio and maybe even holding it in while attempting the upload again (being careful not to squeeze PTT at the same time). Cheers, Mark Dixon On 7/06/2021 11:01 am, Gerald Stein wrote: > I’m new to Chirp software.

Re: [slurm-users] Drain node from TaskProlog / TaskEpilog

2021-05-25 Thread Mark Dixon
ay to continue the job while still 'fixing' the issue. That could be done in the TaskEpilog script (assuming your daemon user has permissions to do so). On 5/24/2021 8:56 AM, Mark Dixon wrote: Hi Brian, Thanks for replying. On our hardware, GPUs allocated to a job by cgroup sometimes

Re: [slurm-users] Drain node from TaskProlog / TaskEpilog

2021-05-24 Thread Mark Dixon
ittle more. On 5/24/2021 3:02 AM, Mark Dixon wrote: Hi all, Sometimes our compute nodes get into a failed state which we can only detect from inside the job environment. I can see that TaskProlog / TaskEpilog allows us to run our detection test; however, unlike Epilog and Prolog, they do n

[slurm-users] Drain node from TaskProlog / TaskEpilog

2021-05-24 Thread Mark Dixon
Hi all, Sometimes our compute nodes get into a failed state which we can only detect from inside the job environment. I can see that TaskProlog / TaskEpilog allows us to run our detection test; however, unlike Epilog and Prolog, they do not drain a node if they exit with a non-zero exit code

Re: [Puppet Users] Using the puppet gem on an unsupported platform

2021-03-19 Thread Mark Dixon
gents. > > You need the module on the agent if you want to run puppet apply only. > > Hth, > Martin > > > On 18. Mar 2021, at 17:47, Mark Dixon wrote: > > Thanks for that, it's showing just how weird things are getting! > > puppet 7.4.1, puppetserver 7.0

Re: [Puppet Users] Using the puppet gem on an unsupported platform

2021-03-18 Thread Mark Dixon
M UTC Martin Alfke wrote: > Hi Mark, > > You can check if a type is available by running puppet describe -l > This will print out all available puppet custom types. > > Best, > Martin > > > On 11. Mar 2021, at 18:11, Mark Dixon wrote: > > Hi Martin, > >

Re: [Puppet Users] Using the puppet gem on an unsupported platform

2021-03-11 Thread Mark Dixon
please check module path using 'puppet config print modulepath' and > install the required core modules into one of the mentioned folders: > puppet module install puppetlabs-mount_core --target-dir > > This should make the mount resource type available. > > Best, > Marti

[Puppet Users] Using the puppet gem on an unsupported platform

2021-03-10 Thread Mark Dixon
Hi there, Following on from the conversation about the availability of a puppet agent RPM on el8 for the ppc64le architecture, I'm trying to use agent in the version of puppet made available as a ruby gem. It largely works just by doing this, giving me puppet 7.4.1: yum install ruby gem i

Re: [Puppet Users] Re: open source puppet-agent for ppc64le / power9 on rhel 8

2021-03-03 Thread Mark Dixon
/puppet/el/8/ http://yum.puppetlabs.com/puppet6/el/8/ http://yum.puppetlabs.com/puppet7/el/8/ Best, Mark On Tuesday, March 2, 2021 at 6:40:43 PM UTC Justin Stoller wrote: > On Tue, Mar 2, 2021 at 9:05 AM Mark Dixon wrote: > >> Hi Nick, >> >> That's great news, for

[Puppet Users] Re: open source puppet-agent for ppc64le / power9 on rhel 8

2021-03-02 Thread Mark Dixon
Hi Nick, That's great news, for a moment there I was worried :) It's a new deployment so I'm fairly relaxed about puppet 6 vs. 7, but specifically I'm feeling the lack of any version at all for rhel8. Best wishes, Mark On Tuesday, March 2, 2021 at 4:48:21 PM UTC Nick Walker wrote: > Hi Mark,

[Puppet Users] open source puppet-agent for ppc64le / power9 on rhel 8

2021-03-01 Thread Mark Dixon
Hi all, Just been looking at yum.puppetlabs.com for a copy of the puppet agent for the ppc64le architecture on rhel8 and couldn't find one. I can see a rhel7 version of puppet6 (but not puppet7) and nothing at all for rhel8. Has open source puppet dropped support for IBM POWER9 clients, please?

[chirp_users] Support for Yaesu FT-991A?

2021-03-01 Thread Mark Dixon
I think this has been asked before, in March 2019, but I'm wondering if there is any further news on Chirp support for the Yaesu FT-991A? Is there perhaps a different model which uses much the same communication protocol that does have Chirp support? Kind regards, Mark Dixon,

Re: [chirp_users] Chirp no work - on Windows, try Linux if possible

2021-01-05 Thread Mark Dixon
even with a basic after-market cable. Kind regards, Mark Dixon, VK6EZ. On 6/01/2021 2:06 am, Billy Joe Higginbotham Jr via chirp_users wrote: > I can’t get my radio to link with my computer. I have the uv-5r and run > Windows 7 on my computer. I keep get the message (An error has occurred

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-12-02 Thread Mark Dixon via users
Hi Mark, Thanks so much for this - yes, applying that pull request against ompi 4.0.5 allows hdf5 1.10.7's parallel tests to pass on our Lustre filesystem. I'll certainly be applying it on our local clusters! Best wishes, Mark On Tue, 1 Dec 2020, Mark Allen via users wrote: At least for t

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-30 Thread Mark Dixon via users
On Fri, 27 Nov 2020, Dave Love wrote: ... It's less dramatic in the case I ran, but there's clearly something badly wrong which needs profiling. It's probably useful to know how many ranks that's with, and whether it's the default striping. (I assume with default ompio fs parameters.) Hi Da

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-26 Thread Mark Dixon via users
-Original Message- From: users On Behalf Of Mark Dixon via users Sent: Thursday, November 26, 2020 9:38 AM To: Dave Love via users Cc: Mark Dixon ; Dave Love Subject: Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO? On Wed, 25 Nov 2020, Dave Love via users wrote: The perf test says romio

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-26 Thread Mark Dixon via users
On Wed, 25 Nov 2020, Dave Love via users wrote: The perf test says romio performs a bit better. Also -- from overall time -- it's faster on IMB-IO (which I haven't looked at in detail, and ran with suboptimal striping). I take that back. I can't reproduce a significant difference for total I

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-17 Thread Mark Dixon via users
Hi Edgar, Pity, that would have been nice! But thanks for looking. Checking through the ompi github issues, I now realise I logged exactly the same issue over a year ago (completely forgot - I've moved jobs since then), including a script to reproduce the issue on a Lustre system. Unfortunate

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-16 Thread Mark Dixon via users
There was a bug fix in the Open MPI to ROMIO integration layer sometime in the 4.0 series that fixed a datatype problem, which caused some problems in the HDF5 tests. You might be hitting that problem. Thanks Edgar -Original Message- From: users On Behalf Of Mark Dixon via users Sent: Monday, N

[OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-16 Thread Mark Dixon via users
Hi all, I'm confused about how openmpi supports mpi-io on Lustre these days, and am hoping that someone can help. Back in the openmpi 2.0.0 release notes, it said that OMPIO is the default MPI-IO implementation on everything apart from Lustre, where ROMIO is used. Those release notes are pre

Re: [Bacula-users] Autoprune not getting rid of Verify jobs

2020-11-05 Thread Mark Dixon
Hi Martin, Thanks for the reply, glad to know that it's not just me. They're InitCatalog Verify jobs. Going to SQL feels a bit hairy, as we'd have to be careful to cascade the delete to at least the File and Log tables. As prune (unfortunately) only lets you work on a single client at a time,

[Bacula-users] Autoprune not getting rid of Verify jobs

2020-11-04 Thread Mark Dixon
Hi all, I use bacula to backup my data, but also find verify jobs to be extremely useful. However, after checking my bacula database, I noticed that none of the old verify jobs have been pruned, which is making the catalog quite big. Actual backup jobs are happily being automatically pruned.

Re: [slurm-users] Fair share per partition

2020-09-17 Thread Mark Dixon
d for it, but both sets of hardware was busy. In this case, I think I can safely move back to scaling down the partition charge :) Cheers, Mark -- Mark Dixon Tel: +44(0)191 33 41383 Advanced Research Computing (ARC), Durham University, UK

Re: [slurm-users] Fair share per partition

2020-09-17 Thread Mark Dixon
On Thu, 17 Sep 2020, Paul Edmon wrote: So the way we handle it is that we give a blanket fairshare to everyone but then dial in our TRES charge back on a per partition basis based on hardware.  Our fairshare doc has a fuller explanation: https://docs.rc.fas.harvard.edu/kb/fairshare/ -Paul Ed

[slurm-users] Fair share per partition

2020-09-17 Thread Mark Dixon
d managing users, particularly as I've not figured out how to define a group of users in one place (say, a unix group) that I can then use multiple times. Is there a better way, please? Thanks, Mark -- Mark Dixon Tel: +44(0)191 33 41383 Advanced Research Computing (ARC), Durham University, UK

Re: [slurm-users] getting started with job_submit_lua

2020-09-16 Thread Mark Dixon
On Wed, 16 Sep 2020, Niels Carl W. Hansen wrote: If you explicitely specify the account, f.ex. 'sbatch -A myaccount' then 'slurm.log_info("submit -- account %s", job_desc.account)' works. Great, thanks - that's working! Of course I have other problems... :

Re: [slurm-users] getting started with job_submit_lua

2020-09-16 Thread Mark Dixon
On Wed, 16 Sep 2020, Diego Zuccato wrote: ... From the source it seems these fields are available: account comment direct_set_prio gres job_id Always nil ? Maybe no JobID yet? job_state licenses max_cpus max_nodes min_

[slurm-users] getting started with job_submit_lua

2020-09-15 Thread Mark Dixon
Hi all, I'm trying to get started with the lua job_submit feature and I have a really dumb question. This job_submit Lua script: function slurm_job_submit( job_desc, part_list, submit_uid ) slurm.log_info("submit called lua plugin") for k,v in pairs(job_desc) do slurm.log_

Re: [Bacula-users] Multithreaded backups?

2020-04-29 Thread Mark Dixon
Hi Josh, Many thanks for your advice, good points all - I'll continue to play. Best wishes, Mark On Tue, 28 Apr 2020, Josh Fisher wrote: ... Yes. I believe the client is multi-threaded in that multiple commands can be issued and they will each be handled in a separately spawned thread. Howeve

Re: [Bacula-users] Multithreaded backups?

2020-04-28 Thread Mark Dixon
p from the client. On Mon, Apr 27, 2020 at 9:34 AM Mark Dixon wrote: Hi all, Am I right in thinking that a single bacula job can only back up each file in its fileset sequentially - there's no multithreading available to back up multiple files at the same time in order to leverage the clie

[Bacula-users] Multithreaded backups?

2020-04-27 Thread Mark Dixon
Hi all, Am I right in thinking that a single bacula job can only back up each file in its fileset sequentially - there's no multithreading available to back up multiple files at the same time in order to leverage the client CPU? I'm a relatively long-term user of bacula (thanks!) who has been

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon
efault qos for foo's jobs: "sacctmgr modify user foo set qos=drain defaultqos=drain" And then update the qos on all of foo's waiting jobs. I'll be using David's GrpSubmitJobs=0 suggestion instead. Thanks for everyone's help, Mark On Wed, 1 Apr 2020, Mark Dixon

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon
un to completion and pending jobs won't start Antony On Wed, 1 Apr 2020 at 10:57, Mark Dixon wrote: Hi all, I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. I'd like to stop user foo from submitting new jobs but allow their existing jobs to run. We hav

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon
done wrong. Best, Mark On Wed, 1 Apr 2020, mercan wrote: Hi; If you have working job_submit.lua script, you can put a block new jobs of the spesific user: if job_desc.user_name == "baduser" then     return 2045 end thats all! Regards; Ahmet M. 1.04.2020 16:22 tarih

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon
we set the GrpSubmit jobs on an account to 0 which allowed in-flight jobs to continue but no new work to be submitted. HTH, David On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon wrote: Hi all, I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. I'd like to stop user foo

[slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon
Hi all, I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. I'd like to stop user foo from submitting new jobs but allow their existing jobs to run. We have several partitions, each with its own qos and MaxSubmitJobs typically set to some vaue. These qos are stopping a "sa

[gridengine users] Thanks Gridengine!

2019-08-16 Thread Mark Dixon
Hi all, I've hung up my cape - giving up my superpowers on the University of Leeds supercomputers - and before I went, had a bit of fun looking at our gridengine accounting logs. Thought I'd share it, in case anyone found it interesting: https://arc.leeds.ac.uk/very-nearly-almost-14-years-of-a-

[SGE-discuss] Thanks Gridengine!

2019-08-16 Thread Mark Dixon
Hi all, I've hung up my cape - giving up my superpowers on the University of Leeds supercomputers - and before I went, had a bit of fun looking at our gridengine accounting logs. Thought I'd share it, in case anyone found it interesting: https://arc.leeds.ac.uk/very-nearly-almost-14-years-of-a-

[OMPI users] OMPI 4.0.1 + PHDF5 1.8.21 tests fail on Lustre

2019-08-05 Thread Mark Dixon via users
Hi, I’ve built parallel HDF5 1.8.21 against OpenMPI 4.0.1 on CentOS 7 and a Lustre 2.12 filesystem using the OS-provided GCC 4.8.5 and am trying to run the testsuite. I’m failing the testphdf5 test: could anyone help, please? I’ve successfully used the same method to pass tests when building H

Re: [gridengine users] [SGE-discuss] A Virtual GridEngine Cluster in a cluster

2019-03-08 Thread Mark Dixon
On Fri, 8 Mar 2019, Reuti wrote: ... > We got access to a SLURM equipped cluster where one always get complete > nodes and are asked to avoid single serial jobs or to pack them by > scripting to fill the nodes. With the additional need for a workflow > application (kinda DRMAA) and array job dep

Re: [SGE-discuss] A Virtual GridEngine Cluster in a cluster

2019-03-08 Thread Mark Dixon
On Fri, 8 Mar 2019, Reuti wrote: ... > We got access to a SLURM equipped cluster where one always get complete > nodes and are asked to avoid single serial jobs or to pack them by > scripting to fill the nodes. With the additional need for a workflow > application (kinda DRMAA) and array job dep

Re: [gridengine users] Support for Open Source Grid Engine

2019-03-01 Thread Mark Dixon
Hi Fritz, That's one step closer to my dream of being able to pay for a commercially maintained (bugfixes, features, etc.), open source version of gridengine. So near and yet so far! The best of luck :) Mark On Fri, 1 Mar 2019, Friedrich Ferstl wrote: > Hi, > > I hope people do not take offe

Re: [gridengine users] Fair share policy

2019-02-28 Thread Mark Dixon
e was more than one >> issue with Sharetree and arrays that we saw but it didn’t happen in the >> default sharetree configuration. I will have to check. >> >> Regards >> >> Bill >> >> Sent from my iPhone >> >>> On Feb 28, 2019, at 4:32 AM,

Re: [gridengine users] Fair share policy

2019-02-28 Thread Mark Dixon
Hi Bill, I fixed that share-tree-array-jobs priority problem some time ago, unless you're thinking of a different one? https://arc.liv.ac.uk/trac/SGE/ticket/435 https://arc.liv.ac.uk/trac/SGE/changeset/4840/sge We use share tree and array jobs all the time with no problems. It made it into a S

Re: [gridengine users] $TMPDIR With MPI Jobs

2018-12-07 Thread Mark Dixon
On Thu, 6 Dec 2018, Reuti wrote: ... > One can create persistent scratch directories e.g. in a job prolog (just > make the list of nodes unique and issue `qrsh -inherit ...` for each > nodes `mkdir $TMPDIR-persistent` Curley braces are optional here, as the > dash can't be a character in an envi

Experiences with 12-BETA3

2018-11-11 Thread Mark Dixon
Hi, I thought I'd share some difficulties I've had upgrading from 11 to 12. * Java 1.8 seems unstable - I've been doing a bit of Scala dev today, and the JVM seems somewhat unstable. SBT is crashing with the following fairly regularly. I'm pretty sure I've rebuilt it since the upgrade. # # A fat

Re: FreeBSD 12.0-BETA2 Now Available

2018-10-29 Thread Mark Dixon
On Mon, 29 Oct 2018 at 13:24, Lars Engels wrote: > On Mon, Oct 29, 2018 at 07:59:32AM +0000, Mark Dixon wrote: > > Same, FreeBSD update doesn't seem to be working: > > > > $ sudo freebsd-update upgrade -r 12.0-BETA2 > > IIRC freebsd-update only works for RCs, not

Re: FreeBSD 12.0-BETA2 Now Available

2018-10-29 Thread Mark Dixon
Same, FreeBSD update doesn't seem to be working: $ sudo freebsd-update upgrade -r 12.0-BETA2 Mark ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-uns

Re: [gridengine users] cpu usage calculation

2018-09-05 Thread Mark Dixon
On Fri, 31 Aug 2018, Daniel Povey wrote: ... This gets back to the issue of who is going to maintain GridEngine. Dave Love briefly resurfaced (enough to dissuade me from forming a group to maintain it, we were going to make this its home https://github.com/son-of-gridengine/sge) but seems to have

Re: [lustre-discuss] Jira? no access / disappeared?

2018-06-22 Thread Mark Dixon
Does that mean the public git repository (and anything else?) has also moved? $ git clone git://git.hpdd.intel.com/fs/lustre-release.git Cloning into 'lustre-release'... fatal: remote error: access denied or repository not exported: /fs/lustre-release.git Tried replacing hpdd.intel.com w

Re: [gridengine users] Scheduling maintenance and using advance reservation

2018-06-07 Thread Mark Dixon
projects from queues' configuration. Ilya. On Wed, Jun 6, 2018 at 2:41 AM, Mark Dixon wrote: On Tue, 5 Jun 2018, Ilya M wrote: ... Is there a way to submit AR when there are projects attached to queues? I am using SGE 6.2u5. ... Hi Ilya, I've run into this, too: I'm afraid that

Re: [gridengine users] Scheduling maintenance and using advance reservation

2018-06-06 Thread Mark Dixon
On Tue, 5 Jun 2018, Ilya M wrote: ... Is there a way to submit AR when there are projects attached to queues? I am using SGE 6.2u5. ... Hi Ilya, I've run into this, too: I'm afraid that there isn't. I logged it here: https://arc.liv.ac.uk/trac/SGE/ticket/1466 I started to fix it but ran out

Re: [gridengine users] Possible opportunity for development work

2018-05-14 Thread Mark Dixon
Hi Daniel, Well done on wanting to work on gridengine, it's really good to see people interested. Although the topmost layers have clearly suffered from years of applying patches on top of patches on top of patches and so are in sore need of a bit of refactoring, there are some really nice b

Re: [gridengine users] Jobs sitting in queue despite suitable slots and resources available

2018-04-19 Thread Mark Dixon
On Tue, 17 Apr 2018, Joshua Baker-LePain wrote: As an alternative to fixing our current setup, I'd be most interested to hear if/how other folks are handling GPUs in their SoGE setups. I was considering changing the slot count in gpu.q to match the number of GPUs in a host (rather than CPU core

Re: [gridengine users] Corrupt user config?

2018-04-16 Thread Mark Dixon
On Mon, 16 Apr 2018, William Hay wrote: ... I don't think that can be right given that the qmaster complains about multiple user files on start up. If it gave up after the first then presumably it wouldn't complain about the others. All I know is that, when we had this sort of problem, most o

Re: [gridengine users] Corrupt user config?

2018-04-16 Thread Mark Dixon
Hi William, I've seen this before back in the SGE 6.2u5 days when it used to write out core binding options it couldn't subsequently read back in. IIRC, users are read from disk at startup in turn and then the files are only written to from then on - so this sort of thing only tends to be no

Re: [gridengine users] Problems with quotas

2018-03-23 Thread Mark Dixon
Hi Jakub, That's right: if you need to cut down the logging, one option is to add the redirection in the start script. You're looking for the line starting "sge_qmaster", and you might want to try adding a ">/dev/null" after it. You'll lose all syslog messages from sge_qmaster though (normal

Re: [gridengine users] Problems with quotas

2018-03-22 Thread Mark Dixon
Hi, It's this bit that's doing it: "SGE_ND=true". It's there so that the qmaster doesn't daemonise, in order to play nicely with systemd. Unfortunately, as it was originally put in to aid debugging, it also enables some debug messages. If too much is being generated, I'd suggest either redir

Re: Chrome and noto

2018-03-07 Thread Mark Dixon
d of > noto>0:x11-fonts/noto > as it does now, and rebuilt it (which should work [tm]). > > > > mfg Tobias > >> On 6 March 2018 at 09:13, Mark Dixon wrote: >> Hi all, >> >> Chrome seems to have added a dependency on x11-fonts/noto which seems

Chrome and noto

2018-03-06 Thread Mark Dixon
Hi all, Chrome seems to have added a dependency on x11-fonts/noto which seems to be conflicting with KDE's x11-fonts/noto-lite. Is there anything we can do about this? Not sure if depending on the full 800Mb noto is a great move on their part. Mark

Re: [OMPI users] Failed to register memory (openmpi 2.0.2)

2017-11-13 Thread Mark Dixon
t] could not find endpoint with port: 1, lid: 69, msg_type: 100 On Thu, 19 Oct 2017, Mark Dixon wrote: Thanks Ralph, will do. Cheers, Mark On Wed, 18 Oct 2017, r...@open-mpi.org wrote: Put “oob=tcp” in your default MCA param file On Oct 18, 2017, at 9:00 AM, Mark Dixon wrote: Hi, We&

Re: [gmx-users] Gromacs 2016.4 - the Intel compiler and 'make check'

2017-10-26 Thread Mark Dixon
lation are probably also going to affect which compiler implementation works fastest for you. Mark On Thu, 26 Oct 2017 17:10 Mark Dixon wrote: Hi Mark, Many thanks for the reply. Am I going against the flow by using the Intel compiler with GROMACS? I've been using it so far because of -

Re: [gmx-users] Gromacs 2016.4 - the Intel compiler and 'make check'

2017-10-26 Thread Mark Dixon
d the fact that gcc passes fine suggests you should be confident in the code and icc. Mark On Thu, 26 Oct 2017 13:56 Mark Dixon wrote: Hi there, Is there a recommended compiler for GROMACS, please? I'm trying to validate my install on a CentOS 7.4 Intel Broadwell system by running the

[gmx-users] Gromacs 2016.4 - the Intel compiler and 'make check'

2017-10-26 Thread Mark Dixon
Hi there, Is there a recommended compiler for GROMACS, please? I'm trying to validate my install on a CentOS 7.4 Intel Broadwell system by running the tests shipped in the GROMACS source tar ball (and the separate regression tests). If I use GCC (4.8.5 or 7.2.0), everything passes but, if I

Re: [OMPI users] Failed to register memory (openmpi 2.0.2)

2017-10-19 Thread Mark Dixon
Thanks Ralph, will do. Cheers, Mark On Wed, 18 Oct 2017, r...@open-mpi.org wrote: Put “oob=tcp” in your default MCA param file On Oct 18, 2017, at 9:00 AM, Mark Dixon wrote: Hi, We're intermittently seeing messages (below) about failing to register memory with openmpi 2.0.2 on ce

[OMPI users] Failed to register memory (openmpi 2.0.2)

2017-10-18 Thread Mark Dixon
Hi, We're intermittently seeing messages (below) about failing to register memory with openmpi 2.0.2 on centos7 / Mellanox FDR Connect-X 3 and the vanilla IB stack as shipped by centos. We're not using any mlx4_core module tweaks at the moment. On earlier machines we used to set registered m

Re: Baloo performance

2017-08-18 Thread Mark Dixon
ly include > "simple" subfolders of ~, but it fails when trying to index the whole ~. > > > mfg Tobias > > >> On 11 August 2017 at 11:10, Mark Dixon wrote: >> I've stopped in now, would have to set it going again. CPU is not the issue >&g

Re: Baloo performance

2017-08-12 Thread Mark Dixon
retically :D > > mfg Tobias > >> On 10 August 2017 at 22:12, Mark Dixon wrote: >> Hi all, >> >> I'm having problems with Baloo again. I'm not sure exactly what the >> requirements are for it, but I must be doing something wrong. It essentially &

Baloo performance

2017-08-11 Thread Mark Dixon
Hi all, I'm having problems with Baloo again. I'm not sure exactly what the requirements are for it, but I must be doing something wrong. It essentially renders the system unusable. I finally caught it in this state - CPU is minimal, and memory isn't too bad (for this box) - baloo_file was using

11.1-RC2 won't mount zfs

2017-07-08 Thread Mark Dixon
I did a freebsd-update to RC2 yesterday evening. On rebooting with the new kernel, my zfs mounts were not present. In single user mode, the zfs command was core dumping. Fix was to copy /boot/kernel.old to /boot/kernel Backtrace is not super useful I think: lldb) bt * thread #1, name = 'zfs',

Re: [SGE-discuss] Qmaster unresponsive, process status "disk sleep"

2017-06-29 Thread Mark Dixon
On Tue, 27 Jun 2017, juanesteban.jime...@mdc-berlin.de wrote: Never mind. One of my users submitted a job with 139k subjobs. ... Hi, I don't think I have all the messages from this thread for some reason. No doubt I'm going to repeat things someone else has suggested - apologies in advance

Re: kwin_x11 crashes immediately after starting kde session

2017-06-05 Thread Mark Dixon
> > When starting a KDE session - and wow, it starts very fast - I see > backgoundpicture and panel. > > But a few secconds later kwin crashes, restarts, crashes, restarts and > so on, and then I get the offer to change to openbox because of this > crashs. > > Yes, I get exactly the same here, also

KWin (built from github) keeps crashing

2017-06-04 Thread Mark Dixon
Hi, I'm playing with plasma5 from github, but kwin crashes pretty much immediately. I managed to produce this backtrace: [mark@markspc /usr/home/mark]$ lldb /usr/local/bin/kwin_x11 -c kwin_x11.core (lldb) target create "/usr/local/bin/kwin_x11" --core "kwin_x11.core" Core file '/usr/home/mark/kwi

Re: The state of KDE on FreeBSD / root vulnerability...

2017-05-21 Thread Mark Dixon
ould that affect area51? They still want an > up-to-date KDE... > > Cheers, > DMK > tcberner at freebsd.org > On May 20, 2017 7:04 AM, Mark Dixon wrote: > > I suspect area51 svn isn't coming back. PC-BSD is a defunct project, > > they've all moved on t

  1   2   3   4   >