[slurm-users] Re: Doubt with SelectTypeParameters in slurm.conf

2025-03-28 Thread Laura Hild via slurm-users
> After running a simple “helloworld” test, I have noticed that if > SelectTypeParameters=CR_Core, system always reserves me an even > number of CPUs (during “pending” time, I can see the real number > I have requested, but when job starts “running”, that number is > increased to the next even numb

[slurm-users] Re: Using more cores/CPUs that requested with

2025-03-26 Thread Laura Hild via slurm-users
In addition to checking under /sys/fs/cgroup like Tim said, if this is just to convince yourself that you got the CPU restriction working, you could also open `top` on the host running the job and observe that %CPU is now being held to 200,0 or lower (or if its multiple processes per job, summin

[slurm-users] Re: slurmrestd equivalent to "srun -n 10 echo HELLO"

2025-03-24 Thread Laura Hild via slurm-users
> When I run something like `srun -n 10 echo HELLO', I get HELLO > returned to my console/stdout 10x. When I submit this command > as a script to the /jobs/submit route, I get success/200, but I > cannot determine how to get the console output of HELLO 10x in > any form. It's not in my stdout log

Re: [lustre-discuss] Lustre MDT/OST Mount Failures During Virtual Machine Reboot with Pacemaker

2025-03-17 Thread Laura Hild via lustre-discuss
petek, 14. marec 2025 23:06 Za: Laura Hild Kp: lustre-discuss Zadeva: Re: [lustre-discuss] Lustre MDT/OST Mount Failures During Virtual Machine Reboot with Pacemaker Thank you for your advice. A user named Oyvind replied on the us...@clusterlabs.org mailing list: You need the systemd drop-in fu

[slurm-users] Re: Assistance with Burst Buffer Configuration in slurm

2025-03-12 Thread Laura Hild via slurm-users
Hi Manisha. Does your Slurm build/installation have burst_buffer_lua.so? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Re: [lustre-discuss] Lustre MDT/OST Mount Failures During Virtual Machine Reboot with Pacemaker

2025-03-05 Thread Laura Hild via lustre-discuss
I'm not sure what to say about how Pacemaker *should* behave, but I *can* say I virtually never try to (cleanly) reboot a host from which I have not already evacuated all resources, e.g. with `pcs node standby` or by putting Pacemaker in maintenance mode and unmounting/exporting everything manua

Re: [lustre-discuss] mkfs.lustre for ZFS draid

2025-02-06 Thread Laura Hild
> But how can I address them when running mkfs.lustre, if I still want to get 7 > OSTs out of it? When we started using draid, we went from having six ten-disk OSTs per OSS to one sixty-disk OST per OSS. I'm unsure why one would want seven OSTs on a single pool, being that if the pool becomes

[slurm-users] Re: Issue running slurm commands as normal account but work as root.

2025-02-05 Thread Laura Hild via slurm-users
Is your regular user unable to read the slurm.conf? How is the cluster set up to get the hostname of the Slurm controller? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Re: [lustre-discuss] 2.16.1 ptlrpcd infinite loop when machine runs out of RAM

2025-02-05 Thread Laura Hild
I wanna say 2.15 added those messages (the obd_memory ones, not the spinning ptlrpcd) to every OoM. I remember seeing them when we first had 2.15 clients and looking them up. I take it you're not getting a corresponding OoM for each, though? It is typical for a host to struggle if OoM conditio

Re: [lustre-discuss] Failed to Load ko2iblnd with Lustre Version lustre-2.15.6-1

2025-01-10 Thread Laura Hild
Hi Yasir. Did you install https://downloads.whamcloud.com/public/lustre/lustre-2.15.6/el8.10/server/RPMS/x86_64/kmod-lustre-2.15.6-1.el8.x86_64.rpm, your own build, or something else? Ordinarily the MOFED-compatible builds are in the -ib directories, and I haven't seen one since 2.15.2. If yo

[slurm-users] Re: jobs dropping

2024-11-12 Thread Laura Hild via slurm-users
Who is uid=64030? What is in the slurmctld log for the same timeframe? How does `sacct -j 1079` say the job ended? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Why AllowAccounts not work in slurm-23.11.6

2024-10-30 Thread Laura Hild via slurm-users
concerning AllowAccounts, "This list is also hierarchical, meaning subaccounts are included automatically." Od: shaobo liu Poslano: torek, 29. oktober 2024 20:49 Za: Laura Hild Kp: slurm-users Zadeva: Re: [slurm-users] Re: Why AllowAccounts not work

[slurm-users] loss of "unchangeable" node features

2024-10-23 Thread Laura Hild via slurm-users
Has anyone else noticed, somewhere between versions 22.05.11 and 23.11.9, losing fixed Features defined for a node in slurm.conf, and instead now just having those controlled by a NodeFeaturesPlugin like node_features/knl_generic? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To

[slurm-users] Re: Why AllowAccounts not work in slurm-23.11.6

2024-10-19 Thread Laura Hild via slurm-users
What do you have for `sacctmgr list account`? If "root" is your top-level (Slurm) (bank) account, AllowAccounts=root may just end up meaning any account. To have AllowAccounts limit what users can submit, you'd need to name a lower-level Slurm (bank) account that only some users have an Associ

[slurm-users] Re: Dependency jobs

2024-10-16 Thread Laura Hild via slurm-users
> I know you can show job info and find what dependency a job is waiting > for, But more after checking if there are jobs waiting on the current > job to complete using the job ID, You mean you don't wanna like squeue -o%i,%E | grep SOME_JOBID ? Although I guess that won't catch a matching `s

[slurm-users] Re: Randomly draining nodes

2024-10-15 Thread Laura Hild via slurm-users
Your slurm.conf should be the same on all machines (is it? you don't have Prolog configured on some but not others?), but no, it is not mandatory to use a prolog. I am simply surprised that you could get a "Prolog error" without having a prolog configured, since an error in the prolog program

[slurm-users] Re: Randomly draining nodes

2024-10-08 Thread Laura Hild via slurm-users
Apologies if I'm missing this in your post, but do you in fact have a Prolog configured in your slurm.conf? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Best practices for tracking jobs started across multiple clusters for accounting purposes.

2024-08-30 Thread Laura Hild via slurm-users
Can whatever is running those sbatch commands add a --comment with a shared identifier that AccountingStoreFlags=job_comment would make available in sacct? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: With slurm, how to allocate a whole node for a single multi-threaded process?

2024-08-02 Thread Laura Hild via slurm-users
My read is that Henrique wants to specify a job to require a variable number of CPUs on one node, so that when the job is at the front of the queue, it will run opportunistically on however many happen to be available on a single node as long as there are at least five. I don't personally know

[slurm-users] Re: With slurm, how to allocate a whole node for a single multi-threaded process?

2024-08-01 Thread Laura Hild via slurm-users
So you're wanting that, instead of waiting for the task to finish and then running on the whole node, that the job should run immediately on n-1 CPUs? If there were only one CPU available in the entire cluster, would you want the job to start running immediately on one CPU instead of waiting fo

[slurm-users] Re: With slurm, how to allocate a whole node for a single multi-threaded process?

2024-08-01 Thread Laura Hild via slurm-users
Hi Henrique. Can you give an example of sharing being unavoidable? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Job Step State

2024-07-12 Thread Laura Hild via slurm-users
There's an enum job_states in slurm.h. It becomes OUT_OF_MEMORY, &c. in the job_state_string function in slurm_protocol_defs.c. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Node (anti?) Feature / attribute

2024-06-17 Thread Laura Hild via slurm-users
> Could you post that snippet? function slurm_job_submit ( job_desc, part_list, submit_uid ) if job_desc.features then if not string.find(job_desc.features,"el9") then job_desc.features = job_desc.features .. '¢os79' end else job_desc.features = "centos79" end return slur

[slurm-users] Re: Node (anti?) Feature / attribute

2024-06-14 Thread Laura Hild via slurm-users
I wrote a job_submit.lua also. It would append "¢os79" to the feature string unless the features already contained "el9," or if empty, set the features string to "centos79" without the ampersand. I didn't hear from any users doing anything fancy enough with their feature string for the ampersa

[slurm-users] Re: Jobs showing running but not running

2024-05-29 Thread Laura Hild via slurm-users
> sudo systemctl restart slurmd # gets stuck Are you able to restart other services on this host? Anything weird in its dmesg? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: srun weirdness

2024-05-15 Thread Laura Hild via slurm-users
PropagateResourceLimitsExcept won't do it? Od: Dj Merrill via slurm-users Poslano: sreda, 15. maj 2024 09:43 Za: slurm-users@lists.schedmd.com Zadeva: [EXTERNAL] [slurm-users] Re: srun weirdness Thank you Hemann and Tom! That was it. The new cluster ha

Re: [lustre-discuss] lustre 2.15.3

2024-03-29 Thread Laura Hild via lustre-discuss
update, Lustre updates may be wrapped up in that also, if you aren't effectively rolling your own custom releases. Od: Khoi Mai Poslano: petek, 29. marec 2024 11:31 Za: Laura Hild Kp: lustre-discuss@lists.lustre.org Zadeva: Re: [lustre-discuss] lu

Re: [lustre-discuss] lustre 2.15.3

2024-03-29 Thread Laura Hild via lustre-discuss
Hi Khoi- You could probably back-port to 2.15.3 the specific patch or patches that give 2.15.4 support for EL8.9 kernels, while leaving out the other fixes in 2.15.4, but that raises the question of why exactly you don't want those other fixes but do want the newer kernel. -Laura ___

Re: [SCIENTIFIC-LINUX-USERS] XFS vs Ext4

2023-12-05 Thread Laura Hild
> No! No. No LVM! Bad admin, no biscuit! > [...] > *Bad* admin. Where's my squirt bottle? Yeah, I wrote "if you're a slice-and-dicer" for a reason. One big root is a fine option, but it's not the situation I was imagining where one is concerned with shrinkability. I think having hard limits on

Re: [SCIENTIFIC-LINUX-USERS] XFS vs Ext4

2023-12-04 Thread Laura Hild
> But shrinking a partition is something that is far less likely to need > to be done in an enterprise environment (compared to me fighting with > too old hardware at home), so maybe that doesn't really matter. Especially if you use LVM and leave free space in the VG. If you're a slice-and-dicer

Re: XFS vs Ext4

2023-12-04 Thread Laura Hild
I've been letting it default as long as it's been EL's default, and the last time I had trouble with XFS was EL6.2.

Re: [lustre-discuss] Lustre mds/ods Server with IB/omnipath and Ethernet clients (dual homed?)

2023-11-30 Thread Laura Hild via lustre-discuss
Hi Philipp- I don't do this a ton so I'm hazy, but do you set nids or nets when you mkfs.lustre? So then maybe you have to tunefs those in when you add more? -Laura Od: lustre-discuss v imenu Philipp Grau Poslano: sreda, 29. november 2023 06:37 Za:

Re: [lustre-discuss] OSS on compute node

2023-10-13 Thread Laura Hild via lustre-discuss
> What is the resource consumption (memory anc CPU) of a storage server? The impression I've been under is that it's less about being able to put sufficient resources in one host and more about the potential for deadlock. ___ lustre-discuss mailing lis

Re: [lustre-discuss] File size discrepancy on lustre

2023-09-16 Thread Laura Hild via lustre-discuss
> Are you using any file mirroring (FLR, "lfs mirror extend") on the files, > perhaps before the "lfs getstripe" was run? We can check with the user on Monday, but do I read https://doc.lustre.org/lustre_manual.xhtml#flr.interop correctly that lsh@qcd16p0314 /c/S/C/N/s/genprop_db3> rpm -q kmod-

Re: [lustre-discuss] Getting started with Lustre on RHEL 8.8

2023-09-12 Thread Laura Hild via lustre-discuss
Hi Cyberxstudio- I assume you're talking about Chapter 8 of the Operations Manual. It could be updated to make it clearer that the instructions generalize, but looking them over briefly I don't think the names of those packages have changed, and the Yum commands should be processed by DNF just

Re: [lustre-discuss] Long failover time problem during Lnet bonding

2023-09-01 Thread Laura Hild via lustre-discuss
Hello, Sunghwan- Is this an active-passive, single-nid setup, or multi-rail, and if the former, did you configure the dev_failover parameter on the ko2iblnd module? -Laura ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lu

Re: [lustre-discuss] getting without inodes

2023-08-11 Thread Laura Hild via lustre-discuss
Good morning, Carlos. Are those LDiskFS or ZFS targets, and what do the non-inode `lfs df`s look like? ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] 2.15 install failure

2023-08-04 Thread Laura Hild via lustre-discuss
I want to say I used to have a similar problem with...amdgpu-dkms, maybe? Where a reinstall or update operation wouldn't work right, seemingly because the cleanup for the previous version of the package would wipe out what the new one installed, so my procedure became to remove and install in t

Re: [lustre-discuss] how does lustre handle node failure

2023-07-18 Thread Laura Hild via lustre-discuss
I'm not familiar with using FLR to tolerate OSS failures. My site does the HA pairs with shared storage method. It's sort of described in the manual https://doc.lustre.org/lustre_manual.xhtml#configuringfailover but in more, Pacemaker-specific detail at https://wiki.lustre.org/Creating_a

Re: [lustre-discuss] Lustre 2.15.1 server with ZFS nothing provides ksym

2023-07-11 Thread Laura Hild via lustre-discuss
> In the meantime, I've seen an email in the mailing list > suggesting that the symbols are in fact provided by the package > kmod-zfs, which is not provided by the OpenZFS repos, but that > can be built manually, so I thought I'd have another crack at > getting 2.15.1 working. I download the tar,

Re: [lustre-discuss] Question about Install OFED with --with-nvmf

2023-06-28 Thread Laura Hild via lustre-discuss
uotas Od: 王烁斌 Poslano: torek, 27. junij 2023 21:03 Za: Laura Hild Zadeva: Re: [lustre-discuss] Question about Install OFED with --with-nvmf I need kernel with _lustre, so that I can use Lustre FS ___ lustre-discuss mailing

Re: [lustre-discuss] Question about Install OFED with --with-nvmf

2023-06-27 Thread Laura Hild via lustre-discuss
Hi Shuobin- If you peek inside the mlnxofedinstall script, you'll see that it specifically checks for /lustre/ kernels. Do you need to be using the _lustre kernel or would patchless suffice? -Laura ___ lustre-discuss mailing list lustre-discuss@list

Re: [lustre-discuss] CentOS Stream 8/9 support?

2023-06-22 Thread Laura Hild via lustre-discuss
We have one, small Stream 8 cluster, which is currently running a Lustre client to which I cherry-picked a kernel compatibility patch. I could imagine the effort being considerably more for the server component. I also wonder, even if Whamcloud were to provide releases for Stream kernels, how

Re: [lustre-discuss] Question About Mellanox-RDMA On Lustre

2023-06-06 Thread Laura Hild via lustre-discuss
Hi Shuobin- That's a lot of interfaces! How is this network connected up physically? On which node is the MGS (a8?) mounted when you're attempting the second mount? What tests have you done to verify RDMA/LNet connectivity between your nodes? -Laura ___

Re: [lustre-discuss] root squash config not working anymore

2023-06-05 Thread Laura Hild via lustre-discuss
Good morning, Jane- I want to say when my group has had this problem, the most reliable fix has been to unmount the MDT, lustre_rmmod, remount the MDT, and conf_param again (which unfortunately, as you no doubt recognize, results in a brief, but significant, interruption in availability). -Lau

Re: [lustre-discuss] Lustre HA

2023-03-29 Thread Laura Hild via lustre-discuss
> due to a server failure. Or (certain kinds of) maintenance. More broadly, gives you the flexibility to move which server is hosting a particular target with less interruption. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lis

Re: [lustre-discuss] Lustre HA

2023-03-29 Thread Laura Hild via lustre-discuss
Nick- See chapter 11 of the Lustre Operations Manual at https://doc.lustre.org/lustre_manual.xhtml#configuringfailover The advantage is reducing the amount of time that a target, and therefore part of one's filesystem, is inaccessible due to a server failure. -Laura _

Re: [lustre-discuss] From ceph to lustre. what is the right setup for high availability in a small cluster?

2023-03-27 Thread Laura Hild via lustre-discuss
Hi Arvid- > This makes me wonder what lustre even adds in this scenario, > since zfs is already doing the heavy lifting of managing > replication and high availability. ZFS by itself is a local filesystem which you can only mount on one host at a time; what Lustre adds is taking several ZFS file

Re: [lustre-discuss] Mounting lustre on block device

2023-03-17 Thread Laura Hild via lustre-discuss
Hi Shambhu- I think lustre-discuss might be able to help you better if you were to explain why it is that you want to mount a Lustre filesystem as a block device. Is it just to get it to show up in the output of lsblk? Would you prefer the output of findmnt? -Laura _

Re: [lustre-discuss] Node Failure in Lustre

2023-03-15 Thread Laura Hild via lustre-discuss
Hi Nick- If there is no MDS/MGS/OSS currently hosting a particular MDT/MGT/OST, then what is stored there will not be accessible. I suggest looking at https://doc.lustre.org/lustre_manual.xhtml#lustrerecovery -Laura ___ lustre-discuss mailing lis

Re: [lustre-discuss] Configuring LustreFS Over DRBD

2023-03-15 Thread Laura Hild via lustre-discuss
Hi Shambhu- I believe neither the ldiskfs nor ZFS OSDs support an active-active configuration (except in the sense that one can have multiple targets, some of which are active on one host and others on another). It seems reasonable to me, only having used DRBD and Lustre independently of each

Re: [lustre-discuss] dkms error in zfs

2023-03-01 Thread Laura Hild via lustre-discuss
Hi Nick- Ask DNF? dnf whatprovides 'pkgconfig(libnl-genl-3.0) >= 3.1' I don't know enough about how build-deps are handled with DKMS to say whether it should have already installed them. -Laura ___ lustre-discuss mailing list lustre-discuss@lists.

Re: [lustre-discuss] ZFS Support for Lustre

2023-02-24 Thread Laura Hild via lustre-discuss
Hi Nick- I see that you are using zfs-dkms instead of kmod-zfs. When using zfs-dkms, it is of particular relevance to your modprobe error whether DKMS actually built and installed modules into your running kernel's modules directory. Does dkms status say it has installed modules for that s

Re: [lustre-discuss] Downloading Lustre packages from github

2023-02-23 Thread Laura Hild via lustre-discuss
Hi Nick- You asked about this before (http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2023-January/018442.html) -- is that solution not working in this case? Are you looking for https://wiki.lustre.org/Compiling_Lustre#Development_Software_Installation_for_Normal_Build_Process

Re: [lustre-discuss] ZFS Support For Lustre

2023-02-15 Thread Laura Hild via lustre-discuss
Hi Nick- I too have had little success finding pre-built kmod-zfs to pair with Lustre. To provide zfs-kmod = 2.1.2 I believe you can use either the zfs-dkms package (which also provides `zfs-kmod` despite not being `kmod-zfs`), or build your own kmod-zfs as I suggested in one of my previous me

Re: [lustre-discuss] Question about lustre2.15.2 between server and client instal

2023-02-13 Thread Laura Hild via lustre-discuss
Hi 王烁斌- There is a caution in section 13.3 (https://doc.lustre.org/lustre_manual.xhtml#mounting_server) of the Lustre Operations Manual, > Do not do this when the client and OSS are on the same node, as memory > pressure between the client and OSS can lead to deadlocks. It is unclear to me fr

Re: [lustre-discuss] Mount Lustre Filesystem on Client with non root user

2023-02-06 Thread Laura Hild via lustre-discuss
Hi Nick- > How to mount a lustre filesystem on a client without root privileges to > non-root user? I think it might help to start by giving the reason you want to do this instead of mounting the filesystem at boot. -Laura ___ lustre-discuss mailing

Re: [lustre-discuss] Mounting Lustre Client through non root user

2023-02-06 Thread Laura Hild via lustre-discuss
Hi Nick- The x-systemd.automount option is documented in the man page systemd.mount. -Laura ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Regarding Lustre with ZFS

2023-02-03 Thread Laura Hild via lustre-discuss
You'll need either lustre-zfs-dkms or kmod-lustre-osd-zfs and their dependencies, but as you found before, you're probably going to have to build your own kmod-zfs to satisfy the latter's. Od: lustre-discuss v imenu Nick dan via lustre-discuss Poslano

Re: [lustre-discuss] Mounting Lustre Client through non root user

2023-02-03 Thread Laura Hild via lustre-discuss
Hi Nick- I can't say I know whether Lustre supports the `user` option, but an alternative might be to use `x-systemd.automount`. -Laura Od: lustre-discuss v imenu Nick dan via lustre-discuss Poslano: petek, 03. februar 2023 01:56 Za: lustre-discuss-r

Re: [lustre-discuss] Lustre with ZFS Install

2023-01-24 Thread Laura Hild via lustre-discuss
Hi, Nick- keyutils-libs is not the devel package. I think it is likely if you run dnf search keyutils that you will find the package you need to install. -Laura Od: lustre-discuss v imenu Nick dan via lustre-discuss Poslano: torek, 24. januar 202

Re: [lustre-discuss] ZFS rpm not getting install.

2023-01-24 Thread Laura Hild via lustre-discuss
Those dependencies are provided by the kmod-zfs package, which is not included in the same repository. It looks like the oldest kmod-zfs provided by the OpenZFS project for EL8.6 is 2.1.4, which might work, but the straightforward thing to do is probably just to build a kmod-zfs-2.1.2 yourself

Re: [lustre-discuss] User find out OST configuration

2023-01-23 Thread Laura Hild via lustre-discuss
> Additionally, how can a user find out the mapping of all available OSTs to > OSSs easily? I've used variations on grep -Fe failover_nids: -e current_connection: /proc/fs/lustre/*c/*/import ___ lustre-discuss mailing list lustre-discuss@lists.lustre

Re: [lustre-discuss] lustre hangs on ls -l

2023-01-11 Thread Laura Hild via lustre-discuss
Does the unlink command work? ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Regarding Lustre with RDMA

2023-01-05 Thread Laura Hild via lustre-discuss
Good morning, Nick- One does not mount the OST block device on multiple hosts simultaneously: the OST is mounted on a single OSS, and the clients connect to the OSS to request I/O (i.e. Lustre is a distributed filesystem, not a shared-disk filesystem). To make those requests using RDMA instead

Re: [lustre-discuss] problem building lustre 2.12.9

2022-09-27 Thread Laura Hild via lustre-discuss
Hi Riccardo- Did you install the mlnx-ofa_kernel-devel package? I’m also curious why to build the client you are attempting to rebuild the DKMS SRPM, when I would think the way to get a client built would be to install the DKMS “binary” RPM, or to rebuild the regular Lustre SRPM. -Laura ___

Re: [lustre-discuss] ZFS file error of MDT

2022-09-23 Thread Laura Hild via lustre-discuss
Hi Ian- It looks to me like that hardware RAID array is giving ZFS data back that is not what ZFS thinks it wrote. Since from ZFS’ perspective there is no redundancy in the pool, only what the RAID array returns, ZFS cannot reconstruct the file to its satisfaction, and rather than return data t

Re: [lustre-discuss] lustre-client not working in 8.5

2022-08-08 Thread Laura Hild via lustre-discuss
Hello, Amit- While I imagine you figured this out in the intervening month, for the benefit of anyone finding this thread later, a common reason for this error, not specific to Lustre, is that module.sig_enforce is enabled, often as a consequence of having Secure Boot enabled, and the module yo

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-10 Thread Laura Hild via lustre-discuss
Hi Andrew- The non-dummy SRP module is in the kmod-srp package, which isn't included in the Lustre repository. I'm less certain than I'd like to be, as ours is a DKMS setup rather than kmod, and the last time I had an SRP setup was a couple years ago, but I suspect you may have success if you

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-06 Thread Laura Hild via lustre-discuss
Absolutely try MOFED. The problem you're describing is extremely similar to one we were dealing with in March after we patched to 2.12.8, right down to those call traces. Went away when we switched. -Laura ___ lustre-discuss mailing list lustre-disc

Re: [lustre-discuss] Anyone know why lustre-zfs-dkms-2.12.8_6_g5457c37-1.el7.noarch.rpm won't install?

2022-05-04 Thread Laura Hild via lustre-discuss
Hi Keith- I can reproduce the failure with dkms-3.0.3 and install successfully with dkms-2.8.5. There was some discussion on this list in October (http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2021-October/017812.html) but I don't know if anyone ended up submitting a patch. -Lau

Re: [lustre-discuss] MOFED & Lustre 2.14.51 - install fails with dependency failure related to ksym/MOFED

2021-05-21 Thread Laura Hild via lustre-discuss
Hi Pinkesh- Not sure how relevant this is for server builds, but when I've built Lustre clients against MOFED, I've had to use mlnx_add_kernel_support.sh --kmp rather than use the "kver" packages, in order to avoid ksym dependency errors when installing. -Laura __

Re: [lustre-discuss] ZFS and OST Space Difference

2021-04-06 Thread Laura Hild via lustre-discuss
> I am not sure about the discrepancy of 3T. Maybe that is due to some ZFS > and/or Lustre overhead? Slop space? https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html#spa-slop-shift -Laura Od: lustre-discuss v imenu M

Re: [lustre-discuss] [EXTERNAL] Re: need to always manually add network after reboot

2021-02-24 Thread Laura Hild via lustre-discuss
> Or you can manually build lnet.conf as lnetctl seems to have occasion > problems with some of the fields exported by "lnetctl export --backup" I've noticed, in particular, LNetError: 122666:0:(peer.c:372:lnet_peer_ni_del_locked()) Peer NI x.x.x.x@tcp is a gateway. Can not delete it and