Re: [lustre-discuss] Lnet not going up with InfiniHost III Lx HCA card

2025-02-07 Thread Jesse Stroik
Hi Ramiro, The invalid MR size looks like you're running into a limit with your cards setting up the RDMA (o2ib) LND when bringing up the network. There may be adjustments or workarounds for it possibly including setting map_on_demand=0 as an argument to the lnet module there. And since you ar

[lustre-discuss] Switching server LNDs / NIDs and updating clients to use two LNDs on a single IP

2025-01-08 Thread Jesse Stroik
Hi Lustre users, I'm looking for a bit of a sanity check here before i go down this path. I've been dealing with a communication problem over lnet that triggers under some conditions for one for our clusters after upgrading. I thought we'd solved it by disabling LNET multi-rail but that doesn't

Re: [lustre-discuss] inodes are full for one of the MDT

2025-01-08 Thread Jesse Stroik
Are you using ZFS for that MDT? ZFS allocates inodes dynamically, but will stop when it runs out of space. You could expand your zpool with additional disks. If you are using ZFS, and don't have extra space you can allocate it might also be worth checking your ashift value on your MDT. For 512e

Re: [lustre-discuss] Connectivity issues after client crash

2024-12-10 Thread Jesse Stroik
ITS] Sent: Wednesday, December 4, 2024 11:08 AM To: Jesse Stroik; lustre-discuss@lists.lustre.org Subject: Re: Connectivity issues after client crash No, we're using o2iblnd primarily now, we just used socklnd until we were able to migrate the bulk of nodes to infiniband. There are a small number of

Re: [lustre-discuss] Connectivity issues after client crash

2024-12-03 Thread Jesse Stroik
many ethernet clients. Jesse From: Nehring, Shane R [ITS] Sent: Tuesday, December 3, 2024 2:50 PM To: lustre-discuss@lists.lustre.org; Jesse Stroik Subject: Re: Connectivity issues after client crash Hello Jesse, What I think we may have been hitting in

Re: [lustre-discuss] Connectivity issues after client crash

2024-12-03 Thread Jesse Stroik
Hi Shane, I realize this is quite an old post but I think it is worth responding for posterity and because I suspect others who upgrade may run into this issue. I'm observing some similar issues to what you describe. They started this weekend for us on two of our servers which were upgraded to

Re: [lustre-discuss] lustre 2.15 installation on Centos 8.5.2111

2022-08-23 Thread Jesse Stroik via lustre-discuss
, 2022 1:28 PM To: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] lustre 2.15 installation on Centos 8.5.2111 In addition, you might need to use the "--nodeps" to install your self-compiled packages, cf. https://jira.whamcloud.com/browse/LU-15976 Cheers Thomas On 7/12/22 19:

Re: [lustre-discuss] lustre 2.15 installation on Centos 8.5.2111

2022-07-12 Thread Jesse Stroik via lustre-discuss
Hi Fran, I suspect the issue is a missing kmod-zfs RPM which provides those symbols. It might be the case that it was inadvertently excluded from the whamcloud repo. You could build your own RPMs on a centos system with the group 'Development Tools' installed. I'd recommend doing it as an unpri

Re: [lustre-discuss] mv / rename not working across directory boundary ("Invalid cross device link")

2020-01-29 Thread Jesse Stroik
g project quotas? On Mon, Jan 27, 2020 at 1:05 PM Jesse Stroik mailto:jesse.str...@ssec.wisc.edu>> wrote: Hello, I have an interesting situation one of our lustre file systems where I cannot rename files or directories across a specific directory boundary but

[lustre-discuss] mv / rename not working across directory boundary ("Invalid cross device link")

2020-01-27 Thread Jesse Stroik
stem and a new one. The clients are either 2.10 or 2.12. The file system is run by 2.12.2 on zfs. nodemap is not active. Best, Jesse Stroik smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lis

Re: [lustre-discuss] Replacing ldiskfs MDT with larger disk

2019-08-09 Thread Jesse Stroik
Heads up for the list - the 'lfs migrate -m1' didn't free up space on mdt0 and seemed to cause space use on mdt0 to increase rapidly as we migrated directories to mdt1. This may be an issue only with older versions of lustre -- these servers are on 2.8.0. Best, Jesse smime.p7s Description

Re: [lustre-discuss] Using lfs migrate to move files between MDTs

2019-08-05 Thread Jesse Stroik
Rick, I've run into the same issue. I don't think this is like LU-11306. Moving a file into a directory assigned to another MDT doesn't change its MDT assignment. Copying a the same file into that directory does. That behavior is as expected. However, when I migrated files off of MDT1 with 'l

Re: [lustre-discuss] Replacing ldiskfs MDT with larger disk

2019-08-05 Thread Jesse Stroik
Ah, nevermind. It appears that this can be done if 'lfs migrate -m' is used directly instead of the lfs_migrate script. Best, Jesse On 8/5/19 11:26 AM, Jesse Stroik wrote: On 7/31/19 6:27 PM, Andreas Dilger wrote: Just to clarify, when I referred to "file level backup/

Re: [lustre-discuss] Replacing ldiskfs MDT with larger disk

2019-08-05 Thread Jesse Stroik
ate in the background to effect the migration so it would be transparent to the end users. Is there a better way to migrate use to the new MDT than recreating all of the directories? Jesse Cheers, Andreas On Jul 31, 2019, at 15:10, Jesse Stroik wrote: This is excellent information, Andreas.

Re: [lustre-discuss] Replacing ldiskfs MDT with larger disk

2019-07-31 Thread Jesse Stroik
e file system and it may be replaced next year so I suspect they'll opt for the DNE method. Thanks again, Jesse Stroik On 7/31/19 3:11 PM, Andreas Dilger wrote: Normally the easy answer would be that a "dd" copy of the MDT device from your HDDs to a larger SSD LUN, then resize2

[lustre-discuss] Replacing ldiskfs MDT with larger disk

2019-07-31 Thread Jesse Stroik
at seemed definitive about ensuring no changes to an ldiskfs MDT during operation and I don't want to assume i can simply remount it read only. Thanks, Jesse Stroik smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss ma

[robinhood-support] Broken symlinks not evaluated by policies

2018-08-30 Thread Jesse Stroik via robinhood-support
running 3.1 on both a centos 6 server and a centos 7 server, both with lustre 2.10.2. Best, Jesse Stroik smime.p7s Description: S/MIME Cryptographic Signature -- Check out the vibrant tech community on one of the world

[lustre-discuss] lnetctl --ip2net configures first interface

2018-03-21 Thread Jesse Stroik
IP addresses. ens224 is 192.168.1.5/24 and ens256 is 128.104.109.161/22. Best, Jesse Stroik smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] LustreError on ZFS volumes

2016-12-13 Thread Jesse Stroik
sonably reproducible, we'll reinitialize the RAID array and reformat the vdev. Thanks for your help, Tom! Best, Jesse Stroik On 12/12/2016 03:51 PM, Crowe, Tom wrote: Hi Jessie, In regards to you seeing 370 objects with errors form ‘zpool status’, but having over 400 files with “access

Re: [lustre-discuss] LustreError on ZFS volumes

2016-12-12 Thread Jesse Stroik
eshooting if an entire file is corrupt, or parts of the file. After the replication, you should set the replicated VDEV to read only with ‘zfs set readonly=on destination_pool/source_ost_replicated’ Thank you for this suggestion. We'll most likely do that. Best, Jesse Stroik smime.p7

[lustre-discuss] LustreError on ZFS volumes

2016-12-12 Thread Jesse Stroik
One of our lustre file systems still running lustre 2.5.3 and zfs 0.6.3 experienced corruption due to a bad RAID controller. The OST in question was a RAID6 volume which we've marked inactive. Most of our lustre clients are 2.8.0. zfs status reports corruption and checksum errors. I have not r

Re: [robinhood-support] monitoring robinhood changelog status

2015-07-31 Thread Jesse Stroik
D_${fsname} - ${RESULT}" Using a non-modified init script is fantastic for future maintainability and the results for our monitoring purposes have been, as you'd expect, on point. Best, Jesse Stroik smime.p7s Description: S/MIME Cry

Re: [robinhood-support] monitoring robinhood changelog status

2015-07-15 Thread Jesse Stroik
ponsive and it feels like a hack which we'd like to purge. An idea we have is to modify the robinhood startup and use that to report the status of the file systems. But before doing that, it seems prudent to ask you if you'd recommend a different method because we want to take the b

[slurm-dev] Re: Change in behavior of "--export" option

2015-03-27 Thread Jesse Stroik
up an environment for srun this has the effect of breaking that environment. You can avoid this behavior by adding '--export=ALL' as an argument to srun or by exporting specific variables to it as needed. Best, Jesse Stroik University of Wisconsin On 10/23/2014 2:36 PM, L. Sha

[robinhood-support] Robinhood lustre client versions

2014-12-09 Thread Jesse Stroik
We've run into LU-5150 on one of our lustre-on-zfs file systems. This can cause instability on our robinhood server, which we reply on for maintaining aspects of several large file systems. Has robinhood been tested with divergent versions between the clients and server? We'd like to install 2.

Re: .link TLD spammer haven?

2014-10-23 Thread Jesse Stroik
nfigurable list appears. I suspect they'll allow a configurable list of TLDs going forward. Best, Jesse Stroik

Re: .link TLD spammer haven?

2014-10-22 Thread Jesse Stroik
3.3.2-4 -- is it a known issue? The URL in question is on a single line and is easily pulled out with egrep and properly parsed with the body rule. Best, Jesse Stroik On 10/13/2014 2:53 PM, Dave Funk wrote: On Mon, 13 Oct 2014, Philip Prindeville wrote: Every connection I’ve gotten f

[slurm-dev] Re: Intel MPI Performance inconsistency (and workaround)

2014-08-28 Thread Jesse Stroik
Yes, I did test this and can confirm it worked. Thanks. Best, Jesse Stroik University of Wisconsin On 8/28/2014 8:54 AM, Holmes, Christopher (CMU) wrote: Andy is right. When you restart the slurmd daemon, it inherits the system limits from your login session, which are different from the

[slurm-dev] Re: Intel MPI Performance inconsistency (and workaround)

2014-08-25 Thread Jesse Stroik
unlimited That can account for differences in processing between system startup and subsequently restarting the daemons by hand. Andy On 08/21/2014 02:42 PM, Jesse Stroik wrote: Slurmites, We recently noticed sporadic performance inconsistencies on one of our clusters. We discovered that if we

[slurm-dev] Bug in displaying nodes for pending jobs with multiple CPUs per task

2014-08-25 Thread Jesse Stroik
be off by that factor. Best, Jesse Stroik

[slurm-dev] Re: Intel MPI Performance inconsistency (and workaround)

2014-08-22 Thread Jesse Stroik
esse On 8/21/2014 6:15 PM, Christopher Samuel wrote: On 22/08/14 04:43, Jesse Stroik wrote: We recently noticed sporadic performance inconsistencies on one of our clusters. What distro is this? Are you using cgroups? cheers, Chris

[slurm-dev] Re: Intel MPI Performance inconsistency (and workaround)

2014-08-21 Thread Jesse Stroik
Yes, but we aren't specifying it for all of these jobs. In the config we have: --- TaskPlugin=task/affinity TaskPluginParam=Sched SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK --- And we typically suggest "--cpu_bind=core --distribution=block:block" for srun

[slurm-dev] Re: Account / partition association on heterogeneous clusters

2014-08-21 Thread Jesse Stroik
to set a default account for each user-partition combination. Best, Jesse Stroik University of Wisconsin On 8/13/2014 12:43 PM, Jesse Stroik wrote: Our cluster has two primary groups of users. The users groups each have a different account from which we designate shares and for which we provide

[slurm-dev] Intel MPI Performance inconsistency (and workaround)

2014-08-21 Thread Jesse Stroik
rmining the cause. We wanted to share this experience with others in case it can help other users or if any slurm developers would like us to file a bug report and be interested in gathering further information. Best, Jesse Stroik University of Wisconsin

[slurm-dev] Account / partition association on heterogeneous clusters

2014-08-13 Thread Jesse Stroik
see a way to allow SLURM to search the association tables for a valid account for the user/partition combination. Best, Jesse Stroik University of Wisconsin

[Lustre-discuss] Unusable ZFS backup MDT/MGS after change

2014-05-21 Thread Jesse Stroik
on-working MGS/MDT which we could switch to if testing were requested. We are running lustre 2.4.0 on the servers and have tested with 2.4.0 and 2.1.6 clients. Best, Jesse Stroik University of Wisconsin ___ Lustre-discuss mailing list Lust

Re: [Gluster-users] gluster client performance

2011-08-09 Thread Jesse Stroik
is that our observed performance was poor using the 2.6.18 RHEL 5 kernel line relative to the mainline (2.6.35) kernels. Updating to the newer kernels was well worth the testing and downtime. Hopefully this information can help others. Best, Jesse Stroik ___ G

[networking-discuss] IBoIP performance between Solaris and Linux

2009-12-04 Thread Jesse Stroik
I have been testing Solaris to Linux performance using IBoIP and the results have been poorer than expected. I typically get about 3.5Gbit/sec aggregate between the Solaris host and 1 more Linux hosts. The setup follows: - The Solaris host and the primary Linux host are conn

Re: [zfs-discuss] Data balance across vdevs

2009-11-23 Thread Jesse Stroik
Erik and Richard: thanks for the information -- this is all very good stuff. Erik Trimble wrote: Something occurs to me: how full is your current 4 vdev pool? I'm assuming it's not over 70% or so. yes, by adding another 3 vdevs, any writes will be biased towards the "empty" vdevs, but that

Re: [zfs-discuss] Data balance across vdevs

2009-11-20 Thread Jesse Stroik
Bruno, Bruno Sousa wrote: Interesting, at least to me, the part where/ "this storage node is very small (~100TB)" /:) Well, that's only as big as two x4540s, and we have lots of those for a slightly different project. Anyway, how are you using your ZFS? Are you creating volumes and pres

Re: [zfs-discuss] Data balance across vdevs

2009-11-20 Thread Jesse Stroik
There are, of course, job types where you use the same set of data for multiple jobs, but having even a small amount of extra memory seems to be very helpful in that case, as you'll have several nodes reading the same data at roughly the same time. Yep. More, faster memory closer to the cons

Re: [zfs-discuss] Data balance across vdevs

2009-11-20 Thread Jesse Stroik
Thanks for the suggestions thus far, Erik: In your case, where you had a 4 vdev stripe, and then added 3 vdevs, I would recommend re-copying the existing data to make sure it now covers all 7 vdevs. Yes, this was my initial reaction as well, but I am concerned with the fact that I do not k

[zfs-discuss] Data balance across vdevs

2009-11-20 Thread Jesse Stroik
utting the data evenly on all vdevs is suboptimal because it is likely the case that different files within a single domain from a single instrument may be used with 200 jobs at once. Because this particular data is 100% static, I cannot count on reads/writes automatically balancing the pool. B

Re: spam and carbon emissions

2009-04-16 Thread Jesse Stroik
> "Interestingly, the majority of energy usage (around 80%) comes from users viewing and deleting spam, and searching for legitimate emails within spam filters." Right -- if your users can't trust their 'spam' folder as spam, then what is the point? They should keep it around so they can che

Re: Spam Rats - does anyone know them?

2009-04-08 Thread Jesse Stroik
Matus, Dropping mail outright because you can't reverse-resolve the mail server is bad, of course. And it /will/ drop messages from legitimate mail servers, especially those on private networks behind mail proxies as many older exchange installations are configured. And those installations a

Re: negative scores for spam

2009-03-20 Thread Jesse Stroik
Hoover Chan wrote: The threshold was set to 6.6 (cf. required=6.6). The message this was attached to was very definitely junk. This kind of situation got me curious about the whole thing where any positive spam score is set as the threshold but seeing junk mail coming in with negative scores.

Re: Last-5-percent tuning

2009-02-12 Thread Jesse Stroik
Kris Deugau wrote: Jesse Stroik wrote: You don't. Hit delete. Sorry, there aren't enough of me to hand-filter 30K ISP user accounts. I wasn't clear. I'm suggesting the user delete them. Overaggressive spam filters that get false positives are much more dangerous

Re: Last-5-percent tuning

2009-02-12 Thread Jesse Stroik
John Hardin wrote: On Thu, 12 Feb 2009, Kris Deugau wrote: What do you do to push that last 5% or so of missed spam over the threshold from nonspam to spam? Do you greylist? Of course not. The assumption that spammers cannot follow RFCs is a silly one. There are a variety of greylisting

Re: Last-5-percent tuning

2009-02-12 Thread Jesse Stroik
Kris Deugau wrote: What do you do to push that last 5% or so of missed spam over the threshold from nonspam to spam? You don't. Hit delete. If AI is ever truly developed, then your computer may be able to more accurately determine spam from nonspam, but for a lot of spam where spamassassi

Re: night of pleasure spam

2008-12-01 Thread Jesse Stroik
Kate, The previous discussion of the windows live spaces spam was from 10/18/2008 and it has the subject "Windows Live Spaces spam". That should help you search the archive. I will look into the BOTNET as I don't believe we are using this at the moment. Do you get many fp's with this? I

Re: Detecting Porn photos

2008-12-01 Thread Jesse Stroik
Think twice before doing this -- just like a computer cannot interpret the intent of a message, it cannot interpret the content of an image. The computer is most certainly guessing, and many of the algorithms spammers use these days to make their images unique would likely defeat it. Karsten s

Re: New free blacklist: BRBL - Barracuda Reputation Block List

2008-09-23 Thread Jesse Stroik
Bowie, What does having the mail gateway on an internal network have to do with anything? If it is going to send mail to the Internet, then it must have a public IP address in order to do so. This address may be local to the machine or it may be translated by a router or firewall, but either

Re: New free blacklist: BRBL - Barracuda Reputation Block List

2008-09-23 Thread Jesse Stroik
Kris Deugau wrote: Jesse Stroik wrote: There are plenty of places still using mail gateways where the mail server used for sending is still on an internal network, for a variety of legitimate reasons, and those mail servers may resolve to a private address. If you discard all mail with no

Re: New free blacklist: BRBL - Barracuda Reputation Block List

2008-09-23 Thread Jesse Stroik
igured makes spam filtering potentially more damaging to email than spam itself. Best, Jesse Stroik

Re: MagicSpam

2008-09-12 Thread Jesse Stroik
Mouss, mouss wrote: It's more than a "common user" question. while I can build an *BSD/Debian/Centos box to do what I want, I did buy "COTS" firewalls, backup servers, ... etc. You're not talking about ease of setup, you're talking about quality and reliability of product. Spamassassin doe

Re: MagicSpam

2008-09-12 Thread Jesse Stroik
Karl, Ease of setup and use are not the primary reason for purchasing any product, IMO. Yes, but you aren't the common user. Many commercial products *must* have oversimplified setups if they want the largest possible customer base. Consider the difference between the primary goals of sp

Re: MagicSpam

2008-09-11 Thread Jesse Stroik
Rob, Spamassassin is more difficult to configure because commercial products don't have the luxury of requiring more sysadmin configuration. They have to be easy or no one would buy them. The disadvantage of them being easier is that they have less flexibility, less information and less sit

Re: vbounce false positive on CommuniGate group message

2008-05-02 Thread Jesse Stroik
Stefan, Fantastic. This works. Thanks for pointing me in the right direction. Best, Jesse Stefan Jakobs wrote: On Friday 02 May 2008 17:24, Jesse Stroik wrote: SA-Users, I'm running spamassassin rules 648641 for 3.2.4 fetched by sa-update. I've run into two issues with my cur

vbounce false positive on CommuniGate group message

2008-05-02 Thread Jesse Stroik
are being flagged as bounces and how I can fix the whitelist_bounce_relays issue? Email addresses have been stripped from the headers of each message. Best, Jesse Stroik - Return-Path: X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on mahogany.sse