Hi Ramiro,
The invalid MR size looks like you're running into a limit with your cards
setting up the RDMA (o2ib) LND when bringing up the network. There may be
adjustments or workarounds for it possibly including setting map_on_demand=0 as
an argument to the lnet module there.
And since you ar
Hi Lustre users,
I'm looking for a bit of a sanity check here before i go down this path.
I've been dealing with a communication problem over lnet that triggers under
some conditions for one for our clusters after upgrading. I thought we'd solved
it by disabling LNET multi-rail but that doesn't
Are you using ZFS for that MDT? ZFS allocates inodes dynamically, but will stop
when it runs out of space. You could expand your zpool with additional disks.
If you are using ZFS, and don't have extra space you can allocate it might also
be worth checking your ashift value on your MDT. For 512e
ITS]
Sent: Wednesday, December 4, 2024 11:08 AM
To: Jesse Stroik; lustre-discuss@lists.lustre.org
Subject: Re: Connectivity issues after client crash
No, we're using o2iblnd primarily now, we just used socklnd until we were able
to migrate the bulk of nodes to infiniband. There are a small number of
many ethernet
clients.
Jesse
From: Nehring, Shane R [ITS]
Sent: Tuesday, December 3, 2024 2:50 PM
To: lustre-discuss@lists.lustre.org; Jesse Stroik
Subject: Re: Connectivity issues after client crash
Hello Jesse,
What I think we may have been hitting in
Hi Shane,
I realize this is quite an old post but I think it is worth responding for
posterity and because I suspect others who upgrade may run into this issue.
I'm observing some similar issues to what you describe. They started this
weekend for us on two of our servers which were upgraded to
, 2022 1:28 PM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] lustre 2.15 installation on Centos 8.5.2111
In addition, you might need to use the "--nodeps" to install your self-compiled
packages, cf.
https://jira.whamcloud.com/browse/LU-15976
Cheers
Thomas
On 7/12/22 19:
Hi Fran,
I suspect the issue is a missing kmod-zfs RPM which provides those symbols. It
might be the case that it was inadvertently excluded from the whamcloud repo.
You could build your own RPMs on a centos system with the group 'Development
Tools' installed. I'd recommend doing it as an unpri
g project quotas?
On Mon, Jan 27, 2020 at 1:05 PM Jesse Stroik
mailto:jesse.str...@ssec.wisc.edu>> wrote:
Hello,
I have an interesting situation one of our lustre file systems
where I
cannot rename files or directories across a specific directory
boundary
but
stem and a new one. The clients are either 2.10 or 2.12. The file
system is run by 2.12.2 on zfs.
nodemap is not active.
Best,
Jesse Stroik
smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lis
Heads up for the list - the 'lfs migrate -m1' didn't free up space on
mdt0 and seemed to cause space use on mdt0 to increase rapidly as we
migrated directories to mdt1.
This may be an issue only with older versions of lustre -- these servers
are on 2.8.0.
Best,
Jesse
smime.p7s
Description
Rick, I've run into the same issue.
I don't think this is like LU-11306. Moving a file into a directory
assigned to another MDT doesn't change its MDT assignment. Copying a the
same file into that directory does. That behavior is as expected.
However, when I migrated files off of MDT1 with 'l
Ah, nevermind. It appears that this can be done if 'lfs migrate -m' is
used directly instead of the lfs_migrate script.
Best,
Jesse
On 8/5/19 11:26 AM, Jesse Stroik wrote:
On 7/31/19 6:27 PM, Andreas Dilger wrote:
Just to clarify, when I referred to "file level backup/
ate in the background to effect the
migration so it would be transparent to the end users.
Is there a better way to migrate use to the new MDT than recreating all
of the directories?
Jesse
Cheers, Andreas
On Jul 31, 2019, at 15:10, Jesse Stroik wrote:
This is excellent information, Andreas.
e file system and it
may be replaced next year so I suspect they'll opt for the DNE method.
Thanks again,
Jesse Stroik
On 7/31/19 3:11 PM, Andreas Dilger wrote:
Normally the easy answer would be that a "dd" copy of the MDT device from your
HDDs to a larger SSD LUN, then resize2
at seemed definitive about ensuring no
changes to an ldiskfs MDT during operation and I don't want to assume i
can simply remount it read only.
Thanks,
Jesse Stroik
smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss ma
running 3.1 on both a centos 6 server and a centos 7 server, both
with lustre 2.10.2.
Best,
Jesse Stroik
smime.p7s
Description: S/MIME Cryptographic Signature
--
Check out the vibrant tech community on one of the world
IP addresses. ens224 is 192.168.1.5/24 and ens256 is 128.104.109.161/22.
Best,
Jesse Stroik
smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
sonably reproducible, we'll
reinitialize the RAID array and reformat the vdev.
Thanks for your help, Tom!
Best,
Jesse Stroik
On 12/12/2016 03:51 PM, Crowe, Tom wrote:
Hi Jessie,
In regards to you seeing 370 objects with errors form ‘zpool status’, but
having over 400 files with “access
eshooting if an entire file is corrupt, or parts of the file.
After the replication, you should set the replicated VDEV to read only with
‘zfs set readonly=on destination_pool/source_ost_replicated’
Thank you for this suggestion. We'll most likely do that.
Best,
Jesse Stroik
smime.p7
One of our lustre file systems still running lustre 2.5.3 and zfs 0.6.3
experienced corruption due to a bad RAID controller. The OST in question
was a RAID6 volume which we've marked inactive. Most of our lustre
clients are 2.8.0.
zfs status reports corruption and checksum errors. I have not r
D_${fsname} - ${RESULT}"
Using a non-modified init script is fantastic for future maintainability
and the results for our monitoring purposes have been, as you'd expect,
on point.
Best,
Jesse Stroik
smime.p7s
Description: S/MIME Cry
ponsive and it feels like a hack which we'd like to purge.
An idea we have is to modify the robinhood startup and use that to
report the status of the file systems. But before doing that, it seems
prudent to ask you if you'd recommend a different method because we want
to take the b
up an environment for srun this has
the effect of breaking that environment.
You can avoid this behavior by adding '--export=ALL' as an argument to
srun or by exporting specific variables to it as needed.
Best,
Jesse Stroik
University of Wisconsin
On 10/23/2014 2:36 PM, L. Sha
We've run into LU-5150 on one of our lustre-on-zfs file systems. This
can cause instability on our robinhood server, which we reply on for
maintaining aspects of several large file systems.
Has robinhood been tested with divergent versions between the clients
and server? We'd like to install 2.
nfigurable list
appears.
I suspect they'll allow a configurable list of TLDs going forward.
Best,
Jesse Stroik
3.3.2-4 -- is it a known issue? The URL in question is on a
single line and is easily pulled out with egrep and properly parsed with
the body rule.
Best,
Jesse Stroik
On 10/13/2014 2:53 PM, Dave Funk wrote:
On Mon, 13 Oct 2014, Philip Prindeville wrote:
Every connection I’ve gotten f
Yes, I did test this and can confirm it worked. Thanks.
Best,
Jesse Stroik
University of Wisconsin
On 8/28/2014 8:54 AM, Holmes, Christopher (CMU) wrote:
Andy is right. When you restart the slurmd daemon, it inherits the system
limits from your login session, which are different from the
unlimited
That can account for differences in processing between system startup
and subsequently restarting the daemons by hand.
Andy
On 08/21/2014 02:42 PM, Jesse Stroik wrote:
Slurmites,
We recently noticed sporadic performance inconsistencies on one of our
clusters. We discovered that if we
be off by that factor.
Best,
Jesse Stroik
esse
On 8/21/2014 6:15 PM, Christopher Samuel wrote:
On 22/08/14 04:43, Jesse Stroik wrote:
We recently noticed sporadic performance inconsistencies on one of our
clusters.
What distro is this? Are you using cgroups?
cheers,
Chris
Yes, but we aren't specifying it for all of these jobs. In the config we
have:
---
TaskPlugin=task/affinity
TaskPluginParam=Sched
SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK
---
And we typically suggest "--cpu_bind=core --distribution=block:block"
for srun
to set a default account
for each user-partition combination.
Best,
Jesse Stroik
University of Wisconsin
On 8/13/2014 12:43 PM, Jesse Stroik wrote:
Our cluster has two primary groups of users. The users groups each have
a different account from which we designate shares and for which we
provide
rmining the cause. We
wanted to share this experience with others in case it can help other
users or if any slurm developers would like us to file a bug report and
be interested in gathering further information.
Best,
Jesse Stroik
University of Wisconsin
see a way to allow SLURM to
search the association tables for a valid account for the user/partition
combination.
Best,
Jesse Stroik
University of Wisconsin
on-working MGS/MDT which we could switch to if testing were requested.
We are running lustre 2.4.0 on the servers and have tested with 2.4.0
and 2.1.6 clients.
Best,
Jesse Stroik
University of Wisconsin
___
Lustre-discuss mailing list
Lust
is that our observed performance was poor using the
2.6.18 RHEL 5 kernel line relative to the mainline (2.6.35) kernels.
Updating to the newer kernels was well worth the testing and downtime.
Hopefully this information can help others.
Best,
Jesse Stroik
___
G
I have been testing Solaris to Linux performance using IBoIP and the
results have been poorer than expected. I typically get about
3.5Gbit/sec aggregate between the Solaris host and 1 more Linux hosts.
The setup follows:
- The Solaris host and the primary Linux host are conn
Erik and Richard: thanks for the information -- this is all very good stuff.
Erik Trimble wrote:
Something occurs to me: how full is your current 4 vdev pool? I'm
assuming it's not over 70% or so.
yes, by adding another 3 vdevs, any writes will be biased towards the
"empty" vdevs, but that
Bruno,
Bruno Sousa wrote:
Interesting, at least to me, the part where/ "this storage node is very
small (~100TB)" /:)
Well, that's only as big as two x4540s, and we have lots of those for a
slightly different project.
Anyway, how are you using your ZFS? Are you creating volumes and pres
There are, of course, job types where you use the same set of data for
multiple jobs, but having even a small amount of extra memory seems to
be very helpful in that case, as you'll have several nodes reading the
same data at roughly the same time.
Yep. More, faster memory closer to the cons
Thanks for the suggestions thus far,
Erik:
In your case, where you had a 4 vdev stripe, and then added 3 vdevs, I
would recommend re-copying the existing data to make sure it now covers
all 7 vdevs.
Yes, this was my initial reaction as well, but I am concerned with the
fact that I do not k
utting the data evenly on all vdevs is
suboptimal because it is likely the case that different files within a
single domain from a single instrument may be used with 200 jobs at once.
Because this particular data is 100% static, I cannot count on
reads/writes automatically balancing the pool.
B
> "Interestingly, the majority of energy usage (around 80%) comes from
users viewing and deleting spam, and searching for legitimate emails
within spam filters."
Right -- if your users can't trust their 'spam' folder as spam, then
what is the point? They should keep it around so they can che
Matus,
Dropping mail outright because you can't reverse-resolve the mail server
is bad, of course. And it /will/ drop messages from legitimate mail
servers, especially those on private networks behind mail proxies as
many older exchange installations are configured. And those
installations a
Hoover Chan wrote:
The threshold was set to 6.6 (cf. required=6.6). The message this was attached
to was very definitely junk. This kind of situation got me curious about the
whole thing where any positive spam score is set as the threshold but seeing
junk mail coming in with negative scores.
Kris Deugau wrote:
Jesse Stroik wrote:
You don't. Hit delete.
Sorry, there aren't enough of me to hand-filter 30K ISP user accounts.
I wasn't clear. I'm suggesting the user delete them. Overaggressive
spam filters that get false positives are much more dangerous
John Hardin wrote:
On Thu, 12 Feb 2009, Kris Deugau wrote:
What do you do to push that last 5% or so of missed spam over the
threshold from nonspam to spam?
Do you greylist?
Of course not. The assumption that spammers cannot follow RFCs is a
silly one. There are a variety of greylisting
Kris Deugau wrote:
What do you do to push that last 5% or so of missed spam over the
threshold from nonspam to spam?
You don't. Hit delete.
If AI is ever truly developed, then your computer may be able to more
accurately determine spam from nonspam, but for a lot of spam where
spamassassi
Kate,
The previous discussion of the windows live spaces spam was from
10/18/2008 and it has the subject "Windows Live Spaces spam". That
should help you search the archive.
I will look into the BOTNET as I don't believe we are using this at the
moment. Do you get many fp's with this?
I
Think twice before doing this -- just like a computer cannot interpret
the intent of a message, it cannot interpret the content of an image.
The computer is most certainly guessing, and many of the algorithms
spammers use these days to make their images unique would likely defeat it.
Karsten s
Bowie,
What does having the mail gateway on an internal network have to do with
anything? If it is going to send mail to the Internet, then it must
have a public IP address in order to do so. This address may be local
to the machine or it may be translated by a router or firewall, but
either
Kris Deugau wrote:
Jesse Stroik wrote:
There are plenty of places still using mail gateways where the mail
server used for sending is still on an internal network, for a variety
of legitimate reasons, and those mail servers may resolve to a private
address. If you discard all mail with no
igured makes spam
filtering potentially more damaging to email than spam itself.
Best,
Jesse Stroik
Mouss,
mouss wrote:
It's more than a "common user" question. while I can build an
*BSD/Debian/Centos box to do what I want, I did buy "COTS" firewalls,
backup servers, ... etc.
You're not talking about ease of setup, you're talking about quality and
reliability of product. Spamassassin doe
Karl,
Ease of setup and use are not the primary reason for purchasing any
product, IMO.
Yes, but you aren't the common user. Many commercial products *must*
have oversimplified setups if they want the largest possible customer
base. Consider the difference between the primary goals of sp
Rob,
Spamassassin is more difficult to configure because commercial products
don't have the luxury of requiring more sysadmin configuration. They
have to be easy or no one would buy them. The disadvantage of them
being easier is that they have less flexibility, less information and
less sit
Stefan,
Fantastic. This works. Thanks for pointing me in the right direction.
Best,
Jesse
Stefan Jakobs wrote:
On Friday 02 May 2008 17:24, Jesse Stroik wrote:
SA-Users,
I'm running spamassassin rules 648641 for 3.2.4 fetched by sa-update.
I've run into two issues with my cur
are
being flagged as bounces and how I can fix the whitelist_bounce_relays
issue? Email addresses have been stripped from the headers of each message.
Best,
Jesse Stroik
-
Return-Path:
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on
mahogany.sse
59 matches
Mail list logo