Thanks for the reply. This issue seems to be VERY serious. New objects
are disappearing every day. This is a silent, creeping data loss.
I couldn't find the object with rados stat, but I am now listing all the
objects and will grep the dump to see if there is anything left.
Janek
On 09/11/20
Hi Andras,
I don't have much experience with blacklisting to know what is a safe default.
For our clusters we use the auto-reconnect settings and never
blacklist any clients.
Cheers, Dan
On Tue, Nov 10, 2020 at 2:10 AM Andras Pataki
wrote:
>
> Hi Dan,
>
> That makes sense - the time between bla
Hi Eugen,
Yes, you're right other than OSD's rest don't require cluster IP.
But in my case, I don't know what went wrong my kernel client requires
cluster IP for the mount to work properly.
About my setup; :-
Cluster Initially bootstrapped configured with public IP only, later added
cluster IP b
On Tue, Nov 10, 2020 at 10:59 AM Frank Schilder wrote:
>
> Hi Dan.
>
> > For our clusters we use the auto-reconnect settings
>
> Could you give me a hint what settings these are? Are they available in mimic?
Yes. On the mds you need:
mds session blacklist on timeout = false
mds session bl
Hi Nathan,
Kernel client should be using only the public IP of the cluster to
communicate with OSD's.
But here it requires both IP's for mount to work properly.
regards
Amudhan
On Mon, Nov 9, 2020 at 9:51 PM Nathan Fish wrote:
> It sounds like your client is able to reach the mon but not th
I found some of the data in the rados ls dump. We host some WARCs from
the Internet Archive and one affected WARC still has its warc.os.cdx.gz
file intact, while the actual warc.gz is gone.
A rados stat revealed
WIDE-20110903143858-01166.warc.os.cdx.gz mtime
2019-07-14T17:48:39.00+0200, s
Den tis 10 nov. 2020 kl 11:13 skrev Amudhan P :
> Hi Nathan,
>
> Kernel client should be using only the public IP of the cluster to
> communicate with OSD's.
>
"ip of the cluster" is a bit weird way to state it.
A mounting client needs only to talk to ips in the public range yes, but
OSDs always
Hi Dan,
one of our customers reported practically the same issue with fnctl
locks but no negative PIDs:
0807178332093/mailboxes/Spam/rbox-Mails/dovecot.index.log (WRITE
lock held by pid 25164)
0807178336211/mailboxes/INBOX/rbox-Mails/dovecot.index.log (WRITE
lock held by pid 8143)
These
Hi Dan.
> For our clusters we use the auto-reconnect settings
Could you give me a hint what settings these are? Are they available in mimic?
Thanks!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Dan van der Ster
Sent: 10 No
Super, thanks! Yeah, I read that an unclean reconnect might lead to data loss
and a proper mount/unmount is better. So far, any evicted client was rebooting,
so the reconnect works fine for us with blacklisting. Good to know the
alternative though.
Thanks and best regards,
=
Fra
Could it be possible that you have some misconfiguration in the name
resolution and IP mapping? I've never heard or experienced that a
client requires a cluster address, that would make the whole concept
of separate networks obsolete which is hard to believe, to be honest.
I would recommend
Hi
We have ceph cluster running on Nautilus, recently upgraded from Mimic.
When in Mimic we noticed issue with osdmap not trimming, which caused
part of our cluster to crash due to osdmap cache misses. We solved it by
adding "osd_map_cache_size = 5000" to our ceph.conf
Because we had at that ti
Hi All,
We have recently deployed a new CEPH cluster Octopus 15.2.4 which consists
of
12 OSD Nodes(16 Core + 200GB RAM, 30x14TB disks, CentOS 8)
3 Mon Nodes (8 Cores + 15GB, CentOS 8)
We use Erasure Coded Pool and RBD block devices.
3 Ceph clients use the RBD devices, each has 25 RBDs and Eac
On Tue, Nov 10, 2020 at 1:52 PM athreyavc wrote:
>
> Hi All,
>
> We have recently deployed a new CEPH cluster Octopus 15.2.4 which consists
> of
>
> 12 OSD Nodes(16 Core + 200GB RAM, 30x14TB disks, CentOS 8)
> 3 Mon Nodes (8 Cores + 15GB, CentOS 8)
>
> We use Erasure Coded Pool and RBD block devi
Thanks for the Reply.
We are not really expecting performance level which is needed for the
Virtual Machines or Databases. We want to use it as a File store where an
app writes in to the mounts.
I think it is slow for file write operations and we see high IO waits.
Anything I can do to increa
Here's something else I noticed: when I stat objects that work via
radosgw-admin, the stat info contains a "begin_iter" JSON object with RADOS key
info like this
"key": {
"name":
"29/items/WIDE-20110924034843-crawl420/WIDE-20110924065228-02544.warc.g
I'm setting up a radosgw for my ceph Octopus cluster. As soon as I
started the radosgw service, I notice that it created a handful of new
pools. These pools were assigned the 'replicated_data' crush rule
automatically.
I have a mixed hdd/ssd/nvme cluster, and this 'replicated_data' crush
ru
I've inherited a Ceph Octopus cluster that seems like it needs urgent
maintenance before data loss begins to happen. I'm the guy with the most Ceph
experience on hand and that's not saying much. I'm experiencing most of the ops
and repair tasks for the first time here.
Ceph health output looks
Hello everybody,
we are running a multisite (active/active) gateway on 2 ceph clusters.
One production and one backup cluster.
Now we make a backup with rclone from the master and don't need anymore the
second Gateway.
What is the best way to shutdown the second Gateway and remove the multisite
Hello all,
We're trying to debug a "slow ops" situation on our cluster running Nautilus
(latest version). Things were running smoothly for a while, but we had a few
issues that made things fall apart (possible clock skew, faulty disk...)
- We've checked the ntp, everything seems fine, the whole
Hi,
We have recently deployed a Ceph cluster with
12 OSD nodes(16 Core + 200GB RAM + 30 disks each of 14TB) Running CentOS 8
3 Monitoring Nodes (8 Core + 16GB RAM) Running CentOS 8
We are using Ceph Octopus and we are using RBD block devices.
We have three Ceph client nodes(16core + 30GB RAM,
Hi,
I have problem with RGW in multisite configuration with Nautilus 14.2.11. Both
zones with SSD and 10Gbps network. Master zone consist from 5x DELL R740XD
servers (every 256GB RAM, 8x800GB SSD for CEPH, 24xCPU). Secondary zone
(temporary for testing) consist from 3x HPE DL360 Gen10 servers (e
Hi,
I have invested in SAMSUNG PM983 (MZ1LB960HAJQ-7) x3 to run a fast pool on.
However I am only getting 150mb/sec from these.
vfio results directly on the NVMe's:
https://docs.google.com/spreadsheets/d/1LXupjEUnNdf011QNr24pkAiDBphzpz5_MwM0t9oAl54/edit?usp=sharing
Config and Results of cep
Yes, you'd have to apply the patch to that kernel yourself. No RHEL7
kernels have that patch (so far). Newer RHEL8 kernels _do_ if that's an
option for you.
-- Jeff
On Mon, 2020-11-09 at 19:21 +0100, Frédéric Nass wrote:
> I feel lucky to have you on this one. ;-) Do you mean applying a
> specifi
Hi All,
I'm exploring deploying Ceph at my organization for use as an object storage
system (using the S3 RGW interface).
My users have range of file sizes and I'd like to direct small files to a pool
that uses replication and large files to a pool that uses erasure encoding.
Is that possible?
We are having the exact same problem (also Octopus). The object is listed by
s3cmd, but trying to download it results in a 404 error. radosgw-admin object
stat shows that the object still exists. Any further ideas how I can restore
access to this object?
_
Hi All,
Cephfs kernel client is influenced by kernel page cache when we write
data to it, outgoing data will be huge when os starts flush page cache.
So Is there a way to make Cephfs kernel client to write data to ceph osd
smoothly when buffer io is used ?
__
Hi,
I have the same use-case.
Is there some alternative to Samba in order to export CephFS to the end user? I
am somewhat concerned with its potential vulnerabilities, which appear to be
quite frequent.
Specifically, I need server-side enforced permissions and possibly Kerberos
authentication an
Michael;
I run a Nautilus cluster, but all I had to do was change the rule associated
with the pool, and ceph moved the data.
Thank you,
Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com
-Original Messag
If this is a file server, why not use SAMBA on cephfs? That's what we do. RBD
is a very cumbersome extra layer for storing files that eats a lot of
performance. Add an SSD to each node for the meta-data and primary data pool
and use an EC pool on HDDs for data. This will be much better.
Still,
Hi all,
i try to run ceph client tools on an odroid xu4 (armhf) with Ubuntu 20.04
on python 3.8.5.
Unfortunately there is the following error on each "ceph" command (even in
ceph --help)
Traceback (most recent call last):
File "/usr/bin/ceph", line 1275, in
retval = main()
File "/usr/bin/
I'm building a new 4-node Proxmox/Ceph cluster, to hold disk images for our
VMs. (Ceph version is 15.2.5).
Each node has 6 x NVMe SSDs (4TB), and 1 x Optane drive (960GB).
CPU is AMD Rome 7442, so there should be plenty of CPU capacity to spare.
My aim is to create 4 x OSDs per NVMe SSD (to mak
Yes, of course this works. For some reason I recall having trouble when
I tried this on my first ceph install. But I think in that case I
didn't change the crush tree, but instead I had changed the device
classes without changing the crush tree.
In any case, the re-crush worked fine.
--Mike
Hi Janek,
What you said sounds right - an S3 single part obj won't have an S3
multipart string as part of the prefix. S3 multipart string looks like
"2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme".
>From memory, single part S3 objects that don't fit in a single rados object
are assigned a random prefix that
Hi Janne,
My OSD's have both public IP and Cluster IP configured. The monitor node
and OSD nodes are co-located.
regards
Amudhan P
On Tue, Nov 10, 2020 at 4:45 PM Janne Johansson wrote:
>
>
> Den tis 10 nov. 2020 kl 11:13 skrev Amudhan P :
>
>> Hi Nathan,
>>
>> Kernel client should be using on
Hi Eugen,
I have only added my Public IP and relevant hostname to hosts file.
Do you find any issue in the below commands I have used to set cluster IP
in cluster.
### adding public IP for ceph cluster ###
ceph config set global cluster_network 10.100.4.0/24
ceph orch daemon reconfig mon.host1
36 matches
Mail list logo