[ceph-users] Re: Problems adding a new host via orchestration.

2024-02-03 Thread Eugen Block

Hi,

I found this blog post [1] which reports the same error message. It  
seems a bit misleading because it appears to be about DNS. Can you check


cephadm check-host --expect-hostname 

Or is that what you already tried? It's not entirely clear how you  
checked the hostname.


Regards,
Eugen

[1]  
https://blog.mousetech.com/ceph-distributed-file-system-for-the-enterprise/ceph-bogus-error-cannot-allocate-memory/


Zitat von Gary Molenkamp :

Happy Friday all.  I was hoping someone could point me in the right  
direction or clarify any limitations that could be impacting an  
issue I am having.


I'm struggling to add a new set of hosts to my ceph cluster using  
cephadm and orchestration.  When trying to add a host:

    "ceph orch host add  172.31.102.41 --labels _admin"
returns:
    "Error EINVAL: Can't communicate with remote host  
`172.31.102.41`, possibly because python3 is not installed there:  
[Errno 12] Cannot allocate memory"


I've verified that the ceph ssh key works to the remote host, host's  
name matches that returned from `hostname`, python3 is installed,  
and "/usr/sbin/cephadm prepare-host" on the new hosts returns "host  
is ok".    In addition, the cluster ssh key works between hosts and  
the existing hosts are able to ssh in using the ceph key.


The existing ceph cluster is Pacific release using docker based  
containerization on RockyLinux8 base OS.  The new hosts are  
RockyLinux9 based, with the cephadm being installed from Quincy  
release:

        ./cephadm add-repo --release quincy
        ./cephadm install
I did try installing cephadm from the Pacific release by changing  
the repo to el8,  but that did not work either.


Is there a limitation is mixing RL8 and RL9 container hosts under  
Pacific?  Does this same limitation exist under Quincy?  Is there a  
python version dependency?
The reason for RL9 on the new hosts is to stage upgrading the OS's  
for the cluster.  I did this under Octopus for moving from Centos7  
to RL8.


Thanks and I appreciate any feedback/pointers.
Gary


I've added the log trace here in case that helps (from `ceph log  
last cephadm`)




2024-02-02T14:22:32.610048+ mgr.storage01.oonvfl (mgr.441023307)  
4957871 : cephadm [ERR] Can't communicate with remote host  
`172.31.102.41`, possibly because python3 is not installed there:  
[Errno 12] Cannot allocate memory

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1524, in  
_remote_connection

    conn, connr = self.mgr._get_connection(addr)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1370, in _get_connection
    sudo=True if self.ssh_user != 'root' else False)
  File "/lib/python3.6/site-packages/remoto/backends/__init__.py",  
line 35, in __init__

    self.gateway = self._make_gateway(hostname)
  File "/lib/python3.6/site-packages/remoto/backends/__init__.py",  
line 46, in _make_gateway

    self._make_connection_string(hostname)
  File "/lib/python3.6/site-packages/execnet/multi.py", line 133, in  
makegateway

    io = gateway_io.create_io(spec, execmodel=self.execmodel)
  File "/lib/python3.6/site-packages/execnet/gateway_io.py", line  
121, in create_io

    io = Popen2IOMaster(args, execmodel)
  File "/lib/python3.6/site-packages/execnet/gateway_io.py", line  
21, in __init__

    self.popen = p = execmodel.PopenPiped(args)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line  
184, in PopenPiped

    return self.subprocess.Popen(args, stdout=PIPE, stdin=PIPE)
  File "/lib64/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/lib64/python3.6/subprocess.py", line 1295, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1528, in  
_remote_connection

    raise execnet.gateway_bootstrap.HostNotFound(msg)
execnet.gateway_bootstrap.HostNotFound: Can't communicate with  
remote host `172.31.102.41`, possibly because python3 is not  
installed there: [Errno 12] Cannot allocate memory


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 125, in wrapper
    return OrchResult(f(*args, **kwargs))
  File "/usr/share/ceph/mgr/cephadm/module.py", line 2709, in apply
    results.append(self._apply(spec))
  File "/usr/share/ceph/mgr/cephadm/module.py", line 2574, in _apply
    return self._add_host(cast(HostSpec, spec))
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1517, in _add_host
    ip_addr = self._check_valid_addr(spec.hostname, spec.addr)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1498, in  
_check_valid_addr

    error_ok=True, no_fsid=True)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1326, in _run_cephadm
    with self._remote

[ceph-users] Re: How can I clone data from a faulty bluestore disk?

2024-02-03 Thread Alexander E. Patrakov
Hi,

I think that the approach with exporting and importing PGs would be
a-priori more successful than the one based on pvmove or ddrescue. The
reason is that you don't need to export/import all data that the
failed disk holds, but only the PGs that Ceph cannot recover
otherwise. The logic here is that these are, likely, not the same PGs
due to which tools are crashing.

Note that after the export/import operation Ceph might still think "I
need a copy from that failed disk and not the one that you gave me",
in this case just export a copy of the same PG from the other failed
OSD and import elsewhere, up to the total number of copies. If even
that desn't help, "ceph osd lost XX" would be the last (very
dangerous) words to convince Cepth that osd.XX will not be seen in the
future.

On Sat, Feb 3, 2024 at 5:35 AM Eugen Block  wrote:
>
> Hi,
>
> if the OSDs are deployed as LVs (by ceph-volume) you could try to do a
> pvmove to a healthy disk. There was a thread here a couple of weeks
> ago explaining the steps. I don’t have it at hand right now, but it
> should be easy to find.
> Of course, there’s no guarantee that this will be successful. I also
> can’t tell if Igor‘s approach is more promising.
>
> Zitat von Igor Fedotov :
>
> > Hi Carl,
> >
> > you might want to use ceph-objectstore-tool to export PGs from
> > faulty OSDs and import them back to healthy ones.
> >
> > The process could be quite tricky though.
> >
> > There is also pending PR (https://github.com/ceph/ceph/pull/54991)
> > to make the tool more tolerant to disk errors.
> >
> > The patch worth trying in some cases, not a silver bullet though.
> >
> > And generally whether the recovery doable greatly depends on the
> > actual error(s).
> >
> >
> > Thanks,
> >
> > Igor
> >
> > On 02/02/2024 19:03, Carl J Taylor wrote:
> >> Hi,
> >> I have a small cluster with some faulty disks within it and I want to clone
> >> the data from the faulty disks onto new ones.
> >>
> >> The cluster is currently down and I am unable to do things like
> >> ceph-bluestore-fsck but ceph-bluestore-tool  bluefs-export does appear to
> >> be working.
> >>
> >> Any help would be appreciated
> >>
> >> Many thanks
> >> Carl
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Alexander E. Patrakov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Performance issues with writing files to Ceph via S3 API

2024-02-03 Thread Renann Prado
Hello,

I have an issue at my company where we have an underperforming Ceph
instance.
The issue that we have is that sometimes writing files to Ceph via S3 API
(our only option) takes up to 40s, which is too long for us.
We are a bit limited on what we can do to investigate why it's performing
so badly, because we have a service provider in between, so getting to the
bottom of this really is not that easy.

That being said, the way we use the S3 APi (again, Ceph under the hood) is
by writing all files (multiple millions) to the root, so we don't use *no*
folder-like structure e.g. we write */* instead of */this/that/*
.

The question is:

Does anybody know whether Ceph has performance gains when you create a
folder structure vs when you don't?
Looking at Ceph's documentation I could not find such information.

Best regards,

*Renann Prado*
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Snapshot automation/scheduling for rbd?

2024-02-03 Thread Jeremy Hansen
Am I just off base here or missing something obvious?

Thanks

> On Thursday, Feb 01, 2024 at 2:13 AM, Jeremy Hansen  (mailto:jer...@skidrow.la)> wrote:
> Can rbd image snapshotting be scheduled like CephFS snapshots? Maybe I missed 
> it in the documentation but it looked like scheduling snapshots wasn’t a 
> feature for block images. I’m still running Pacific. We’re trying to devise a 
> sufficient backup plan for Cloudstack and other things residing in Ceph.
>
> Thanks.
> -jeremy
>
>
>


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance issues with writing files to Ceph via S3 API

2024-02-03 Thread Anthony D'Atri
The slashes don’t mean much if anything to Ceph.  Buckets are not hierarchical 
filesystems. 

You speak of millions of files.  How many millions?

How big are they?  Very small objects stress any object system.  Very large 
objects may be multi part uploads that stage to slow media or otherwise add 
overhead.  

Are you writing them to a single bucket?

How is the index pool configured?  On what media?
Same with the bucket pool.  

Which Ceph release? Sharding config?
Are you mixing in bucket list operations ?

It could be that you have an older release or a cluster set up on an older 
release that doesn’t effectively auto-reshard the bucket index.  If the index 
pool is set up poorly - slow media, too few OSDs, too few PGs - that may 
contribute. 

In some circumstances pre-sharding might help. 

Do you have the ability to utilize more than one bucket? If you can limit the 
number of objects in a bucket that might help.  

If your application keeps track of object names you might try indexless 
buckets.  

> On Feb 3, 2024, at 12:57 PM, Renann Prado  wrote:
> 
> Hello,
> 
> I have an issue at my company where we have an underperforming Ceph
> instance.
> The issue that we have is that sometimes writing files to Ceph via S3 API
> (our only option) takes up to 40s, which is too long for us.
> We are a bit limited on what we can do to investigate why it's performing
> so badly, because we have a service provider in between, so getting to the
> bottom of this really is not that easy.
> 
> That being said, the way we use the S3 APi (again, Ceph under the hood) is
> by writing all files (multiple millions) to the root, so we don't use *no*
> folder-like structure e.g. we write */* instead of */this/that/*
> .
> 
> The question is:
> 
> Does anybody know whether Ceph has performance gains when you create a
> folder structure vs when you don't?
> Looking at Ceph's documentation I could not find such information.
> 
> Best regards,
> 
> *Renann Prado*
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How can I clone data from a faulty bluestore disk?

2024-02-03 Thread Anthony D'Atri
I’ve done the pg import dance a couple of times.  It was very slow but did work 
ultimately.  
Depending on the situation, if there is one valid copy available one can enable 
recovery by temporarily setting min_size on the pool to 1, reverting it once 
recovery completes.  

You you run with 1 all the time, that can lead to this situation. 

> On Feb 3, 2024, at 11:39 AM, Alexander E. Patrakov  wrote:
> 
> Hi,
> 
> I think that the approach with exporting and importing PGs would be
> a-priori more successful than the one based on pvmove or ddrescue. The
> reason is that you don't need to export/import all data that the
> failed disk holds, but only the PGs that Ceph cannot recover
> otherwise. The logic here is that these are, likely, not the same PGs
> due to which tools are crashing.
> 
> Note that after the export/import operation Ceph might still think "I
> need a copy from that failed disk and not the one that you gave me",
> in this case just export a copy of the same PG from the other failed
> OSD and import elsewhere, up to the total number of copies. If even
> that desn't help, "ceph osd lost XX" would be the last (very
> dangerous) words to convince Cepth that osd.XX will not be seen in the
> future.
> 
>> On Sat, Feb 3, 2024 at 5:35 AM Eugen Block  wrote:
>> 
>> Hi,
>> 
>> if the OSDs are deployed as LVs (by ceph-volume) you could try to do a
>> pvmove to a healthy disk. There was a thread here a couple of weeks
>> ago explaining the steps. I don’t have it at hand right now, but it
>> should be easy to find.
>> Of course, there’s no guarantee that this will be successful. I also
>> can’t tell if Igor‘s approach is more promising.
>> 
>> Zitat von Igor Fedotov :
>> 
>>> Hi Carl,
>>> 
>>> you might want to use ceph-objectstore-tool to export PGs from
>>> faulty OSDs and import them back to healthy ones.
>>> 
>>> The process could be quite tricky though.
>>> 
>>> There is also pending PR (https://github.com/ceph/ceph/pull/54991)
>>> to make the tool more tolerant to disk errors.
>>> 
>>> The patch worth trying in some cases, not a silver bullet though.
>>> 
>>> And generally whether the recovery doable greatly depends on the
>>> actual error(s).
>>> 
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> On 02/02/2024 19:03, Carl J Taylor wrote:
 Hi,
 I have a small cluster with some faulty disks within it and I want to clone
 the data from the faulty disks onto new ones.
 
 The cluster is currently down and I am unable to do things like
 ceph-bluestore-fsck but ceph-bluestore-tool  bluefs-export does appear to
 be working.
 
 Any help would be appreciated
 
 Many thanks
 Carl
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> 
> --
> Alexander E. Patrakov
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Snapshot automation/scheduling for rbd?

2024-02-03 Thread Marc
I am having a script that checks on each node what vm's are active and then the 
script makes a snap shot of their rbd's. It first issues some command to the vm 
to freeze the fs if the vm supports it.


> 
> Am I just off base here or missing something obvious?
> 
> Thanks
> 
> 
> 
> 
>   On Thursday, Feb 01, 2024 at 2:13 AM, Jeremy Hansen   > wrote:
> 
>   Can rbd image snapshotting be scheduled like CephFS snapshots?  Maybe
> I missed it in the documentation but it looked like scheduling snapshots
> wasn’t a feature for block images.  I’m still running Pacific. We’re trying
> to devise a sufficient backup plan for Cloudstack and other things residing
> in Ceph.
> 
>   Thanks.
>   -jeremy
> 
> 
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Snapshot automation/scheduling for rbd?

2024-02-03 Thread Jayanth Reddy
Hi,
For CloudStack with RBD, you should be able to control the snapshot placement 
using the global setting "snapshot.backup.to.secondary". Setting this to false 
makes snapshots be placed directly on Ceph instead of secondary storage. See if 
you can perform recurring snapshots. I know that there are limitations with KVM 
and disk snapshots but good to give it a try.

Thanks


Get Outlook for Android

From: Jeremy Hansen 
Sent: Saturday, February 3, 2024 11:39:19 PM
To: ceph-users@ceph.io 
Subject: [ceph-users] Re: Snapshot automation/scheduling for rbd?

Am I just off base here or missing something obvious?

Thanks



On Thursday, Feb 01, 2024 at 2:13 AM, Jeremy Hansen 
mailto:jer...@skidrow.la>> wrote:
Can rbd image snapshotting be scheduled like CephFS snapshots?  Maybe I missed 
it in the documentation but it looked like scheduling snapshots wasn’t a 
feature for block images.  I’m still running Pacific. We’re trying to devise a 
sufficient backup plan for Cloudstack and other things residing in Ceph.

Thanks.
-jeremy



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD Image Returning 'Unknown Filesystem LVM2_member' On Mount - Help Please

2024-02-03 Thread duluxoz

Hi All,

All of this is using the latest version of RL and Ceph Reef

I've got an existing RBD Image (with data on it - not "critical" as I've 
got a back up, but its rather large so I was hoping to avoid the restore 
scenario).


The RBD Image used to be server out via an (Ceph) iSCSI Gateway, but we 
are now looking to use plain old kernal module.


The RBD Image has been RBD Mapped to the client's /dev/rbd0 location.

So now I'm trying a straight `mount /dev/rbd0 /mount/old_image/` as a test

What I'm getting back is `mount: /mount/old_image/: unknown filesystem 
type 'LVM2_member'.`


All my Google Foo is telling me that to solve this issue I need to 
reformat the image with a new file system - which would mean "losing" 
the data.


So my question is: How can I get to this data using rbd kernal modules 
(the iSCSI Gateway is no longer available, so not an option), or am I 
stuck with the restore option?


Or is there something I'm missing (which would not surprise me in the 
least)?  :-)


Thanks in advance (as always, you guys and gals are really, really helpful)

Cheers


Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io