[ceph-users] Re: Troubleshooting cephadm - not deploying any daemons

2022-06-09 Thread Redouane Kachach Elhichou
To see what cephadm is doing you can check both the logs on:
*/var/log/ceph/cephadm.log* (here you can see what the cephadm running on
each host is doing) and you can also check what the cephadm (mgr module) is
doing by checking the logs of the mgr container by:

> podman logs -f `podman ps | grep mgr. | awk '{print $1}'`.

Normally this second command would show what cephadm is trying to do. To
see more debug from cephadm you can set the loglevel by using:

> cephadm shell
(and from the shell)
> ceph config set mgr mgr/cephadm/log_to_cluster_level info
> ceph log last 100 debug cephadm (to dump the last 100 messages)

You can activate the debug level as well but it will print a lot of
messages.

BTW: you can find this info on:
https://docs.ceph.com/en/quincy/cephadm/operations/





On Wed, Jun 8, 2022 at 11:47 PM Zach Heise (SSCC) 
wrote:

> Yes - running tail on /var/log/ceph/cephadm.log on ceph01, then running
> 'ceph orch apply mgr "ceph01,ceph03"' (my active manager is on ceph03
> and I don't want to clobber it while troubleshooting)
>
> the log output on ceph01's cephadm.log is merely the following lines,
> over and over again, 6 times in a row, then a minute passes, then
> another 6 copies of the following text, and repeat forever. There is
> nothing listed in it about attempting the deployment of a new daemon.
>
> cephadm ['gather-facts']
> 2022-06-08 16:36:42,275 7f7c1ef9fb80 DEBUG /bin/podman: 3.2.3
> 2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinux
> status: enabled
> 2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinuxfs
> mount:/sys/fs/selinux
> 2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinux root
> directory: /etc/selinux
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Loaded policy
> name: targeted
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Current
> mode:   enforcing
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Mode from config
> file:  enforcing
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Policy MLS
> status:  enabled
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Policy deny_unknown
> status: allowed
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Memory protection
> checking: actual (secure)
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Max kernel policy
> version:  31
>
>
>
> On 2022-06-08 4:30 PM, Eugen Block wrote:
> > Have you checked /var/log/ceph/cephadm.log on the target nodes?
> >
> > Zitat von "Zach Heise (SSCC)" :
> >
> >>  Yes, sorry - I tried both 'ceph orch apply mgr "ceph01,ceph03"' and
> >> 'ceph orch apply mds "ceph04,ceph05"' before writing this initial
> >> email - once again, the same logged message: "6/8/22 2:25:12
> >> PM[INF]Saving service mgr spec with placement ceph03;ceph01" but
> >> there's no messages logged about attempting to create the mgr daemon.
> >>
> >> I tried this at the same time that I tried ''ceph orch apply mgr
> >> --placement=2' that I mentioned in my original email.
> >>
> >> I think what I need is some advice on how to check cephadm's status -
> >> I assume it should be logging every time it tries to deploy a new
> >> daemon right? That should be my next stop, I think - looking at that
> >> log to see if it's even trying. I just don't know how to get to that
> >> point.
> >>
> >> And it's not just mgr daemons, it's any kind of daemon so far, is not
> >> getting deployed.
> >>
> >> But thank you for the advice, Dhairya.
> >> -Zach
> >>
> >> On 2022-06-08 3:44 PM, Dhairya Parmar wrote:
> >>> Hi Zach,
> >>>
> >>> Try running `ceph orch apply mgr 2` or `ceph orch apply mgr
> >>> --placement=" "`. Refer this
> >>> <
> https://docs.ceph.com/en/latest/cephadm/services/#orchestrator-cli-placement-spec>
>
> >>> doc for more information, hope it helps.
> >>>
> >>> Regards,
> >>> Dhairya
> >>>
> >>> On Thu, Jun 9, 2022 at 1:59 AM Zach Heise (SSCC)
> >>>  wrote:
> >>>
> >>>Our 16.2.7 cluster was deployed using cephadm from the start, but
> >>>now it
> >>>seems like deploying daemons with it is broken. Running 'ceph orch
> >>>apply
> >>>mgr --placement=2' causes '6/8/22 2:34:18 PM[INF]Saving service
> >>>mgr spec
> >>>with placement count:2' to appear in the logs, but a 2nd mgr does
> >>> not
> >>>get created.
> >>>
> >>>I also confirmed the same with mds daemons - using the dashboard, I
> >>>tried creating a new set of MDS daemons "220606" count:3, but they
> >>>never
> >>>got deployed. The service type appears in the dashboard, though,
> >>> just
> >>>with no daemons deployed under it. Then I tried to delete it with
> >>> the
> >>>dashboard, and now 'ceph orch ls' outputs:
> >>>
> >>>NAME   PORTSRUNNING  REFRESHED AGE
> >>>PLACEMENT
> >>>mds.220606  0/3  15h
> >>>count:3
> >>>
> >>>More detail in YAML format doesn't even give m

[ceph-users] Error adding lua packages to rgw

2022-06-09 Thread Koldo Aingeru
Hello,

I’m having trouble adding new packages to rgw via radosgw-admin :

# radosgw-admin script-package add --package=luajson
ERROR: failed to add lua package:  luajson .error: -10

# radosgw-admin script-package add --package=luasocket --allow-compilation
ERROR: failed to add lua package:  luasocket .error: -10

I’m on Quincy deployed with cephadm / orch :

# ceph version
ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable)

In the previous version I didn’t have this problem adding packager for lua 
scripting.

Thanks!


Koldo Aingeru Marcos Fdez.  
Ingeniería de Sistemas
Sarenet S.A.U.
944209470
Parque Tecnológico, Edificio 103
48170 Zamudio, Bizkaia
www.sarenet.es








___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw multisite sync - how to fix data behind shards?

2022-06-09 Thread Wyll Ingersoll
I think you mean "radosgw-admin sync error list", in which case there are 32 
shards, each with the same error.  I dont see errors on the master zone logs so 
I'm not sure how to correct the situation.


"shard_id": 31,
"entries": [
{
"id": "1_1654722349.230688_62850.1",
"section": "data",
"name": 
"zone-1:a6ed5947-0ceb-407b-812f-347fab2ef62d.677322760.1:6",
"timestamp": "2022-06-08T21:05:49.230688Z",
"info": {
"source_zone": "a6ed5947-0ceb-407b-812f-347fab2ef62d",
"error_code": 125,
"message": "failed to sync bucket instance: (125) Operation 
canceled"
}
}
]
}





From: Amit Ghadge 
Sent: Wednesday, June 8, 2022 9:16 PM
To: Wyll Ingersoll 
Subject: Re: radosgw multisite sync - how to fix data behind shards?

check any error by running command radosgw-admin data sync error list


-AmitG


On Wed, Jun 8, 2022 at 2:44 PM Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:

Seeking help from a radosgw expert...

I have a 3-zone multisite configuration (all running pacific 16.2.9) with 1 
bucket per zone and a couple of small objects in each bucket for testing 
purposes.
One of the secondary zones cannot get seem to get into sync with the master, 
sync status reports:


  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source: a6ed5947-0ceb-407b-812f-347fab2ef62d (zone-1)
syncing
full sync: 128/128 shards
full sync: 66 buckets to sync
incremental sync: 0/128 shards
data is behind on 128 shards
behind shards: 
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]


I have tried using "data sync init" and restarting the radosgw multiple times, 
but that does not seem to be helping in any way.

If I manually do "radosgw-admin data sync run --bucket bucket-1" - it just 
hangs forever and doesn't appear to do anything.  Checking the sync status 
never shows any improvement in the shards.

It is very hard to figure out what to do as there are a several sync commands - 
 bucket sync, data sync, metadata sync  - and it is not clear what effect they 
have or how to properly run them when the syncing gets confused.

Any guidance on how to get out of this situation would be greatly appreciated.  
I've read lots of threads on various mailing list archives (via google search) 
and very few of them have any sort of resolution or recommendation that is 
confirmed to have fixed these sort of problems.


___
Dev mailing list -- d...@ceph.io
To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OpenStack Swift on top of CephFS

2022-06-09 Thread David Orman
I agree with this, just because you can doesn't mean you should. It will
likely be significantly less painful to upgrade the infrastructure to
support doing this the more-correct way, vs. trying to layer swift on top
of cephfs. I say this having a lot of personal experience with Swift at
extremely large scales.

On Thu, Jun 9, 2022 at 7:34 AM Etienne Menguy 
wrote:

> > but why not CephFS?
> You don't want to offer distributed storage on top of distributed storage.
> You can't compare rgw and openstack swift, swift also takes care of data
> storage ( the openstack swift proxy is "similar" to rgw ).
>
> For sure you could use 'tricks' like a single replica on swift or ceph,
> but don't expect great performance. It sounds like a terrible idea.
> Also, it's probably easier to update your infrastructure rather than
> deploy/learn/maintain openstack swift.
>
> Étienne
>
> > -Original Message-
> > From: Kees Meijs | Nefos 
> > Sent: jeudi 9 juin 2022 13:43
> > To: Etienne Menguy 
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] OpenStack Swift on top of CephFS
> >
> > Hi,
> >
> > Well, there's a Ceph implementation in production already with a lot of
> > storage. Local storage is small and limited.
> >
> > Customers ask for Swift in addition to the OpenStack environment, so it
> > makes sense to combine both with regard to Swift.
> >
> > Obviously it's best to use Keystone integration with Ceph RGW and
> integrate
> > on that level, but both Ceph and OpenStack implementations aren't new
> > enough to do that.
> >
> > So, I was wondering if someone tried to use CephFS as a backend for
> Swift.
> > An alternative would be RBD with a filesystem on top but why not CephFS?
> >
> > Regards,
> > Kees
> >
> > On 09-06-2022 13:23, Etienne Menguy wrote:
> > > Hi,
> > >
> > > You should probably explain your need, why do you want to use cephfs
> > rather than local storage?
> > >
> > > Étienne
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Troubleshooting cephadm - not deploying any daemons

2022-06-09 Thread Eugen Block
Can you share more details about the cluster, like 'ceph -s' and 'ceph  
orch ls'. Have you tried a MGR failover just to see if that clears  
anything? Also the active mgr log should contain at least some  
information. How did you deploy the current services when  
bootstrapping the cluster? Has anything changed regarding  
security/firewall or anything like that?


Zitat von "Zach Heise (SSCC)" :

Yes - running tail on /var/log/ceph/cephadm.log on ceph01, then  
running 'ceph orch apply mgr "ceph01,ceph03"' (my active manager is  
on ceph03 and I don't want to clobber it while troubleshooting)


the log output on ceph01's cephadm.log is merely the following  
lines, over and over again, 6 times in a row, then a minute passes,  
then another 6 copies of the following text, and repeat forever.  
There is nothing listed in it about attempting the deployment of a  
new daemon.


cephadm ['gather-facts']
2022-06-08 16:36:42,275 7f7c1ef9fb80 DEBUG /bin/podman: 3.2.3
2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinux  
status: enabled
2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinuxfs  
mount:    /sys/fs/selinux
2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinux root  
directory: /etc/selinux
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Loaded policy  
name: targeted
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Current  
mode:   enforcing
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Mode from  
config file:  enforcing
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Policy MLS  
status:  enabled
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Policy  
deny_unknown status: allowed
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Memory  
protection checking: actual (secure)
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Max kernel  
policy version:  31




On 2022-06-08 4:30 PM, Eugen Block wrote:

Have you checked /var/log/ceph/cephadm.log on the target nodes?

Zitat von "Zach Heise (SSCC)" :

 Yes, sorry - I tried both 'ceph orch apply mgr "ceph01,ceph03"'  
and 'ceph orch apply mds "ceph04,ceph05"' before writing this  
initial email - once again, the same logged message: "6/8/22  
2:25:12 PM[INF]Saving service mgr spec with placement  
ceph03;ceph01" but there's no messages logged about attempting to  
create the mgr daemon.


I tried this at the same time that I tried ''ceph orch apply mgr  
--placement=2' that I mentioned in my original email.


I think what I need is some advice on how to check cephadm's  
status - I assume it should be logging every time it tries to  
deploy a new daemon right? That should be my next stop, I think -  
looking at that log to see if it's even trying. I just don't know  
how to get to that point.


And it's not just mgr daemons, it's any kind of daemon so far, is  
not getting deployed.


But thank you for the advice, Dhairya.
-Zach

On 2022-06-08 3:44 PM, Dhairya Parmar wrote:

Hi Zach,

Try running `ceph orch apply mgr 2` or `ceph orch apply mgr  
--placement=" "`. Refer this  
 doc for more information, hope it  
helps.


Regards,
Dhairya

On Thu, Jun 9, 2022 at 1:59 AM Zach Heise (SSCC)  
 wrote:


   Our 16.2.7 cluster was deployed using cephadm from the start, but
   now it
   seems like deploying daemons with it is broken. Running 'ceph orch
   apply
   mgr --placement=2' causes '6/8/22 2:34:18 PM[INF]Saving service
   mgr spec
   with placement count:2' to appear in the logs, but a 2nd mgr does not
   get created.

   I also confirmed the same with mds daemons - using the dashboard, I
   tried creating a new set of MDS daemons "220606" count:3, but they
   never
   got deployed. The service type appears in the dashboard, though, just
   with no daemons deployed under it. Then I tried to delete it with the
   dashboard, and now 'ceph orch ls' outputs:

   NAME   PORTS    RUNNING  REFRESHED AGE
   PLACEMENT
   mds.220606  0/3  15h
   count:3

   More detail in YAML format doesn't even give me that much information:

   ceph01> ceph orch ls --service_name=mds.220606 --format yaml
   service_type: mds
   service_id: '220606'
   service_name: mds.220606
   placement:
      count: 3
   status:
      created: '2022-06-07T03:42:57.234124Z'
      running: 0
      size: 3
   events:
   - 2022-06-07T03:42:57.301349Z service:mds.220606 [INFO] "service was
   created"

   'ceph health detail' reports HEALTH_OK but cephadm doesn't seem to be
   doing its job. I read through the Cephadm troubleshooting page on
   ceph's
   website but since the daemons I'm trying to create don't even seem to
   try to spawn containers (podman ps shows the existing containers just
   fine) I don't know where to look next for logs, to see if cephadm +
   podman are trying to create new container

[ceph-users] Re: Error adding lua packages to rgw

2022-06-09 Thread Yuval Lifshitz
Hi Koldo,
this might be related to the containerized deployment.
the error code (-10) is returned when we cannot find the "luarocks" binary.
assuming it is installed on the host (just check: "luarocks --version"), it
might not be accessible from inside the RGW container.
if this is the case, can you please open a tracker for the orchestrator [1]?

Yuval

[1] https://tracker.ceph.com/projects/orchestrator



On Thu, Jun 9, 2022 at 3:29 PM Koldo Aingeru 
wrote:

> Hello,
>
> I’m having trouble adding new packages to rgw via radosgw-admin :
>
> # radosgw-admin script-package add --package=luajson
> ERROR: failed to add lua package:  luajson .error: -10
>
> # radosgw-admin script-package add --package=luasocket --allow-compilation
> ERROR: failed to add lua package:  luasocket .error: -10
>
> I’m on Quincy deployed with cephadm / orch :
>
> # ceph version
> ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy
> (stable)
>
> In the previous version I didn’t have this problem adding packager for lua
> scripting.
>
> Thanks!
>
> 
> Koldo Aingeru Marcos Fdez.
> Ingeniería de Sistemas
> Sarenet S.A.U.
> 944209470
> Parque Tecnológico, Edificio 103
> 48170 Zamudio, Bizkaia
> www.sarenet.es
> 
>
>
>
>
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Luminous to Pacific Upgrade with Filestore OSDs

2022-06-09 Thread Pardhiv Karri
Awesome, thank you, Eneko!

Would you mind sharing the upgrade run book, if you have one? Want to avoid
reinventing the wheel as there will b some caveats while uprading and they
don't usually be present in official Ceph upgrade docs.

Thanks,
Pardhiv

On Thu, Jun 9, 2022 at 12:40 AM Eneko Lacunza  wrote:

> Hi Pardhiv,
>
> We have a running production Pacific cluster with some filestore OSDs (and
> other Bluestore OSD too). This cluster was installed "some" years ago with
> Firefly... :)
>
> No issues related to filestore so far.
>
> Cheers
>
> El 8/6/22 a las 21:32, Pardhiv Karri escribió:
>
> Hi,
>
> We are planning to upgrade our current Ceph from Luminous (12.2.11) to
> Nautilus and then to Pacific. We are using Filestore for OSDs now. Is it
> okay to upgrade with filestore OSDs? We plan to migrate from filestore to
> Bluestore at a later date as the clusters are pretty large in PBs size and
> understand that any new or failed OSDs will have to be added as Bluestore
> OSDs only post-upgrade. Will that work?
>
> Thanks,
> Pardhiv
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
>
> Tel. +34 943 569 206 | https://www.binovo.esAstigarragako Bidea, 2 - 2º 
> 
>  izda. Oficina 10-11, 20180 Oiartzun
> https://www.youtube.com/user/CANALBINOVOhttps://www.linkedin.com/company/37269706/
>
> --
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw multisite sync - how to fix data behind shards?

2022-06-09 Thread Wyll Ingersoll


I ended up giving up after trying everything I could find in the forums and 
docs, deleted the problematic zone, and then re-added it back to the zonegroup 
and re-established the group sync policy for the bucket in question.  The 
sync-status is OK now, though the error list still shows a bunch of errors from 
yesterday that I cannot figure out how to clear ("sync error trim" doesn't do 
anything that I can tell).

My opinion is that multisite sync policy in the current Pacific release 
(16.2.9) is still very fragile and poorly documented as far as troubleshooting 
goes.  I'd love to see clear explanations of the various data and metadata 
operations - metadata, data, bucket, bilog, datalog.  It's hard to know where 
to start when things get into a bad state and the online resources are not 
helpful enough.

Another question, if a sync policy is defined on a bucket already has some 
objects in it, what command should be used to force a sync operation based on 
the new policy? It seems that only objects added AFTER the policy is applied 
get replicated, pre-existing ones are not replicated.



From: Wyll Ingersoll 
Sent: Thursday, June 9, 2022 9:35 AM
To: Amit Ghadge ; ceph-users@ceph.io ; 
d...@ceph.io 
Subject: [ceph-users] Re: radosgw multisite sync - how to fix data behind 
shards?

I think you mean "radosgw-admin sync error list", in which case there are 32 
shards, each with the same error.  I dont see errors on the master zone logs so 
I'm not sure how to correct the situation.


"shard_id": 31,
"entries": [
{
"id": "1_1654722349.230688_62850.1",
"section": "data",
"name": 
"zone-1:a6ed5947-0ceb-407b-812f-347fab2ef62d.677322760.1:6",
"timestamp": "2022-06-08T21:05:49.230688Z",
"info": {
"source_zone": "a6ed5947-0ceb-407b-812f-347fab2ef62d",
"error_code": 125,
"message": "failed to sync bucket instance: (125) Operation 
canceled"
}
}
]
}





From: Amit Ghadge 
Sent: Wednesday, June 8, 2022 9:16 PM
To: Wyll Ingersoll 
Subject: Re: radosgw multisite sync - how to fix data behind shards?

check any error by running command radosgw-admin data sync error list


-AmitG


On Wed, Jun 8, 2022 at 2:44 PM Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:

Seeking help from a radosgw expert...

I have a 3-zone multisite configuration (all running pacific 16.2.9) with 1 
bucket per zone and a couple of small objects in each bucket for testing 
purposes.
One of the secondary zones cannot get seem to get into sync with the master, 
sync status reports:


  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source: a6ed5947-0ceb-407b-812f-347fab2ef62d (zone-1)
syncing
full sync: 128/128 shards
full sync: 66 buckets to sync
incremental sync: 0/128 shards
data is behind on 128 shards
behind shards: 
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]


I have tried using "data sync init" and restarting the radosgw multiple times, 
but that does not seem to be helping in any way.

If I manually do "radosgw-admin data sync run --bucket bucket-1" - it just 
hangs forever and doesn't appear to do anything.  Checking the sync status 
never shows any improvement in the shards.

It is very hard to figure out what to do as there are a several sync commands - 
 bucket sync, data sync, metadata sync  - and it is not clear what effect they 
have or how to properly run them when the syncing gets confused.

Any guidance on how to get out of this situation would be greatly appreciated.  
I've read lots of threads on various mailing list archives (via google search) 
and very few of them have any sort of resolution or recommendation that is 
confirmed to have fixed these sort of problems.


___
Dev mailing list -- d...@ceph.io
To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph

[ceph-users] Ceph User + Dev Monthly June Meetup

2022-06-09 Thread Neha Ojha
Hi everyone,

This month's Ceph User + Dev Monthly meetup is on June 16, 14:00-15:00 UTC.
Please add topics to the agenda:
https://pad.ceph.com/p/ceph-user-dev-monthly-minutes.

Hope to see you there!

Thanks,
Neha
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw multisite sync - how to fix data behind shards?

2022-06-09 Thread Wyll Ingersoll

Running "object rewrite" on a couple of the objects in the bucket seems to have 
triggered the sync and now things appear ok.


From: Szabo, Istvan (Agoda) 
Sent: Thursday, June 9, 2022 3:24 PM
To: Wyll Ingersoll 
Cc: ceph-users@ceph.io ; d...@ceph.io 
Subject: Re: [ceph-users] Re: radosgw multisite sync - how to fix data behind 
shards?

Try data sync init and restart the gateways, sometimes this helped me.

If this doesn’t turn on and off the sync policy on the bucket.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

On 2022. Jun 9., at 20:48, Wyll Ingersoll  
wrote:

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


I ended up giving up after trying everything I could find in the forums and 
docs, deleted the problematic zone, and then re-added it back to the zonegroup 
and re-established the group sync policy for the bucket in question.  The 
sync-status is OK now, though the error list still shows a bunch of errors from 
yesterday that I cannot figure out how to clear ("sync error trim" doesn't do 
anything that I can tell).

My opinion is that multisite sync policy in the current Pacific release 
(16.2.9) is still very fragile and poorly documented as far as troubleshooting 
goes.  I'd love to see clear explanations of the various data and metadata 
operations - metadata, data, bucket, bilog, datalog.  It's hard to know where 
to start when things get into a bad state and the online resources are not 
helpful enough.

Another question, if a sync policy is defined on a bucket already has some 
objects in it, what command should be used to force a sync operation based on 
the new policy? It seems that only objects added AFTER the policy is applied 
get replicated, pre-existing ones are not replicated.



From: Wyll Ingersoll 
Sent: Thursday, June 9, 2022 9:35 AM
To: Amit Ghadge ; ceph-users@ceph.io ; 
d...@ceph.io 
Subject: [ceph-users] Re: radosgw multisite sync - how to fix data behind 
shards?

I think you mean "radosgw-admin sync error list", in which case there are 32 
shards, each with the same error.  I dont see errors on the master zone logs so 
I'm not sure how to correct the situation.


   "shard_id": 31,
   "entries": [
   {
   "id": "1_1654722349.230688_62850.1",
   "section": "data",
   "name": 
"zone-1:a6ed5947-0ceb-407b-812f-347fab2ef62d.677322760.1:6",
   "timestamp": "2022-06-08T21:05:49.230688Z",
   "info": {
   "source_zone": "a6ed5947-0ceb-407b-812f-347fab2ef62d",
   "error_code": 125,
   "message": "failed to sync bucket instance: (125) Operation 
canceled"
   }
   }
   ]
   }





From: Amit Ghadge 
Sent: Wednesday, June 8, 2022 9:16 PM
To: Wyll Ingersoll 
Subject: Re: radosgw multisite sync - how to fix data behind shards?

check any error by running command radosgw-admin data sync error list


-AmitG


On Wed, Jun 8, 2022 at 2:44 PM Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:

Seeking help from a radosgw expert...

I have a 3-zone multisite configuration (all running pacific 16.2.9) with 1 
bucket per zone and a couple of small objects in each bucket for testing 
purposes.
One of the secondary zones cannot get seem to get into sync with the master, 
sync status reports:


 metadata sync syncing
   full sync: 0/64 shards
   incremental sync: 64/64 shards
   metadata is caught up with master
 data sync source: a6ed5947-0ceb-407b-812f-347fab2ef62d (zone-1)
   syncing
   full sync: 128/128 shards
   full sync: 66 buckets to sync
   incremental sync: 0/128 shards
   data is behind on 128 shards
   behind shards: 
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]


I have tried using "data sync init" and restarting the radosgw multiple times, 
but that does not seem to be helping in any way.

If I manually do "radosgw-admin data sync run --bucket bucket-1" - it just 
hangs forever and doesn't appear to do anything.  Checking the sync status 
never shows any improvement in the shards.

It is very hard to figure out what to do as there are a several sync commands 

[ceph-users] Re: Ceph on RHEL 9

2022-06-09 Thread Robert W. Eckert
Does anyone have any pointers to install CEPH on Rhel 9?  

-Original Message-
From: Robert W. Eckert  
Sent: Saturday, May 28, 2022 8:28 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Ceph on RHEL 9

Hi- I started to update my 3 host cluster to RHEL 9, but came across a bit of a 
stumbling block.

The upgrade process uses the RHEL leapp process, which ran through a few simple 
things to clean up, and told me everything was hunky dory, but when I kicked 
off the first server, the server wouldn't boot because I had a ceph filesystem 
mounted in /etc/fstab, commenting it out, let the upgrade happen.

Then I went to check on the ceph client which appears to be uninstalled.

When I tried to install ceph,  I got:

[root@story ~]# dnf install ceph
Updating Subscription Management repositories.
Last metadata expiration check: 0:07:58 ago on Sat 28 May 2022 08:06:52 PM EDT.
Error:
Problem: package ceph-2:17.2.0-0.el8.x86_64 requires ceph-mgr = 2:17.2.0-0.el8, 
but none of the providers can be installed
  - conflicting requests
  - nothing provides libpython3.6m.so.1.0()(64bit) needed by 
ceph-mgr-2:17.2.0-0.el8.x86_64 (try to add '--skip-broken' to skip 
uninstallable packages or '--nobest' to use not only best candidate packages)

This is the content of my /etc/yum.repos.d/ceph.conf

[ceph]
name=Ceph packages for $basearch
baseurl=https://download.ceph.com/rpm-quincy/el8/$basearch
enabled=1
priority=2
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.asc

[ceph-noarch]
name=Ceph noarch packages
baseurl=https://download.ceph.com/rpm-quincy/el8/noarch
enabled=1
priority=2
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.asc

[ceph-source]
name=Ceph source packages
baseurl=https://download.ceph.com/rpm-quincy/el8/SRPMS
enabled=0
priority=2
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.asc
Is there anything I should change for el9 (I don't see el9 rpms out yet).

Or should I  wait before updating the other two servers?

Thanks,
Rob

___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs getting OOM-killed right after startup

2022-06-09 Thread Janne Johansson
Den tors 9 juni 2022 kl 22:31 skrev Mara Sophie Grosch :
> good catch with the way too low memory target, I wanted to configure 1
> GiB not 1 MiB. I'm aware it's low, but removed anyway for testing - it
> sadly didn't change anything.
>
> I customize the config mostly for dealing problems I have, something in
> my setup makes the OSDs eat lots of memory in normal operation, just
> gradually increasing ..

> The problem of eating very much memory on startup manifests with some of
> the PGs only, but for those it goes up to ~50GiB.

Just a small note, the memory target settings are more or less only
for how the OSD should behave while in normal operation, so it will
control the amount of cache memory used and so on. If the OSD decides
it needs 50GB ram to do some recovery/GC/whatever-it-does at startup,
this will be on top of whatever memory target you have set, not
within. There are lots and lots of different places where OSDs can and
will allocate memory and any ceph.conf OSD setting will only affect a
few of those, mostly concerning how much ram it should try to use in
the default, easy cases. When doing recoveries or compactions or any
other operation that isn't just related to serving data to ceph
clients, it will allocate whatever it thinks is needed.


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Error adding lua packages to rgw

2022-06-09 Thread Koldo Aingeru
Hi Yuval,

That was it, after installing it on the host I got no errors :)

Thanks a lot!

> On 9 Jun 2022, at 16:36, Yuval Lifshitz  wrote:
> 
> Hi Koldo,
> this might be related to the containerized deployment.
> the error code (-10) is returned when we cannot find the "luarocks" binary.
> assuming it is installed on the host (just check: "luarocks --version"), it 
> might not be accessible from inside the RGW container.
> if this is the case, can you please open a tracker for the orchestrator [1]?
> 
> Yuval
> 
> [1] https://tracker.ceph.com/projects/orchestrator 
> 
> 
> 
> 
> On Thu, Jun 9, 2022 at 3:29 PM Koldo Aingeru  > wrote:
> Hello,
> 
> I’m having trouble adding new packages to rgw via radosgw-admin :
> 
> # radosgw-admin script-package add --package=luajson
> ERROR: failed to add lua package:  luajson .error: -10
> 
> # radosgw-admin script-package add --package=luasocket --allow-compilation
> ERROR: failed to add lua package:  luasocket .error: -10
> 
> I’m on Quincy deployed with cephadm / orch :
> 
> # ceph version
> ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable)
> 
> In the previous version I didn’t have this problem adding packager for lua 
> scripting.
> 
> Thanks!
> 
> 
> Koldo Aingeru Marcos Fdez.  
> Ingeniería de Sistemas
> Sarenet S.A.U.
> 944209470
> Parque Tecnológico, Edificio 103
> 48170 Zamudio, Bizkaia
> www.sarenet.es 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
> 


Koldo Aingeru Marcos Fdez.  
Ingeniería de Sistemas
Sarenet S.A.U.
944209470
Parque Tecnológico, Edificio 103
48170 Zamudio, Bizkaia
www.sarenet.es








___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Generation of systemd units after nuking /etc/systemd/system

2022-06-09 Thread Flemming Frandsen
Hi, this is somewhat embarrassing, but one of my colleagues fat fingered an
ansible rule and managed to wipe out /etc/systemd/system on all of our ceph
hosts.

The cluster is running nautilus on ubuntu 18.04, deployed with
ceph-ansible, one of our near-future tasks is to upgrade to the latest ceph
and cephadm, so I'm not looking forward to redoing the entire cluster using
ceph-ansible.

Normally I'd put on the workboots and start re-installing a broken host
from scratch, but I hope there's a faster way.

Is there any way to generate the ceph-owned contents of /etc/systemd/system
?

-- 
Flemming Frandsen - YAPH - http://osaa.dk - http://dren.dk/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph pool set min_write_recency_for_promote not working

2022-06-09 Thread Pardhiv Karri
Hi,

I created a new pool called "ssdimages," which is similar to another pool
called "images" (a very old one). But when I try to
set min_write_recency_for_promote to 1, it fails with permission denied. Do
you know how I can fix it?

ceph-lab # ceph osd dump | grep -E 'images|ssdimages'
pool 3 'images' replicated size 3 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 74894 flags hashpspool
min_write_recency_for_promote 1 stripe_width 0 application rbd
pool 25 'ssdimages' replicated size 3 min_size 1 crush_rule 1 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 78217 flags hashpspool
stripe_width 0 application rbd
ceph-lab #


ceph-lab # ceph osd pool set ssdimages min_write_recency_for_promote 1
Error EACCES: (13) Permission denied
ceph-lab #

Thanks,
Pardhiv
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io