[ceph-users] Re: how to "undelete" a pool

2020-09-25 Thread Dan van der Ster
Hi Peter,

I'm not a rook expert, but are you asking how to remove the rook
action to delete a pool? Or is the pool already deleted from ceph
itself?

We "bare" ceph operators have multiple locks to avoid fat fingers like:
   ceph osd pool set cephfs_data nodelete 1
   ceph config set mon mon_allow_pool_delete false  # the default

-- Dan


On Fri, Sep 25, 2020 at 4:49 AM Peter Sarossy  wrote:
>
> Hit send too early...
>
> So I did find in the code that it's looking for the deletion timestamp, but
> deleting this field in the CRD does not stop the deletion request either.
> The deletionTimestamp reappears after committing the change.
> https://github.com/rook/rook/blob/23108cc94afdebc8f4ab144130a270b1e4ffd94e/pkg/operator/ceph/pool/controller.go#L193
>
> On Thu, Sep 24, 2020 at 10:40 PM Peter Sarossy 
> wrote:
>
> > hey folks,
> >
> > I have managed to fat finger a config apply command and accidentally
> > deleted the CRD for one of my pools. The operator went ahead and tried to
> > purge it, but fortunately since it's used by CephFS it was unable to.
> >
> > Redeploying the exact same CRD does not make the operator stop trying to
> > delete it though.
> >
> > Any hints on how to make the operator forget about the deletion request
> > and leave it be?
> >
> > --
> > Cheers,
> > Peter Sarossy
> > Technical Program Manager
> > Data Center Data Security - Google LLC.
> >
>
>
> --
> Cheers,
> Peter Sarossy
> Technical Program Manager
> Data Center Data Security - Google LLC.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Incapable to see companions on Facebook? Get to Facebook Customer Service Toll Free number.

2020-09-25 Thread mary smith
The companions on Facebook tells you the number of individuals you're 
associated with, on the web-based media website. Subsequently, on the off 
chance that you can't see your companions, at that point this may be because of 
the helpless web association. On the off chance that you accept that there is 
some other explanation, at that point you can get to Facebook Customer Service 
Toll Free Number to know the clear arrangements. 
https://www.customercare-email.com/facebook-customer-service.html
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Epson printer making odd sound? Arrive at technical support by reaching Epson Customer Service.

2020-09-25 Thread mary smith
Now and then an issue can emerge when you may hear peculiar sounds originating 
from the printer because of some tech glitch. In the event that that occurs, at 
that point you can get the assistance by heading off to the tech help locales 
or you can call the Epson Customer Service to get the issue settled.  
https://www.epsonprintersupportpro.net/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issues with the ceph-bluestore-tool during cluster upgrade from Mimic to Nautilus

2020-09-25 Thread Igor Fedotov

Hi Saber,

I don't think this is related. New assertion happens along the write 
path while the original one occurred on allocator shutdown.



Unfortunately there are not much information to  troubleshoot this... 
Are you able to reproduce the case?



Thanks,

Igor

On 9/25/2020 4:21 AM, sa...@planethoster.info wrote:

Hi Igor,

We had an osd crash a week after running Nautilus. I have attached the 
logs, is it related to the same bug?





Thanks,
Saber
CTO @PlanetHoster

On Sep 14, 2020, at 10:22 AM, Igor Fedotov > wrote:


Thanks!

Now got the root cause. The fix is on its way...

Meanwhile you might want to try to workaround the issue via setting 
"bluestore_hybrid_alloc_mem_cap" to 0 or using different allocator, 
e.g. avl for bluestore_allocator (and optionally for bluefs_allocator 
too).



Hope this helps,

Igor.



On 9/14/2020 5:02 PM, Jean-Philippe Méthot wrote:

Alright, here’s the full log file.





Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






Le 14 sept. 2020 à 06:49, Igor Fedotov > a écrit :


Well, I can see duplicate admin socket command 
registration/de-registration (and the second de-registration 
asserts) but don't understand how this could happen.


Would you share the full log, please?


Thanks,

Igor

On 9/11/2020 7:26 PM, Jean-Philippe Méthot wrote:

Here’s the out file, as requested.




Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






Le 11 sept. 2020 à 10:38, Igor Fedotov > a écrit :


Could you please run:

CEPH_ARGS="--log-file log --debug-asok 5" ceph-bluestore-tool 
repair --path <...> ; cat log | grep asok > out


and share 'out' file.


Thanks,

Igor

On 9/11/2020 5:15 PM, Jean-Philippe Méthot wrote:

Hi,

We’re upgrading our cluster OSD node per OSD node to Nautilus 
from Mimic. From some release notes, it was recommended to run 
the following command to fix stats after an upgrade :


ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-0

However, running that command gives us the following error message:

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: In
 function 'virtual Allocator::SocketHook::~SocketHook()' thread 
7f1a6467eec0 time 2020-09-10 14:40:25.872353
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: 53

: FAILED ceph_assert(r == 0)
 ceph version 14.2.11 
(f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, 
char const*)+0x14a) [0x7f1a5a823025]

 2: (()+0x25c1ed) [0x7f1a5a8231ed]
 3: (()+0x3c7a4f) [0x55b33537ca4f]
 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) 
[0x55b3352749a1]

 8: (main()+0x10b3) [0x55b335187493]
 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 10: (()+0x1f9b5f) [0x55b3351aeb5f]
2020-09-10 14:40:25.873 7f1a6467eec0 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: In function 'virtual 
Allocator::SocketHook::~SocketHook()' thread 7f1a6467eec0 time 
2020-09-10 14:40:25.872353
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc 
: 53: FAILED ceph_assert(r == 0)


 ceph version 14.2.11 
(f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, 
char const*)+0x14a) [0x7f1a5a823025]

 2: (()+0x25c1ed) [0x7f1a5a8231ed]
 3: (()+0x3c7a4f) [0x55b33537ca4f]
 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) 
[0x55b3352749a1]

 8: (main()+0

[ceph-users] Re: how to "undelete" a pool

2020-09-25 Thread Stefan Kooman
On 2020-09-25 04:40, Peter Sarossy wrote:
> hey folks,
> 
> I have managed to fat finger a config apply command and accidentally
> deleted the CRD for one of my pools. The operator went ahead and tried to
> purge it, but fortunately since it's used by CephFS it was unable to.
> 
> Redeploying the exact same CRD does not make the operator stop trying to
> delete it though.
> 
> Any hints on how to make the operator forget about the deletion request and
> leave it be?

No, sorry, not using ROOK / k8s for Ceph. You might want to set the
following though, just to make sure deleting things is hard (those knobs
were invented for situations like this):

# Do not accidentally delete the whole thing
osd_pool_default_flag_nodelete = true
mon_allow_pool_delete = false

This way you have to manually change those first before Ceph is even
able to delete anything.

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how to "undelete" a pool

2020-09-25 Thread Peter Sarossy
Thanks for the details folks.

Apologies, apparently yesterday definitely was not a day to be operating
anything for me, as I was meaning to send this to the rook users list
instead of the ceph users list :(

I will circle back with and answer for posterity once I figure it out.



On Fri, Sep 25, 2020 at 3:13 AM Dan van der Ster  wrote:

> Hi Peter,
>
> I'm not a rook expert, but are you asking how to remove the rook
> action to delete a pool? Or is the pool already deleted from ceph
> itself?
>
> We "bare" ceph operators have multiple locks to avoid fat fingers like:
>ceph osd pool set cephfs_data nodelete 1
>ceph config set mon mon_allow_pool_delete false  # the default
>
> -- Dan
>
>
> On Fri, Sep 25, 2020 at 4:49 AM Peter Sarossy 
> wrote:
> >
> > Hit send too early...
> >
> > So I did find in the code that it's looking for the deletion timestamp,
> but
> > deleting this field in the CRD does not stop the deletion request either.
> > The deletionTimestamp reappears after committing the change.
> >
> https://github.com/rook/rook/blob/23108cc94afdebc8f4ab144130a270b1e4ffd94e/pkg/operator/ceph/pool/controller.go#L193
> >
> > On Thu, Sep 24, 2020 at 10:40 PM Peter Sarossy 
> > wrote:
> >
> > > hey folks,
> > >
> > > I have managed to fat finger a config apply command and accidentally
> > > deleted the CRD for one of my pools. The operator went ahead and tried
> to
> > > purge it, but fortunately since it's used by CephFS it was unable to.
> > >
> > > Redeploying the exact same CRD does not make the operator stop trying
> to
> > > delete it though.
> > >
> > > Any hints on how to make the operator forget about the deletion request
> > > and leave it be?
> > >
> > > --
> > > Cheers,
> > > Peter Sarossy
> > > Technical Program Manager
> > > Data Center Data Security - Google LLC.
> > >
> >
> >
> > --
> > Cheers,
> > Peter Sarossy
> > Technical Program Manager
> > Data Center Data Security - Google LLC.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Peter Sarossy
Technical Program Manager
Data Center Data Security - Google LLC.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how to "undelete" a pool

2020-09-25 Thread Brian Topping
Haha I figured out you were on Rook. 

I think you need to add an annotation or label to the CRD. Just create an empty 
one and do a kubectl get cephcluster -oyaml to see what it generates then 
figure out what the appropriate analog for the restored CRD is. Once the 
operator sees the correct info, it will stop trying. 

Sent from my iPhone

> On Sep 25, 2020, at 09:02, Peter Sarossy  wrote:
> 
> Thanks for the details folks.
> 
> Apologies, apparently yesterday definitely was not a day to be operating
> anything for me, as I was meaning to send this to the rook users list
> instead of the ceph users list :(
> 
> I will circle back with and answer for posterity once I figure it out.
> 
> 
> 
>> On Fri, Sep 25, 2020 at 3:13 AM Dan van der Ster  wrote:
>> 
>> Hi Peter,
>> 
>> I'm not a rook expert, but are you asking how to remove the rook
>> action to delete a pool? Or is the pool already deleted from ceph
>> itself?
>> 
>> We "bare" ceph operators have multiple locks to avoid fat fingers like:
>>   ceph osd pool set cephfs_data nodelete 1
>>   ceph config set mon mon_allow_pool_delete false  # the default
>> 
>> -- Dan
>> 
>> 
>>> On Fri, Sep 25, 2020 at 4:49 AM Peter Sarossy 
>>> wrote:
>>> 
>>> Hit send too early...
>>> 
>>> So I did find in the code that it's looking for the deletion timestamp,
>> but
>>> deleting this field in the CRD does not stop the deletion request either.
>>> The deletionTimestamp reappears after committing the change.
>>> 
>> https://github.com/rook/rook/blob/23108cc94afdebc8f4ab144130a270b1e4ffd94e/pkg/operator/ceph/pool/controller.go#L193
>>> 
>>> On Thu, Sep 24, 2020 at 10:40 PM Peter Sarossy 
>>> wrote:
>>> 
 hey folks,
 
 I have managed to fat finger a config apply command and accidentally
 deleted the CRD for one of my pools. The operator went ahead and tried
>> to
 purge it, but fortunately since it's used by CephFS it was unable to.
 
 Redeploying the exact same CRD does not make the operator stop trying
>> to
 delete it though.
 
 Any hints on how to make the operator forget about the deletion request
 and leave it be?
 
 --
 Cheers,
 Peter Sarossy
 Technical Program Manager
 Data Center Data Security - Google LLC.
 
>>> 
>>> 
>>> --
>>> Cheers,
>>> Peter Sarossy
>>> Technical Program Manager
>>> Data Center Data Security - Google LLC.
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> 
> 
> -- 
> Cheers,
> Peter Sarossy
> Technical Program Manager
> Data Center Data Security - Google LLC.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how to "undelete" a pool

2020-09-25 Thread Peter Sarossy
Turns out there is no way to undo the deletion:
https://github.com/kubernetes/kubernetes/issues/69980
Time to rotate the pool under the folder and just let it do it's thing...

On Fri, Sep 25, 2020 at 1:51 PM Brian Topping 
wrote:

> Haha I figured out you were on Rook.
>
> I think you need to add an annotation or label to the CRD. Just create an
> empty one and do a kubectl get cephcluster -oyaml to see what it generates
> then figure out what the appropriate analog for the restored CRD is. Once
> the operator sees the correct info, it will stop trying.
>
> Sent from my iPhone
>
> > On Sep 25, 2020, at 09:02, Peter Sarossy 
> wrote:
> >
> > Thanks for the details folks.
> >
> > Apologies, apparently yesterday definitely was not a day to be operating
> > anything for me, as I was meaning to send this to the rook users list
> > instead of the ceph users list :(
> >
> > I will circle back with and answer for posterity once I figure it out.
> >
> >
> >
> >> On Fri, Sep 25, 2020 at 3:13 AM Dan van der Ster 
> wrote:
> >>
> >> Hi Peter,
> >>
> >> I'm not a rook expert, but are you asking how to remove the rook
> >> action to delete a pool? Or is the pool already deleted from ceph
> >> itself?
> >>
> >> We "bare" ceph operators have multiple locks to avoid fat fingers like:
> >>   ceph osd pool set cephfs_data nodelete 1
> >>   ceph config set mon mon_allow_pool_delete false  # the default
> >>
> >> -- Dan
> >>
> >>
> >>> On Fri, Sep 25, 2020 at 4:49 AM Peter Sarossy  >
> >>> wrote:
> >>>
> >>> Hit send too early...
> >>>
> >>> So I did find in the code that it's looking for the deletion timestamp,
> >> but
> >>> deleting this field in the CRD does not stop the deletion request
> either.
> >>> The deletionTimestamp reappears after committing the change.
> >>>
> >>
> https://github.com/rook/rook/blob/23108cc94afdebc8f4ab144130a270b1e4ffd94e/pkg/operator/ceph/pool/controller.go#L193
> >>>
> >>> On Thu, Sep 24, 2020 at 10:40 PM Peter Sarossy <
> peter.saro...@gmail.com>
> >>> wrote:
> >>>
>  hey folks,
> 
>  I have managed to fat finger a config apply command and accidentally
>  deleted the CRD for one of my pools. The operator went ahead and tried
> >> to
>  purge it, but fortunately since it's used by CephFS it was unable to.
> 
>  Redeploying the exact same CRD does not make the operator stop trying
> >> to
>  delete it though.
> 
>  Any hints on how to make the operator forget about the deletion
> request
>  and leave it be?
> 
>  --
>  Cheers,
>  Peter Sarossy
>  Technical Program Manager
>  Data Center Data Security - Google LLC.
> 
> >>>
> >>>
> >>> --
> >>> Cheers,
> >>> Peter Sarossy
> >>> Technical Program Manager
> >>> Data Center Data Security - Google LLC.
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >
> >
> > --
> > Cheers,
> > Peter Sarossy
> > Technical Program Manager
> > Data Center Data Security - Google LLC.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Peter Sarossy
Technical Program Manager
Data Center Data Security - Google LLC.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph RGW Performance

2020-09-25 Thread Dylan Griff
Hey folks!

Just shooting this out there in case someone has some advice. We're
just setting up RGW object storage for one of our new Ceph clusters (3
mons, 1072 OSDs, 34 nodes) and doing some benchmarking before letting
users on it.

We have 10Gb network to our two RGW nodes behind a single ip on
haproxy, and some iperf testing shows I can push that much; latencies
look okay. However, when using a small cosbench cluster I am unable to
get more than ~250Mb of read speed total.

If I add more nodes to the cosbench cluster it just spreads out the
load evenly with the same cap Same results when running two cosbench
clusters from different locations. I don't see any obvious bottlenecks
in terms of the RGW server hardware limitations, but here I am asking
for assistance so I don't put it past me missing something. I have
attached one of my cosbench load files with keys removed, but I get
similar results with different numbers of workers, objects, buckets,
object sizes, and cosbench drivers.

Does anyone have any pointers on what I could find to nail this
bottleneck down? Am I wrong in expecting more throughput? Let me know
if I can get any other info for you.

Cheers,
Dylan

-- 

Dylan Griff
Senior System Administrator
CLE D063
RCS - Systems - University of Victoria
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph RGW Performance

2020-09-25 Thread martin joy
Can you share the object size details. Try to increase gradually to say 1gb
and measure.
Thanks

On Sat, 26 Sep, 2020, 1:10 am Dylan Griff,  wrote:

> Hey folks!
>
> Just shooting this out there in case someone has some advice. We're
> just setting up RGW object storage for one of our new Ceph clusters (3
> mons, 1072 OSDs, 34 nodes) and doing some benchmarking before letting
> users on it.
>
> We have 10Gb network to our two RGW nodes behind a single ip on
> haproxy, and some iperf testing shows I can push that much; latencies
> look okay. However, when using a small cosbench cluster I am unable to
> get more than ~250Mb of read speed total.
>
> If I add more nodes to the cosbench cluster it just spreads out the
> load evenly with the same cap Same results when running two cosbench
> clusters from different locations. I don't see any obvious bottlenecks
> in terms of the RGW server hardware limitations, but here I am asking
> for assistance so I don't put it past me missing something. I have
> attached one of my cosbench load files with keys removed, but I get
> similar results with different numbers of workers, objects, buckets,
> object sizes, and cosbench drivers.
>
> Does anyone have any pointers on what I could find to nail this
> bottleneck down? Am I wrong in expecting more throughput? Let me know
> if I can get any other info for you.
>
> Cheers,
> Dylan
>
> --
>
> Dylan Griff
> Senior System Administrator
> CLE D063
> RCS - Systems - University of Victoria
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io