[ceph-users] Re: pacific installation at ubuntu 20.04

2021-06-24 Thread Jana Markwort



I think, I found the reason.
the cephadm-script uses the ubuntu repo instead the ceph repo.
so I get the older version 15 ...

root@node1:~# ./cephadm -v add-repo --release pacific
Could not locate podman: podman not found
Installing repo GPG key from 
https://download.ceph.com/keys/release.asc...

Installing repo file at /etc/apt/sources.list.d/ceph.list...

root@node1:~# ./cephadm -v install
Could not locate podman: podman not found
Installing packages ['cephadm']...
Running command: apt-get install -y cephadm
apt-get: Reading package lists...
apt-get: Building dependency tree...
apt-get: Reading state information...
apt-get: Recommended packages:
apt-get:   docker.io
apt-get: The following NEW packages will be installed:
apt-get:   cephadm
apt-get: 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
apt-get: Need to get 57.8 kB of archives.
apt-get: After this operation, 303 kB of additional disk space will be 
used.
apt-get: Get:1 http://de.archive.ubuntu.com/ubuntu 
focal-updates/universe amd64 cephadm amd64 15.2.11-0ubuntu0.20.04.2 
[57.8 kB]

apt-get: Fetched 57.8 kB in 0s (282 kB/s)
apt-get: Selecting previously unselected package cephadm.
(Reading database ... 71696 files and directories currently installed.)
apt-get: (Reading database ...
apt-get: Preparing to unpack 
.../cephadm_15.2.11-0ubuntu0.20.04.2_amd64.deb ...

apt-get: Unpacking cephadm (15.2.11-0ubuntu0.20.04.2) ...
apt-get: Setting up cephadm (15.2.11-0ubuntu0.20.04.2) ...
apt-get: Adding system user cephadmdone

any idea how to fix that?


Am 2021-06-23 16:50, schrieb Jana Markwort:

Hi all,
I'm a new ceph user and try to install my first cluster.
I try to install pacific but as result I get octopus.
What's wrong here?

I've done:
# curl --silent --remote-name --location
https://github.com/ceph/ceph/raw/pacific/src/cephadm/cephadm
# chmod +x cephadm
# ./cephadm add-repo --release pacific
# ./cephadm install
# cephadm install ceph-common
# ceph -v
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus 
(stable)


# cat /etc/apt/sources.list.d/ceph.list
deb https://download.ceph.com/debian-pacific/ focal main

??

Kind regards,
Jana
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific installation at ubuntu 20.04

2021-06-24 Thread Robert Sander
Hi,

On 24.06.21 09:34, Jana Markwort wrote:
> 
> I think, I found the reason.
> the cephadm-script uses the ubuntu repo instead the ceph repo.
> so I get the older version 15 ...
> 
> root@node1:~# ./cephadm -v add-repo --release pacific
> Could not locate podman: podman not found
> Installing repo GPG key from 
> https://download.ceph.com/keys/release.asc...
> Installing repo file at /etc/apt/sources.list.d/ceph.list...
> 
> root@node1:~# ./cephadm -v install
> Could not locate podman: podman not found

I think there is an "apt update" missing between these two steps.

The first creates /etc/apt/sources.list.d/ceph.list and the second
installs packages, but the repo list was never updated.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 93818 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Missing objects in pg

2021-06-24 Thread Vadim Bulst

Dear List,

since my update yesterday from 14.2.18 to 14.2.20 i got an unhealthy 
cluster. As I remember right, it appeared after rebooting the second 
server. They are 7 missing objects from pgs of a cache pool (pool 3). 
This pool is now changed writeback to proxy and i'm not able to flush 
all objects.


root@scvirt06:/home/urzadmin/ceph_issue# ceph -s
  cluster:
    id: 5349724e-fa96-4fd6-8e44-8da2a39253f7
    health: HEALTH_ERR
    7/15893342 objects unfound (0.000%)
    Possible data damage: 7 pgs recovery_unfound
    Degraded data redundancy: 21/47680026 objects degraded 
(0.000%), 7 pgs degraded, 7 pgs undersized

    client is using insecure global_id reclaim
    mons are allowing insecure global_id reclaim

  services:
    mon: 3 daemons, quorum scvirt03,scvirt06,scvirt01 (age 19h)
    mgr: scvirt04(active, since 21m), standbys: scvirt03, scvirt02
    mds: scfs:1 {0=scvirt04=up:active} 1 up:standby-replay 1 up:standby
    osd: 54 osds: 54 up (since 17m), 54 in (since 10w); 7 remapped pgs

  task status:
    scrub status:
    mds.scvirt03: idle

  data:
    pools:   5 pools, 704 pgs
    objects: 15.89M objects, 49 TiB
    usage:   139 TiB used, 145 TiB / 285 TiB avail
    pgs: 21/47680026 objects degraded (0.000%)
 7/15893342 objects unfound (0.000%)
 694 active+clean
 7 active+recovery_unfound+undersized+degraded+remapped
 3   active+clean+scrubbing+deep

  io:
    client:   3.7 MiB/s rd, 6.6 MiB/s wr, 40 op/s rd, 31 op/s wr

my cluster:

scvirt01 - mon,osds

scvirt02 - mgr,osds

scvirt03 - mon,mgr,mds,osds

scvirt04 - mgr,mds,osds

scvirt05 - osds

scvirt06 - mon,mds,osds


log of osd.49:

root@scvirt03:/home/urzadmin# tail -f /var/log/ceph/ceph-osd.49.log
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.64 GB write, 0.01 MB/s write, 0.54 GB read, 
0.01 MB/s read, 6.5 seconds Interval compaction: 0.00 GB write, 0.00 
MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Stalls(count): 0 
level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 
level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 
slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 
memtable_slowdown, interval 0 total count


** File Read Latency Histogram By Level [default] **

2021-06-24 08:53:08.865 7f88ab86c700 -1 log_channel(cluster) log [ERR] : 
3.9 has 1 objects unfound and apparently lost
2021-06-24 08:53:08.865 7f88a505f700 -1 log_channel(cluster) log [ERR] : 
3.1e has 1 objects unfound and apparently lost
2021-06-24 08:53:40.570 7f88ab86c700 -1 log_channel(cluster) log [ERR] : 
3.9 has 1 objects unfound and apparently lost
2021-06-24 08:53:40.570 7f88a9067700 -1 log_channel(cluster) log [ERR] : 
3.1e has 1 objects unfound and apparently lost
2021-06-24 08:54:45.042 7f88b487e700  4 rocksdb: [db/db_impl.cc:777] 
--- DUMPING STATS ---

2021-06-24 08:54:45.042 7f88b487e700  4 rocksdb: [db/db_impl.cc:778]
** DB Stats **
Uptime(secs): 85202.3 total, 600.0 interval
Cumulative writes: 1148K writes, 8640K keys, 1148K commit groups, 1.0 
writes per commit group, ingest: 1.24 GB, 0.01 MB/s
Cumulative WAL: 1148K writes, 546K syncs, 2.10 writes per sync, written: 
1.24 GB, 0.01 MB/s

Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 369 writes, 1758 keys, 369 commit groups, 1.0 writes 
per commit group, ingest: 0.41 MB, 0.00 MB/s
Interval WAL: 369 writes, 155 syncs, 2.37 writes per sync, written: 0.00 
MB, 0.00 MB/s

Interval stall: 00:00:0.000 H:M:S, 0.0 percent

** Compaction Stats [default] **
Level    Files   Size Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) 
Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) 
Comp(cnt) Avg(sec) KeyIn KeyDrop


  L0  3/0   104.40 MB   0.8  0.0 0.0  0.0 0.2 0.2   
0.0   1.0  0.0 67.8 2.89  2.70 6    0.482   0  0
  L1  2/0   131.98 MB   0.5  0.2 0.1  0.1 0.2 0.1   
0.0   1.8    149.9    120.9 1.53  1.41 1    1.527   2293K   140K
  L2 16/0   871.57 MB   0.3  0.3 0.1  0.3 0.3 
-0.0   0.0   5.2    158.1    132.3 2.05 1.93 1    2.052   
3997K  1089K
 Sum 21/0    1.08 GB   0.0  0.5 0.2  0.4 0.6 0.2   
0.0   3.3 85.5    100.8 6.47  6.03 8    0.809   6290K  1229K
 Int  0/0    0.00 KB   0.0  0.0 0.0  0.0 0.0 0.0   
0.0   0.0  0.0  0.0 0.00  0.00 0    0.000   0  0


If I run

ceph pg repair 3.1e

it doesn't change anything

and i do not understand why these pgs are undersized. All OSDs are up.

ceph.conf:

[g

[ceph-users] Re: pacific installation at ubuntu 20.04

2021-06-24 Thread Jana Markwort

ok, the problem is the GPG key:

root@node1:~# ./cephadm -v add-repo --release pacific
Could not locate podman: podman not found
Installing repo GPG key from 
https://download.ceph.com/keys/release.asc...

Installing repo file at /etc/apt/sources.list.d/ceph.list...

...
W: https://download.ceph.com/debian-pacific/dists/focal/InRelease: The 
key(s) in the keyring /etc/apt/trusted.gpg.d/ceph.release.gpg are 
ignored as the file has an unsupported filetype.


workaround:

# rm /etc/apt/trusted.gpg.d/ceph.release.gpg
# wget https://download.ceph.com/keys/release.asc
# apt-key add release.asc
# apt update

after that, cephadm installs from 
https://download.ceph.com/debian-pacific



root@node1:~# ceph -v
ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific 
(stable)



Am 2021-06-24 13:08, schrieb Robert Sander:

Hi,

On 24.06.21 09:34, Jana Markwort wrote:


I think, I found the reason.
the cephadm-script uses the ubuntu repo instead the ceph repo.
so I get the older version 15 ...

root@node1:~# ./cephadm -v add-repo --release pacific
Could not locate podman: podman not found
Installing repo GPG key from
https://download.ceph.com/keys/release.asc...
Installing repo file at /etc/apt/sources.list.d/ceph.list...

root@node1:~# ./cephadm -v install
Could not locate podman: podman not found


I think there is an "apt update" missing between these two steps.

The first creates /etc/apt/sources.list.d/ceph.list and the second
installs packages, but the repo list was never updated.

Regards
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Start a service on a specified node

2021-06-24 Thread E Taka
If I understand the documentation for the placements in "ceph orch
apply" correctly, I can place the daemons by number or on specific
host. But what I want is:

"Start 3 mgr services, and one of it should be started on node ceph01."

How I can achieve this?

Thanks!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW topic created in wrong (default) tenant

2021-06-24 Thread Yuval Lifshitz
On Wed, Jun 23, 2021 at 6:39 PM Daniel Iwan  wrote:

> this looks like a bug, the topic should be created in the right tenant.
>> please submit a tracker for that.
>>
>
> Thank you for confirming.
> Created here https://tracker.ceph.com/issues/51331
>
>

thanks


> yes. topics are owned by the tenant. previously, they were owned by the
>> user but since the same topic could be used among different buckets and
>> different users, this was causing issues (was fixed here:
>> https://github.com/ceph/ceph/pull/38136)
>> (documentation also mentioned that in the intro paragraph of the doc:
>> https://docs.ceph.com/en/latest/radosgw/notifications/)
>>
>
> I think it's this section
> ```
> A user can create different topics. A topic entity is defined by its name
> and is per tenant. A user can only associate its topics (via notification
> configuration) with buckets it owns.
> ```
>
>
>> no permissions are needed to create a topic. however, note that without
>> proper permissions on the bucket, you cannot create a notification that
>> associates this topic with the bucket.
>>
>
> Yes, I thought it would be similar to AWS, possibly not implemented/needed
> so far:
>
> https://docs.aws.amazon.com/sns/latest/dg/sns-using-identity-based-policies.html
>
> https://docs.aws.amazon.com/config/latest/developerguide/sns-topic-policy.html
>
> ```
> { "Statement": [{ "Effect": "Allow", "Action": ["sns:CreateTopic",
> "sns:ListTopics", "sns:SetTopicAttributes", "sns:DeleteTopic"], "Resource":
> "*" }] }
> ```
>
> Not having that sns:CreateTopic sns:DeleteTopic leaves room for abuse.
> User could potentially create many topics, delete all topics from
> tenant(s) maliciously or by accident (bugs) etc.
>
>
"tenant" in the RGW is somewhat equivalent to an "account" in AWS. however,
"tenant" does not have all the security aspects that an "account" has.
adding that would be much wider in scope than the creation/deletion of
topics.



> On a deletion note, if I understand correctly, deletion of the topic
> without deletion of all notifications first creates the situation where
> notifications can no longer be deleted due to the topic missing.
> The only option is to re-create the topic and delete notifications first.
>

according to what i tested, this is not the case. deletion of a topic only
prevents the creation of new notifications with that topic.
it does not effect the deletion of notifications with that topic, not the
actual sending of these notifications.

note that we also added a cascade delete process to delete all
notifications of a bucket when a bucket is deleted.
(it should be in pacific: https://github.com/ceph/ceph/pull/38351)


>
> Btw I enjoyed your FOSDEM presentation
> https://fosdem.org/2021/schedule/event/sds_ceph_rgw_serverless/
>

thank you!

Any timeframe for native SQS coming to Ceph?
>
>
no actual timelines... but it should probably land in the main branch later
this year :-)


> Regards
> Daniel
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Month June Schedule Now Available

2021-06-24 Thread Mike Perez
Hi everyone,

Today is the final day for Ceph Month! Here's today's schedule:

9:00 ET / 15:00 CEST Evaluating CephFS Performance vs. Cost on
High-Density Commodity Disk Servers [Dan van der Ster]
9:30 ET / 15:30 CEST Ceph Market Development Working Group BoF
10:10 ET / 16:10 CEST Ceph Community Ambassador BoF

Meeting link: https://bluejeans.com/908675367
Full schedule: https://pad.ceph.com/p/ceph-month-june-2021


On Tue, Jun 22, 2021 at 5:50 AM Mike Perez  wrote:
>
> Hi everyone,
>
> Join us in ten minutes for week 4 of Ceph Month!
>
> 9:00 ET / 15:00 CEST cephadm [sebastian wagner]
> 9:30 ET / 15:30 CEST CephFS + fscrypt: filename and content encryption
> 10:00 ET / 16:00 CEST Crimson Update [Samuel Just]
>
> Meeting link:https://bluejeans.com/908675367
> Full schedule: https://pad.ceph.com/p/ceph-month-june-2021
>
> On Fri, Jun 18, 2021 at 5:50 AM Mike Perez  wrote:
> >
> > Hi everyone,
> >
> > Join us in ten minutes for more Ceph Month presentations!
> >
> > 9:00 ET / 15:00 CEST Optimizing Ceph on Arm64 [Richael Zhuang]
> > 9:30 ET / 15:30 CEST Improving Cosbench for Ceph Benchmarking [Danny 
> > Abukalam]
> >
> > Meeting link:https://bluejeans.com/908675367
> > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021
> >
> > On Wed, Jun 16, 2021 at 6:25 AM Mike Perez  wrote:
> > >
> > > Hi everyone,
> > >
> > > Here's today schedule for Ceph Month:
> > >
> > > 9:00 ET / 15:00 CEST Project Aquarium - An easy-to-use storage
> > > appliance wrapped around Ceph [Joao Eduardo Luis]
> > > 9:30 ET / 15:30 CEST [lightning] Qemu: librbd vs krbd performance
> > > [Wout van Heeswijk]
> > > 9:40 ET / 15:40 CEST [lightning] Evaluation of RBD replication options
> > > @CERN Arthur, Outhenin-Chalandre
> > >
> > > Meeting link:https://bluejeans.com/908675367
> > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021
> > >
> > >
> > > On Tue, Jun 15, 2021 at 5:52 AM Mike Perez  wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > Here's today's schedule for Ceph Month:
> > > >
> > > > 9:00ET / 15:00 CEST Dashboard Update [Ernesto]
> > > > 9:30 ET / 15:30 CEST [lightning] RBD latency with QD=1 bs=4k [Wido,
> > > > den Hollander]
> > > > 9:40 ET / 15:40 CEST [lightning] From Open Source  to Open Ended in
> > > > Ceph with Lua [Yuval Lifshitz]
> > > >
> > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021
> > > > Meeting link: https://bluejeans.com/908675367
> > > >
> > > > On Mon, Jun 14, 2021 at 6:50 AM Mike Perez  wrote:
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > In ten minutes, Ceph Month continues with the following schedule 
> > > > > today:
> > > > >
> > > > > 10:00 ET / 16:00 CEST RBD update [Ilya Dryomov]
> > > > > 10:30 ET / 16:30 CEST 5 more ways to break your ceph cluster [Wout 
> > > > > van Heeswijk]
> > > > >
> > > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021
> > > > > Meeting link: https://bluejeans.com/908675367
> > > > >
> > > > >
> > > > > On Fri, Jun 11, 2021 at 6:50 AM Mike Perez  wrote:
> > > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > In ten minutes, join us for the next Ceph Month presentation on 
> > > > > > Intel
> > > > > > QLC SSD: Cost-Effective Ceph Deployments by Anthony D'Atri
> > > > > >
> > > > > > https://bluejeans.com/908675367
> > > > > > https://pad.ceph.com/p/ceph-month-june-2021
> > > > > >
> > > > > > On Fri, Jun 11, 2021 at 5:50 AM Mike Perez  
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > In ten minutes, join us for the next Ceph Month presentation on
> > > > > > > Performance Optimization for All Flash-based on aarch64 by 
> > > > > > > Chunsong
> > > > > > > Feng
> > > > > > >
> > > > > > > https://pad.ceph.com/p/ceph-month-june-2021
> > > > > > > https://bluejeans.com/908675367
> > > > > > >
> > > > > > > On Thu, Jun 10, 2021 at 6:00 AM Mike Perez  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > We're about to start Ceph Month 2021 with Casey Bodley giving a 
> > > > > > > > RGW update!
> > > > > > > >
> > > > > > > > Afterward we'll have two BoF discussions on:
> > > > > > > >
> > > > > > > > 9:30 ET / 15:30 CEST [BoF] Ceph in Research & Scientific 
> > > > > > > > Computing
> > > > > > > > [Kevin Hrpcek]
> > > > > > > >
> > > > > > > > 10:10 ET / 16:10 CEST [BoF] The go-ceph get together [John 
> > > > > > > > Mulligan]
> > > > > > > >
> > > > > > > > Join us now on the stream:
> > > > > > > >
> > > > > > > > https://bluejeans.com/908675367
> > > > > > > >
> > > > > > > > On Tue, Jun 1, 2021 at 6:50 AM Mike Perez  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > In ten minutes, join us for the start of the Ceph Month June 
> > > > > > > > > event!
> > > > > > > > > The schedule and meeting link can be found on this etherpad:
> > > > > > > > >
> > > > > > > > > https://pad.ceph.com/p/ceph-month-june-2021
> > > > > > > > >
> > > > > > > > > On Tue, May 25, 2021 at 11

[ceph-users] Re: Ceph Month June Schedule Now Available

2021-06-24 Thread Mike Perez
Hi Marc,

We can look into that for future events. For this event, we
recommended people subscribe to the Ceph Community Calendar which does
display the times in your local time.

https://calendar.google.com/calendar/embed?src=9ts9c7lt7u1vic2ijvvqqlfpo0%40group.calendar.google.com


On Tue, Jun 22, 2021 at 5:57 AM Marc  wrote:
>
> Maybe it is nice to send this as a calendar invite? So it nicely shows up at 
> correct local time of everyone?
>
>
>
> > -Original Message-
> > From: Mike Perez 
> > Sent: Tuesday, 22 June 2021 14:50
> > To: ceph-users 
> > Subject: [ceph-users] Re: Ceph Month June Schedule Now Available
> >
> > Hi everyone,
> >
> > Join us in ten minutes for week 4 of Ceph Month!
> >
> > 9:00 ET / 15:00 CEST cephadm [sebastian wagner]
> > 9:30 ET / 15:30 CEST CephFS + fscrypt: filename and content encryption
> > 10:00 ET / 16:00 CEST Crimson Update [Samuel Just]
> >
> > Meeting link:https://bluejeans.com/908675367
> > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021
> >
> > On Fri, Jun 18, 2021 at 5:50 AM Mike Perez  wrote:
> > >
> > > Hi everyone,
> > >
> > > Join us in ten minutes for more Ceph Month presentations!
> > >
> > > 9:00 ET / 15:00 CEST Optimizing Ceph on Arm64 [Richael Zhuang]
> > > 9:30 ET / 15:30 CEST Improving Cosbench for Ceph Benchmarking [Danny
> > Abukalam]
> > >
> > > Meeting link:https://bluejeans.com/908675367
> > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021
> > >
> > > On Wed, Jun 16, 2021 at 6:25 AM Mike Perez  wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > Here's today schedule for Ceph Month:
> > > >
> > > > 9:00 ET / 15:00 CEST Project Aquarium - An easy-to-use storage
> > > > appliance wrapped around Ceph [Joao Eduardo Luis]
> > > > 9:30 ET / 15:30 CEST [lightning] Qemu: librbd vs krbd performance
> > > > [Wout van Heeswijk]
> > > > 9:40 ET / 15:40 CEST [lightning] Evaluation of RBD replication
> > options
> > > > @CERN Arthur, Outhenin-Chalandre
> > > >
> > > > Meeting link:https://bluejeans.com/908675367
> > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021
> > > >
> > > >
> > > > On Tue, Jun 15, 2021 at 5:52 AM Mike Perez 
> > wrote:
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > Here's today's schedule for Ceph Month:
> > > > >
> > > > > 9:00ET / 15:00 CEST Dashboard Update [Ernesto]
> > > > > 9:30 ET / 15:30 CEST [lightning] RBD latency with QD=1 bs=4k
> > [Wido,
> > > > > den Hollander]
> > > > > 9:40 ET / 15:40 CEST [lightning] From Open Source  to Open Ended
> > in
> > > > > Ceph with Lua [Yuval Lifshitz]
> > > > >
> > > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021
> > > > > Meeting link: https://bluejeans.com/908675367
> > > > >
> > > > > On Mon, Jun 14, 2021 at 6:50 AM Mike Perez 
> > wrote:
> > > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > In ten minutes, Ceph Month continues with the following schedule
> > today:
> > > > > >
> > > > > > 10:00 ET / 16:00 CEST RBD update [Ilya Dryomov]
> > > > > > 10:30 ET / 16:30 CEST 5 more ways to break your ceph cluster
> > [Wout van Heeswijk]
> > > > > >
> > > > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021
> > > > > > Meeting link: https://bluejeans.com/908675367
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 11, 2021 at 6:50 AM Mike Perez 
> > wrote:
> > > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > In ten minutes, join us for the next Ceph Month presentation
> > on Intel
> > > > > > > QLC SSD: Cost-Effective Ceph Deployments by Anthony D'Atri
> > > > > > >
> > > > > > > https://bluejeans.com/908675367
> > > > > > > https://pad.ceph.com/p/ceph-month-june-2021
> > > > > > >
> > > > > > > On Fri, Jun 11, 2021 at 5:50 AM Mike Perez
> >  wrote:
> > > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > In ten minutes, join us for the next Ceph Month presentation
> > on
> > > > > > > > Performance Optimization for All Flash-based on aarch64 by
> > Chunsong
> > > > > > > > Feng
> > > > > > > >
> > > > > > > > https://pad.ceph.com/p/ceph-month-june-2021
> > > > > > > > https://bluejeans.com/908675367
> > > > > > > >
> > > > > > > > On Thu, Jun 10, 2021 at 6:00 AM Mike Perez
> >  wrote:
> > > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > We're about to start Ceph Month 2021 with Casey Bodley
> > giving a RGW update!
> > > > > > > > >
> > > > > > > > > Afterward we'll have two BoF discussions on:
> > > > > > > > >
> > > > > > > > > 9:30 ET / 15:30 CEST [BoF] Ceph in Research & Scientific
> > Computing
> > > > > > > > > [Kevin Hrpcek]
> > > > > > > > >
> > > > > > > > > 10:10 ET / 16:10 CEST [BoF] The go-ceph get together [John
> > Mulligan]
> > > > > > > > >
> > > > > > > > > Join us now on the stream:
> > > > > > > > >
> > > > > > > > > https://bluejeans.com/908675367
> > > > > > > > >
> > > > > > > > > On Tue, Jun 1, 2021 at 6:50 AM Mike Perez
> >  wrote:
> > > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > In 

[ceph-users] Re: ceph fs mv does copy, not move

2021-06-24 Thread Frank Schilder
Dear Patrick,

thanks for letting me know.

Could you please consider to make this a ceph client mount option, for example, 
'-o fast_move', that enables a code path that enforces an mv to be a proper 
atomic mv with the risk that in some corner cases the target quota is overrun? 
With this option enabled, a move should either be a move or fail outright with 
"out of disk quota" (no partial move, no cp+rm at all). The fail should only 
occur if it is absolutely obvious that the target quota will be exceeded. Any 
corner cases are the responsibility of the operator. Application crashes due to 
incorrect error handling are acceptable.

Reasoning:

>From a user's/operator's side, the preferred functionality is that in cases 
>where a definite quota overrun can securely be detected in advance, the move 
>should actually fail with "out of disk quota" instead of resorting to cp+rm, 
>potentially leading to partial moves and a total mess for users/operators to 
>clean up. In any other case, the quota should simply be ignored and the move 
>should be a complete atomic move with the risk of exceeding the target quota 
>and IO to stall. A temporary stall or fail of IO until the operator increases 
>the quota again is, in my opinion and use case, highly preferable over the 
>alternative of cp+rm. A quota or a crashed job is fast to fix, a partial move 
>is not.

Some background:

We use ceph fs as an HPC home file system and as a back-end store. Being able 
to move data quickly across the entire file system is essential, because users 
re-factor their directory structure containing huge amounts of data quite often 
for various reasons.

On our system, we set file system quotas mainly for psychological reasons. We 
run a cron job that adjusts the quotas every day to show between 20% and 30% 
free capacity on the mount points. The psychological side here is to give an 
incentive to users to clean up temporary data. It is not intended to limit 
usage seriously, only to limit what can be done in between cron job runs as a 
safe-guard. The pool quotas set the real hard limits.

I'm in the process of migrating 100+TB right now and am really happy that I 
still have a client where I can do an O(1) move. It would be a disaster if I 
had now to use rsync or similar, which would take weeks.

Please, in such situations where developers seem to have to make a definite 
choice, consider the possibility of offering operators to choose the 
alternative that suits their use case best. Adding further options seems far 
better than limiting functionality in a way that becomes a terrible burden in 
certain, if not many use cases.

In ceph fs there have been many such decisions that allow for different answers 
from a user/operator perspective. For example, I would prefer if I could get 
rid of the attempted higher POSIX compliance level of ceph fs compared with 
Lustre, just disable all the client-caps and cache-coherence management and 
turn it into an awesome scale-out parallel file system. The attempt of POSIX 
compliant handling of simultaneous writes to files offers nothing to us, but 
costs huge in performance and forces users to move away from perfectly 
reasonable HPC work flows. Also, that it takes a TTL to expire before changes 
on one client become visible on another (unless direct_io is used for all IO) 
is perfectly acceptable for us given the potential performance gain due to 
simpler client-MDS communication.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Patrick Donnelly 
Sent: 24 June 2021 05:29:45
To: Frank Schilder
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] ceph fs mv does copy, not move

Hello Frank,

On Tue, Jun 22, 2021 at 2:16 AM Frank Schilder  wrote:
>
> Dear all,
>
> some time ago I reported that the kernel client resorts to a copy instead of 
> move when moving a file across quota domains. I was told that the fuse client 
> does not have this problem. If enough space is available, a move should be a 
> move, not a copy.
>
> Today, I tried to move a large file across quota domains testing botn, the 
> kernel- and the fuse client. Both still resort to a copy even though this 
> issue was addressed quite a while ago 
> (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/44AEIHNEGKV4VGCARRTARGFZ264CR4T7/#XY7ZCE3KWHI4QSUNZHDWL3QZQFOHXRQW).
>  The versions I'm using are (CentOS 7)
>
> # yum list installed | grep ceph-fuse
> ceph-fuse.x86_64  2:13.2.10-0.el7   @ceph
>
> # uname -r
> 3.10.0-1160.31.1.el7.x86_64
>
> Any suggestions how to get this to work? I have to move directories 
> containing 100+ TB.

ceph-fuse reverted this behavior in: https://tracker.ceph.com/issues/48203
The kernel had a patch around that time too.

In summary, it was not possible to accurately account for the quota
usage prior to doing the rename. Rather than allow a quota to
potentially be massively overru

[ceph-users] iscsi, gwcli, and vmware version

2021-06-24 Thread Philip Brown


I notice on
https://docs.ceph.com/en/latest/rbd/iscsi-initiator-esx/

that it lists a requirement of
"VMware ESX 6.5 or later using Virtual Machine compatibility 6.5 with VMFS 6."


Could anyone enlighten me as to why this specific limit is in place?
Officlaly knowing something like, "you have to use v6.5 or later, because X 
happens", would be very helpful to me when doing a writeup for potential 
deployment plans.


--
Philip Brown| Sr. Linux System Administrator | Medata, Inc. 
5 Peters Canyon Rd Suite 250 
Irvine CA 92606 
Office 714.918.1310| Fax 714.918.1325 
pbr...@medata.com| www.medata.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] query about product use of rbd mirror for DR

2021-06-24 Thread huxia...@horebdata.cn
Dear Ceph Folks,

Does  anyone has real experience of using rbd mirroring for disaster recovery 
over 1000 miles away? 

I am planning using Ceph rbd mirroring feature for DR, and has no real 
experience. Could anyone sharing good or bad experience here? I am thinking of 
using iSCSI over rbd-nbd map, with rbd mirror to a remote site using a 
dedicated link of 200Mb/s.

Ceph version will be on Luminous 12.2.13

Any sharing, suggestions and comments are highly appreciated.

best regards,

samuel   



huxia...@horebdata.cn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: iscsi, gwcli, and vmware version

2021-06-24 Thread Andrew Ferris
Hi Philip,
 
Part of it will be down to VFMS supporting features for ISCSI and then that is 
chained to specific ESXi and VM levels.
 

Andrew Ferris
Network & System Management
UBC Centre for Heart & Lung Innovation
St. Paul's Hospital, Vancouver
http://www.hli.ubc.ca
 


>>> Philip  Brown  6/24/2021 12:56 PM >>>

I notice on
https://docs.ceph.com/en/latest/rbd/iscsi-initiator-esx/

that it lists a requirement of
"VMware ESX 6.5 or later using Virtual Machine compatibility 6.5 with VMFS 6."


Could anyone enlighten me as to why this specific limit is in place?
Officlaly knowing something like, "you have to use v6.5 or later, because X 
happens", would be very helpful to me when doing a writeup for potential 
deployment plans.


--
Philip Brown| Sr. Linux System Administrator | Medata, Inc. 
5 Peters Canyon Rd Suite 250 
Irvine CA 92606 
Office 714.918.1310| Fax 714.918.1325 
pbr...@medata.com| www.medata.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: iscsi, gwcli, and vmware version

2021-06-24 Thread Philip Brown
I would appreciate it if anyone could call out specific features involved here.

"upgrade because it's better" doesnt usually fly in cost justification writeups.



- Original Message -
From: "Andrew Ferris" 
To: "ceph-users" , "Philip Brown" 
Sent: Thursday, June 24, 2021 1:13:02 PM
Subject: Re: [ceph-users] iscsi, gwcli, and vmware version

Hi Philip,
 
Part of it will be down to VFMS supporting features for ISCSI and then that is 
chained to specific ESXi and VM levels.
 

Andrew Ferris
Network & System Management
UBC Centre for Heart & Lung Innovation
St. Paul's Hospital, Vancouver
http://www.hli.ubc.ca
 


>>> Philip  Brown  6/24/2021 12:56 PM >>>

I notice on
https://docs.ceph.com/en/latest/rbd/iscsi-initiator-esx/

that it lists a requirement of
"VMware ESX 6.5 or later using Virtual Machine compatibility 6.5 with VMFS 6."


Could anyone enlighten me as to why this specific limit is in place?
Officlaly knowing something like, "you have to use v6.5 or later, because X 
happens", would be very helpful to me when doing a writeup for potential 
deployment plans.


--
Philip Brown| Sr. Linux System Administrator | Medata, Inc. 
5 Peters Canyon Rd Suite 250 
Irvine CA 92606 
Office 714.918.1310| Fax 714.918.1325 
pbr...@medata.com| www.medata.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Sage Weil
On Sat, Jun 19, 2021 at 3:43 PM Nico Schottelius
 wrote:
> Good evening,
>
> as an operator running Ceph clusters based on Debian and later Devuan
> for years and recently testing ceph in rook, I would like to chime in to
> some of the topics mentioned here with short review:
>
> Devuan/OS package:
>
> - Over all the years changing from Debian to Devuan, changing the Devuan
>   versions, dist-upgrading - we did not encounter a single issues on the
>   OS basis. The only real problems where, when ceph version
>   incompatibilities between the major versions happened. However this
>   will not change with containers.
>
>   I do see the lack of proper packages for Alpine Linux, which would be
>   an amazing lean target for running ceph.
>
>   The biggest problem I see is that ceph/cephadm is the longer the more
>   relying on systemd and that actually locks out folks.

I want to reiterate that while cephadm requirements are
systemd+lvm+python3+containers, the orchestration framework does not
have any of these limitations, and is designed to allow you to plug in
other options.

> [...]
>
> Thus my suggestion for the ceph team is to focus on 2 out of the three
> variants:
>
> - Keep providing a native, even manual deployment mode. Let people get
>   an understanding of ceph, develop even their own tooling around it.
>   This enables distros, SMEs, Open Source communities, hackers,
>   developers. Low entrance barrier, easy access, low degree of
>   automation.
>
> - For those who are into containers, advise them how to embrace k8s. How
>   to use k8s on bare metal. Is it potentially even smarter to run ceph
>   on IPv6 only clusters? What does the architecture look like with k8s?
>   How does rook do autodetection, what metrics can the kube-prometheus
>   grafana help with? etc. etc. The whole shebang that you'll need to
>   develop and create over time anyway.

Cephadm is intended to be the primary non-k8s option, since it seems
pretty clear that there is a significant (huge?) portion of the user
commuity that is not interested in adding kubernetes underneath their
storage (take all of the "containers add complex" arguments and *
100).  We used containers because, in our view, it simplified the
developer AND user experience.

But neither rook nor cephadm preclude deploying Ceph the traditional
way.  The newer capabilities in the dashboard to manage the deployment
of Ceph relies on the orchestrator API, so a traditional deployment
today cannot make use of these new features, but nothing is preventing
a non-container-based orchestrator implementation.

sage
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Sage Weil
On Sun, Jun 20, 2021 at 9:51 AM Marc  wrote:
> Remarks about your cephadm approach/design:
>
> 1. I am not interested in learning podman, rook or kubernetes. I am using 
> mesos which is also on my osd nodes to use the extra available memory and 
> cores. Furthermore your cephadm OC is limited to only ceph nodes. While my 
> mesos OC is spread across a larger cluster and has rules when, and when not 
> to run tasks on the osd nodes. You incorrectly assume that rgw, grafana, 
> prometheus, haproxy are going to be ran on your ceph OC.

rgw, grafana, prom, haproxy, etc are all optional components.  The
monitoring stack is deployed by default but is trivially disabled via
a flag to the bootstrap command.  We are well aware that not everyone
wants these, but we cannot ignore the vast majority of users that
wants things to Just Work without figuring out how to properly deploy
and manage all of these extraneous integrated components.

> 2. Nico pointed out that you do not have alpine linux container images. I did 
> not even know you were using container images. So how big are these? Where 
> are these stored. And why are these not as small as they can be? Such an osd 
> container image should be 20MB or so at most. I would even expect statically 
> build binary container image, why even a tiny os?
> 4. Ok found the container images[2] (I think). Sorry but this has ‘nothing’ 
> to do with container thinking. I expected to find container images for osd, 
> msd, rgw separately and smaller. This looks more like an OS deployment.
Early on the team building the container images opted for a single
image that includes all of the daemons for simplicity.  We could build
stripped down images for each daemon type, but that's an investment in
developer time and complexity and we haven't heard any complaints
about the container size.  (Usually a few hundred MB on a large scale
storage server isn't a problem.)

> 3. Why is in this cephadm still being talked about systemd? Your orchestrator 
> should handle restarts,namespaces and failed tasks not? There should be no 
> need to have a systemd dependency, at least I have not seen any container 
> images relying on this.

Something needs to start the ceph daemon containers when the system
reboots.  We integrated with systemd since all major distros adopted
it.  Cephadm could be extended to support other init systems with
pretty minimal effort... we aren't doing anything fancy with systemd.

> 5. I have been writing this previously on the mailing list here. Is each rgw 
> still requiring its own dedicated client id? Is it still true, that if you 
> want to spawn 3 rgw instances, they need to authorize like client.rgw1, 
> client.rgw2 and client.rgw3?
> This does not allow for auto scaling. The idea of using an OC is that you 
> launch a task, and that you can scale this task automatically when necessary. 
> So you would get multiple instances of rgw1. If this is still and issue with 
> rgw, mds and mgr etc. Why even bother doing something with an OC and 
> containers?

The orchestrator automates the creation and cleanup of credentials for
each rgw instance.  (It also trivially scales them up/down, ala k8s.)
If you have an autoscaler, you just need to tell cephadm how many you
want and it will add/remove daemons.  If you are using cephadm's
ingress (haproxy) capability, the LB configuration will be adjusted
for you.  If you are using an external LB, you can query cephadm for a
description of the current daemons and their endpoints and feed that
info into your own ingress solution.

> 6. As I wrote before I do not want my rgw or haproxy running in a OC that has 
> the ability to give tasks capability SYSADMIN. So that would mean I have to 
> run my osd daemons/containers separately.

Only the OSD containers get extra caps to deal with the storage hardware.

> 7. If you are not setting cpu and memory limits on your cephadm containers, 
> then again there is an argument why even use containers.

Memory limits are partially implemented; we haven't gotten to CPU
limits yet.  It's on the list!

> 8. I still see lots of comments on the mailing list about accessing logs. I 
> have all my containers log to a remote syslog server, if you still have your 
> ceph daemons that can not do this (correctly). What point is it even going to 
> containers.

By default, we log to stderr and your logs are in journald or whatever
alternative your container runtime has set up.  You can trivially flip
a switch and you get traditional file-based logs with a logrotated.d
config, primary to satisfy users (like me!) who aren't comfortable
with the newfangled log management style.

> 9. I am updating my small cluster something like this:
>
> ssh root@c01 "ceph osd set noout  ; ceph osd set noscrub ; ceph osd set 
> nodeep-scrub"
> ssh root@c01 "ceph tell osd.* injectargs '--osd_max_scrubs=0'"
>
> ssh root@c01 "yum update 'ceph-*' -y"
> ...
>
> ssh root@c01 "service ceph-mon@a restart"
> ...
>
> ssh root@c01 "s

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Sage Weil
On Tue, Jun 22, 2021 at 11:58 AM Martin Verges  wrote:
>
> > There is no "should be", there is no one answer to that, other than 42.
> Containers have been there before Docker, but Docker made them popular,
> exactly for the same reason as why Ceph wants to use them: ship a known
> good version (CI tests) of the software with all dependencies, that can be
> run "as is" on any supported platform.
>
> So ship it tested for container software XXX and run it on YYY. How will
> that benefit me as a user? There are differences when running a docker
> container, lxc, nspawn, podman, kubernetes and whatever. So you trade error
> A for error B. There are even problems with containers if you don't use
> version X from docker. That's what the past told us, why should it be
> better in the future with even more container environments. Have you tried
> running rancher on debian in the past? It breaks apart due to iptables or
> other stuff.

Rook is based on kubernetes, and cephadm on podman or docker.  These
are well-defined runtimes.  Yes, some have bugs, but our experience so
far has been a big improvement over the complexity of managing package
dependencies across even just a handful of distros.  (Podman has been
the only real culprit here, tbh, but I give them a partial pass as the
tool is relatively new.)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Sage Weil
On Tue, Jun 22, 2021 at 1:25 PM Stefan Kooman  wrote:
> On 6/21/21 6:19 PM, Nico Schottelius wrote:
> > And while we are at claiming "on a lot more platforms", you are at the
> > same time EXCLUDING a lot of platforms by saying "Linux based
> > container" (remember Ceph on FreeBSD? [0]).
>
> Indeed, and that is a more fundamental question: how easy it is to make
> Ceph a first-class citizen on non linux platforms. Was that ever a
> (design) goal? But then again, if you would be able to port docker
> natively to say OpenBSD, you should be able to run Ceph on it as well.

Thank you for bringing this up.  This is in fact a key reason why the
orchestration abstraction works the way it does--to allow other
runtime environments to be supported (FreeBSD!
sysvinit/Devuan/whatever for systemd haters!) while ALSO allowing an
integrated, user-friendly experience in which users workflow for
adding/removing hosts, replacing failed OSDs, managing services (MDSs,
RGWs, load balancers, etc) can be consistent across all platforms.
For 10+ years we basically said "out of scope" to these pesky
deployment details and left this job to Puppet, Chef, Ansible,
ceph-deploy, rook, etc., but the result of that strategy was pretty
clear: ceph was hard to use and the user experience dismal when
compared to an integrated product from any half-decent enterprise
storage company, or products like Martin's that capitalize on core
ceph's bad UX.

The question isn't whether we support other environments, but how.  As
I mentioned in one of my first messages, we can either (1) generalize
cephadm to work in other environments (break the current
systemd+container requirement), or (2) add another orchestrator
backend that supports a new environment.  I don't have any well-formed
opinion here.  There is a lot of pretty generic "orchestration" logic
in cephadm right now that isn't related to systemd or containers that
could either be pulled out of cephadm into the mgr/ochestrator layer
or a library.  Or an independent, fresh orch backend implementation
could opt for a very different approach or set of opinions.

Either way, my assumption has been that these other environments would
probably not be docker|podman-based.  In the case of FreeBSD we'd
probably want to use jails or whatever.  But anything is possible.

s
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] PG inconsistent+failed_repair

2021-06-24 Thread Vladimir Prokofev
Hello.

Today we've experienced a complete CEPH cluster outage - total loss of
power in the whole infrastructure.
6 osd nodes and 3 monitors went down at the same time. CEPH 14.2.10

This resulted in unfound objects, which were "reverted" in a hurry with
ceph pg  mark_unfound_lost revert
In retrospect that was probably a mistake as the "have" part stated 0'0.

But then deep-scrubs started and they found inconsistent PGs. We tried
repairing them, but they just switched to failed_repair.

Here's a log example:
2021-06-25 00:08:07.693645 osd.0 [ERR] 3.c shard 6
3:3163e703:::rbd_data.be08c566ef438d.2445:head : missing
2021-06-25 00:08:07.693710 osd.0 [ERR] repair 3.c
3:3163e2ee:::rbd_data.efa86358d15f4a.004b:6ab1 : is an
unexpected clone
2021-06-25 00:11:55.128951 osd.0 [ERR] 3.c repair 1 missing, 0 inconsistent
objects
2021-06-25 00:11:55.128969 osd.0 [ERR] 3.c repair 2 errors, 1 fixed

I tried manually deleting conflicting objects from secondary osds
with ceph-objectstore-tool like this
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-22 --pgid 3.c
rbd_data.efa86358d15f4a.004b:6ab1 remove
it removes it but without any positive impact. Pretty sure I don't
understand the concept.

So currently I have the following thoughts:
 - is there any doc on the object placement specifics and what all of those
numbers in their name mean? I've seen objects with similar prefix/mid but
different suffix and I have no idea what does it mean;
 - I'm actually not sure what the production impact is at that point
because everything seems to work so far. So I'm thinking if it's possible
to kill replicas on secondary OSDd with ceph-objectstore-tool and just let
CEPH create a replica from primary PG?

I have 8 scrub errors and 4 inconsistent+failed_repair PGs, and I'm afraid
that further deep scrubs will reveal more errors.
Any thoughts appreciated.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph fs mv does copy, not move

2021-06-24 Thread Stefan Kooman

On 6/24/21 5:34 PM, Frank Schilder wrote:


Please, in such situations where developers seem to have to make a definite 
choice, consider the possibility of offering operators to choose the 
alternative that suits their use case best. Adding further options seems far 
better than limiting functionality in a way that becomes a terrible burden in 
certain, if not many use cases.


Yeah, I agree.


In ceph fs there have been many such decisions that allow for different answers 
from a user/operator perspective. For example, I would prefer if I could get 
rid of the attempted higher POSIX compliance level of ceph fs compared with 
Lustre, just disable all the client-caps and cache-coherence management and 
turn it into an awesome scale-out parallel file system. The attempt of POSIX 
compliant handling of simultaneous writes to files offers nothing to us, but 
costs huge in performance and forces users to move away from perfectly 
reasonable HPC work flows. Also, that it takes a TTL to expire before changes 
on one client become visible on another (unless direct_io is used for all IO) 
is perfectly acceptable for us given the potential performance gain due to 
simpler client-MDS communication.


Isn't that where LazyIO is for? See 
https://docs.ceph.com/en/latest/cephfs/lazyio/


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent+failed_repair

2021-06-24 Thread Vladimir Prokofev
Followup. This is what's written in logs when I try to fix one PG:
ceph pg repair 3.60

primary osd log:
2021-06-25 01:07:32.146 7fc006339700 -1 log_channel(cluster) log [ERR] :
repair 3.53 3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 : is
an unexpected clone
2021-06-25 01:07:32.146 7fc006339700 -1 osd.6 pg_epoch: 210926 pg[3.53( v
210926'64271902 (210920'64268839,210926'64271902]
local-lis/les=210882/210883 n=6046 ec=56/56 lis/c 210882/210882 les/c/f
210883/210883/5620 210811/210882/210882) [6,22,12] r=0 lpr=210882
luod=210926'64271899 crt=210926'64271902 lcod 210926'64271898 mlcod
210926'64271898 active+clean+scrubbing+deep+inconsistent+repair]
_scan_snaps no clone_snaps for
3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 in 6aa5=[6aa5]:{}

secondary osd 1:
2021-06-25 01:07:31.934 7f9eae8fa700 -1 osd.22 pg_epoch: 210926 pg[3.53( v
210926'64271899 (210920'64268839,210926'64271899]
local-lis/les=210882/210883 n=6046 ec=56/56 lis/c 210882/210882 les/c/f
210883/210883/5620 210811/210882/210882) [6,22,12] r=1 lpr=210882 luod=0'0
lua=210881'64265352 crt=210926'64271899 lcod 210926'64271898
active+inconsistent mbc={}] _scan_snaps no clone_snaps for
3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 in 6aa5=[6aa5]:{}

secondary osd 2:
2021-06-25 01:07:30.828 7f94d6e61700 -1 osd.12 pg_epoch: 210926 pg[3.53( v
210926'64271899 (210920'64268839,210926'64271899]
local-lis/les=210882/210883 n=6046 ec=56/56 lis/c 210882/210882 les/c/f
210883/210883/5620 210811/210882/210882) [6,22,12] r=2 lpr=210882 luod=0'0
lua=210881'64265352 crt=210926'64271899 lcod 210926'64271898
active+inconsistent mbc={}] _scan_snaps no clone_snaps for
3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 in 6aa5=[6aa5]:{}

And nothing happens, it's still in a failed_repair state.

пт, 25 июн. 2021 г. в 00:36, Vladimir Prokofev :

> Hello.
>
> Today we've experienced a complete CEPH cluster outage - total loss of
> power in the whole infrastructure.
> 6 osd nodes and 3 monitors went down at the same time. CEPH 14.2.10
>
> This resulted in unfound objects, which were "reverted" in a hurry with
> ceph pg  mark_unfound_lost revert
> In retrospect that was probably a mistake as the "have" part stated 0'0.
>
> But then deep-scrubs started and they found inconsistent PGs. We tried
> repairing them, but they just switched to failed_repair.
>
> Here's a log example:
> 2021-06-25 00:08:07.693645 osd.0 [ERR] 3.c shard 6
> 3:3163e703:::rbd_data.be08c566ef438d.2445:head : missing
> 2021-06-25 00:08:07.693710 osd.0 [ERR] repair 3.c
> 3:3163e2ee:::rbd_data.efa86358d15f4a.004b:6ab1 : is an
> unexpected clone
> 2021-06-25 00:11:55.128951 osd.0 [ERR] 3.c repair 1 missing, 0
> inconsistent objects
> 2021-06-25 00:11:55.128969 osd.0 [ERR] 3.c repair 2 errors, 1 fixed
>
> I tried manually deleting conflicting objects from secondary osds
> with ceph-objectstore-tool like this
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-22 --pgid 3.c
> rbd_data.efa86358d15f4a.004b:6ab1 remove
> it removes it but without any positive impact. Pretty sure I don't
> understand the concept.
>
> So currently I have the following thoughts:
>  - is there any doc on the object placement specifics and what all of
> those numbers in their name mean? I've seen objects with similar prefix/mid
> but different suffix and I have no idea what does it mean;
>  - I'm actually not sure what the production impact is at that point
> because everything seems to work so far. So I'm thinking if it's possible
> to kill replicas on secondary OSDd with ceph-objectstore-tool and just let
> CEPH create a replica from primary PG?
>
> I have 8 scrub errors and 4 inconsistent+failed_repair PGs, and I'm afraid
> that further deep scrubs will reveal more errors.
> Any thoughts appreciated.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph fs mv does copy, not move

2021-06-24 Thread Frank Schilder
Hi Stefan,

> Isn't that where LazyIO is for? See ...

Yes, it is, to some extend. However, there are many large HPC applications that 
will not start using exotic libraries for IO. A parallel file system offers 
everything that is needed with standard OS library calls. This is better solved 
on the FS than the client side. We put the link to lazy IO on our cluster 
documentation over a year ago, but I cannot imagine any of our users starting 
to invest porting massive applications even though we have ceph. So far, nobody 
did.

Its also that HPC uses MPI, which comes with IO libraries users don't have 
influence on. I don't see this becoming a relevant alternative to a parallel 
file system any-time soon. Sorry.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Stefan Kooman 
Sent: 24 June 2021 20:01:16
To: Frank Schilder; Patrick Donnelly
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ceph fs mv does copy, not move

On 6/24/21 5:34 PM, Frank Schilder wrote:

> Please, in such situations where developers seem to have to make a definite 
> choice, consider the possibility of offering operators to choose the 
> alternative that suits their use case best. Adding further options seems far 
> better than limiting functionality in a way that becomes a terrible burden in 
> certain, if not many use cases.

Yeah, I agree.
>
> In ceph fs there have been many such decisions that allow for different 
> answers from a user/operator perspective. For example, I would prefer if I 
> could get rid of the attempted higher POSIX compliance level of ceph fs 
> compared with Lustre, just disable all the client-caps and cache-coherence 
> management and turn it into an awesome scale-out parallel file system. The 
> attempt of POSIX compliant handling of simultaneous writes to files offers 
> nothing to us, but costs huge in performance and forces users to move away 
> from perfectly reasonable HPC work flows. Also, that it takes a TTL to expire 
> before changes on one client become visible on another (unless direct_io is 
> used for all IO) is perfectly acceptable for us given the potential 
> performance gain due to simpler client-MDS communication.

Isn't that where LazyIO is for? See
https://docs.ceph.com/en/latest/cephfs/lazyio/

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Fox, Kevin M
I've actually had rook-ceph not proceed with something that I would have 
continued on with. Turns out I was wrong and it was right. Its checking was 
more through then mine. Thought that was pretty cool. It eventually cleared 
itself and finished up.

For a large ceph cluster, the orchestration is very nice.

Thanks,
Kevin


From: Sage Weil 
Sent: Thursday, June 24, 2021 1:46 PM
To: Marc
Cc: Anthony D'Atri; Nico Schottelius; Matthew Vernon; ceph-users@ceph.io
Subject: [ceph-users] Re: Why you might want packages not containers for Ceph 
deployments

Check twice before you click! This email originated from outside PNNL.


On Sun, Jun 20, 2021 at 9:51 AM Marc  wrote:
> Remarks about your cephadm approach/design:
>
> 1. I am not interested in learning podman, rook or kubernetes. I am using 
> mesos which is also on my osd nodes to use the extra available memory and 
> cores. Furthermore your cephadm OC is limited to only ceph nodes. While my 
> mesos OC is spread across a larger cluster and has rules when, and when not 
> to run tasks on the osd nodes. You incorrectly assume that rgw, grafana, 
> prometheus, haproxy are going to be ran on your ceph OC.

rgw, grafana, prom, haproxy, etc are all optional components.  The
monitoring stack is deployed by default but is trivially disabled via
a flag to the bootstrap command.  We are well aware that not everyone
wants these, but we cannot ignore the vast majority of users that
wants things to Just Work without figuring out how to properly deploy
and manage all of these extraneous integrated components.

> 2. Nico pointed out that you do not have alpine linux container images. I did 
> not even know you were using container images. So how big are these? Where 
> are these stored. And why are these not as small as they can be? Such an osd 
> container image should be 20MB or so at most. I would even expect statically 
> build binary container image, why even a tiny os?
> 4. Ok found the container images[2] (I think). Sorry but this has ‘nothing’ 
> to do with container thinking. I expected to find container images for osd, 
> msd, rgw separately and smaller. This looks more like an OS deployment.
Early on the team building the container images opted for a single
image that includes all of the daemons for simplicity.  We could build
stripped down images for each daemon type, but that's an investment in
developer time and complexity and we haven't heard any complaints
about the container size.  (Usually a few hundred MB on a large scale
storage server isn't a problem.)

> 3. Why is in this cephadm still being talked about systemd? Your orchestrator 
> should handle restarts,namespaces and failed tasks not? There should be no 
> need to have a systemd dependency, at least I have not seen any container 
> images relying on this.

Something needs to start the ceph daemon containers when the system
reboots.  We integrated with systemd since all major distros adopted
it.  Cephadm could be extended to support other init systems with
pretty minimal effort... we aren't doing anything fancy with systemd.

> 5. I have been writing this previously on the mailing list here. Is each rgw 
> still requiring its own dedicated client id? Is it still true, that if you 
> want to spawn 3 rgw instances, they need to authorize like client.rgw1, 
> client.rgw2 and client.rgw3?
> This does not allow for auto scaling. The idea of using an OC is that you 
> launch a task, and that you can scale this task automatically when necessary. 
> So you would get multiple instances of rgw1. If this is still and issue with 
> rgw, mds and mgr etc. Why even bother doing something with an OC and 
> containers?

The orchestrator automates the creation and cleanup of credentials for
each rgw instance.  (It also trivially scales them up/down, ala k8s.)
If you have an autoscaler, you just need to tell cephadm how many you
want and it will add/remove daemons.  If you are using cephadm's
ingress (haproxy) capability, the LB configuration will be adjusted
for you.  If you are using an external LB, you can query cephadm for a
description of the current daemons and their endpoints and feed that
info into your own ingress solution.

> 6. As I wrote before I do not want my rgw or haproxy running in a OC that has 
> the ability to give tasks capability SYSADMIN. So that would mean I have to 
> run my osd daemons/containers separately.

Only the OSD containers get extra caps to deal with the storage hardware.

> 7. If you are not setting cpu and memory limits on your cephadm containers, 
> then again there is an argument why even use containers.

Memory limits are partially implemented; we haven't gotten to CPU
limits yet.  It's on the list!

> 8. I still see lots of comments on the mailing list about accessing logs. I 
> have all my containers log to a remote syslog server, if you still have your 
> ceph daemons that can not do this (correctly). What point is it even going to 

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Fox, Kevin M
I bumped into this recently:
https://samuel.karp.dev/blog/2021/05/running-freebsd-jails-with-containerd-1-5/

:)

Kevin


From: Sage Weil 
Sent: Thursday, June 24, 2021 2:06 PM
To: Stefan Kooman
Cc: Nico Schottelius; Kai Börnert; Marc; ceph-users
Subject: [ceph-users] Re: Why you might want packages not containers for Ceph 
deployments

Check twice before you click! This email originated from outside PNNL.


On Tue, Jun 22, 2021 at 1:25 PM Stefan Kooman  wrote:
> On 6/21/21 6:19 PM, Nico Schottelius wrote:
> > And while we are at claiming "on a lot more platforms", you are at the
> > same time EXCLUDING a lot of platforms by saying "Linux based
> > container" (remember Ceph on FreeBSD? [0]).
>
> Indeed, and that is a more fundamental question: how easy it is to make
> Ceph a first-class citizen on non linux platforms. Was that ever a
> (design) goal? But then again, if you would be able to port docker
> natively to say OpenBSD, you should be able to run Ceph on it as well.

Thank you for bringing this up.  This is in fact a key reason why the
orchestration abstraction works the way it does--to allow other
runtime environments to be supported (FreeBSD!
sysvinit/Devuan/whatever for systemd haters!) while ALSO allowing an
integrated, user-friendly experience in which users workflow for
adding/removing hosts, replacing failed OSDs, managing services (MDSs,
RGWs, load balancers, etc) can be consistent across all platforms.
For 10+ years we basically said "out of scope" to these pesky
deployment details and left this job to Puppet, Chef, Ansible,
ceph-deploy, rook, etc., but the result of that strategy was pretty
clear: ceph was hard to use and the user experience dismal when
compared to an integrated product from any half-decent enterprise
storage company, or products like Martin's that capitalize on core
ceph's bad UX.

The question isn't whether we support other environments, but how.  As
I mentioned in one of my first messages, we can either (1) generalize
cephadm to work in other environments (break the current
systemd+container requirement), or (2) add another orchestrator
backend that supports a new environment.  I don't have any well-formed
opinion here.  There is a lot of pretty generic "orchestration" logic
in cephadm right now that isn't related to systemd or containers that
could either be pulled out of cephadm into the mgr/ochestrator layer
or a library.  Or an independent, fresh orch backend implementation
could opt for a very different approach or set of opinions.

Either way, my assumption has been that these other environments would
probably not be docker|podman-based.  In the case of FreeBSD we'd
probably want to use jails or whatever.  But anything is possible.

s
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io