[ceph-users] NIH Datasets

2025-04-06 Thread Linas Vepstas
OK what you will read below might sound insane but I am obliged to ask.

There are 275 petabytes of NIH data at risk of being deleted. Cancer
research, medical data, HIPAA type stuff. Currently unclear where it's
located, how it's managed, who has access to what, but lets ignore
that for now. It's presumably splattered across data centers, cloud,
AWS, supercomputing labs, who knows. Everywhere.

I'm talking to a biomed person in Australias that uses NCBI data
daily, she's in talks w/ Australian govt to copy and preserve the
datasets they use. Some multi-petabytes of stuff. I don't know.

While bouncing around tech ideas, IPFS and Ceph came up. My experience
with IPFS is that it's not a serious contender for anything. My
experience with Ceph is that it's more-or-less A-list.

OK. So here's the question: is it possible to (has anyone tried) set
up an internet-wide Ceph cluster? Ticking off the typical checkboxes
for "decentralized storage"? Stuff, like: internet connections need to
be encrypted. Connections go down, come back up. Slow. Sure, national
labs may have multi-terabit fiber, but little itty-bitty participants
trying to contribute a small collection of disks to a large pool might
only have a gigabit connection, of which maybe 10% is "usable".
Barely. So, a hostile networking environment.

Is this like, totally insane, run away now, can't do that, it won't
work idea, or is there some glimmer of hope?

Am I misunderstanding something about IPFS that merits taking a second
look at it?

Is there any other way of getting scalable reliable "decentralized"
internet-wide storage?

I mean, yes, of course, the conventional answer is that it could be
copied to AWS or some national lab or two somewhere in the EU or Aus
or UK or where-ever, That's the "obvious" answer. I'm looking for a
non-obvious answer, an IPFS-like thing, but one that actually works.
Could it work?

-- Linas


-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-06 Thread Jeremy Hansen
Snipped some of the irrelevant logs to keep message size down.

ceph config-key get mgr/cephadm/upgrade_state

{"target_name": "quay.io/ceph/ceph:v17.2.0", "progress_id": 
"e7e1a809-558d-43a7-842a-c6229fdc57af", "target_id": 
"e1d6a67b021eb077ee22bf650f1a9fb1980a2cf5c36bdb9cba9eac6de8f702d9", 
"target_digests": 
["quay.io/ceph/ceph@sha256:12a0a4f43413fd97a14a3d47a3451b2d2df50020835bb93db666209f3f77617a",
 
"quay.io/ceph/ceph@sha256:cb4d698cb769b6aba05bf6ef04f41a7fe694160140347576e13bd9348514b667"],
 "target_version": "17.2.0", "fs_original_max_mds": null, 
"fs_original_allow_standby_replay": null, "error": null, "paused": false, 
"daemon_types": null, "hosts": null, "services": null, "total_count": null, 
"remaining_count": null}

What should I do next?

Thank you!
-jeremy

> On Sunday, Apr 06, 2025 at 1:38 AM, Eugen Block  (mailto:ebl...@nde.ag)> wrote:
> Can you check if you have this config-key?
>
> ceph config-key get mgr/cephadm/upgrade_state
>
> If you reset the MGRs, it might be necessary to clear this key,
> otherwise you might end up in some inconsistency. Just to be sure.
>
> Zitat von Jeremy Hansen :
>
> > Thanks. I’m trying to be extra careful since this cluster is
> > actually in use. I’ll wait for your feedback.
> >
> > -jeremy
> >
> > > On Saturday, Apr 05, 2025 at 3:39 PM, Eugen Block  > > (mailto:ebl...@nde.ag)> wrote:
> > > No, that's not necessary, just edit the unit.run file for the MGRs to
> > > use a different image. See Frédéric's instructions:
> > >
> > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/32APKOXKRAIZ7IDCNI25KVYFCCCF6RJG/
> > >
> > > But I'm not entirely sure if you need to clear some config-keys first
> > > in order to reset the upgrade state. If I have time, I'll try to check
> > > tomorrow, or on Monday.
> > >
> > > Zitat von Jeremy Hansen :
> > >
> > > > Would I follow this process to downgrade?
> > > >
> > > >
> > > https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-manager-daemon
> > > >
> > > > Thank you
> > > >
> > > > > On Saturday, Apr 05, 2025 at 2:04 PM, Jeremy Hansen
> > > > > mailto:jer...@skidrow.la)> wrote:
> > > > > ceph -s claims things are healthy:
> > > > >
> > > > > ceph -s
> > > > > cluster:
> > > > > id: 95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1
> > > > > health: HEALTH_OK
> > > > >
> > > > > services:
> > > > > mon: 3 daemons, quorum cn01,cn03,cn02 (age 20h)
> > > > > mgr: cn03.negzvb(active, since 26m), standbys: cn01.tjmtph,
> > > > > cn02.ceph.xyz.corp.ggixgj
> > > > > mds: 1/1 daemons up, 2 standby
> > > > > osd: 15 osds: 15 up (since 19h), 15 in (since 14M)
> > > > >
> > > > > data:
> > > > > volumes: 1/1 healthy
> > > > > pools: 6 pools, 610 pgs
> > > > > objects: 284.59k objects, 1.1 TiB
> > > > > usage: 3.3 TiB used, 106 TiB / 109 TiB avail
> > > > > pgs: 610 active+clean
> > > > >
> > > > > io:
> > > > > client: 255 B/s rd, 1.2 MiB/s wr, 10 op/s rd, 16 op/s wr
> > > > >
> > > > >
> > > > >
> > > > > —
> > > > > How do I downgrade if the orch is down?
> > > > >
> > > > > Thank you
> > > > > -jeremy
> > > > >
> > > > >
> > > > >
> > > > > > On Saturday, Apr 05, 2025 at 1:56 PM, Eugen Block  > > > > (mailto:ebl...@nde.ag)> wrote:
> > > > > > It would help if you only pasted the relevant parts. Anyway, these 
> > > > > > two
> > > > > > sections stand out:
> > > > > >
> > > > > > ---snip---
> > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp
> > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]:
> > > > > > debug 2025-04-05T20:33:48.909+ 7f26f0200700 0 [balancer INFO 
> > > > > > root]
> > > > > > Some PGs (1.00) are unknown; try again later
> > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp
> > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]:
> > > > > > debug 2025-04-05T20:33:48.917+ 7f2663400700 -1 mgr load Failed 
> > > > > > to
> > > > > > construct class in 'cephadm'
> > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp
> > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]:
> > > > > > debug 2025-04-05T20:33:48.917+ 7f2663400700 -1 mgr load 
> > > > > > Traceback
> > > > > > (most recent call last):
> > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp
> > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]:
> > > > > > File "/usr/share/ceph/mgr/cephadm/module.py", line 470, in __init__
> > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp
> > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]:
> > > > > > self.upgrade = CephadmUpgrade(self)
> > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp
> > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]:
> > > > > > File "/usr/share/ceph/mgr/cephadm/upgrade.py", line 112, in __init__
> > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp
> > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]:
> > > > > > self.upgrade_state: Optional[UpgradeState] =
> > > > > > UpgradeState.from_json(json.loads(t))
> > > > > > Apr 05 2