Hi Alex,
"Cost concerns" is the fig leaf that is being used in many cases, but
often a closer look indicates political motivations.
The current US administration is actively engaged in the destruction of
anything that would conflict with their view of the world. That includes
health practices - especially regarding vaccination, climate data, the
role of women and non-white people in history, and whatever else offends
their fragile minds. For example, here in the Free State of Florida, the
governor has been promoting the idea that slavery was not a bad thing
because it gave forcibly-imported black people "useful job skills",
textbooks must now refer to the "Gulf of America", fluoridation of water
is a Bad Thing, and much more.
Famous non-white people are being scrubbed from military websites and
even national park webpages - and non-white, non-male people being fired
from top-level government/military positions. Some are even joking that
Harriet Tubman be re-classified as a "human trafficer" (this is in
reference to the Underground Railroad).
Which is why I don't think we should stop at just saving NIH data.
Virtually all government-controlled data is at risk.
And by the way, DOGE has just bragged that they saved the US government
a whole million dollars by getting rid of records on magnetic tape
(where they put the data afterwards wasn't said). So forget magtape
archives inside the government itself.
The science-fiction novel "A Canticle for Liebowitz" by Walter M. Miller
outlines a post-nuclear future where the survivors rebel against
knowledge, proudly bragging of being "simpletons" and burning books (and
I think also educated people). Their USA counterpart is MAGA, who
inherited a long history of "I don't need no librul education, I got's
comun since!". Or, as Isaac Asimov put it, the idea that "my ignorance
is just as good as your knowledge". This is not a concept unique to the
USA, but the monkeys are firmly in charge of the zoo at this point so
protecting everything we can is really important.
Tim
On 4/8/25 09:28, Alex Gorbachev wrote:
Hi Linas,
Is the intent of purging of this data mainly due to just cost concerns? If
the goal is purely preservation of data, the likely cheapest and least
maintenance intensive way of doing this is a large scale tape archive.
Such archives (purely based on a google search) exist at LLNL and OU, and
there is a TAPAS service from SpectraLogic.
I would imagine questions would arise about custody of the data, legal
implications etc. The easiest is for the organization already hosting the
data to just preserve it by archiving, and thereby claim a significant cost
reduction.
--
Alex Gorbachev
On Sun, Apr 6, 2025 at 11:08 PM Linas Vepstas <linasveps...@gmail.com>
wrote:
OK what you will read below might sound insane but I am obliged to ask.
There are 275 petabytes of NIH data at risk of being deleted. Cancer
research, medical data, HIPAA type stuff. Currently unclear where it's
located, how it's managed, who has access to what, but lets ignore
that for now. It's presumably splattered across data centers, cloud,
AWS, supercomputing labs, who knows. Everywhere.
I'm talking to a biomed person in Australias that uses NCBI data
daily, she's in talks w/ Australian govt to copy and preserve the
datasets they use. Some multi-petabytes of stuff. I don't know.
While bouncing around tech ideas, IPFS and Ceph came up. My experience
with IPFS is that it's not a serious contender for anything. My
experience with Ceph is that it's more-or-less A-list.
OK. So here's the question: is it possible to (has anyone tried) set
up an internet-wide Ceph cluster? Ticking off the typical checkboxes
for "decentralized storage"? Stuff, like: internet connections need to
be encrypted. Connections go down, come back up. Slow. Sure, national
labs may have multi-terabit fiber, but little itty-bitty participants
trying to contribute a small collection of disks to a large pool might
only have a gigabit connection, of which maybe 10% is "usable".
Barely. So, a hostile networking environment.
Is this like, totally insane, run away now, can't do that, it won't
work idea, or is there some glimmer of hope?
Am I misunderstanding something about IPFS that merits taking a second
look at it?
Is there any other way of getting scalable reliable "decentralized"
internet-wide storage?
I mean, yes, of course, the conventional answer is that it could be
copied to AWS or some national lab or two somewhere in the EU or Aus
or UK or where-ever, That's the "obvious" answer. I'm looking for a
non-obvious answer, an IPFS-like thing, but one that actually works.
Could it work?
-- Linas
--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io