Hi Alex,

The data purge is political. Data includes research on gun violence,
sexually transmitted disease, you name it. The scientists keeping the
data were keeping it for conventional science reasons. Labs are
typically funded, and storage costs are paid for by "principal
investigators": the big name getting the research grants, determining
the research directions, hiring the junior scientists to do the work
(and paying rent for the lab, buying the equipment).  Since a large
number of principal investigators were fired, and their labs have been
dissolved, equipment sold off, the data that those labs kept is next.
Some of this data is shared internationally.

I've been talking to an Australian genomics researcher, they're
working with the Australian govt to find funding to purchase the
servers and storage needed to obtain and preserve copies of datasets.
Those conversations reveal to me that it's utter chaos and confusion;
no one knows anything, no one knows how to copy datasets, how much
time they have to do this. A month ago, everything was fine: they used
some random gnu-R or SciPy widgets that query some ncbi.nih.gov
database. Now it's "where is this database and how would I copy it?"
types of questions/conversations. Chaos. These are biologists, not
programmers, not system admins. No visibility into the network
architecture. No idea ***at all***.

Like all things chaotic, it's difficult to discern what the actual
problem is, what the danger is, what the timescale is, who's doing
what.  As to the scale: something like 20K or 50K workers at NIH were
fired. I cannot keep track of the headlines. Headlines include reports
that lab animals, lab mice were just abandoned and have died of
thirst. Was this one lab? A dozen labs? Is this some fake internet
news report? I can't tell. But it's a thermometer for gauging the
rapidity of the shutdown: not even orderly enough to deal with
cleaning out the labs before turning off the lights.

As to tape backup: I am clearly more online than you are, because just
a few days ago, DOGE announced that they "saved" almost $1M (That's M
as in million -- six zeros) of "unnecessary and wasteful expenditures"
by destroying a tape library with 75K tapes in it.  These are the
types of headlines circulating around.

Politically, this is the destruction of health and biology research in
America. Will the Europeans pick up the slack? Who knows. The
short-term tactical issue is about the preservation of datasets. The
long-term strategic issue is designing storage solutions that are
robust against political attack.

The short-term situation is total chaos, and I understand nothing at
all about the status. The long-term strategic issue is one I've been
keeping an eye on for decades, and is what led me to Ceph.

-- Linas

On Tue, Apr 8, 2025 at 8:28 AM Alex Gorbachev <a...@iss-integration.com> wrote:
>
> Hi Linas,
>
> Is the intent of purging of this data mainly due to just cost concerns?  If 
> the goal is purely preservation of data, the likely cheapest and least 
> maintenance intensive way of doing this is a large scale tape archive.  Such 
> archives (purely based on a google search) exist at LLNL and OU, and there is 
> a TAPAS service from SpectraLogic.
>
> I would imagine questions would arise about custody of the data, legal 
> implications etc.  The easiest is for the organization already hosting the 
> data to just preserve it by archiving, and thereby claim a significant cost 
> reduction.
>
> --
> Alex Gorbachev
>
>
>
>
> On Sun, Apr 6, 2025 at 11:08 PM Linas Vepstas <linasveps...@gmail.com> wrote:
>>
>> OK what you will read below might sound insane but I am obliged to ask.
>>
>> There are 275 petabytes of NIH data at risk of being deleted. Cancer
>> research, medical data, HIPAA type stuff. Currently unclear where it's
>> located, how it's managed, who has access to what, but lets ignore
>> that for now. It's presumably splattered across data centers, cloud,
>> AWS, supercomputing labs, who knows. Everywhere.
>>
>> I'm talking to a biomed person in Australias that uses NCBI data
>> daily, she's in talks w/ Australian govt to copy and preserve the
>> datasets they use. Some multi-petabytes of stuff. I don't know.
>>
>> While bouncing around tech ideas, IPFS and Ceph came up. My experience
>> with IPFS is that it's not a serious contender for anything. My
>> experience with Ceph is that it's more-or-less A-list.
>>
>> OK. So here's the question: is it possible to (has anyone tried) set
>> up an internet-wide Ceph cluster? Ticking off the typical checkboxes
>> for "decentralized storage"? Stuff, like: internet connections need to
>> be encrypted. Connections go down, come back up. Slow. Sure, national
>> labs may have multi-terabit fiber, but little itty-bitty participants
>> trying to contribute a small collection of disks to a large pool might
>> only have a gigabit connection, of which maybe 10% is "usable".
>> Barely. So, a hostile networking environment.
>>
>> Is this like, totally insane, run away now, can't do that, it won't
>> work idea, or is there some glimmer of hope?
>>
>> Am I misunderstanding something about IPFS that merits taking a second
>> look at it?
>>
>> Is there any other way of getting scalable reliable "decentralized"
>> internet-wide storage?
>>
>> I mean, yes, of course, the conventional answer is that it could be
>> copied to AWS or some national lab or two somewhere in the EU or Aus
>> or UK or where-ever, That's the "obvious" answer. I'm looking for a
>> non-obvious answer, an IPFS-like thing, but one that actually works.
>> Could it work?
>>
>> -- Linas
>>
>>
>> --
>> Patrick: Are they laughing at us?
>> Sponge Bob: No, Patrick, they are laughing next to us.
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to