There are likely simpler answers if you want to tier entire buckets, but
it sounds like you are hosting a filesystem(s) on NetApp and want to
tier them. It would be nice to have NetApp running Ceph as a block
store, but I don't think crush is sophisticated enough to migrate
components of a filesystem pool based on the ages of files/directories
in them. For one thing, I'm not sure that the PGs in the pool can/should
be that aware of such details or that you might not get into problems
with fragments of files in different PGs with much of a PG being un-aged
data. So I'm not optimistic on that concept.
What that suggests to me is that you might use an overlay filesystem,
where the different tiers overlay each other to present a unified
filesystem image. This is precisely what containers do, although much of
their goal is simply optimising shared image layers. A variation of this
is Copy-on-Write (COW), but what you want is more like the reverse.
At any rate, a frontend overlay filesystem with NetApp overlaying a
secondary Ceph system seems like a likely solution. Then all you'd need
would be a mechanism to move aged-out resources. That might even be a
good use of rsync.
Tim
On 5/4/25 10:20, sacawulu wrote:
Hi all,
We're exploring solutions to offload large volumes of data (on the
order of petabytes) from our NetApp all-flash storage to our more
cost-effective, HDD-based Ceph storage cluster, based on criteria such
as: last access time older than X years.
Ideally, we would like to leave behind a 'stub' or placeholder file on
the NetApp side to preserve the original directory structure and
potentially enable some sort of transparent access or recall if
needed. This kind of setup is commonly supported by solutions like
DataCore/FileFly, but as far as we can tell, FileFly doesn’t support
Ceph as a backend and instead favors its own Swarm object store.
Has anyone here implemented a similar tiering/archive/migration
solution involving NetApp and Ceph?
We’re specifically looking for:
* Enterprise-grade tooling
* Stub file support or similar metadata-preserving offload
* Support and reliability (given the scale, we can’t afford data
loss or inconsistency)
* Either commercial or well-supported open source solutions
Any do’s/don’ts, war stories, or product recommendations would be
greatly appreciated. We’re open to paying for software or services if
it brings us the reliability and integration we need.
Thanks in advance!
MJ
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io