I'm curious about this too. There's a bunch of difficult to maintain code in our codebase relating to HDFS and a lot of the HDFS tests are super flakey. I've had the impression it was mostly added because there was a point at which "HDFS all the things" was a fad. I haven't personally ever seen it in actual use at a customer. I have a suspicion it's mostly attractive to places that made a big HDFS investment. Not having heard success stories if a customer asked I'd advise them "don't do it unless someone is forcing you onto HDFS, I don't know anyone using it and the tests fail frequently so it may be buggy." I don't think I've ever gone to http://fucit.org/solr-jenkins-reports/failure-report.html and not seen 1/2 to 1/4 of the tests with recent failures not have HDFS in the name...
On Sat, Aug 31, 2024 at 3:04 AM ufuk yılmaz <uyil...@vivaldi.net.invalid> wrote: > Hi, > > It is possible to put Solr index on hdfs instead of a regular disk, but I > wonder if there is a significant upside of that approach? > > Is it to take advantage of hdfs’ replication to protect data from disk > failures? > > Is it mostly for the situation “I already have a functioning hdfs cluster, > lets just reuse that instead of dealing with local disks” or is there still > an upside of setting up hdfs just to use with Solr? > > —ufuk > > — > -- http://www.needhamsoftware.com (work) https://a.co/d/b2sZLD9 (my fantasy fiction book)