Yes; this was shocking to me at first because the implications are big and it's almost a secret. Ideally the ref guide would scream this loudly; users today care *way* more about S3 than HDFS. The "HDFS" Solr module uses the HDFS client API which has a pluggable back-end, and thus you can have it talk to S3. You can search the user list for this; maybe JIRA. I've briefly dabbled with it (got stuck with incompatible versions) but I know others have done this (presumably at earlier versions than what I used at the time). It's a simple matter of adding the correct JAR files and some trivial configuration. The main problem is that such a home-brew concoction of theoretically compatible things is on your shoulders to debug/support. Solr isn't testing its support for this; it will fail for some versions as it did for me. Maybe Solr *should* test/support this.
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Thu, Feb 23, 2023 at 10:59 PM Zara Parst <edotserv...@gmail.com> wrote: > David, you made a point. Is it true we can keep indexes to S3? I mean index > under use not the backup ? > > On Fri, Feb 24, 2023 at 1:11 AM David Smiley <dsmi...@apache.org> wrote: > > > I agree with Eric, but wish to add one point: Separation of compute from > > storage to get: better redundancy (HDFS or S3 will do it better, maybe > > cheaper), better elasticity (since Solr nodes become stateless; easy to > add > > more nodes), better cost? Sacrifice indexing performance and a bit of > > query. Admittedly I don't have real experience here but this is my > > thinking. The most annoying thing about Solr's HDFS support is that > > SolrCloud's replication is quite redundant/wasteful with that at the > > storage layer, thus adding cost inefficiency. There is potential for > > improvements there. > > > > ~ David Smiley > > Apache Lucene/Solr Search Developer > > http://www.linkedin.com/in/davidwsmiley > > > > > > On Thu, Feb 23, 2023 at 7:45 AM Eric Pugh < > ep...@opensourceconnections.com > > > > > wrote: > > > > > I am replying, but just to the users mailing list, as it’s not > > appropriate > > > for dev@. > > > > > > I think the short answer is that if you are already super into the > Hadoop > > > ecosystem, then you already have strong reasons why, and you can answer > > all > > > of your questions listed already ;-). You then look at Solr on Hadoop > as > > > “hey, it works with what I am already doing” at my enterprise. > > > > > > If you aren’t already in the Hadoop ecosystem, then there isn’t any > > > special Solr specific reason to go this way, and indeed many reasons > NOT > > > to. Hadoop isn’t for the faint of heart…. > > > > > > Not an answer per se…. > > > > > > > On Feb 23, 2023, at 5:57 AM, Zara Parst <edotserv...@gmail.com> > wrote: > > > > > > > > Hi, > > > > > > > > I read at many places about using Hadoop in solrCloud. I try to find > > the > > > > reason why to use Hadoop in place of a local file system. Can someone > > > > briefly explain why to use Hadoop with SolrCloud when solr is just > > using > > > > Hadoop for indexing and storing logs in Hadoop. Is there any > compelling > > > > reason to do that? > > > > > > > > Is Hadoop having any advantage over the local file system with solr, > > > since > > > > I can achieve cloud mod storing index in the local file system and > can > > > > still use shard and replica. So my question is what advantage Hadoop > > > will > > > > give me, does Hadoop do indexing fast, does Hadoop take less space to > > > store > > > > index, is that distributed file system is better in Hadoop, like > > > sharding, > > > > replication etc. Or does it take backup automatically? > > > > > > > > Please do answer this question as much as possible, > > > > > > _______________________ > > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 > | > > > http://www.opensourceconnections.com < > > > http://www.opensourceconnections.com/> | My Free/Busy < > > > http://tinyurl.com/eric-cal> > > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > > > > > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > > > > > > > This e-mail and all contents, including attachments, is considered to > be > > > Company Confidential unless explicitly stated otherwise, regardless of > > > whether attachments are marked as such. > > > > > > > > >