Re: About Using Hadoop in SolrCloud

David Smiley Mon, 27 Feb 2023 15:01:23 -0800

Yes; this was shocking to me at first because the implications are big and
it's almost a secret.  Ideally the ref guide would scream this loudly;
users today care *way* more about S3 than HDFS.  The "HDFS" Solr module
uses the HDFS client API which has a pluggable back-end, and thus you can
have it talk to S3.  You can search the user list for this; maybe JIRA.
I've briefly dabbled with it (got stuck with incompatible versions) but I
know others have done this (presumably at earlier versions than what I used
at the time).  It's a simple matter of adding the correct JAR files and
some trivial configuration.  The main problem is that such a home-brew
concoction of theoretically compatible things is on your shoulders to
debug/support.  Solr isn't testing its support for this; it will fail for
some versions as it did for me.  Maybe Solr *should* test/support this.


~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Feb 23, 2023 at 10:59 PM Zara Parst <edotserv...@gmail.com> wrote:

> David, you made a point. Is it true we can keep indexes to S3? I mean index
> under use not the backup ?
>
> On Fri, Feb 24, 2023 at 1:11 AM David Smiley <dsmi...@apache.org> wrote:
>
> > I agree with Eric, but wish to add one point:  Separation of compute from
> > storage to get: better redundancy (HDFS or S3 will do it better, maybe
> > cheaper), better elasticity (since Solr nodes become stateless; easy to
> add
> > more nodes), better cost?  Sacrifice indexing performance and a bit of
> > query.  Admittedly I don't have real experience here but this is my
> > thinking.  The most annoying thing about Solr's HDFS support is that
> > SolrCloud's replication is quite redundant/wasteful with that at the
> > storage layer, thus adding cost inefficiency. There is potential for
> > improvements there.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Thu, Feb 23, 2023 at 7:45 AM Eric Pugh <
> ep...@opensourceconnections.com
> > >
> > wrote:
> >
> > > I am replying, but just to the users mailing list, as it’s not
> > appropriate
> > > for dev@.
> > >
> > > I think the short answer is that if you are already super into the
> Hadoop
> > > ecosystem, then you already have strong reasons why, and you can answer
> > all
> > > of your questions listed already ;-).  You then look at Solr on Hadoop
> as
> > > “hey, it works with what I am already doing” at my enterprise.
> > >
> > > If you aren’t already in the Hadoop ecosystem, then there isn’t any
> > > special Solr specific reason to go this way, and indeed many reasons
> NOT
> > > to.   Hadoop isn’t for the faint of heart….
> > >
> > > Not an answer per se….
> > >
> > > > On Feb 23, 2023, at 5:57 AM, Zara Parst <edotserv...@gmail.com>
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I read at many places about using Hadoop in solrCloud. I try to find
> > the
> > > > reason why to use Hadoop in place of a local file system. Can someone
> > > > briefly explain why to use Hadoop with SolrCloud when solr is just
> > using
> > > > Hadoop for indexing and storing logs in Hadoop. Is there any
> compelling
> > > > reason to do that?
> > > >
> > > > Is Hadoop having any advantage over the local file system with solr,
> > > since
> > > > I can achieve cloud mod storing index in the local file system and
> can
> > > > still use shard and replica.  So my question is what advantage Hadoop
> > > will
> > > > give me, does Hadoop do indexing fast, does Hadoop take less space to
> > > store
> > > > index, is that distributed file system is better in Hadoop, like
> > > sharding,
> > > > replication etc. Or does it take backup automatically?
> > > >
> > > > Please do answer this question as much as possible,
> > >
> > > _______________________
> > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> |
> > > http://www.opensourceconnections.com <
> > > http://www.opensourceconnections.com/> | My Free/Busy <
> > > http://tinyurl.com/eric-cal>
> > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> > >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> > >
> > >
> > > This e-mail and all contents, including attachments, is considered to
> be
> > > Company Confidential unless explicitly stated otherwise, regardless of
> > > whether attachments are marked as such.
> > >
> > >
> >
>

Re: About Using Hadoop in SolrCloud

Reply via email to