Re: [EXTERNAL] Re: [EXT] Re: Looking for expertise on comparing Solr search to Postgres full-text search

Davis, Daniel (NIH/NLM) [C] Thu, 17 Mar 2022 12:17:32 -0700

This is really a question of how big the haystack is and what sort of search 
task users are trying to accomplish.


If there is no IDF (a mistake I did *not* make at https://www.indexengines.com/ 
despite using home-grown search BTW), then there is an assumption both on the 
size of the documents being similar and also on corpora linguistics.

In any case, if users are basically doing "Known Item Search", e.g. entering in 
keywords from a title, then PostgreSQL should do OK.

On 3/17/22, 1:34 PM, "Alessandro Benedetti" <a.benede...@sease.io> wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you recognize the sender and are 
confident the content is safe.


    Ok Charlie, Eric,
    we are on the same page.
    I agree it's definitely possible with some custom proxy work on both Quepid
    and RRE, I meant it's not possible to directly point to the DB (for example
    via JDBC).
    Thanks!

    Cheers
    --------------------------
    Alessandro Benedetti
    Apache Lucene/Solr PMC member and Committer
    Director, R&D Software Engineer, Search Consultant

    www.sease.io


    On Thu, 17 Mar 2022 at 17:03, Bayer, Samuel <s...@mitre.org> wrote:

    > You are, indeed :-).
    >
    > What appears to be the problem - and I'm not sure yet, but it sure seems
    > like a good culprit - is that Postgres search, for reasons that mystify 
me,
    > was implemented with TF but no notion of IDF. There are various extensions
    > that add IDF-like properties to Postgres search. Why it didn't start out
    > that way is a mystery to me, and I don't know how stable any of the
    > extensions that do this actually are.
    >
    > At the moment, that's my diagnosis of the discrepancy. I'll probably
    > follow up with the Postgres folks to see if they have any more insight 
into
    > those extensions.
    >
    > Thanks to all who responded.
    >
    > Cordially,
    > Sam Bayer
    > The MITRE Corporation
    >
    > On 3/17/22 12:42 PM, Eric Pugh wrote:
    > > What I’ve done to compare other search engines with RRE and Quepid is to
    > put a proxy in the middle that converts your query into what looks like a
    > Solr request/response ;-).  This works great for custom Search API’s, and 
I
    > *guess* you could do it with database backed search?
    > >
    > > Now we are probably getting beyond what Sam was hoping to do!
    > >
    > >
    > >
    > >
    > >> On Mar 17, 2022, at 11:56 AM, Alessandro Benedetti <
    > a.benede...@sease.io> wrote:
    > >>
    > >> This is an interesting question.
    > >> I second both comments so far (from Eric and David), but I am afraid at
    > the
    > >> moment the open-source tools for search quality evaluation can't really
    > >> compare Postgres to Solr.
    > >> As far as I know, both Quepid(Eric correct me if I am wrong) and RRE(
    > >> https://github.com/SeaseLtd/rated-ranking-evaluator and also the
    > Enterprise
    > >> version) are able to compare only Apache Solr and Elasticsearch backed
    > >> systems (against each other, or against different configurations).
    > >>
    > >> In general, I would recommend following David's suggestions:
    > >> - collect your requirements(both functional and performance-wise)
    > >> - compare
    > >>
    > >> I have seen in the past many times DB used as terrible search engines
    > and
    > >> search engines used as terrible DB.
    > >> Many times I have seen queries on a search engine to perform poorly
    > because
    > >> they were designed as they were DB queries.
    > >>
    > >> Cheers
    > >>
    > >> --------------------------
    > >> Alessandro Benedetti
    > >> Apache Lucene/Solr PMC member and Committer
    > >> Director, R&D Software Engineer, Search Consultant
    > >>
    > >> www.sease.io
    > >>
    > >>
    > >> On Sat, 5 Mar 2022 at 05:04, David Smiley <dsmi...@apache.org> wrote:
    > >>
    > >>> Hello Sam,
    > >>>
    > >>> You are a familiar name from my MITRE days :-)
    > >>>
    > >>> Check out Solr's feature list and see how it compares to that of
    > Postgres.
    > >>> If you are only doing the most basic default relevancy ranked top-N
    > search
    > >>> with default text analysis, then the tech/maintenance overhead might
    > not be
    > >>> worth it.  I'm looking at this as such an example:
    > >>> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=solr
    > >>>
    > >>> On the other hand, if you want to ensure that you're able to make
    > search
    > >>> the best it can be for your users, then keeping Solr and using it more
    > will
    > >>> get you there; a database won't.  To a database, full-text-search is
    > just
    > >>> one checkbox of many concerns.  The capabilities there are usually 
very
    > >>> simple.  It's fine for a demo/POC -- getting started.
    > >>>
    > >>> One feature in particular I want to call out is faceting.  To some
    > apps,
    > >>> it's a game changer that can pivot the UX from merely having a basic
    > search
    > >>> box to having navigation filters and everything else, at which point
    > Solr
    > >>> is the foundation of what's driving the UX.  I've seen people/apps 
miss
    > >>> this -- the user experience is so clumsy without it for 
rich/structured
    > >>> data in particular.  If you've ever used a Maven repository manager
    > like
    > >>> Nexus or it's competitors (last I checked), they are still stuck in 
the
    > >>> stone-age -- it's painful when you've been exposed to so much better.
    > On
    > >>> the backend, if all you know is a database, you may not see how to
    > make a
    > >>> faceting UI work because it's rather unnatural for SQL.
    > >>>
    > >>> Eric's response was great too.
    > >>>
    > >>> ~ David Smiley
    > >>> Apache Lucene/Solr Search Developer
    > >>> http://www.linkedin.com/in/davidwsmiley
    > >>>
    > >>>
    > >>> On Fri, Mar 4, 2022 at 9:33 AM Bayer, Samuel <s...@mitre.org> wrote:
    > >>>
    > >>>> Hi all -
    > >>>>
    > >>>> In the interest of reducing my technology stack, I'm exploring 
whether
    > >>>> using Postgres full-text search instead of Solr might be an option
    > when I
    > >>>> need both complex querying and full-text search. In my experience, so
    > >>> far,
    > >>>> Postgres can't compare to Solr, but I'm trying to understand why, in
    > >>> order
    > >>>> to have more of an ability to evaluate the functionality/complexity
    > >>>> tradeoffs. I know something about search technologies, but I'm not an
    > >>>> expert by any stretch of the imagination, and I've been looking for
    > >>> sources
    > >>>> that talk about the comparison in an informed way - people, blogs,
    > >>>> articles. So far, everything I've found is extremely basic. Does
    > anyone
    > >>>> have any pointers for me?
    > >>>>
    > >>>> Thanks in advance -
    > >>>> Sam Bayer
    > >>>> The MITRE Corporation
    > >>>> s...@mitre.org
    > >>>>
    > >>>
    > >
    > > _______________________
    > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
    > http://www.opensourceconnections.com <
    > http://www.opensourceconnections.com/> | My Free/Busy <
    > http://tinyurl.com/eric-cal>
    > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
    > 
https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    >
    > > This e-mail and all contents, including attachments, is considered to be
    > Company Confidential unless explicitly stated otherwise, regardless of
    > whether attachments are marked as such.
    > >
    > >
    >

Re: [EXTERNAL] Re: [EXT] Re: Looking for expertise on comparing Solr search to Postgres full-text search

Reply via email to