Re: [EXTERNAL] Re: [EXT] Re: Looking for expertise on comparing Solr search to Postgres full-text search

Dave Thu, 17 Mar 2022 13:41:16 -0700

I’m a big believer in the right tool for the job.  Like what said before if 
you’re doing just a field:value query or four and no complications, sure use a 
standard rdbms. But if you inform the client that something like 
Leaves And whitm* title^3 with bf:title^3 author ^2 
Is possible, the conversation changes with the right questions.


> On Mar 17, 2022, at 3:17 PM, Davis, Daniel (NIH/NLM) [C] 
> <daniel.da...@nih.gov.invalid> wrote:
> 
> This is really a question of how big the haystack is and what sort of search 
> task users are trying to accomplish.
> 
> If there is no IDF (a mistake I did *not* make at 
> https://www.indexengines.com/ despite using home-grown search BTW), then 
> there is an assumption both on the size of the documents being similar and 
> also on corpora linguistics.
> 
> In any case, if users are basically doing "Known Item Search", e.g. entering 
> in keywords from a title, then PostgreSQL should do OK.
> 
> On 3/17/22, 1:34 PM, "Alessandro Benedetti" <a.benede...@sease.io> wrote:
> 
>    CAUTION: This email originated from outside of the organization. Do not 
> click links or open attachments unless you recognize the sender and are 
> confident the content is safe.
> 
> 
>    Ok Charlie, Eric,
>    we are on the same page.
>    I agree it's definitely possible with some custom proxy work on both Quepid
>    and RRE, I meant it's not possible to directly point to the DB (for example
>    via JDBC).
>    Thanks!
> 
>    Cheers
>    --------------------------
>    Alessandro Benedetti
>    Apache Lucene/Solr PMC member and Committer
>    Director, R&D Software Engineer, Search Consultant
> 
>    www.sease.io
> 
> 
>>    On Thu, 17 Mar 2022 at 17:03, Bayer, Samuel <s...@mitre.org> wrote:
>> 
>> You are, indeed :-).
>> 
>> What appears to be the problem - and I'm not sure yet, but it sure seems
>> like a good culprit - is that Postgres search, for reasons that mystify me,
>> was implemented with TF but no notion of IDF. There are various extensions
>> that add IDF-like properties to Postgres search. Why it didn't start out
>> that way is a mystery to me, and I don't know how stable any of the
>> extensions that do this actually are.
>> 
>> At the moment, that's my diagnosis of the discrepancy. I'll probably
>> follow up with the Postgres folks to see if they have any more insight into
>> those extensions.
>> 
>> Thanks to all who responded.
>> 
>> Cordially,
>> Sam Bayer
>> The MITRE Corporation
>> 
>>> On 3/17/22 12:42 PM, Eric Pugh wrote:
>>> What I’ve done to compare other search engines with RRE and Quepid is to
>> put a proxy in the middle that converts your query into what looks like a
>> Solr request/response ;-).  This works great for custom Search API’s, and I
>> *guess* you could do it with database backed search?
>>> 
>>> Now we are probably getting beyond what Sam was hoping to do!
>>> 
>>> 
>>> 
>>> 
>>>> On Mar 17, 2022, at 11:56 AM, Alessandro Benedetti <
>> a.benede...@sease.io> wrote:
>>>> 
>>>> This is an interesting question.
>>>> I second both comments so far (from Eric and David), but I am afraid at
>> the
>>>> moment the open-source tools for search quality evaluation can't really
>>>> compare Postgres to Solr.
>>>> As far as I know, both Quepid(Eric correct me if I am wrong) and RRE(
>>>> https://github.com/SeaseLtd/rated-ranking-evaluator and also the
>> Enterprise
>>>> version) are able to compare only Apache Solr and Elasticsearch backed
>>>> systems (against each other, or against different configurations).
>>>> 
>>>> In general, I would recommend following David's suggestions:
>>>> - collect your requirements(both functional and performance-wise)
>>>> - compare
>>>> 
>>>> I have seen in the past many times DB used as terrible search engines
>> and
>>>> search engines used as terrible DB.
>>>> Many times I have seen queries on a search engine to perform poorly
>> because
>>>> they were designed as they were DB queries.
>>>> 
>>>> Cheers
>>>> 
>>>> --------------------------
>>>> Alessandro Benedetti
>>>> Apache Lucene/Solr PMC member and Committer
>>>> Director, R&D Software Engineer, Search Consultant
>>>> 
>>>> www.sease.io
>>>> 
>>>> 
>>>> On Sat, 5 Mar 2022 at 05:04, David Smiley <dsmi...@apache.org> wrote:
>>>> 
>>>>> Hello Sam,
>>>>> 
>>>>> You are a familiar name from my MITRE days :-)
>>>>> 
>>>>> Check out Solr's feature list and see how it compares to that of
>> Postgres.
>>>>> If you are only doing the most basic default relevancy ranked top-N
>> search
>>>>> with default text analysis, then the tech/maintenance overhead might
>> not be
>>>>> worth it.  I'm looking at this as such an example:
>>>>> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=solr
>>>>> 
>>>>> On the other hand, if you want to ensure that you're able to make
>> search
>>>>> the best it can be for your users, then keeping Solr and using it more
>> will
>>>>> get you there; a database won't.  To a database, full-text-search is
>> just
>>>>> one checkbox of many concerns.  The capabilities there are usually very
>>>>> simple.  It's fine for a demo/POC -- getting started.
>>>>> 
>>>>> One feature in particular I want to call out is faceting.  To some
>> apps,
>>>>> it's a game changer that can pivot the UX from merely having a basic
>> search
>>>>> box to having navigation filters and everything else, at which point
>> Solr
>>>>> is the foundation of what's driving the UX.  I've seen people/apps miss
>>>>> this -- the user experience is so clumsy without it for rich/structured
>>>>> data in particular.  If you've ever used a Maven repository manager
>> like
>>>>> Nexus or it's competitors (last I checked), they are still stuck in the
>>>>> stone-age -- it's painful when you've been exposed to so much better.
>> On
>>>>> the backend, if all you know is a database, you may not see how to
>> make a
>>>>> faceting UI work because it's rather unnatural for SQL.
>>>>> 
>>>>> Eric's response was great too.
>>>>> 
>>>>> ~ David Smiley
>>>>> Apache Lucene/Solr Search Developer
>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>> 
>>>>> 
>>>>> On Fri, Mar 4, 2022 at 9:33 AM Bayer, Samuel <s...@mitre.org> wrote:
>>>>> 
>>>>>> Hi all -
>>>>>> 
>>>>>> In the interest of reducing my technology stack, I'm exploring whether
>>>>>> using Postgres full-text search instead of Solr might be an option
>> when I
>>>>>> need both complex querying and full-text search. In my experience, so
>>>>> far,
>>>>>> Postgres can't compare to Solr, but I'm trying to understand why, in
>>>>> order
>>>>>> to have more of an ability to evaluate the functionality/complexity
>>>>>> tradeoffs. I know something about search technologies, but I'm not an
>>>>>> expert by any stretch of the imagination, and I've been looking for
>>>>> sources
>>>>>> that talk about the comparison in an informed way - people, blogs,
>>>>>> articles. So far, everything I've found is extremely basic. Does
>> anyone
>>>>>> have any pointers for me?
>>>>>> 
>>>>>> Thanks in advance -
>>>>>> Sam Bayer
>>>>>> The MITRE Corporation
>>>>>> s...@mitre.org
>>>>>> 
>>>>> 
>>> 
>>> _______________________
>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
>> http://www.opensourceconnections.com <
>> http://www.opensourceconnections.com/> | My Free/Busy <
>> http://tinyurl.com/eric-cal>
>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
>> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>> 
>>> This e-mail and all contents, including attachments, is considered to be
>> Company Confidential unless explicitly stated otherwise, regardless of
>> whether attachments are marked as such.
>>> 
>>> 
>> 
>

Re: [EXTERNAL] Re: [EXT] Re: Looking for expertise on comparing Solr search to Postgres full-text search

Reply via email to