Replacing Apache Solr with Postgre Full Text Search?

2020-03-25 Thread J2eeInside J2eeInside
Hi all,

I hope someone  can help/suggest:
I'm currently maintaining a project that uses Apache Solr /Lucene. To be
honest, I wold like to replace Solr with Postgre Full Text Search. However,
there is a huge amount of documents involved - arround 200GB. Wondering,
can Postgre handle this efficiently?
Does anyone have specific experience, and what should the infrastructure
look like?

P.S. Not to be confused, the Sol works just fine, i just wanted to
eliminate one component from the whole system (if Full text search can
replace Solr at all)


Re: Replacing Apache Solr with Postgre Full Text Search?

2020-03-26 Thread J2eeInside J2eeInside
Hi Mike, and thanks for valuable answer!
In short, you think a PG Full Text Search can do the same as Apache Solr?

P.S. I need to index .pdf, .html and MS Word .doc/.docx files, is there any
constraints in Ful Text search regarding those file types?


On Wed, Mar 25, 2020 at 3:36 PM Mike Rylander  wrote:

> On Wed, Mar 25, 2020 at 8:37 AM J2eeInside J2eeInside
>  wrote:
> >
> > Hi all,
> >
> > I hope someone  can help/suggest:
> > I'm currently maintaining a project that uses Apache Solr /Lucene. To be
> honest, I wold like to replace Solr with Postgre Full Text Search. However,
> there is a huge amount of documents involved - arround 200GB. Wondering,
> can Postgre handle this efficiently?
> > Does anyone have specific experience, and what should the infrastructure
> look like?
> >
> > P.S. Not to be confused, the Sol works just fine, i just wanted to
> eliminate one component from the whole system (if Full text search can
> replace Solr at all)
>
> I'm one of the core developers (and the primary developer of the
> search subsystem) for the Evergreen ILS [1] (integrated library system
> -- think book library, not software library).  We've been using PGs
> full-text indexing infrastructure since day one, and I can say it is
> definitely capable of handling pretty much anything you can throw at
> it.
>
> Our indexing requirements are very complex and need to be very
> configurable, and need to include a lot more than just "search and
> rank a text column," so we've had to build a ton of infrastructure
> around record (document) ingest, searching/filtering, linking, and
> display.  If your indexing and search requirements are stable,
> specific, and well-understood it should be straight forward,
> especially if you don't have to take into account non-document
> attributes like physical location, availability, and arbitrary
> real-time visibility rules like Evergreen does.
>
> As for scale, it's more about document count than total size.  There
> are Evergreen libraries with several million records to search, and
> with proper hardware and tuning everything works well.  Our main
> performance issue has to do with all of the stuff outside the records
> (documents) themselves that have to be taken into account during
> search.  The core full-text search part of our queries is extremely
> performant, and has only gotten better over the years.
>
> [1] http://evergreen-ils.org
>
> HTH,
> --
> Mike Rylander
>  | Executive Director
>  | Equinox Open Library Initiative
>  | phone:  1-877-OPEN-ILS (673-6457)
>  | email:  mi...@equinoxinitiative.org
>  | web:  http://equinoxinitiative.org
>


Re: Replacing Apache Solr with Postgre Full Text Search?

2020-03-26 Thread J2eeInside J2eeInside
Thanks again.
For the end, the finally question:

On Thu, Mar 26, 2020 at 4:18 PM Mike Rylander  wrote:

> On Thu, Mar 26, 2020 at 4:03 AM J2eeInside J2eeInside
>  wrote:
> >
>
> > P.S. I need to index .pdf, .html and MS Word .doc/.docx files, is there
> any constraints in Ful Text search regarding those file types?
> >
>
> It can't handle those without some help -- it supports exactly text --
> but you can extract the text using other tools.
>
>
- Can you recommend those tools you mention above/any useful resource on
how to do that?


Re: Replacing Apache Solr with Postgre Full Text Search?

2020-03-26 Thread J2eeInside J2eeInside
You are wellcome Andreas, and thanks for useful answer ;-)

On Thu, Mar 26, 2020 at 4:33 PM Andreas Joseph Krogh 
wrote:

> På onsdag 25. mars 2020 kl. 13:36:38, skrev J2eeInside J2eeInside <
> j2eeins...@gmail.com>:
>
> Hi all,
>
> I hope someone  can help/suggest:
> I'm currently maintaining a project that uses Apache Solr /Lucene. To be
> honest, I wold like to replace Solr with Postgre Full Text Search. However,
> there is a huge amount of documents involved - arround 200GB. Wondering,
> can Postgre handle this efficiently?
> Does anyone have specific experience, and what should the infrastructure
> look like?
>
> P.S. Not to be confused, the Sol works just fine, i just wanted to
> eliminate one component from the whole system (if Full text search can
> replace Solr at all)
>
>
> I see you've gotten some answers but wanted to chime in...
> We seach in ~15mill. emails and ~10 mill documents (extracted text from
> Word/PDF etc. using Java-tools), and use PG and FTS (gin, not rum) for the
> exact same reasons as Evergreen (it seems). We have to mix FTS with
> domain-specific logic/filtering and that is based on relational data in the
> database. I don't see how we could have done that using an external
> search-engine. Maybe it's easy, I don't have any experience with it.
>
> --
> *Andreas Joseph Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andr...@visena.com
> www.visena.com
> <https://www.visena.com>
>
>