Hi Mike, and thanks for valuable answer!
In short, you think a PG Full Text Search can do the same as Apache Solr?

P.S. I need to index .pdf, .html and MS Word .doc/.docx files, is there any
constraints in Ful Text search regarding those file types?


On Wed, Mar 25, 2020 at 3:36 PM Mike Rylander <mrylan...@gmail.com> wrote:

> On Wed, Mar 25, 2020 at 8:37 AM J2eeInside J2eeInside
> <j2eeins...@gmail.com> wrote:
> >
> > Hi all,
> >
> > I hope someone  can help/suggest:
> > I'm currently maintaining a project that uses Apache Solr /Lucene. To be
> honest, I wold like to replace Solr with Postgre Full Text Search. However,
> there is a huge amount of documents involved - arround 200GB. Wondering,
> can Postgre handle this efficiently?
> > Does anyone have specific experience, and what should the infrastructure
> look like?
> >
> > P.S. Not to be confused, the Sol works just fine, i just wanted to
> eliminate one component from the whole system (if Full text search can
> replace Solr at all)
>
> I'm one of the core developers (and the primary developer of the
> search subsystem) for the Evergreen ILS [1] (integrated library system
> -- think book library, not software library).  We've been using PGs
> full-text indexing infrastructure since day one, and I can say it is
> definitely capable of handling pretty much anything you can throw at
> it.
>
> Our indexing requirements are very complex and need to be very
> configurable, and need to include a lot more than just "search and
> rank a text column," so we've had to build a ton of infrastructure
> around record (document) ingest, searching/filtering, linking, and
> display.  If your indexing and search requirements are stable,
> specific, and well-understood it should be straight forward,
> especially if you don't have to take into account non-document
> attributes like physical location, availability, and arbitrary
> real-time visibility rules like Evergreen does.
>
> As for scale, it's more about document count than total size.  There
> are Evergreen libraries with several million records to search, and
> with proper hardware and tuning everything works well.  Our main
> performance issue has to do with all of the stuff outside the records
> (documents) themselves that have to be taken into account during
> search.  The core full-text search part of our queries is extremely
> performant, and has only gotten better over the years.
>
> [1] http://evergreen-ils.org
>
> HTH,
> --
> Mike Rylander
>  | Executive Director
>  | Equinox Open Library Initiative
>  | phone:  1-877-OPEN-ILS (673-6457)
>  | email:  mi...@equinoxinitiative.org
>  | web:  http://equinoxinitiative.org
>

Reply via email to