Hi Mike, and thanks for valuable answer! In short, you think a PG Full Text Search can do the same as Apache Solr?
P.S. I need to index .pdf, .html and MS Word .doc/.docx files, is there any constraints in Ful Text search regarding those file types? On Wed, Mar 25, 2020 at 3:36 PM Mike Rylander <mrylan...@gmail.com> wrote: > On Wed, Mar 25, 2020 at 8:37 AM J2eeInside J2eeInside > <j2eeins...@gmail.com> wrote: > > > > Hi all, > > > > I hope someone can help/suggest: > > I'm currently maintaining a project that uses Apache Solr /Lucene. To be > honest, I wold like to replace Solr with Postgre Full Text Search. However, > there is a huge amount of documents involved - arround 200GB. Wondering, > can Postgre handle this efficiently? > > Does anyone have specific experience, and what should the infrastructure > look like? > > > > P.S. Not to be confused, the Sol works just fine, i just wanted to > eliminate one component from the whole system (if Full text search can > replace Solr at all) > > I'm one of the core developers (and the primary developer of the > search subsystem) for the Evergreen ILS [1] (integrated library system > -- think book library, not software library). We've been using PGs > full-text indexing infrastructure since day one, and I can say it is > definitely capable of handling pretty much anything you can throw at > it. > > Our indexing requirements are very complex and need to be very > configurable, and need to include a lot more than just "search and > rank a text column," so we've had to build a ton of infrastructure > around record (document) ingest, searching/filtering, linking, and > display. If your indexing and search requirements are stable, > specific, and well-understood it should be straight forward, > especially if you don't have to take into account non-document > attributes like physical location, availability, and arbitrary > real-time visibility rules like Evergreen does. > > As for scale, it's more about document count than total size. There > are Evergreen libraries with several million records to search, and > with proper hardware and tuning everything works well. Our main > performance issue has to do with all of the stuff outside the records > (documents) themselves that have to be taken into account during > search. The core full-text search part of our queries is extremely > performant, and has only gotten better over the years. > > [1] http://evergreen-ils.org > > HTH, > -- > Mike Rylander > | Executive Director > | Equinox Open Library Initiative > | phone: 1-877-OPEN-ILS (673-6457) > | email: mi...@equinoxinitiative.org > | web: http://equinoxinitiative.org >