Replacing Apache Solr with Postgre Full Text Search?
Hi all, I hope someone can help/suggest: I'm currently maintaining a project that uses Apache Solr /Lucene. To be honest, I wold like to replace Solr with Postgre Full Text Search. However, there is a huge amount of documents involved - arround 200GB. Wondering, can Postgre handle this efficiently? Does anyone have specific experience, and what should the infrastructure look like? P.S. Not to be confused, the Sol works just fine, i just wanted to eliminate one component from the whole system (if Full text search can replace Solr at all)
Re: Replacing Apache Solr with Postgre Full Text Search?
Hi Mike, and thanks for valuable answer! In short, you think a PG Full Text Search can do the same as Apache Solr? P.S. I need to index .pdf, .html and MS Word .doc/.docx files, is there any constraints in Ful Text search regarding those file types? On Wed, Mar 25, 2020 at 3:36 PM Mike Rylander wrote: > On Wed, Mar 25, 2020 at 8:37 AM J2eeInside J2eeInside > wrote: > > > > Hi all, > > > > I hope someone can help/suggest: > > I'm currently maintaining a project that uses Apache Solr /Lucene. To be > honest, I wold like to replace Solr with Postgre Full Text Search. However, > there is a huge amount of documents involved - arround 200GB. Wondering, > can Postgre handle this efficiently? > > Does anyone have specific experience, and what should the infrastructure > look like? > > > > P.S. Not to be confused, the Sol works just fine, i just wanted to > eliminate one component from the whole system (if Full text search can > replace Solr at all) > > I'm one of the core developers (and the primary developer of the > search subsystem) for the Evergreen ILS [1] (integrated library system > -- think book library, not software library). We've been using PGs > full-text indexing infrastructure since day one, and I can say it is > definitely capable of handling pretty much anything you can throw at > it. > > Our indexing requirements are very complex and need to be very > configurable, and need to include a lot more than just "search and > rank a text column," so we've had to build a ton of infrastructure > around record (document) ingest, searching/filtering, linking, and > display. If your indexing and search requirements are stable, > specific, and well-understood it should be straight forward, > especially if you don't have to take into account non-document > attributes like physical location, availability, and arbitrary > real-time visibility rules like Evergreen does. > > As for scale, it's more about document count than total size. There > are Evergreen libraries with several million records to search, and > with proper hardware and tuning everything works well. Our main > performance issue has to do with all of the stuff outside the records > (documents) themselves that have to be taken into account during > search. The core full-text search part of our queries is extremely > performant, and has only gotten better over the years. > > [1] http://evergreen-ils.org > > HTH, > -- > Mike Rylander > | Executive Director > | Equinox Open Library Initiative > | phone: 1-877-OPEN-ILS (673-6457) > | email: mi...@equinoxinitiative.org > | web: http://equinoxinitiative.org >
Re: Replacing Apache Solr with Postgre Full Text Search?
Thanks again. For the end, the finally question: On Thu, Mar 26, 2020 at 4:18 PM Mike Rylander wrote: > On Thu, Mar 26, 2020 at 4:03 AM J2eeInside J2eeInside > wrote: > > > > > P.S. I need to index .pdf, .html and MS Word .doc/.docx files, is there > any constraints in Ful Text search regarding those file types? > > > > It can't handle those without some help -- it supports exactly text -- > but you can extract the text using other tools. > > - Can you recommend those tools you mention above/any useful resource on how to do that?
Re: Replacing Apache Solr with Postgre Full Text Search?
You are wellcome Andreas, and thanks for useful answer ;-) On Thu, Mar 26, 2020 at 4:33 PM Andreas Joseph Krogh wrote: > På onsdag 25. mars 2020 kl. 13:36:38, skrev J2eeInside J2eeInside < > j2eeins...@gmail.com>: > > Hi all, > > I hope someone can help/suggest: > I'm currently maintaining a project that uses Apache Solr /Lucene. To be > honest, I wold like to replace Solr with Postgre Full Text Search. However, > there is a huge amount of documents involved - arround 200GB. Wondering, > can Postgre handle this efficiently? > Does anyone have specific experience, and what should the infrastructure > look like? > > P.S. Not to be confused, the Sol works just fine, i just wanted to > eliminate one component from the whole system (if Full text search can > replace Solr at all) > > > I see you've gotten some answers but wanted to chime in... > We seach in ~15mill. emails and ~10 mill documents (extracted text from > Word/PDF etc. using Java-tools), and use PG and FTS (gin, not rum) for the > exact same reasons as Evergreen (it seems). We have to mix FTS with > domain-specific logic/filtering and that is based on relational data in the > database. I don't see how we could have done that using an external > search-engine. Maybe it's easy, I don't have any experience with it. > > -- > *Andreas Joseph Krogh* > CTO / Partner - Visena AS > Mobile: +47 909 56 963 > andr...@visena.com > www.visena.com > <https://www.visena.com> > >