Thanks a lot Shawn for the very detailed reply, very informative and much appreciated!! I will check the link for performance problems.
Regarding executing models (question number 4), let me explain this a bit better: Can SOLr run custom tensorflow/pytorch models? This is not a feature in lucene, it is something on top of it. Thanks!! On Fri, Aug 13, 2021 at 2:44 PM Shawn Heisey <apa...@elyograg.org> wrote: > On 8/13/2021 2:25 AM, Albert Dfm wrote: > > We got to know about SOLR, and we are very excited about it to replace > our > > current elasticsearch infra.Currently, our main issue is regarding data > and > > model size running on each machine. > > > > *Our setup:* > > 1. We use the following search arch: 1st tier, the fast search (low > > response time) with most likely data to be retrieved, > > 2. 2nd tier with the rest (including on-disk data) > > > > We saw the all features (solr wabpage) provided by SOLr, and we would > like > > to ask about them, more specifically we would like to know: > > 1. Can we do text search and vector similarity? > > 2. Can we filter by metadata? > > 3. How about index/memory consumption? 1st tier needs around 4000M > > embeddings vector (128 fp32) + metadata stored in memory > > 4. Can we execute models in the DB itself? (not outside SOLr). We have > > per-user models, and we need a way of executing TensorFlow models on the > > database to prevent moving data outside of the DB > > 5. Subsecond queries > > 6. Real-time indexing (or near real-time) of new data > > 7. Easily scalable > > > As Solr and ES both use Lucene for the vast majority of their > functionality, they have nearly identical overall capabilities. If ES > can do it, Solr most likely can too. If the configs are nearly the > same, Solr and ES will have similar performance. > > Number 3: The bottom line here is that we do not know, and we can't > know. Any guess made by us about Solr or the ES team about ES would be > just that -- a guess. What works for one user with an index of a > particular size might be way too low or way too high for another user > with a similar size index. When we guess, we're always going to err on > the side of caution -- recommend significantly more resources than what > might actually be required, so we know there will be enough. And we > generally need a lot of information that you might not have yet in order > to make a guess. If it works in ES with X amount of resources, it will > probably also work in Solr with those resources too -- assuming that the > configs are substantially similar. In example configs, Solr tends to > have a lot more features enabled than ES does, which is one reason that > ES can claim that they perform better "out of the box". When the > configs are actually similar, performance tends to be similar. > > > https://lucidworks.com/post/solr-sizing-guide-estimating-solr-sizing-hardware/ > > First 1 and 2: You could set up different indexes for this purpose. > Solr doesn't provide a way to automatically move older data from one > index to another. You would have to do that in your indexing software. > For time-series data (think logs or similar), SolrCloud has the "Time > Routed Aliases" feature -- it creates a new collection for the most > recent data, and then later another new collection will be created. I > have never used the feature, though I do understand the concept. > > 1: Text search, definitely. Vector similarity, probably ... but because > I do not know what this is, I do not want to say the answer is > definitely yes. Solr provides a way to utilize Lucene TermVectors. > 2: Generally, yes. How you set up the schema and the nature of the data > will determine exactly what you can do with filters. This would be the > case for ES too. > 3: See above. > 4: I have no idea what you mean by this. But as I have said before, if > ES can do it, Solr probably can too. > 5: If you have enough resources, particularly memory, Solr performs > great. If the index is REALLY big, it might be difficult to arrange to > have enough unallocated memory for the OS to reliably cache the index. > Neither Solr nor ES do that caching themselves, they rely on the OS to > handle it. > 6: Faster indexing generally means taking a hit on query performance > whenever you update the index and commit changes. This would be the case > for ES too. > 7: This is such a vague question that I cannot answer it without knowing > EXACTLY what you mean. > > Additional reading (disclaimer: I wrote this wiki page): > > https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems > > Thanks, > Shawn > >