Re: how do i improve Indexing and Searching performance of 2 billion documents over SolrCloud

2017-02-14 Thread Duke DAI
SSD or in-memory index Best regards, Duke If not now, when? If not me, who? On Wed, Feb 15, 2017 at 12:32 AM, Adrien Grand wrote: > This list is for users of the Lucene Java API, maybe try solr-user instead? > > Le lun. 13 févr. 2017 à 21:24, yeshwanth kumar a > écrit : > > > Hi, we have 4 sol

Numeric Ranges Faceting

2017-02-14 Thread Chitra R
Hi, We have planned to implement both string and numeric faceting using docvalues field. For string faceting, we have added pathtraversed dimensions in drilldownquery. But for numeric faceting , how and where can we add pathtraversed ranges during nextlevel faceted search.? And which is the

Re: how do i improve Indexing and Searching performance of 2 billion documents over SolrCloud

2017-02-14 Thread Adrien Grand
This list is for users of the Lucene Java API, maybe try solr-user instead? Le lun. 13 févr. 2017 à 21:24, yeshwanth kumar a écrit : > Hi, we have 4 solr instances running > > we are using solr cloud for indexing hbase table column names. > each column in hbase will end up as a document in solr,

Re: SynonymFilterFactory deprecated since 6.4.0

2017-02-14 Thread Michael McCandless
Here's the new blog post I mentioned earlier in the thread, trying to explain the recent changes to make multi-token synonyms work ... it just went out today: https://www.elastic.co/blog/multitoken-synonyms-and-graph-queries-in-elasticsearch Mike McCandless http://blog.mikemccandless.com On Tue

Re: Building an automaton efficiently (CompiledAutomaton vs RunAutomaton vs Automaton)

2017-02-14 Thread Michael McCandless
Wow, 2G heap, that's horrible! How much heap does the automaton itself take? You can use the automaton's step method to transition from a state given the next input character to another state (or -1 if that state doesn't accept that character); it will be slower than the 2 GB run automaton, but p

Re: Building an automaton efficiently (CompiledAutomaton vs RunAutomaton vs Automaton)

2017-02-14 Thread Oliver Mannion
Thanks Mike for getting back to me, sounds like I'm on the right track. I'm building the automaton from around 1.7million strings, and it ends up with about 3.8million states and it turns out building a CharacterRunAutomaton from that takes up about 2gig of heap (I was quite suprised!), with negli

Re: SynonymFilterFactory deprecated since 6.4.0

2017-02-14 Thread Michael McCandless
Hi Bernd, Actually, pos (which is just the accumulation of PositionIncrementAttribute, starting with -1) is the *start* node. The end node is then pos + PositionLengthAttribute. As far as I know, ShingleFilter is not yet graph friendly: it does not set PositionLengthAttribute. But you could vis

Re: Proper Use of SynonymGraphFilter

2017-02-14 Thread Michael McCandless
That's right. And just be aware of the tradeoffs you're making so you make an informed decision. Mike McCandless http://blog.mikemccandless.com On Mon, Feb 13, 2017 at 6:19 PM, Corbin, J.D. wrote: > Hi Mike, > > Thanks for the response, > > Sounds like I was using it incorrectly by specifying