Re: date issues

2012-02-22 Thread Jason Toy
Can I still do range searches on a string? It seems like it would be more efficient to store as an integer. > Hi, > > You could consider storing date field as String in "MMDD" format. This > will save space and it will perform better. > > Regards > Aditya > www.findbestopensource.com > > >

Multiple index vs Single Index

2012-02-22 Thread Ganesh
Hello all, This debate we might have had more frequently in the group. Yet one more time, i want to clarify. I was using multiple indexes (per week one index) with previous versions of Lucene (2.4 - 3.0.3). The performance was really good for incremental indexing. I used to optimize once per d

When deletes will be removed?

2012-02-22 Thread Ganesh
Hello all, I am using v3.5 with all default options. In my index the deletes are not removed. When will it be removed? I have not done optimize (forced merge). 1618714 Feb 22 20:42 _11y_l.del 499 Feb 22 20:42 _195_k.del 591 Feb 22 20:42 _1hs_l.del 556 Feb 22 20:42 _1pl_l.del

Re: date issues

2012-02-22 Thread findbestopensource
Hi, You could consider storing date field as String in "MMDD" format. This will save space and it will perform better. Regards Aditya www.findbestopensource.com On Thu, Feb 23, 2012 at 11:55 AM, Jason Toy wrote: > I have a solr instance with about 400m docs. For text searches it is > per

date issues

2012-02-22 Thread Jason Toy
I have a solr instance with about 400m docs. For text searches it is perfectly fine. When I do searches that calculate the amount of times a word appeared in the doc set for every day of a month, it usually causes solr to crash with out of memory errors. I calculate this by running ~30 queri

Re: TaxonomySearch & similar words?

2012-02-22 Thread Cheng
Thank you. The alternative sounds reasonable. On Thu, Feb 23, 2012 at 12:54 PM, Shai Erera wrote: > Hi Cheng, > > You will need to use the exact path labels in order to get to the category > 'Mark Twain', unless you index multiple paths from start, e.g.: > /author/American/Mark Twain > /writer/A

Re: TaxonomySearch & similar words?

2012-02-22 Thread Shai Erera
Hi Cheng, You will need to use the exact path labels in order to get to the category 'Mark Twain', unless you index multiple paths from start, e.g.: /author/American/Mark Twain /writer/American/Mart Twain The taxonomy index does not process the CategoryPath labels in anyway to e.g. produce synony

TaxonomySearch & similar words?

2012-02-22 Thread Cheng
Hi, I am using Taxonomy Search to build a facet comprising things such as “/author/American/Mark Twain”. Since the word "author" has a synonym of "writer", can I use "writer" instead of "author" to get the path? Currently I can only use exactly the word "author" to do it. Thanks

Impact of max merged segment setting

2012-02-22 Thread Vitaly Funstein
Hello, I am currently experimenting with tuning of max merged segment MB parameter on TieredMergePolicy in Lucene 3.5, and seeing significant gains in index writing speed from values dramatically lower than the default (5 Gb). For instance, when setting it to 5 or 10 MB, I can see my writing tests

Fwd: How to combine StandardAnalyzer with ReverseWildcardFilter

2012-02-22 Thread Michael Bell
>>> Michael Bell 2/21/2012 12:18 PM >>> I've ported over the various pieces from SOLR 3.5 (SolrQueryParser, ReverseWildcardFilter, ReverseWildcardFactory). But I do not understand how to apply this to indexing. Here's the situation. Some fields will need StandardAnalyzer, some need KeyWord,

RE: Custom lucene scoring - Dot product between field boost and query boost

2012-02-22 Thread Yuval Kesten
Hi all, Inspired by another thread here (Question about CustomScoreQuery) I am using this solution which is working really well (with one drawback): I discovered that some of my problems were due to the fact that my assumption was wrong: I did have many fields/queries terms with the same field ID

Re: Custom lucene scoring - Dot product between field boost and query boost

2012-02-22 Thread Alan Woodward
Hi Yuval, You can just override Similarity, rather than DefaultSimilarity - that way you don't burn any CPU cycles on TF/IDF calculations. Alan On 22 Feb 2012, at 07:17, Yuval Kesten wrote: > Hi Em, > 1. Regarding the performances - the similarity class (And my subtype as well) > gets the IDF

Re: Custom lucene scoring - Dot product between field boost and query boost

2012-02-22 Thread Em
Hi Yuval, > 1. Regarding the performances - the similarity class (And my subtype as well) gets the IDF and TF and SQUARED SUMS calculations as inputs - they just factor them differently. Even though I ignore the values they are being computed. Good point. However I think that these values are rel