Re: date issues

2012-02-23 Thread findbestopensource
Yes. By storing as String, You should be able to do range search. I am not sure, which is better, storing as String / Integer. Regards Aditya www.findbestopensource.com On Thu, Feb 23, 2012 at 1:25 PM, Jason Toy wrote: > Can I still do range searches on a string? It seems like it would be m

Re: date issues

2012-02-23 Thread Danil Ε’ORIN
Ranges on String are painfully slow. Format them as MMDD and store as class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0" On Thu, Feb 23, 2012 at 10:19, findbestopensource wrote: > Yes. By storing as String, You should be able to do range search. I am not >

data extraction architecture

2012-02-23 Thread chris chisolm
I'm relatively new to this field and I have a problem that seems to be solvable in lots of different ways, and I'm looking for some recommendations on how to approach a data refining pipeline. I'm not sure where to look for this type of architecture description. My best finds so far have been som

Re: When deletes will be removed?

2012-02-23 Thread Ian Lea
Eventually, as more modifications take place and merges are triggered. If you really care, and are using the default TieredMergePolicy, you could try playing with TieredMergePolicy.setForceMergeDeletesPctAllowed(double v). Might help. Or you could call IndexWriter.forceMergeDeletes(), The javad

Re: Multiple index vs Single Index

2012-02-23 Thread Ian Lea
Millions of docs in a single index is definitely OK. If it was my system I'd willingly trade slightly slower indexing for simplicity and ease of use. If it works and is fast enough, job done. -- Ian. On Thu, Feb 23, 2012 at 7:31 AM, Ganesh wrote: > Hello all, > > This debate we might have ha

Re: Multiple index vs Single Index

2012-02-23 Thread Ganesh
Thanks. The reason i have gone for multiple index is, I have more updates for current date and deletes in the older date. Now i am planning to use Single index. I think i should use forceMergeDelete to merge the deletes. Do you optimize your index? How you handle millions of docs in the index

Re: Multiple index vs Single Index

2012-02-23 Thread Ian Lea
Well, you certainly can force a merge if you wish, I guess it's a balance between an expensive, disk intensive operation that may make other operations quicker. Your choice. I only have one set of multi-million doc indexes whose performance I care about and they are updated in bulk every night a

RE: Custom lucene scoring - Dot product between field boost and query boost

2012-02-23 Thread Yuval Kesten
One important thing - Since I am not using the indexed documents fields' norms, because the weight is the value of the field, I am now indexing the fields using: Field field = new Field(field_name, Float.toString(weight), Store.YES, Index.NOT_ANALYZED_NO_NORMS); And the memory usage is back to n

Re: date issues

2012-02-23 Thread Erick Erickson
1> Don't use sint, it's being deprecated. And it'll take up more space than a TrieDate 2> Precision. Sure, use the coarsest time you can, normalizing everything to day would be a good thing. You won't get any space savings by storing to day resolution, it's just a long under the covers. But depend

Custom scoring

2012-02-23 Thread Damerian
Hello, I am trying to implement my own Jaccard similarity for Lucene. So far i have the following code public class JaccardSimilarity extends DefaultSimilarity { int numberOfDocumentTerms; //String field="contents"; // Should the Jaccard similarity be only based in the contents field

Re: Custom scoring

2012-02-23 Thread Ahmet Arslan
> The problem is that coord() method is not used (or at least > so that i understand) neither in searching nor in indexing > What do i do wrong? If you want to see coord() values, use a multi-word query (two or more query terms) and go to last page of result set. --