Poll: how to report # of docs in index over time

2012-02-13 Thread Otis Gospodnetic
Hello, Quick poll for those who have an opinion about what index size monitoring should report in terms of the number of documents in the index. Poll: http://blog.sematext.com/2012/02/13/poll-solr-index-size-monitoring/ For example, imagine that in some 5-minute time period (say 10:00 AM to 10:

Re: When to refresh writer?

2012-02-13 Thread Michael McCandless
IndexWriter doesn't require refreshing... just keep it open forever. It'll run it's own merges when needed (see the MergePolicy/Scheduler). Just call .commit() when you want changes to be durable (survive OS/JVM crash, power loss, etc.). Mike McCandless http://blog.mikemccandless.com On Mon, Fe

Re: query performance with leading *

2012-02-13 Thread G.Long
Thank you for the tips, Is there an analyzer which uses this tokenizer? If not, do you know any tutorial which explain how to implement a custom analyzer? I didn't find any. Regards. Le 13/02/2012 17:46, Robert Muir a écrit : I think you can solve this with the tokenizers in the org.apache.

RE: Paid Job: Looking for a developer to create a small java application to extract url's from .fdt files

2012-02-13 Thread Uwe Schindler
Hi, > as for Trunk 4.x, I can't find the isDeleted(int) method. any one could tell me > why this method is removed? See MIGRATE.txt... Hint: AtomicReader.getLiveDocs() Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene

Re: query performance with leading *

2012-02-13 Thread Robert Muir
I think you can solve this with the tokenizers in the org.apache.lucene.analysis.path package (in lucene-analyzers.jar) In your case, looks like ReversePathHierarchyTokenizer might be what you want, though you will need to upgrade to at least 3.2 to get it. On Mon, Feb 13, 2012 at 11:38 AM, G.Lon

RE: query performance with leading *

2012-02-13 Thread Austin, Carl
You could possibly tokenize the value both forwards and in reverse, for example: 123456 and 654321 You can then convert a query for *56 to 65* and this will increase performance. -Original Message- From: G.Long [mailto:jde...@gmail.com] Sent: 13 February 2012 16:39 To: java-user@lucene.

query performance with leading *

2012-02-13 Thread G.Long
Hi, Is there a way to improve query performance when using a leading * as a wildcard on a path property? I have hundreds of queries to run on a lucene index (~250mo). Executing those queries without the leading * is about 5x faster than with the leading *. My problem is that I sometimes need

Re: Paid Job: Looking for a developer to create a small java application to extract url's from .fdt files

2012-02-13 Thread Li Li
for 2.x and 3.x you can simply use this codes: Directory dir=FSDirectory.open(new File("./testindex")); IndexReader reader=IndexReader.open(dir); List urls=new ArrayList(reader.numDocs()); for(int i=0;i wrote: > Hi there, > > I am currently working on a search engine based on lucen

Re: Paid Job: Looking for a developer to create a small java application to extract url's from .fdt files

2012-02-13 Thread Shashi Kant
You might want to post this on sites such as odesk.com, rentacoder.com, guru.com, freelancer.com On Mon, Feb 13, 2012 at 9:31 AM, SearchTech wrote: > am currently working on a search engine based on lucene and have some > issues because java is not my regular programming language, which ma

Re: Overriding SloppySimScorer

2012-02-13 Thread Alan Woodward
On 13 Feb 2012, at 12:16, Robert Muir wrote: > On Mon, Feb 13, 2012 at 6:39 AM, Alan Woodward > wrote: >> Hello, >> >> (I'm not interested in Tf or Idf here) >> I've already extended DefaultSimilarity > > In this case, then extending DefaultSimilarity/TFIDFSimilarity is not > the best approach

Re: Overriding SloppySimScorer

2012-02-13 Thread Robert Muir
On Mon, Feb 13, 2012 at 6:39 AM, Alan Woodward wrote: > Hello, > > (I'm not interested in Tf or Idf here) > I've already extended DefaultSimilarity In this case, then extending DefaultSimilarity/TFIDFSimilarity is not the best approach. > Or should the SimScorer methods on TDIDFSimilarity be unf

Overriding SloppySimScorer

2012-02-13 Thread Alan Woodward
Hello, I want to score span queries based on the simple presence or absence of a hit (I'm not interested in Tf or Idf here), with a possible boost on specific spans. I've already extended DefaultSimilarity to deal with single terms. From looking at the code it seems that I want to override T