RE: LogMergePolicy

2008-01-24 Thread Steven Parkes
I'm curious, why is LogMergePolicy named *Log*MergePolicy? (Why not ExpMergePolicy? :-) Well, I guess it's a matter of perspective. When you look at the way the algorithm works, the merge decisions are based on a concept of level and levels are assigned based on the log of the numb

RE: Indexing Wikipedia dumps

2007-12-12 Thread Steven Parkes
Probably want a combination of extractWikipedia.alg and wikipedia.alg? You want the EnwikiDocMaker from extractWikipedia.alg which reads the uncompressed xml file but rather than using WriteLineDoc, you want to go ahead and index as wikipedia.alg does. (Ditch the query part.) You'll need an accep

RE: IndexReader deletes more that expected

2007-08-01 Thread Steven Parkes
If I'm reading this correctly, there's something a little wonky here. In your example code, you close the IndexWriter and then, without creating a new IndexWriter, you call addDocument again. This shouldn't be possible (what version of Lucene are you using?) Assuming for the time being that you ar

RE: drawback addindexes method

2007-05-03 Thread Steven Parkes
See IndexWriter#addIndexesNoOptimize, released with 2.1. Note that it doesn't optimize before or after, so if you want an optimize at the end, you need to ask for it manually. -Original Message- From: Chandan Tamrakar [mailto:[EMAIL PROTECTED] Sent: Thursday, May 03, 2007 12:46 AM To: jav

RE: Merge performance

2007-04-18 Thread Steven Parkes
Yup, 845 is relevant, as is 847. I haven't had time to digest all that David wrote yet, but I'm starting. It's particularly relevant because before I get to the point of making 847 committable, I need a way of testing merge performance (the factoring in 847 proposes to simplify the API slightly, so

RE: Standard Parser Behavior

2007-04-09 Thread Steven Parkes
Lucene doesn't use a pure Boolean algebra, so things don't always do what one might expect and things like De Morgan's law don't hold. The source of this comes from the combination of IR prefix notation (+/-) with standard Boolean AND/OR. If you look at the source, there a number of rules that di

RE: Fast index traversal and update for stored field?

2007-03-19 Thread Steven Parkes
You'll have a difficult time updating Lucene indexes in place. A lot of coordination exists within Lucene specifically not to do this: it's the fact that Lucene does not do this that enables a lot of the lockless parallelism in Lucene. This applies equally to the data store and the inverted index p

RE: Wildcard searches with * or ? as the first character

2007-03-13 Thread Steven Parkes
It's possible to do leading wildcard searches in Lucene as of 2.1. See http://wiki.apache.org/lucene-java/LuceneFAQ#head-4d62118417eaef0dcb87f4370583f809848ea695 (http://tinyurl.com/366suf) -Original Message- From: Oystein Reigem [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 13, 2007 11

RE: Soliciting Design Thoughts on Date Searching

2007-03-05 Thread Steven Parkes
But, letting it stay in the text stream and not putting it in a separate date field would give you some trouble with ranges because things that weren't dates could mess you up. This is why Chris suggested putting a prefix on the token. For example, leading underscor

RE: Soliciting Design Thoughts on Date Searching

2007-03-01 Thread Steven Parkes
If all you want to do is find docs containing dates within a range, it probably doesn't make much difference whether you give dates their own field or put them into your content field. It'll probably be easier to just add them into the token stream since that's the way the analyzer architecture wan

RE: Soliciting Design Thoughts on Date Searching

2007-02-28 Thread Steven Parkes
Yeah, date finding is a little like entity extraction, since dates can have many formats, depending on how crazy you want to get ("a week from tomorrow" is 3/8/2007 if you know that this e-mail was written today). So much so that I went and looked up lingpipe, but they seem to not be concerned with

RE: ranking/scoring algorithm in details

2007-02-28 Thread Steven Parkes
http://lucene.apache.org/java/docs/scoring.html (which you can also find by googling "lucene scoring") -Original Message- From: Jong Kim [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 28, 2007 2:21 PM To: java-user@lucene.apache.org Subject: ranking/scoring algorithm in details Hi,

RE: document field updates

2007-02-28 Thread Steven Parkes
Are unindexed fields stored seperately from the main inverted index? If so then, one could implement the field value change as a delete and re-add of just that value? The short answer is that won't work. Field values are stored in a different data structure than the posting

RE: document field updates

2007-02-27 Thread Steven Parkes
age- From: Neal Richter [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 27, 2007 11:52 AM To: java-user@lucene.apache.org Subject: RE: document field updates Steven Parkes wrote: >There are no plans to do this. It's essentially impossible, given (1) >the reverse nature of te

RE: document field updates

2007-02-27 Thread Steven Parkes
There are no plans to do this. It's essentially impossible, given (1) the reverse nature of text indexes and (2) Lucene's write-once segment architecture. -Original Message- From: Arnone, Anthony [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 27, 2007 10:18 AM To: java-user@lucene.apac

RE: Lucene 1.4.3 : IndexWriter.addDocument(doc) fails when run on OS requiring permissions

2007-02-26 Thread Steven Parkes
The easiest way to pin this down is to get the backtrace from the exception, e.g., e.printStackTrace(). That would tell a lot. That said, prior to 2.1, lucene would put lock files outside the index directory. I don't know if that's what you're hitting, though, because I think the writer should hav

RE: Too many open files?!

2007-02-14 Thread Steven Parkes
See the wiki: http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-48921635adf2c968f79 36dc07d51dfb40d638b82 -Original Message- From: Michael Prichard [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 14, 2007 5:02 PM To: java-user@lucene.apache.org Subject: Too many open files?! I am

RE: Wildcard Search and "Note: You cannot use a * or ? symbol as the first character of a search"

2006-10-20 Thread Steven Parkes
You can go to Jira and get the patch and/or vote for it: https://issues.apache.org/jira/browse/LUCENE-489 [Not that this issue needs much voting, I just like the idea of of encouraging voting. Get Out the Vote (if that's TM'd, I take it back.)] -Original Message- From: Otis Gospodnetic

RE: BooleanQuery.TooManyClauses exception

2006-10-17 Thread Steven Parkes
e date/time values for the query? In my case I have done nothing special to index my dates. I just treat them as a string of numbers. -Original Message- From: Steven Parkes [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 17, 2006 12:13 PM To: java-user@lucene.apache.org Subject: RE: Bo

RE: BooleanQuery.TooManyClauses exception

2006-10-17 Thread Steven Parkes
Lucene takes your date range, enumerates all the unique date/time values in your corpus within that range, and then executes that query. So the number of terms in your query is going to be equal to the number of unique date/time values in the range. The most common way of handling this is to not i

RE: Lucene 2.0.1 release date

2006-10-17 Thread Steven Parkes
I think the idea is that 2.0.1 would be a patch-fix release from the branch created at 2.0 release. This release would incorporate only back-ported high-impact patches, where "high-impact" is defined by the community. Certainly security vulnerabilities would be included. As Otis said, to date, nobo

RE: Problem with Field.Text()

2006-10-05 Thread Steven Parkes
to:[EMAIL PROTECTED] > Sent: Thursday, October 05, 2006 2:53 PM > To: java-user@lucene.apache.org > Subject: Problem with Field.Text() > > I hope now I am in the right mailinglist. In the -dev mailinglist Steven > Parkes said, that I have to change this: > > > Fiel

RE: Problem with Field.Text()

2006-10-05 Thread Steven Parkes
hope now I am in the right mailinglist. In the -dev mailinglist Steven Parkes said, that I have to change this: > Field.Text(String, String); to > Field.Text(String, String, Field.Store.YES, Field.Index.TOKENIZED); But it seems that there isnt such a method declaration. Where is the m

RE: apachecon

2006-09-15 Thread Steven Parkes
I stopped procrastinating on this today. I signed up for a BOF slot at 8 on Thursday. Hopefully not against other stuff of interest. I've not done this before, but the BOF slots were filling. >From my perspective, it'd be great to have people from any of the subprojects. Plenty of cross fertiliz

apachecon

2006-08-22 Thread Steven Parkes
So it looks like there's only a little Lucene-oriented stuff on the program at ApacheCon 2006. The Solr talk looks interesting. I was wondering if there have been any other self/semi-organized things around Lucene in the past, like a BOF? --