Re: Luke for Lucene 2.x ?

2006-08-04 Thread Miles Barr
KEGan wrote: I have read that *Andrzej Bialecki *mentioned that he would release new version of Luke based on Lucene 2.0.0 soon. URL here ... http://www.mail-archive.com/java-user@lucene.apache.org/msg08612.html. Anyone has any idea if it has been released ? Andrzej, if you are reading this, co

Re: Leading wildcard query

2006-07-28 Thread Miles Barr
Pravin Shinde wrote: I am trying to use Leading wildcard query, but I am not able to do it. Any query with leading wildcard is failing with lexical error. query = parser.parse( "*hi" ) JavaError: org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 1. Encountered: "*"

Re: Timestamps as milliseconds

2006-07-27 Thread Miles Barr
Erick Erickson wrote: As Miles said, use the DateTools (lucene) class with a DAY resolution. That'll give you a MMDD format, which won't blow your query with a "TooManyClauses" exception... Remember that Lucene deals with strings, so you want to store things in easily-manipulated string

Re: Timestamps as milliseconds

2006-07-26 Thread Miles Barr
Michael J. Prichard wrote: I guess the more I think about it I don't really care about the minutes in the initial. All that matters is the date (i.e. 2006-07-25). The only thing I would need the time for would be for sorting so I need to have that too. Ideas? Store as much detail as you

Re: Timestamps as milliseconds

2006-07-26 Thread Miles Barr
Michael J. Prichard wrote: I am working on indexing emails and have stored the data as milliseconds. I was thinking of using a filter w/ my search that would only return the email in that data range. I am currently indexing as follows: doc.add(new Field("date", (String) itemContent.get("da

Re: Limit number of search results

2006-07-26 Thread Miles Barr
headhunter wrote: I guess the recommended way to implement paging of results is to do your own query-results caching, right? Or does lucene also do this for me? The other guys have covered caching of results in a general way, so I won't go into that. For a search application I've written I

Re: Limit number of search results

2006-07-25 Thread Miles Barr
headhunter wrote: I am looking for a way to limit the number of search results I retrieve when searching. I am only interested in (let's say) the first ten hits of a query.. maybe I want to look at hits ten..twenty to, but usually only the first results are important. Right now lucene search

Re: Grouping over multiple fields

2006-07-25 Thread Miles Barr
Krishnendra Nandi wrote: Can anybody help me out on this ..? I have to search for a particular value over multiple fields and need to know if grouping is allowed over multiple fields eg. AND ( AUTHOR_NAME:krish OR EMPLOYEE_NAME:krish ) Introducing paranthesis "(" is giving me lexica

Re: drill-down heuristics WAS: Where to find drill-down examples (source code)

2006-07-24 Thread Miles Barr
On Monday 24 July 2006 08:17, Martin Braun wrote: > I think I didn't explain my Problem good enough. > > The harder problem for me is how to get the proposals for the > refinement? I have a date-range of 16xx to now, for about 4 bn. docs. > So the number of found documents could be quite large. Bu

Re: Where to find drill-down examples (source code)

2006-07-21 Thread Miles Barr
Martin Braun wrote: I want to realize a drill-down Function aka "narrow search" aka "refine search". I want to have something like: Refine by Date: * 1990-2000 (30 Docs) * 2001-2003 (200 Docs) * 2004-2006 (10 Docs) But not only DateRanges but also for other Categories. What I have found in t

Re: Index-Format difference between 1.4.3 and 2.0

2006-07-20 Thread Miles Barr
Andrzej Bialecki wrote: lude wrote: As Luke was release with a Lucene-1.9 Where did you get this information? From all I know Luke is based on Lucene Version 1.4.3. The latest version of Luke was released with an early snapshot of 1.9. I plan to release a 2.0-based version in a f

Re: internal Searching behavior or how to get a hit?

2006-05-03 Thread Miles Barr
On Wednesday 03 May 2006 14:56, Mathias Keilbach wrote: > I have a question concerning the interal searching behavior of lucene. How > does lucene get a hit. If I search for the a term, will each index document > be checked for this term or is there an internal relation between terms and > lucene d

Re: DistributingMultiFieldQueryParser and DisjunctionMaxQuery

2005-12-14 Thread Miles Barr
On Tue, 2005-12-13 at 11:51 -0800, Chris Hostetter wrote: > As i mentioned in the comments for LUCENE-323, > DistributingMultiFieldQueryParser seems to be more of a demo of what's > possible with DisjunctionMaxQuery -- not neccessarily a full fledged > QueryParser. I think that's why it wasn't com

DistributingMultiFieldQueryParser and DisjunctionMaxQuery

2005-12-13 Thread Miles Barr
On Mon, 2005-12-12 at 15:35 -0800, Chris Hostetter wrote: > : Oh, BTW: I just found the DisjunctionMaxQuery class, recently added it > : seems. Do you think this query structure could benefit from using it > : instead of the BooleanQuery? > > DisjunctionMaxQuery kicks ass (in my opinion), and It

Re: Getting Dates Back out of lucene

2005-12-06 Thread Miles Barr
On Tue, 2005-12-06 at 09:35 +, Alan Chandler wrote: > I added a date field to a document with > > doc.add(Field.keyword("A Date",myDate)); > > How do I get it back out again as a date? You should be able to use the org.apache.lucene.document.DateField#stringToDate(String) method. Miles

Re: Search problems

2005-11-01 Thread Miles Barr
the tokens it creates won't match the values in your field, because they have to be an exact match. The StandardAnalyzer is the analyzer Luke uses by default. It will make the search terms lower case, and AFAIK it almost removes numbers from the query. -- Miles Barr

Re: Hits sorted

2005-10-13 Thread Miles Barr
your date's into Lucene's date representation. Of course you'd have to update your index to store the date in the same format. Miles Barr - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question: force a field must be matched?

2005-09-16 Thread Miles Barr
return null; } }; } } PerFieldAnalyzerWrapper result = new PerFieldAnalyzerWrapper(new StandardAnalyzer()); result.addAnalyzer("publisher", new KeywordAnalyzer()); QueryParser parser = new QueryParser(,

Re: index files in jar file

2005-08-30 Thread Miles Barr
ential problem might be random access, since I think streams are sequentially accessed. If the index isn't too big you could have your JARDirectory class just wrap a RAMDirectory and just load the contents of the JAR into memory.

Re: QueryParser not thread-safe

2005-08-24 Thread Miles Barr
safe. Check out this article for more: http://www-128.ibm.com/developerworks/java/library/j-threads1.html -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: QueryParser not thread-safe

2005-08-23 Thread Miles Barr
eadsafe object in a threaded environment is fairly standard in Java, just wrap it in a synchronized block. If you don't want all threads waiting on one query parser, create a pool of them. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. ---

Re: UpdateIndex

2005-08-23 Thread Miles Barr
hen you need to look up a particular document, e.g. to delete it. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: UpdateIndex

2005-08-23 Thread Miles Barr
. What analyzer did you pass to the IndexWriter? Also you shouldn't rely on the document ID because it is not fixed for a given document. I believe it changes when you optimize the index. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. -

Re: UpdateIndex

2005-08-23 Thread Miles Barr
e deleted. When you call IndexReader#delete(Term) what value is returned? It should return the number of matching documents it has deleted. If this value is 0, then your term is incorrect. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd.

Re: Why is Hits.java not Serializable?

2005-08-10 Thread Miles Barr
der to load the data or not, but it probably does. In which case you need to recreate the reference when deserializing the object. If you deserialize it in another JVM or another computer it's not obvious what this reference shoul

Re: a *match all* query

2005-05-09 Thread Miles Barr
field to get back all the documents. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Collecting documents where only one field term matches

2005-04-04 Thread Miles Barr
tegory filtering against the database (which holds document/category information). Lucene holds no category information in this case 2. Take the query, look up the relevant category information in the database and expand the query so it only picks up t

Re: Lucene on Linux problem...

2005-04-04 Thread Miles Barr
er.close(); reader = null; } if (writer != null) { writer.optimize(); writer.close(); writer = null; } } -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To uns

Re: Plural Stemming

2005-04-01 Thread Miles Barr
On Fri, 2005-04-01 at 19:24 +0200, Andrzej Bialecki wrote: > Miles Barr wrote: > > Are there any Lucene extensions that can do simple stemming, i.e. just > > for plurals? Or is the only stemming package available Snowball? > > For which language? Stemming is always languag

Plural Stemming

2005-04-01 Thread Miles Barr
Are there any Lucene extensions that can do simple stemming, i.e. just for plurals? Or is the only stemming package available Snowball? Cheers -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscr

Re: Removing similar documents from search results

2005-03-21 Thread Miles Barr
l cases. I'll probably adopt a two stage approach. 1. Prevent duplicate documents from getting into the index in the first place, e.g. compare MD5 hashes and file sizes, maybe make the spider configurable to spot certain URL patterns, etc. 2. Try out the various techniques suggested in

Re: Removing similar documents from search results

2005-03-15 Thread Miles Barr
page would have a 'fingerprint', and hopefully you could come up with a quick way to compare them at query time. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing similar documents from search results

2005-03-15 Thread Miles Barr
til Chuck's patch is included. I'm also a bit worried about the performance of this approach. It might add too much time to each query. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing similar documents from search results

2005-03-14 Thread Miles Barr
x27;original' copy and display it. Or would that approach be too expensive to calculate for each search? -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Removing similar documents from search results

2005-03-14 Thread Miles Barr
milar to the 7 already displayed. If you like, you can repeat the search with the omitted results included." at the bottom of the page. Is there anything in Lucene or one of the contrib packages that compares two documents? -- Miles Barr <[EMAIL PROTECTED]> Runtim

RE: SPAN QUERY [HOW TO]

2005-03-10 Thread Miles Barr
s > > 'DIGITAL CAMERAS' instead of returning me the 1st doc, Or none by changing > the slop factor > > Any more ideas Please do .. B( > > with regards > karthik > > > -Original Message- > From: Miles Barr [mailto:[EMA

RE: SPAN QUERY [HOW TO]

2005-03-10 Thread Miles Barr
the specific > document being returned. If depends what the type of leaf_category is. If you made it Keyword as I suggested then it won't be tokenized. i.e. there's one token 'DIGITAL CAMERA' instead of the two tokens you normally get, 'digital' and 'camera'

Re: Obtaining the contexts of hits

2005-03-10 Thread Miles Barr
The highligher contrib package does what you're looking for: http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/ By default it breaks the document into chunks roughly 100 characters long. You can alter it to get tens words either side of the matched term. -- Miles

Re: lucene index with structured fields

2005-03-09 Thread Miles Barr
the index ahead of time and the weights you want to place on the different levels I'd do a query expansion. i.e. search2:coco would become search2:coco^4 OR search4:coco but actually creating the query objects rather th

RE: SPAN QUERY [HOW TO]

2005-03-09 Thread Miles Barr
with regards > Karthik > > > -Original Message- > From: Miles Barr [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 09, 2005 3:02 PM > To: java-user@lucene.apache.org > Subject: Re: SPAN QUERY [HOW TO] > > > On Wed, 2005-03-09 at 14:52 +0530, Karthi

Re: Remotely Stroring Index file

2005-03-09 Thread Miles Barr
ke your implementation capable of storing files remotely. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: SPAN QUERY [HOW TO]

2005-03-09 Thread Miles Barr
both span and phrase queries would return all the documents. Are you trying to setup a taxonomy? i.e. only display documents in the category Electronics > Digital Camera, and not those in sub categories? If this is the case you should try to build the categorisation at the same time as the inde

Re: Large Index managing

2005-03-02 Thread Miles Barr
the order they happen. But at least by batching them you can make the long wait infrequent. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]