KEGan wrote:
I have read that *Andrzej Bialecki *mentioned that he would release new
version of Luke based on Lucene 2.0.0 soon. URL here ...
http://www.mail-archive.com/java-user@lucene.apache.org/msg08612.html.
Anyone has any idea if it has been released ?
Andrzej, if you are reading this, co
Pravin Shinde wrote:
I am trying to use Leading wildcard query, but I am not able to do it.
Any query with leading wildcard is failing with lexical error.
query = parser.parse( "*hi" )
JavaError: org.apache.lucene.queryParser.ParseException:
Lexical error at line 1, column 1. Encountered: "*"
Erick Erickson wrote:
As Miles said, use the DateTools (lucene) class with a DAY resolution.
That'll give you a MMDD format, which won't blow your query with a
"TooManyClauses" exception...
Remember that Lucene deals with strings, so you want to store things in
easily-manipulated string
Michael J. Prichard wrote:
I guess the more I think about it I don't really care about the
minutes in the initial. All that matters is the date (i.e.
2006-07-25). The only thing I would need the time for would be for
sorting so I need to have that too. Ideas?
Store as much detail as you
Michael J. Prichard wrote:
I am working on indexing emails and have stored the data as
milliseconds. I was thinking of using a filter w/ my search that
would only return the email in that data range. I am currently
indexing as follows:
doc.add(new Field("date", (String) itemContent.get("da
headhunter wrote:
I guess the recommended way to implement paging of results is to do your own
query-results caching, right? Or does lucene also do this for me?
The other guys have covered caching of results in a general way, so I
won't go into that.
For a search application I've written I
headhunter wrote:
I am looking for a way to limit the number of search results I retrieve when
searching.
I am only interested in (let's say) the first ten hits of a query.. maybe I
want to look at hits ten..twenty to, but usually only the first results are
important.
Right now lucene search
Krishnendra Nandi wrote:
Can anybody help me out on this ..?
I have to search for a particular value over multiple fields and need to
know if grouping is allowed over multiple fields
eg.
AND ( AUTHOR_NAME:krish OR EMPLOYEE_NAME:krish )
Introducing paranthesis "(" is giving me lexica
On Monday 24 July 2006 08:17, Martin Braun wrote:
> I think I didn't explain my Problem good enough.
>
> The harder problem for me is how to get the proposals for the
> refinement? I have a date-range of 16xx to now, for about 4 bn. docs.
> So the number of found documents could be quite large. Bu
Martin Braun wrote:
I want to realize a drill-down Function aka "narrow search" aka "refine
search".
I want to have something like:
Refine by Date:
* 1990-2000 (30 Docs)
* 2001-2003 (200 Docs)
* 2004-2006 (10 Docs)
But not only DateRanges but also for other Categories.
What I have found in t
Andrzej Bialecki wrote:
lude wrote:
As Luke was release with a Lucene-1.9
Where did you get this information? From all I know Luke is based on
Lucene
Version 1.4.3.
The latest version of Luke was released with an early snapshot of 1.9.
I plan to release a 2.0-based version in a f
On Wednesday 03 May 2006 14:56, Mathias Keilbach wrote:
> I have a question concerning the interal searching behavior of lucene. How
> does lucene get a hit. If I search for the a term, will each index document
> be checked for this term or is there an internal relation between terms and
> lucene d
On Tue, 2005-12-13 at 11:51 -0800, Chris Hostetter wrote:
> As i mentioned in the comments for LUCENE-323,
> DistributingMultiFieldQueryParser seems to be more of a demo of what's
> possible with DisjunctionMaxQuery -- not neccessarily a full fledged
> QueryParser. I think that's why it wasn't com
On Mon, 2005-12-12 at 15:35 -0800, Chris Hostetter wrote:
> : Oh, BTW: I just found the DisjunctionMaxQuery class, recently added it
> : seems. Do you think this query structure could benefit from using it
> : instead of the BooleanQuery?
>
> DisjunctionMaxQuery kicks ass (in my opinion), and It
On Tue, 2005-12-06 at 09:35 +, Alan Chandler wrote:
> I added a date field to a document with
>
> doc.add(Field.keyword("A Date",myDate));
>
> How do I get it back out again as a date?
You should be able to use the
org.apache.lucene.document.DateField#stringToDate(String) method.
Miles
the tokens it creates won't match the values in your field,
because they have to be an exact match.
The StandardAnalyzer is the analyzer Luke uses by default. It will make
the search terms lower case, and AFAIK it almost removes numbers from
the query.
--
Miles Barr
your date's into Lucene's date representation. Of course
you'd have to update your index to store the date in the same format.
Miles Barr
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
return null;
}
};
}
}
PerFieldAnalyzerWrapper result =
new PerFieldAnalyzerWrapper(new StandardAnalyzer());
result.addAnalyzer("publisher", new KeywordAnalyzer());
QueryParser parser = new QueryParser(,
ential problem might be random access, since I think streams are
sequentially accessed. If the index isn't too big you could have your
JARDirectory class just wrap a RAMDirectory and just load the contents
of the JAR into memory.
safe. Check out this article for more:
http://www-128.ibm.com/developerworks/java/library/j-threads1.html
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
eadsafe object in a threaded environment is fairly
standard in Java, just wrap it in a synchronized block.
If you don't want all threads waiting on one query parser, create a pool
of them.
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
---
hen you need to look up a particular
document, e.g. to delete it.
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
.
What analyzer did you pass to the IndexWriter?
Also you shouldn't rely on the document ID because it is not fixed for a
given document. I believe it changes when you optimize the index.
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
-
e deleted.
When you call IndexReader#delete(Term) what value is returned? It should
return the number of matching documents it has deleted.
If this value is 0, then your term is incorrect.
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
der to load the data or
not, but it probably does. In which case you need to recreate the
reference when deserializing the object. If you deserialize it in
another JVM or another computer it's not obvious what this reference
shoul
field to get back all the documents.
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
tegory filtering against the
database (which holds document/category information). Lucene holds no
category information in this case
2. Take the query, look up the relevant category information in the
database and expand the query so it only picks up t
er.close();
reader = null;
}
if (writer != null) {
writer.optimize();
writer.close();
writer = null;
}
}
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
-
To uns
On Fri, 2005-04-01 at 19:24 +0200, Andrzej Bialecki wrote:
> Miles Barr wrote:
> > Are there any Lucene extensions that can do simple stemming, i.e. just
> > for plurals? Or is the only stemming package available Snowball?
>
> For which language? Stemming is always languag
Are there any Lucene extensions that can do simple stemming, i.e. just
for plurals? Or is the only stemming package available Snowball?
Cheers
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
-
To unsubscr
l cases.
I'll probably adopt a two stage approach.
1. Prevent duplicate documents from getting into the index in the first
place, e.g. compare MD5 hashes and file sizes, maybe make the spider
configurable to spot certain URL patterns, etc.
2. Try out the various techniques suggested in
page would have a
'fingerprint', and hopefully you could come up with a quick way to
compare them at query time.
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
til
Chuck's patch is included. I'm also a bit worried about the performance
of this approach. It might add too much time to each query.
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
x27;original' copy and display it.
Or would that approach be too expensive to calculate for each search?
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
milar to the 7 already displayed.
If you like, you can repeat the search with the omitted results
included."
at the bottom of the page.
Is there anything in Lucene or one of the contrib packages that compares
two documents?
--
Miles Barr <[EMAIL PROTECTED]>
Runtim
s
>
> 'DIGITAL CAMERAS' instead of returning me the 1st doc, Or none by changing
> the slop factor
>
> Any more ideas Please do .. B(
>
> with regards
> karthik
>
>
> -Original Message-
> From: Miles Barr [mailto:[EMA
the specific
> document being returned.
If depends what the type of leaf_category is. If you made it Keyword as
I suggested then it won't be tokenized. i.e. there's one token 'DIGITAL
CAMERA' instead of the two tokens you normally get, 'digital' and
'camera'
The highligher contrib package does what you're looking for:
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/
By default it breaks the document into chunks roughly 100 characters
long. You can alter it to get tens words either side of the matched
term.
--
Miles
the index ahead of time and the weights you
want to place on the different levels I'd do a query expansion. i.e.
search2:coco
would become
search2:coco^4 OR search4:coco
but actually creating the query objects rather th
with regards
> Karthik
>
>
> -Original Message-
> From: Miles Barr [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 09, 2005 3:02 PM
> To: java-user@lucene.apache.org
> Subject: Re: SPAN QUERY [HOW TO]
>
>
> On Wed, 2005-03-09 at 14:52 +0530, Karthi
ke your implementation capable of storing files remotely.
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
both span and phrase
queries would return all the documents.
Are you trying to setup a taxonomy? i.e. only display documents in the
category Electronics > Digital Camera, and not those in sub categories?
If this is the case you should try to build the categorisation at the
same time as the inde
the order they happen. But at
least by batching them you can make the long wait infrequent.
--
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
43 matches
Mail list logo