On Thu, Nov 25, 2010 at 2:58 AM, Uwe Schindler wrote:
> ParallelMultiSearcher as subclass of MultiSearcher has the same problems.
> These are not crashes, but more that some queries do not return correct
> scored results for some queries. This effects especially all MultiTermQueries
> (TermRang
Thanks for the input.
My results are sorted by date and i am not much bothered about score. Will i
still be in trouble?
Regards
Ganesh
- Original Message -
From: "Robert Muir"
To:
Sent: Thursday, November 25, 2010 1:45 PM
Subject: Re: best practice: 1.4 billions documents
On Thu,
You are in trouble if you use MultiTermQuery subclasses as negative clause in a
BooleanQuery, e.g a range like "-[A TO B]" or even NumericRanges or Wildcards.
The query will then incorrect results.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetap
Hi Jan,
On Wed, Nov 24, 2010 at 9:12 AM, wrote:
> Of course:
>
> We are trying to search in documents that contain text in several languages.
> We are also investigating other approaches*, so this is not about finding
> other variants.
> the goal is to only match tokens from 1 or more given la
Hi guys,
I have this problem:
I'm using Lucene to create a search engine on people profiles.
I have a set of hobbies (let's say {"reading" , "singing"} for example) and
I want to find people who have at least one of these hobbies AND which of
these hobbies they have.
Currently I search for eac
Can't you just store the hobbies as standard stored fields
(Field.Store.YES), or as a single field, call doc.get("hobbies") and
do what you want with them?
This sounds rather like faceting - if so you might want to consider
using Solr. http://wiki.apache.org/solr/SolrFacetingOverview
--
Ian.
O
Hello List,
Lucene 3.0.1
Windows Vista Premium Home Edition
I am currently attempting to configure my IndexFiles.java file. My intention is
to add the following functionality to the code as I require input text to be
further analyzed than what the default analyzer does.
IndexWriter writer = n
I used KeywordAnalyzer and KeywordTokenizer as templates for
a new analyzer.
The analyzer works fine but the result never reaches the index.
My analyzer is called in "DocInverterPerField.processFields"
with "stream.incrementToken()".
...
try {
boolean hasMoreTokens = stream.incrementToken();
What I call "profile" is free text (extracted from a pdf) and not the result
of the user listing hobbies in a form
So to store hobbies in a field called "hobbies" I have to extract hobbies
from text first...is it possible to do it using Lucene?
-Messaggio originale-
Da: Ian Lea [mailto:ian
The normal technique is to write your own analyzer. See
http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_write_my_own_Analyzer.3F.
Then pass that to IndexWriter - and be sure to use the same analyzer
when you are searching, unless you're doing clever things.
--
Ian.
On Thu, Nov 25, 2010 a
You could parse the output from the lucene analyzer that you are using
to get hold of a list of terms and pick the ones that are hobbies. Or
do it outside lucene using whatever string parsing technique you like.
Or take a look at the recent thread on this list on a similar topic:
"High frequency
Hi Simon,
On 25.11.2010 10:40, ext Simon Willnauer wrote:
Hi Jan,
On Wed, Nov 24, 2010 at 9:12 AM, wrote:
Of course:
We are trying to search in documents that contain text in several languages. We
are also investigating other approaches*, so this is not about finding other
variants.
the go
On Thu, Nov 25, 2010 at 3:25 PM, Jan Kurella wrote:
> Hi Simon,
>
> On 25.11.2010 10:40, ext Simon Willnauer wrote:
>>
>> Hi Jan,
>>
>> On Wed, Nov 24, 2010 at 9:12 AM, wrote:
>>>
>>> Of course:
>>>
>>> We are trying to search in documents that contain text in several
>>> languages. We are also i
Thanks a lot.
I used the lucene analyzer to parse the profile and everything works :)
-Messaggio originale-
Da: Ian Lea [mailto:ian@gmail.com]
Inviato: giovedì 25 novembre 2010 14.52
A: java-user@lucene.apache.org
Oggetto: Re: Retrieve found keywords from document
You could parse the
What is your evidence that "the result never reaches the index?"
Are you sure:
1> you commit afterwards
2> you reopen the underlying reader to see
3> if you don't store the value for the field, how are you sure?
4> If you search and don't find it, did you index it?
First, I'd be sure the value in
field.fieldsData is used for the stored field contents and so only *stored*
in index, of course not analyzed (why should I analyze a stored field). The
indexed tokens go of course through your analyzer and the returned tokens
are indexed as terms. Where is the problem?
-
Uwe Schindler
H.-H.-Me
Hi Erik,
my evidence is that I load a single document into an empty index
with a field "id" and a second field "dcdocid". The field "dcdocid"
has the word "foo". This goes through my analyzer and changes to
MD5 string which is then "acbd18db4cc2f85cedef654fccc4a4d8".
After indexing and commit a se
Hi Uwe,
my fieldType and fields are as follows:
So the field dcdocid has the attribute *stored* which I can also see
in the debugger.
Why should I analyze a stored field?
I don't know if I need to analyze it, I also tried a filter but also no success.
My understanding is to send somet
18 matches
Mail list logo