Thanks Erik, thats what I thought.
In my case no phrase queries are done, so it seems I am good to go.
Any additional thoughts on the issue are welcomed.
Thanks
Erick Erickson wrote:
>
> No, a phrase search it will NOT match. Phrase semantics
> requires that split tokens be adjacent (slop of 0
Why are you doing this in the first place? Do you actually have
evidence that the default Lucene behavior (caching, etc) is inadequate
for your needs?
I'd *strongly* recommend, if you haven't, just using the regular
FSDirectories rather than RAMDirectories and only getting
complex if that's too s
An IndexReader doesn't see changes in the index unless you close
and reopen it, but if there is significant time between the time you
fetch your docid and read it's vector, that could be a problem.
You can always use TermEnum/TermDocs to find the doc ID
associated with a particular field you have
Hello,
I have been using a large, in memory MultiSearcher that
is reaching the limits of my hardware RAM with this code:
try
{
IndexSearcher[] searcher_a=
{
new IndexSearcher(new RAMDirectory(index_one_path)),
new IndexSearcher(new RAMD
I'm interested in getting the term vector of a lucene doc. The point is,
it seems I have to give to the IndexReader.getTermFreqVector a doc ID,
while I would know if there is a way to get the termvector by a doc
identifier (not lucene doc id, but a my own field). I know how to get
the lucene docid
Stefan Colella wrote:
> I tried to only add the content of the page where that expression can be
> found (instead of the whole document) and then the search works.
>
> Do i have to split my pdf text into more field? Or what could be the
> problem?
Perhaps indexWriter's setMaxFieldLength() is rel
No, a phrase search it will NOT match. Phrase semantics
requires that split tokens be adjacent (slop of 0). So, since
"mainstrasse" was split into two tokens at index time, the test for
"is schöne right next to strasse" will fail because of the intervening
(introduced) term "main". Whether this is
I will never have "mainstrasse" in my lucene index, since strasse is always
replaced with " strasse" causing "mainstrasse" to be split to "main
strasse".
So the example you gave:
"schöne strasse" will match "schöne mainstrasse", since in the lucene index
I have "schöne main strasse".
Daniel Nabe
Actually before you jump in, be warned that the "+" plus sign is also
part of query parser.
You can not really/easily pass the query with the "+" sign through
query parser in order to get a match.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
What you need to do is to create your own tokenizer. Just copy the code
from the StandardTokenizer to your XYZTokenizer and make your changes.
Then you need to create your own Analyzer class (again copy the code
from the StandardAnalyzer) and user your XYZTokenizer in the new
XYZAnalyzer you create
Does it mandate you to pass data through Hibernate? This seems very
similar to Compass' approach.
I believe a more generic approach is to compare what's already indexed
with what's changed or deleted, so you can use any framework to work
with Lucene. And simply selecting the data and creating the
On Monday 21 May 2007 22:53, bhecht wrote:
> If someone searches for mainstrasse, my tools will split it again to
> main and strasse, and then lucene will be able to find it.
"strasse" will match "mainstrasse" but the phrase query "schöne strasse"
will not match "schöne mainstrasse". However, th
Thanks Daniel,
But when searching, I will run my "standardization" tools again before
querying Lucene, so what you mentioned will not be a problem.
If someone searches for mainstrasse, my tools will split it again to main
and strasse, and then lucene will be able to find it.
Daniel Naber-5 wrot
On Monday 21 May 2007 22:05, bhecht wrote:
> Is there any point for me to start creating custom analyzers with filter
> for stop words, synonyms, and implementing my own "sub string" filter,
> for separating tokens into "sub words" (like "mainstrasse"=> "main",
> "strasse")
Yes: I assume your doc
Hi there,
I have started using Lucene not long ago, with plans to replace my current
sql queries in my application with it.
As I wasn't aware of Lucene before, I have implemented some similar tools
(filters) as Lucene includes.
For example I have implemented a "stop word" tool.
In my case I have
Hi there,
I was interested in changing the StandardTokenzier so it will not remove the
"+" (plus) sign from my stream.
Looking in the code and documentation, it reads:
"If this tokenizer does not suit your application, please consider copying
this source code
directory to your project and maint
If you are using Orcale and Lucene, check out
http://www.hibernate.org/410.html "Hibernate Search" , this will
automaticly update your lucene index, on any change to your database table
Erick Erickson wrote:
>
> You have to delete the old document and add it a new one.
>
> See IndexModifier c
Hi Ian
Well it worked. Thanks :)
Wasn't aware of that could have fixed it, but after your suggestion it
seemed like the most logical solution.
/Svend
man, 21 05 2007 kl. 14:30 +0100, skrev Ian Lea:
> Hi
>
>
> I saw this or something similar going from 2.0 to 2.1 when hadn't
> recompiled all
Hello,
Thank you for your quick answer.
I use Luke to examine the index, but since I switched to FrenchAnalyzer, it
says 'Not a Lucene index'.
If I open the index files in a text viewer, the strings are in UPPER case.
I do use the same analyzer to index and search.
So, do I have to specify the Fre
Mike Klaas wrote:
> On 18-May-07, at 1:01 PM, charlie w wrote:
>> Is there an upper limit on the number of fields comprising a document,
>> and if so what is it?
>
> There is not. They are relatively costless if omitNorms=False
Mike, I think you meant "relatively costless if omitNorms=True".
St
Hi
I saw this or something similar going from 2.0 to 2.1 when hadn't
recompiled all my lucene related code. It went away when everything
was recompiled, so I'd guess you've got an old class file lurking
somewhere.
--
Ian.
On 5/21/07, Svend Ole Nielsen <[EMAIL PROTECTED]> wrote:
Hi
I have t
First have you gotten a copy of Luke to examine your index to see
what's actually indexed?
The default behavior is usually to lowercase everything, but I'm not
entirely sure if the French analyzer does this. But I suspect so.
Searches are case sensitive. To get caseless searching, you need
to pu
Hi
I have tried to upgrade from 2.0 -> 2.1 to overcome some NFS-issues. It
compiles just fine, but when I run the application and try to add a
document if throws an exception stating NoSuchMethod. This happens when
I try to add an object of type Field to a newly created empty Document.
I have eras
Hello,
I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got strange
search results on strings in uppercase. (example : VEHICLE)
When I search the string (in lower case), I get no result. I get results if
I use "vehicle*" or "vehiclE", or "vehicLe" etc.
What is odd is that it affects on
hello,
thx for u reply, i used the explain method and i understand now why some
documents are returned.
I am using the same Analyzer for indexing and searching.
I tried to only add the content of the page where that expression can be
found (instead of the whole document) and then the search
Peter Bloem wrote:
[...]
"+(A B) C D E"
[...]
In other words, Lucene considers all documents that
have both A and B, and ranks them higher if they also have C D or E.
Hello Peter,
for my understanding "+(A B) C D E" means at least one of the terms "A"
or "B" must be contained and the terms
26 matches
Mail list logo