Hi,
I think using Field.Index.NOT_ANALYZED means ignoring StandardAnalyzer, so we
index "sql. server" as one word. You may use luke to see how this field is
indexed.
In this case we can only search whole term (without case change even), if using
the
StandardAnalyzer to analyze "sql. server" w
Hi,
I am using SpanQuery and SpanNearQuery to get phrase query like "Sql Server".
In my text file in which I am searching, it is present like (sql. server) mean
'sql dot server' which is not a span like "Sql Server".
While searching for phrase query "Sql Server". It gives result for (sql.
ser
I didn't have any errors or exceptions. Sorry to be dense, but what exactly
is the "infoStream output" you're asking about?
>This is not expected.
>
>Did the last IW exit "gracefully"? If so, it should delete the old
>segments after swapping in the optimized one.
>Can you post infoStre
Sorry, I mean "let you specify numTerms".
Mike
On Wed, Feb 9, 2011 at 6:16 PM, Michael McCandless
wrote:
> Hmm, which version of Lucene are you using? Newer versions let you
> specify a field...
>
> Mike
>
> On Wed, Feb 9, 2011 at 12:06 PM, Pablo Mendes wrote:
>> Guys,
>> this is tiny and prob
Hmm, which version of Lucene are you using? Newer versions let you
specify a field...
Mike
On Wed, Feb 9, 2011 at 12:06 PM, Pablo Mendes wrote:
> Guys,
> this is tiny and probably not relevant. But I'll bet a beer that at least a
> dozen people had to dirtymod this class while they could have r
This is not expected.
Did the last IW exit "gracefully"? If so, it should delete the old
segments after swapping in the optimized one.
Can you post infoStream output after running optimize?
Mike
On Wed, Feb 9, 2011 at 1:58 PM, Phil Herold wrote:
> I know that the size of a Lucene index can do
I know that the size of a Lucene index can double while optimization is
underway, but it's supposed to eventually settle back down to the original
size, correct? We have a Lucene index consisting of 100K documents, that is
normally about 12GB in size. It is split across 10 sub-indexes which we
sear
Guys,
this is tiny and probably not relevant. But I'll bet a beer that at least a
dozen people had to dirtymod this class while they could have run it from
command line.
A 15 min time save that took 15 min to create. I guess it's a tie.
Best,
Pablo
--- HighFreqTerms.java
+++ ExtractStopwords.java
One way, not necessarily typical or best practice, but known to work,
is to designate one of your WS layer machines, or another server, as
the master indexer. Run all index updates on that server and copy
indexes out to other server(s) using rsync. That is normally quick
since it only takes chang
http://lucene.apache.org/java/3_0_3/queryparsersyntax.html should have
answers to your query formulation questions. You might also like to
consider building the queries yourself programatically and using Span
queries. They are good for specifying slop and order and so on. There
is useful info on
Have you considered using stemming instead? Sounds like that might
make most of your problems go away and achieve the same result.
I'm not aware of a utility method to remove stop words from a string
but there are ways of passing data through analyzers/tokenizers and
grabbing the output. Standard
Hi,
I'm building my own BooleanQuery (rather than using Query Parser). That's
because I need different defaults from my users:
If a user types: java program
I need to run the query: +java* +program* (namely AND search, with Prefix so as
to hit "programS", "programMER").
So naively I split the
I've just read about payload, and I'm not sure it would be easy to use that
feature in calculating distances for SpanQueries. I guess You would have to
build your own SpanQuery, using payload instead of term position. But it would
be much lighter for index size if it were possible, indeed.
I ho
13 matches
Mail list logo