As you have seen the example code for PartOfSpeechTaggingFilter at
http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analysis/package-summary.html
You can use a custom analyzer to inject "metadata" tokens into the index at
the same position as the source tokens.
For example, given t
Hi Mike,
thanks for the infos.
As far as I know a write.lock is created from an IndexWriter.
So I have to dig into it why an IndexWriter is created just
on starting solr with an optimized index.
The problem, this is only with a huge index.
And also old parts of the index are not cleaned up.
May
On 02/05/2011 23:36, Paul Taylor wrote:
Hi
Nearing completion on a new version of a lucene search component for
the http://www.musicbrainz.org music database and having a problem
with performance. There are a number of indexes each built from data
in a database, there is one index for albums,
Well, it is not only with a huge index.
It is only if ReplicationHandler is in use on a master.
If ReplicationHandler is configured to replicateAfter startup it first
sends a commit via IndexWriter to have a "stable" index. The left over
of this operation is the write.lock.
So removing replicateA
Sorry for coming back to my issue. Can anybody explain why my "simple" unit
test below fails? Any hint/help appreciated.
Directory directory = new RAMDirectory();
IndexWriter indexWriter = new IndexWriter( directory, new StandardAnalyzer(
Version.LUCENE_31 ), IndexWriter.MaxFieldLength.UNLIMITED
Mer != mer. The latter will be what is indexed because
StandardAnalyzer calls LowerCaseFilter.
--
Ian.
On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss wrote:
> Sorry for coming back to my issue. Can anybody explain why my "simple" unit
> test below fails? Any hint/help appreciated.
>
> Directory
Unfortunately lowercasing doesn't help.
Also, doesn't the FuzzyQuery ignore casing?
> -Ursprüngliche Nachricht-
> Von: Ian Lea [mailto:ian@gmail.com]
> Gesendet: Dienstag, 3. Mai 2011 11:06
> An: java-user@lucene.apache.org
> Betreff: Re: "fuzzy prefix" search
>
> Mer != mer. The la
I'd assumed that FuzzyQuery wouldn't ignore case but I could be wrong.
What would be the edit distance between "mer" and "merlot"? Would it
be less that 1.5 which I reckon would be the value of length(term)*0.5
as detailed in the javadocs? Seems unlikely, but I don't really know
anything about th
>PrefixQuery
I'd like the combination of prefix and fuzzy ;-) because people could also type
"menlo" or "märl" and in any of these cases I'd like to get a hit on Merlot
(for suggesting Merlot)
> -Ursprüngliche Nachricht-
> Von: Ian Lea [mailto:ian@gmail.com]
> Gesendet: Dienstag, 3.
Hi,
I have been experimenting with using a int payload as a unique identifier, one
per Document. I have successfully loaded them in using the TermPositions API
with something like:
public static void loadPayloadIntArray(IndexReader reader, Term term, int[]
intArray, int from, int to) thro
Have you tried
Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.499f);
Sven
-Ursprüngliche Nachricht-
Von: Clemens Wyss [mailto:clemens...@mysign.ch]
Gesendet: Dienstag, 3. Mai 2011 10:57
An: java-user@lucene.apache.org
Betreff: AW: "fuzzy prefix" search
Sorry for coming back
Then why not do that? Add a PrefixQuery and a FuzzyQuery to a
BooleanQuery and use that.
--
Ian.
On Tue, May 3, 2011 at 10:25 AM, Clemens Wyss wrote:
>>PrefixQuery
> I'd like the combination of prefix and fuzzy ;-) because people could also
> type "menlo" or "märl" and in any of these cases
I feel like we are back to Basic ;)
If you keep running line 40 over and over on the same memory index, do
you see a slowdown?
Mike
http://blog.mikemccandless.com
On Mon, May 2, 2011 at 1:19 PM, Otis Gospodnetic
wrote:
> Hi,
>
> I think this describes what's going on:
>
> 10 load N stored quer
I had a look into the 3.0 implementation
The calculation of the similarity is
1 - (edit distance / min (string 1 length, string 2 length)
As opposed to the levenstein in spellchecker
1 - (edit distance / max (string 1 length, string 2 length)
So, the similarity is 1 - ( 3 / mi
Is this calculation intended or a bug?
> -Ursprüngliche Nachricht-
> Von: Biedermann,S.,Fa. Post Direkt [mailto:s.biederm...@postdirekt.de]
> Gesendet: Dienstag, 3. Mai 2011 12:00
> An: java-user@lucene.apache.org
> Betreff: AW: "fuzzy prefix" search
>
> I had a look into the 3.0 implemen
(11/03/01 21:16), Amel Fraisse wrote:
Hello,
The MoreLikeThisHandler could include higlighting ?
Is it true to define a MoreLikeThisHandler like this: ?
true
contenu
Thank you for your help.
Amel.
Amel,
1. I think you shou
On Tue, May 3, 2011 at 5:35 AM, Chris Bamford
wrote:
> Hi,
>
> I have been experimenting with using a int payload as a unique identifier,
> one per Document. I have successfully loaded them in using the TermPositions
> API with something like:
>
> public static void loadPayloadIntArray(Index
I don't know.
But changing it now would cause trouble in many applications...
For our applications we reimplemented fuzzy query so that we can pass along a
org.apache.lucene.search.spell.StringDistance instance that holds the
similarity algorithm of choice.
--
Sven
-Ursprüngliche Nachr
Hi,
I didn't read this thread closely, but just in case:
* Is this something you can handle with synonyms?
* If this is for English and you are trying to handle typos, there is a list of
common English misspellings out there that you could use for this perhaps.
* Have you considered n-gramming yo
Hi,
2011/5/3 Michael McCandless :
> I feel like we are back to Basic ;)
>
> If you keep running line 40 over and over on the same memory index, do
> you see a slowdown?
Yes. I've tested running same query list (~3,5 k queries) on the same
MemoryIndex instance and after a while iterations get slow
> Hi,
>
> 2011/5/3 Michael McCandless :
> > I feel like we are back to Basic ;)
> >
> > If you keep running line 40 over and over on the same memory index, do
> > you see a slowdown?
>
> Yes. I've tested running same query list (~3,5 k queries) on the same
> MemoryIndex instance and after a while
Im receiving a number of searches with many ORs so that the total number
of matches is huge ( > 1 million) although only the first 20 results are
required. Analysis shows most time is spent scoring the results. Now it
seems to me if you sending a query with 10 OR components, documents that
matc
Hi All,
I want to know any inbuilt method in lucene that can help me to fix the
number of searched terms for a given field e.g.
Suppose I have given content:(text1 text2 text3 text4 text5) to search and
want to limit it to 3 words only i.e. content:(text1 text2 text3)
Please help.
Thanks,
Harsh
Why do you want to do this? I'm wondering if this is an XY problem...
See: http://people.apache.org/~hossman/#xyproblem
Best
Erick
On Tue, May 3, 2011 at 7:55 AM, harsh srivastava wrote:
> Hi All,
>
>
> I want to know any inbuilt method in lucene that can help me to fix the
> number of searched
That seems to work. Thank you!
Sincerely,
Chris Salem
Development Team
Main Sequence Technologies, Inc.
PCRecruiter.net - PCRecruiter Support
ch...@mainsequence.net
P: 440.946.5214 ext 5458
F: 440.856.0312
This email and any files transmitted with it may contain confidential
information inten
How can I convert this Similariity method to use 3.1 (currently using
3.0.3), I understand I have to replace lengthNorm() wuth computerNorm()
, but fieldlName is not a provided parameter in computerNorm() and
FieldInvertState does not contain the fieldname either. I need the field
because I onl
On Tue, May 3, 2011 at 9:57 AM, Paul Taylor wrote:
> How can I convert this Similariity method to use 3.1 (currently using
> 3.0.3), I understand I have to replace lengthNorm() wuth computerNorm() ,
> but fieldlName is not a provided parameter in computerNorm() and
> FieldInvertState does not cont
On 03/05/2011 15:06, Robert Muir wrote:
On Tue, May 3, 2011 at 9:57 AM, Paul Taylor wrote:
How can I convert this Similariity method to use 3.1 (currently using
3.0.3), I understand I have to replace lengthNorm() wuth computerNorm() ,
but fieldlName is not a provided parameter in computerNorm()
On Tue, May 3, 2011 at 10:29 AM, Paul Taylor wrote:
> I assume this would be the correct way to fix the code for 3.1.0
>
Yes, thats correct.
> public float computeNorm(String field, FieldInvertState state) {
>
>
> //This will match both artist and label aliases and is applicable to
> both
On Tue, May 3, 2011 at 7:43 AM, Tomislav Poljak wrote:
> Hi,
>
> 2011/5/3 Michael McCandless :
>> I feel like we are back to Basic ;)
>>
>> If you keep running line 40 over and over on the same memory index, do
>> you see a slowdown?
>
> Yes. I've tested running same query list (~3,5 k queries) on
We subclassed PerFieldAnalyzerWrapper as follows:
public class PerFieldEntityAnalyzer extends PerFieldAnalyzerWrapper {
public PerFieldEntityAnalyzer(Class indexFieldClass) {
super(new StandardUnaccentAnalyzer());
for(Object o : EnumSet.allOf(indexFieldClass)) {
How does an simple Analyzer look that just "n-grams" the docs/fields.
class SimpleNGramAnalyzer extends Analyzer
{
@Override
public TokenStream tokenStream ( String fieldName, Reader reader )
{
EdgeNGramTokenFilter... ???
}
}
> -Ursprüngliche Nachricht-
> Von: Otis Gospodnetic [mailto:
Clemens,
Something a la:
public TokenStream tokenStream (String fieldName, Reader r) {
return nw EdgeNGramTokenFilter(new KeywordTokenizer(r),
EdgeNGramTokenFilter.Side.FRONT, 1, 4);
}
Check out page 265 of Lucene in Action 2.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nu
But doesn't the KeyWordTokenizer extract single words out oft he stream? I
would like to create n-grams on the stream (field content) as it is...
> -Ursprüngliche Nachricht-
> Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Gesendet: Dienstag, 3. Mai 2011 21:31
> An: java-user
Clemens - that's just an example. Stick another tokenizer in there, like
WhitespaceTokenizer in there, for example.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: Clemens Wyss
> To:
I know this is just an example.
But even the WhitespaceAnalyzer takes the words apart, which I don't want. I
would like the phrases as they are (maximum 3 words, e.g. "Merlot del Ticino",
...) to be n-gram-ed. I hence want to have the n-grams.
Mer
Merl
Merlo
Merlot
Merlot
Merlot d
...
Regards
Cl
On Tue, May 3, 2011 at 7:03 PM, Paul Taylor wrote:
> We subclassed PerFieldAnalyzerWrapper as follows:
>
> public class PerFieldEntityAnalyzer extends PerFieldAnalyzerWrapper {
>
>public PerFieldEntityAnalyzer(Class indexFieldClass) {
>super(new StandardUnaccentAnalyzer());
>
>
37 matches
Mail list logo