On Tue, May 3, 2011 at 7:03 PM, Paul Taylor wrote:
> We subclassed PerFieldAnalyzerWrapper as follows:
>
> public class PerFieldEntityAnalyzer extends PerFieldAnalyzerWrapper {
>
>public PerFieldEntityAnalyzer(Class indexFieldClass) {
>super(new StandardUnaccentAnalyzer());
>
>
I know this is just an example.
But even the WhitespaceAnalyzer takes the words apart, which I don't want. I
would like the phrases as they are (maximum 3 words, e.g. "Merlot del Ticino",
...) to be n-gram-ed. I hence want to have the n-grams.
Mer
Merl
Merlo
Merlot
Merlot
Merlot d
...
Regards
Cl
Clemens - that's just an example. Stick another tokenizer in there, like
WhitespaceTokenizer in there, for example.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: Clemens Wyss
> To:
But doesn't the KeyWordTokenizer extract single words out oft he stream? I
would like to create n-grams on the stream (field content) as it is...
> -Ursprüngliche Nachricht-
> Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Gesendet: Dienstag, 3. Mai 2011 21:31
> An: java-user
Clemens,
Something a la:
public TokenStream tokenStream (String fieldName, Reader r) {
return nw EdgeNGramTokenFilter(new KeywordTokenizer(r),
EdgeNGramTokenFilter.Side.FRONT, 1, 4);
}
Check out page 265 of Lucene in Action 2.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nu
How does an simple Analyzer look that just "n-grams" the docs/fields.
class SimpleNGramAnalyzer extends Analyzer
{
@Override
public TokenStream tokenStream ( String fieldName, Reader reader )
{
EdgeNGramTokenFilter... ???
}
}
> -Ursprüngliche Nachricht-
> Von: Otis Gospodnetic [mailto:
We subclassed PerFieldAnalyzerWrapper as follows:
public class PerFieldEntityAnalyzer extends PerFieldAnalyzerWrapper {
public PerFieldEntityAnalyzer(Class indexFieldClass) {
super(new StandardUnaccentAnalyzer());
for(Object o : EnumSet.allOf(indexFieldClass)) {
On Tue, May 3, 2011 at 7:43 AM, Tomislav Poljak wrote:
> Hi,
>
> 2011/5/3 Michael McCandless :
>> I feel like we are back to Basic ;)
>>
>> If you keep running line 40 over and over on the same memory index, do
>> you see a slowdown?
>
> Yes. I've tested running same query list (~3,5 k queries) on
On Tue, May 3, 2011 at 10:29 AM, Paul Taylor wrote:
> I assume this would be the correct way to fix the code for 3.1.0
>
Yes, thats correct.
> public float computeNorm(String field, FieldInvertState state) {
>
>
> //This will match both artist and label aliases and is applicable to
> both
On 03/05/2011 15:06, Robert Muir wrote:
On Tue, May 3, 2011 at 9:57 AM, Paul Taylor wrote:
How can I convert this Similariity method to use 3.1 (currently using
3.0.3), I understand I have to replace lengthNorm() wuth computerNorm() ,
but fieldlName is not a provided parameter in computerNorm()
On Tue, May 3, 2011 at 9:57 AM, Paul Taylor wrote:
> How can I convert this Similariity method to use 3.1 (currently using
> 3.0.3), I understand I have to replace lengthNorm() wuth computerNorm() ,
> but fieldlName is not a provided parameter in computerNorm() and
> FieldInvertState does not cont
How can I convert this Similariity method to use 3.1 (currently using
3.0.3), I understand I have to replace lengthNorm() wuth computerNorm()
, but fieldlName is not a provided parameter in computerNorm() and
FieldInvertState does not contain the fieldname either. I need the field
because I onl
That seems to work. Thank you!
Sincerely,
Chris Salem
Development Team
Main Sequence Technologies, Inc.
PCRecruiter.net - PCRecruiter Support
ch...@mainsequence.net
P: 440.946.5214 ext 5458
F: 440.856.0312
This email and any files transmitted with it may contain confidential
information inten
Why do you want to do this? I'm wondering if this is an XY problem...
See: http://people.apache.org/~hossman/#xyproblem
Best
Erick
On Tue, May 3, 2011 at 7:55 AM, harsh srivastava wrote:
> Hi All,
>
>
> I want to know any inbuilt method in lucene that can help me to fix the
> number of searched
Hi All,
I want to know any inbuilt method in lucene that can help me to fix the
number of searched terms for a given field e.g.
Suppose I have given content:(text1 text2 text3 text4 text5) to search and
want to limit it to 3 words only i.e. content:(text1 text2 text3)
Please help.
Thanks,
Harsh
Im receiving a number of searches with many ORs so that the total number
of matches is huge ( > 1 million) although only the first 20 results are
required. Analysis shows most time is spent scoring the results. Now it
seems to me if you sending a query with 10 OR components, documents that
matc
> Hi,
>
> 2011/5/3 Michael McCandless :
> > I feel like we are back to Basic ;)
> >
> > If you keep running line 40 over and over on the same memory index, do
> > you see a slowdown?
>
> Yes. I've tested running same query list (~3,5 k queries) on the same
> MemoryIndex instance and after a while
Hi,
2011/5/3 Michael McCandless :
> I feel like we are back to Basic ;)
>
> If you keep running line 40 over and over on the same memory index, do
> you see a slowdown?
Yes. I've tested running same query list (~3,5 k queries) on the same
MemoryIndex instance and after a while iterations get slow
Hi,
I didn't read this thread closely, but just in case:
* Is this something you can handle with synonyms?
* If this is for English and you are trying to handle typos, there is a list of
common English misspellings out there that you could use for this perhaps.
* Have you considered n-gramming yo
I don't know.
But changing it now would cause trouble in many applications...
For our applications we reimplemented fuzzy query so that we can pass along a
org.apache.lucene.search.spell.StringDistance instance that holds the
similarity algorithm of choice.
--
Sven
-Ursprüngliche Nachr
On Tue, May 3, 2011 at 5:35 AM, Chris Bamford
wrote:
> Hi,
>
> I have been experimenting with using a int payload as a unique identifier,
> one per Document. I have successfully loaded them in using the TermPositions
> API with something like:
>
> public static void loadPayloadIntArray(Index
(11/03/01 21:16), Amel Fraisse wrote:
Hello,
The MoreLikeThisHandler could include higlighting ?
Is it true to define a MoreLikeThisHandler like this: ?
true
contenu
Thank you for your help.
Amel.
Amel,
1. I think you shou
Is this calculation intended or a bug?
> -Ursprüngliche Nachricht-
> Von: Biedermann,S.,Fa. Post Direkt [mailto:s.biederm...@postdirekt.de]
> Gesendet: Dienstag, 3. Mai 2011 12:00
> An: java-user@lucene.apache.org
> Betreff: AW: "fuzzy prefix" search
>
> I had a look into the 3.0 implemen
I had a look into the 3.0 implementation
The calculation of the similarity is
1 - (edit distance / min (string 1 length, string 2 length)
As opposed to the levenstein in spellchecker
1 - (edit distance / max (string 1 length, string 2 length)
So, the similarity is 1 - ( 3 / mi
I feel like we are back to Basic ;)
If you keep running line 40 over and over on the same memory index, do
you see a slowdown?
Mike
http://blog.mikemccandless.com
On Mon, May 2, 2011 at 1:19 PM, Otis Gospodnetic
wrote:
> Hi,
>
> I think this describes what's going on:
>
> 10 load N stored quer
Then why not do that? Add a PrefixQuery and a FuzzyQuery to a
BooleanQuery and use that.
--
Ian.
On Tue, May 3, 2011 at 10:25 AM, Clemens Wyss wrote:
>>PrefixQuery
> I'd like the combination of prefix and fuzzy ;-) because people could also
> type "menlo" or "märl" and in any of these cases
Have you tried
Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.499f);
Sven
-Ursprüngliche Nachricht-
Von: Clemens Wyss [mailto:clemens...@mysign.ch]
Gesendet: Dienstag, 3. Mai 2011 10:57
An: java-user@lucene.apache.org
Betreff: AW: "fuzzy prefix" search
Sorry for coming back
Hi,
I have been experimenting with using a int payload as a unique identifier, one
per Document. I have successfully loaded them in using the TermPositions API
with something like:
public static void loadPayloadIntArray(IndexReader reader, Term term, int[]
intArray, int from, int to) thro
>PrefixQuery
I'd like the combination of prefix and fuzzy ;-) because people could also type
"menlo" or "märl" and in any of these cases I'd like to get a hit on Merlot
(for suggesting Merlot)
> -Ursprüngliche Nachricht-
> Von: Ian Lea [mailto:ian@gmail.com]
> Gesendet: Dienstag, 3.
I'd assumed that FuzzyQuery wouldn't ignore case but I could be wrong.
What would be the edit distance between "mer" and "merlot"? Would it
be less that 1.5 which I reckon would be the value of length(term)*0.5
as detailed in the javadocs? Seems unlikely, but I don't really know
anything about th
Unfortunately lowercasing doesn't help.
Also, doesn't the FuzzyQuery ignore casing?
> -Ursprüngliche Nachricht-
> Von: Ian Lea [mailto:ian@gmail.com]
> Gesendet: Dienstag, 3. Mai 2011 11:06
> An: java-user@lucene.apache.org
> Betreff: Re: "fuzzy prefix" search
>
> Mer != mer. The la
Mer != mer. The latter will be what is indexed because
StandardAnalyzer calls LowerCaseFilter.
--
Ian.
On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss wrote:
> Sorry for coming back to my issue. Can anybody explain why my "simple" unit
> test below fails? Any hint/help appreciated.
>
> Directory
Sorry for coming back to my issue. Can anybody explain why my "simple" unit
test below fails? Any hint/help appreciated.
Directory directory = new RAMDirectory();
IndexWriter indexWriter = new IndexWriter( directory, new StandardAnalyzer(
Version.LUCENE_31 ), IndexWriter.MaxFieldLength.UNLIMITED
Well, it is not only with a huge index.
It is only if ReplicationHandler is in use on a master.
If ReplicationHandler is configured to replicateAfter startup it first
sends a commit via IndexWriter to have a "stable" index. The left over
of this operation is the write.lock.
So removing replicateA
On 02/05/2011 23:36, Paul Taylor wrote:
Hi
Nearing completion on a new version of a lucene search component for
the http://www.musicbrainz.org music database and having a problem
with performance. There are a number of indexes each built from data
in a database, there is one index for albums,
Hi Mike,
thanks for the infos.
As far as I know a write.lock is created from an IndexWriter.
So I have to dig into it why an IndexWriter is created just
on starting solr with an optimized index.
The problem, this is only with a huge index.
And also old parts of the index are not cleaned up.
May
As you have seen the example code for PartOfSpeechTaggingFilter at
http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analysis/package-summary.html
You can use a custom analyzer to inject "metadata" tokens into the index at
the same position as the source tokens.
For example, given t
37 matches
Mail list logo