Hi Thushara,
Please use lucene-gosen mailing list for lucene-gosen questions:
http://groups.google.com/group/lucene-gosen
Thanks,
koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/
(12/03/03 6:41), Thushara Wijeratna wrote:
> I'm testing lucene-gosen for Japanese tokenization an
(12/03/13 2:38), Hassane Cabir wrote:
Hi guys,
I'm using Lucene for my project and I need to calcule how similar two (or
more) documents are, using TFIDF. How to get TFIDF with lucene?
Any insights on this?
Solr has TermVectorComponent which can return tf, df and tf-idf of each term
in a docu
(12/04/06 2:34), okayndc wrote:
Hello,
I currently use Lucene version 3.0...probably need to upgrade to a more
current version soon.
The problem that I have is when I test search for a an HTML tag (ex.
), Lucene returns
the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to
"
Hello,
Sorry for cross post. I just wanted to announce that I've written a blog post on
how to create synonyms.txt file automatically from Wikipedia:
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Hope that the article gives someone a good experience!
koji
e you shared source code / jar for the same so at it could be used ?
Thanks,
Rajesh
On Mon, May 27, 2013 at 8:44 PM, Koji Sekiguchi wrote:
Hello,
Sorry for cross post. I just wanted to announce that I've written a blog
post on
how to create synonyms.txt file automatically from Wikiped
Hi Oliver,
> My questions are:
>
> 1. Why are the overrided lengthNorm() (under Lucene410) or
> computeNorm() (under Lucene350) methods not called during a searching
> process?
Regardless of whether you override the method or not, Lucene framework
calls the method during index time only be
(13/07/11 22:56), gtkesh wrote:
Hi everyone! I have two questions:
1. What are the cases where Lucene's default tf-idf overperforms BM25? What
are the best use cases where I should use tf-idf or BM25?
2. Are there any user-friendly guide or something about how can I use BM25
algorithm instead
(13/08/02 17:16), Ankit Murarka wrote:
Hello All,
Just like spellcheck feature which after lot of trouble was Implemented, is it
possible to implement
Complete Phrase Suggest Feature in Lucene 4.3 . So if I enter an incorrect
phrase it can suggest me
few possible valid phrases.
One way could
(13/09/04 2:33), David Miranda wrote:
Is there any way to check the similarity of texts with Lucene?
I have the DBpedia indexed and wanted to get the texts more similar
between the abstract and DBpedia another text. If I do a search in the
abstract field, with a particular text the result is not
(10/05/12 20:32), Midhat Ali wrote:
Is it possible to return entire field contents instead of a fixed size
fragment. In Highlightrer, there is a Nullfragmenter. Whats's its
counterpart in FastVectorhighlighter
Currently, FVH doesn't have such function. I've opened a JIRA issue:
https://iss
(10/05/19 13:58), Li Li wrote:
hi all,
I read lucene in action 2nd Ed. It says SimpleSpanFragmenter will
"make fragments that always include the spans matching each document".
And also a SpanScorer existed for this use. But I can't find any class
named SpanScorer in lucene 3.0.1. And the res
(10/07/09 19:30), manjula wijewickrema wrote:
Uwe, thanx for your comments. Following is the code I used in this case.
Could you pls. let me know where I have to insert UNLIMITED field length?
and how?
Tanx again!
Manjula
Manjula,
You can set UNLIMITED field length to IW constructor:
http
(10/07/20 7:31), Joe Hansen wrote:
Hey All,
I am using Apache Lucene (2.9.1) and its fast and it works great! I
have a question in connection with Apache PDFBox.
The following command creates a Lucent Document from a PDF file:
Document document =
org.apache.pdfbox.searchengine.lucene.LucenePDFD
(10/09/22 3:24), Devshree Sane wrote:
I am using the FastVectorHighlighter for retrieving snippets from the index.
I am a bit confused about the parameters that are passed to the
FastVectorHighlighter.getBestFragments() method. One parameter is a document
id and another is the maximum number o
Hello,
I'd like to know which field got hit in each doc in the hit results.
To implement it, I thought I could use Scorer.freq() which
was introduced 3.1/4.0:
https://issues.apache.org/jira/browse/LUCENE-2590
But I didn't become successful so far. What I did is:
- in each visit methods in MockS
Hi Mike,
Hmm are you only gathering the MUST_NOT TermScorers? (In which case
I'd expect that the .docID() would not match the docID being
collected). Or do you also see .docID() not matching for SHOULD and
MUST sub queries?
The snippet I copy-n-paste at previous mail was not appropriate.
Sor
(11/01/20 22:19), Paul Taylor wrote:
Trying to extend MappingCharFilter so that it only changes a token if the
length of the token
matches the length of singleMatch in NormalizeCharMap (currently the
singleMatch just has to be
found in the token I want ut to match the whole token). Can this be
(11/01/25 2:14), Paul Taylor wrote:
On 22/01/2011 15:43, Koji Sekiguchi wrote:
(11/01/20 22:19), Paul Taylor wrote:
Trying to extend MappingCharFilter so that it only changes a token if the
length of the token
matches the length of singleMatch in NormalizeCharMap (currently the
singleMatch
(11/03/07 1:16), Joel Halbert wrote:
Hi,
I'm using FastVectorHighlighter for highlighting, 3.0.3.
At the moment this is highlighting a field which is stored, but not
compressed. It all works perfectly.
I'd like to compress the field that is being highlighted, but it seems
like the new way to c
Hello,
Does IndexWriter (or somewhere else) have the method such that
it gets the number of updated documents before commit?
I have an optimized index and I'm using iw.updateDocument(Term,Document)
with the index, and before commit, I'd like to know the number of updated
documents from IndexWrite
Does IndexWriter (or somewhere else) have the method such that
it gets the number of updated documents before commit?
you have maxDocs which gives you the maxdocid-1 but this might not be
super accurate since there might have been merges going on in the
background. I am not sure if this number yo
(11/03/19 6:16), madhuri_1...@yahoo.com wrote:
Hi,
I am new to lucene ... I have a question while implementing similarity search
using MoreLikeThis query. I have written a small program but it is not giving
any results. In my index file I have both strored and unstored(analyzed) fields.
Sampl
(11/04/01 21:32), shrinath.m wrote:
I was wondering whats the difference between the Lucene's 2 implementation of
highlighters...
I saw the javadoc of FVH, but it only says "another implementation of Lucene
Highlighter" ...
Description section in the javadoc shows the features of FVH:
https://
(11/04/06 14:01), shrinath.m wrote:
If there is a phrase in search, the highlighter highlights every word
separately..
Like this :
I love Lucene
Instead what I want is like this :
I love Lucene
Not sure my mailer problem or not, I don't see the difference between above two.
But reading t
(11/03/01 21:16), Amel Fraisse wrote:
Hello,
The MoreLikeThisHandler could include higlighting ?
Is it true to define a MoreLikeThisHandler like this: ?
true
contenu
Thank you for your help.
Amel.
Amel,
1. I think you shou
(11/05/23 14:36), Weiwei Wang wrote:
> 1. source string: 7
> 2. WhitespaceTokenizer + EGramTokenFilter
> 3. FastVectorHighlighter,
> 4. debug info: subInfos=(777((8,11))777((5,8))777((2,5)))/3.0(2,102),
> srcIndex is not correctly computed for the second loop of the outer for-loop
>
How
(11/05/24 3:28), Sujit Pal wrote:
> Hello,
>
> My version: Lucene 3.1.0
>
> I've had to customize the snippet for highlighting based on our
> application requirements. Specifically, instead of the snippet being a
> set of relevant fragments in the text, I need it to be the first
> sentence where
(11/05/27 20:56), Pierre GOSSE wrote:
Hi,
Maybe is it related to :
https://issues.apache.org/jira/browse/LUCENE-3087
No, because Joel's problem is FastVectorHighlighter, but LUCENE-3087
is for Highlighter.
koji
--
http://www.rondhuit.com/en/
--
(11/05/27 19:57), Joel Halbert wrote:
Hi,
I'm using Lucene 3.0.3. I'm extracting snippets using
FastVectorHighlighter, for some snippets (I think always when searching
for exact matches, quoted) the fragment is null.
Code looks like:
query = QueryParser.escape(query);
Mike,
FVH used to be faster for large docs. I wrote FVH section for Lucene in Action
and it said:
In contrib/benchmark (covered in appendix C), there’s an algorithm
file called highlight-vs-vector-highlight.alg that lets you see the difference
between two highlighters in processing time. As of
(11/06/22 2:03), Anupam Tangri wrote:
Hi,
We are using lucene 3.2 for our project where I needed to highlight search
matches. I earlier used default highlighter which did not work correctly
all the time.
So, I started using FHV which worked worked beautifully till I started
searching multiple t
A user here hit the exception the title says when optimizing. They're using
Solr 1.4
(Lucene 2.9) running on a server that mounts NFS for index.
I think I know the famous "Stale NFS File Handle IOException" problem, but I
think it causes
FileNoutFoundException. Is there any chance to hit the exc
e is very close (like off by just 1 byte or 8
bytes or something).
I think they are using 1.6, but I should ask the minor number.
Could you show me the pointer of the JRE bug you mentioned?
Thank you very much!
koji
Mike McCandless
http://blog.mikemccandless.com
2011/9/9 Koji Sekiguchi:
Also: what java version are they running? We added this check
originally as a workaround for a JRE bug... but usually when that bug
strikes the file size is very close (like off by just 1 byte or 8
bytes or something).
I think they are using 1.6, but I should ask the minor number.
Could you show
re they running? We added this check
originally as a workaround for a JRE bug... but usually when that bug
strikes the file size is very close (like off by just 1 byte or 8
bytes or something).
Mike McCandless
http://blog.mikemccandless.com
2011/9/9 Koji Sekiguchi:
A user here hit the exception th
(13/10/07 18:33), VIGNESH S wrote:
Hi,
How to implement synonym Search for All languages..
As far as i know,Wordnet has only English Support..Is there any other we
can use to get support for all languages.
I think most people make synonym data manually...
I've never explored Wordnet, but I t
wikipedia is giving for all
languages.
Please kindly help.
On Mon, Oct 7, 2013 at 8:06 PM, Koji Sekiguchi wrote:
(13/10/07 18:33), VIGNESH S wrote:
Hi,
How to implement synonym Search for All languages..
As far as i know,Wordnet has only English Support..Is there any other we
can use to get
x for only English..
I need to create Dictionary Index for all languages.I want to know whether
anything like wordnet which i can readily plugin in my application ..
Please Kindly Guide me..
Thanks and Regards
Vignesh Srinivasan.
On Wed, Oct 9, 2013 at 5:56 PM, Koji Sekiguchi wrote:
Hi VIGNESH,
(13/11/27 9:19), Scott Smith wrote:
I'm doing some highlighting with the following code fragment:
formatter = new SimpleHTMLFormatter(,
);
Scorer score = new QueryScorer(myQuery);
ht = new Highlighter(formatter, score);
ht.
Hi Russell,
Seems that the error messages says that the implementing class for
OffsetAttribute
cannot be found in your classpath on the (Pig?) environment.
There seems to be implementing classes OffsetAttributeImpl and Token, according
to Javadoc:
http://lucene.apache.org/core/4_6_0/core/org/a
Hello,
I just posted an article on Comparing Document Classification Functions
of Lucene and Mahout.
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
Comments are welcome. :)
Thanks!
koji
--
http://soleami.com/blog/comparing-document-classification
ili wrote:
cool Koji, thanks a lot for sharing.
Some useful points / suggestions come out of it, let's see if we can follow
up :)
Regards,
Tommaso
2014-03-07 3:30 GMT+01:00 Koji Sekiguchi :
Hello,
I just posted an article on Comparing Document Classification Functions
of Lucene and Mahout.
Hi Priyanka,
> How can I add Maching Learning Part in Apache Lucene .
I think your question is too wide to asnwer because machine learning
covers a lot of things...
Lucene has already got a text categorization function which is a well
known task of NLP and NLP is a part of machine learning. I'v
Hi Priyanka,
> How can I add Maching Learning Part in Apache Lucene .
I think your question is too wide to asnwer because machine learning
covers a lot of things...
Lucene has already got a text categorization function which is a well
known task of NLP and NLP is a part of machine learning. I'v
Hi Michael,
I haven't executed this yet, but can you try this:
SpanNotQuery(SpanNearQuery("George Washington"), SpanNearQuery("George Washington
Carver"))
Koji
--
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
(2014/07/11 23:20), Michael Ryan wro
Hello,
It's my pleasure to share that I have an interesting tool "word2vec for Lucene"
available at https://github.com/kojisekig/word2vec-lucene .
As you can imagine, you can use "word2vec for Lucene" to extract word vectors
from Lucene index.
Thank you,
Koji
--
http://soleami.com/blog/compar
Rome'), and vector('king') - vector('man') + vector('woman')
is close to
vector('queen')
Thanks,
Koji
(2014/11/20 20:01), Paul Libbrecht wrote:
> Hello Koji,
>
> how would you compare that to SemanticVectors?
>
> paul
>
> On
At least I see more transparent math in the web-page.
> Maybe this helps a bit?
>
> SemanticVectors has always rather pleasant for the LSI/LSA-like approach, but
> precisely this is mathematically opaque.
> Maybe it's more a question of presentation.
>
> Paul
>
>
ays rather pleasant for the LSI/LSA-like approach,
but precisely this is mathematically opaque.
Maybe it's more a question of presentation.
Paul
On 20 nov. 2014, at 16:24, Koji Sekiguchi wrote:
Hi Paul,
I cannot compare it to SemanticVectors as I don't know SemanticVectors.
But w
Hi Tomoko,
Please don't hesitate to open a JIRA issue and give your patch to fix
the error you found.
Koji
--
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
(2014/12/14 11:11), Tomoko Uchida wrote:
Sorry again,
I checked the o.a.l.u.fst.TestFSTs.j
Hello,
Doesn't Lucene have a Tokenizer/Analyzer for Brown Corpus?
There doesn't seem to be such tokenizers/analyzers in Lucene.
As I didn't want re-inventing the wheel, so I googled, I got
the list of snippets that include "the quick brown fox..." :)
Koji
---
hub.com/INL/BlackLab/wiki/Blacklab-query-tool
-- Jack Krupansky
On Tue, Feb 24, 2015 at 1:40 AM, Koji Sekiguchi
wrote:
Hello,
Doesn't Lucene have a Tokenizer/Analyzer for Brown Corpus?
There doesn't seem to be such tokenizers/analyzers in Lucene.
As I didn't want re-inventing th
Hi Prateek,
Using Luke, which is a GUI based browser tool for Lucene index, may be a good
start
to see the structure of Lucene index for you.
https://github.com/DmitryKey/luke/
NLP4L also provides CUI based index browser for Lucene users aside from NLP
functions.
https://github.com/NLP4L/nlp
Hi Clemens,
NLP4L, which stands for Natural Language Processing for Lucene, has a function
for browsing Lucene index aside from NLP tools. It supports 5.x index format.
https://github.com/NLP4L/nlp4l#using-lucene-index-browser
Thanks,
Koji
On 2015/04/24 15:10, Clemens Wyss DEV wrote:
From ti
Hi ajinkya,
In last week, I had a technical talk about NLP4L at Lucene/Solr meetup:
http://www.meetup.com/Downtown-SF-Apache-Lucene-Solr-Meetup/events/223899054/
In my talk, I told about the implementation idea of Learning to Rank using
Lucene.
Please take a look at page 48 to 50 of the follow
Hi Taher,
Solr has the function of result grouping.
I think it has two steps. First, it tries to find how many groups are there in
the result
and choose top groups (say 10 groups) using a priority queue. Second, provide
10 priority
queues for each groups and search again to collect second or a
Hello everyone!
I've developed KEA-lucene [1]. It is an Apache Lucene implementation of KEA [2].
KEA is a program developed by the University of Waikato in New Zealand that automatically extracts
key phrases (keywords) from natural language documents. KEA stands for Keyphrase Extraction
Algo
Hi Chitra,
Without having the knowledge of the language, but can you solve the problem not in TokenFilter level
but in CharFilter level, by setting your own mapping definition using MappingCharFilter?
Koji
On 2017/09/27 21:39, Chitra wrote:
Hi Ahmet,
Thank you so much
Hi Kurosaka-san,
I'd written an article on my blog several month ago about SinkTokenizer
and TeeTokenFilter.
See:
http://lucene.jugem.jp/?eid=172
Sorry, but all written in Japanese...
Koji
Teruhiko Kurosaka wrote:
> Hello,
> I'm interested in knowing how these tokenizers work together.
> The
Uwe Goetzke wrote:
> Use ISOLatin1AccentFilter, although it is not perfect...
> So I made ISOLatin2AccentFilter for me and changed this method.
Or use CharFilter library. It is for Solr as of now, though.
See:
https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
Sascha Fahl wrote:
Where do I get the CharFilter library? I'm using Lucene, not Solr.
Thanks,
Sascha
CharFilter is included in recent Solr nightly build.
It is not OOTB solution for Lucene now, sorry.
If I have time, I will make it for Lucene in this weekend.
Koji
--
> > Where do I get the CharFilter library? I'm using Lucene, not Solr.
> >
> > Thanks,
> > Sascha
> CharFilter is included in recent Solr nightly build.
> It is not OOTB solution for Lucene now, sorry.
> If I have time, I will make it for Lucene in this weekend.
Now the patch available for Lucene
Hello,
I have a problem when using n-gram and highlighter.
I thought it had been solved on the ticket:
http://issues.apache.org/jira/browse/LUCENE-627
Actually, I found this problem when I was using CJKTokenizer
on Solr, though, here is lucene program to reproduce it
using NGramTokenizer(min=2,m
See Sort class javadoc:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/search/Sort.html
It says:
The fields used to determine sort order must be carefully chosen.
Documents must contain a single term in such a field, and the value of
the term should indicate the
That's correct!
Koji
장용석 wrote:
> Thanks for your advice.
>
> If I want to sort some field (for example name is "TITLE") and It must be
> Analyzed.
>
> Then Do I have to make two field that one is ANALYZED and the other is
> NOT_ANALYZED like this?
>
> document.add(new Field("TITLE", value, Fiel
There is an API for it:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/index/IndexReader.html#document(int,%20org.apache.lucene.document.FieldSelector)
"Get the Document at the nth position. The FieldSelector may be used to
determine what Fields to load and how t
Hello,
I'm writing a highlighter by using term offsets info (yes, I borrowed
the idea
of LUCENE-644). In my highlighter, I'm seeing unexpected term offsets info
when getting multi-valued field.
For example, if I indexed [" "," bbb "] (multi-valued), I got term info
bbb(7,10). This is expected
ll be more luckily suggesting this this time.
Have you tried http://issues.apache.org/jira/browse/LUCENE-1448 yet? I am
not sure if its in an applyable state, but I hope that covers your issue.
On Fri, Jan 16, 2009 at 7:15 PM, Koji Sekiguchi wrote:
Hello,
I'm writing a highlighter by using te
Ganesh,
There is Commons Daemon project:
http://commons.apache.org/daemon/
I'm not sure it works for you (RMI on Windows) though, but please check.
Koji
Ganesh wrote:
Thanks.
http://wrapper.tanukisoftware.org/doc/english/download.jsp is free to use in
open source projects. It requires lice
Hello,
I have a requirement to search English words with taking into account
conjugation of verbs and comparative and superlative of adjectives.
I googled but couldn't find solution so far. Do I have to have a synonym
table
to solve this problem or is there someone who have good solution in this
l
o investigate the stemmers would that work? I
confess that I've never examined the output in detail, but
they might help.
I don't know of any synonym lists offhand, but then again I haven't
looked.
Best
er...@miminallyhelpful.com
On Mon, Jan 26, 2009 at 8:51 AM, Koji Sekiguchi wrot
Seid Mohammed wrote:
great,
I have got it
do luke support unicode? I am trying lucene in non-english languaguage
Of course. I can see Japanese terms without problems.
Koji
-
To unsubscribe, e-mail: java-user-unsubscr...@
There is no additional setting for me...
Koji
Seid Mohammed wrote:
I have trioed Amharic fonts, it displays square like character, may be
there is a kind of setting for it?
Seid M
On 2/19/09, Koji Sekiguchi wrote:
Seid Mohammed wrote:
great,
I have got it
do luke support unicode
> first, I rewrite the Similarity(include lengthNorm), but it not
works..., so I modify the lucene source, by set the norm_table =
1.0(all). it can work
If you overrides lengthNorm(), reindexing is needed to take effect.
Koji
Seid Mohammed wrote:
Hi All
I want my lucene to index documents and making some terms to have more
boost value.
so, if I index the document "The quick fox jumps over the lazy dog"
and I want the term fox and dog to have greater boost value.
How can I do that
Thanks a lot
seid M
How about
. :)
Program snippets are there regarding Payload/BoostTermQuery/scorePayload().
Koji
On 3/24/09, Koji Sekiguchi wrote:
Seid Mohammed wrote:
Hi All
I want my lucene to index documents and making some terms to have more
boost value.
so, if I index the document "The quick fox jumps ove
This problem is filed at:
https://issues.apache.org/jira/browse/LUCENE-1489
You may want to take a look at LUCENE-1522 for highlighting N-gram tokens:
https://issues.apache.org/jira/browse/LUCENE-1522
Koji
ito hayato wrote:
> Hi All,
> My name is Hayato.
>
> I have a question for Highlighter
Ariel wrote:
Hi everybody:
I would want to know how Can I make an analyzer that ignore the numbers o
the texts like the stop words are ignored ??? For example that the terms :
3.8, 100, 4.15, 4,33 don't be added to the index.
How can I do that ???
Regards
Ariel
There is a patch for filter
If you omit norms when indexing the name field, you'll get same score back.
Koji
The Seer wrote:
Hello,
I have 5 lucene documents
name: Apple
name: Apple martini
name: Apple drink
name: Apple sweet drink
I am using lucene default similarity and standard analyzer .
When I am searching for
Steven A Rowe wrote:
Hi Ariel,
As Koji mentioned, https://issues.apache.org/jira/browse/SOLR-448 contains a NumberFilter. It
filters out tokens that successfully parse as Doubles. I'm not sure, since the examples you gave
seem to use "," as the decimal character, how this interacts with the
John Seer wrote:
Koji Sekiguchi-2 wrote:
If you omit norms when indexing the name field, you'll get same score
back.
Koji
During building I set omit norms, but result doesn't change at all. I am
still getting the same score
I meant if you set nameField.setOmitN
Dan OConnor wrote:
Thanks for the feed back Chris.
Can you (or someone else on the list) tell me about the IndexMerge tool?
Please see:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/misc/IndexMergeTool.html
Koji
-
John Seer wrote:
Hello,
There is any way that a single document fields can have different analyzers
for different fields?
I think one way of doing it to create custom analyzer which will do field
spastic analyzes..
Any other suggestions?
There is PerFieldAnalyzerWrapper
http://hudson.z
Another possible factor, if you are using omitTf feature, it causes
phrase query doesn't work.
Koji
Ian Lea wrote:
What does query.toString() say? Are you using standard analyzers with
standard lowercasing, stop words etc?
Knocking up a very simple program/index that demonstrates the problem
I'm not sure this is the same case, but there is a report and patch for
CJKTokenizer in JARA:
https://issues.apache.org/jira/browse/LUCENE-973
Koji
Zhang, Lisheng wrote:
Hi,
When I use lucene 2.4.1 QueryParser with CJKAnalyzer, somehow
it always generates an extra space, for example, if the
CHANGES.txt said that we can use HitCollectorWrapper:
12. LUCENE-1575: HitCollector is now deprecated in favor of a new
Collector abstract class. For easy migration, people can use
HitCollectorWrapper which translates (wraps) HitCollector into
Collector.
But it looks package private?
Thank you,
tsuraan wrote:
Make that "Collector" (new as of 2.9).
HitCollector is the old (deprecated as of 2.9) way, which always
pre-computed the score of each hit and passed the score to the collect
method.
Where can I find docs for 2.9? Do I just have to check out the lucene
trunk and run javado
Hello,
This problem was reported by my customer. They are using Solr 1.3
and uni-gram, but it can be reproduced with Lucene 2.9 and
WhitespaceAnalyzer.
The program for reproducing is at the end of this mail.
Query:
(f1:"a b c d" OR f2:"a b c d") AND (f1:"b c g" OR f2:"b c g")
The snippet we expe
. Thanks a lot for
> the test case - made this one fun.
>
> - Mark
>
> Koji Sekiguchi wrote:
>
>> Hello,
>>
>> This problem was reported by my customer. They are using Solr 1.3
>> and uni-gram, but it can be reproduced with Lucene 2.9 and
>> White
Hi Ryan,
I've looked for it when I implemented SOLR-64 patch, but not there.
So I implemented HierarchicalTokenFilterFactory.
I've not looked into your patch yet, but my impression is that probably
we can share such TokenFilter.
Thanks,
Koji
Ryan McKinley wrote:
Hello-
I'm looking for a way
Hi Paul,
CharFilter should work for this case. How about this?
public class MappingAnd {
static final String[] DOCS = {
"R&B", "H&M", "Hennes & Mauritz", "cheeseburger and french fries"
};
static final String F = "f";
static Directory dir = new RAMDirectory();
static Analyzer analyzer =
Or you can use MappingCharFilter if you are using Lucene 2.9.
You can convert "c++" into "cplusplus" prior to running Tokenizer.
Koji
--
http://www.rondhuit.com/en/
Ian Lea wrote:
You need to make sure that these terms are getting indexed, by using
an analyzer that won't drop them and using
MappingCharFilter can be used to convert c++ to cplusplus.
Koji
--
http://www.rondhuit.com/en/
Anshum wrote:
How about getting the original token stream and then converting c++ to
cplusplus or anyother such transform. Or perhaps you might look at
using/extending(in the non java sense) some ot
Paul Taylor wrote:
I want my search to treat 'No. 1' and 'No.1' the same, because in our
context its one token I want 'No. 1' to become 'No.1', I need to do
this before tokenizing because the tokenizer would split one value
into two terms and one into just one term. I already use a
NormalizeM
Koji Sekiguchi wrote:
Paul Taylor wrote:
I want my search to treat 'No. 1' and 'No.1' the same, because in our
context its one token I want 'No. 1' to become 'No.1', I need to do
this before tokenizing because the tokenizer would split one value
into
Weiwei Wang wrote:
The offset is incorrect for PatternReplaceCharFilter so the hilighting
result is wrong.
How to fix it?
As I noted in the comment of the source, if you produce a phrase from a term
and try to highlight a term in the produced phrase, the highlighted snippet
will be undesira
Weiwei Wang wrote:
Hi, all
I currently need a TokenFilter to break token season07 into two tokens
season 07
I'd recommend you to refer WordDelimiterFilter in Solr.
Koji
--
http://www.rondhuit.com/en/
-
To unsubscri
Paul Taylor wrote:
CharStream.Found it at
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/PatternReplaceFilter.java?revision=804726&view=markup,
BTW why not ad this to the Lucene coebase rather than solr code base.
Unfortunately it doesn't address my problem be
Marc Sturlese wrote:
I have FastVectorHighlighter working with a query like:
title:Ipod OR title:IPad
but it's not working when (0 snippets are returned):
title:Ipod OR content:IPad
This is true when you are going to highlight IPad in title field and
set fieldMatch to true at the FVH constr
halbtuerderschwarze wrote:
query.rewrite() didn't help, for queries like ipod* or *ipod I still didn't
get fragments.
Arne
You're right. This is still an open issue:
https://issues.apache.org/jira/browse/LUCENE-1889
Koji
--
http://www.rondhuit.com/en/
--
1 - 100 of 179 matches
Mail list logo