Hi Egorlex,
Shingle filter won't turn "similarissues" into "similar issues". But it can do
the reverse.
It is like a sliding window. Think about what indexed tokens would be if you
set token separator to ""
Ahmet
On Wednesday, June 20, 2018, 12:42:22 PM GMT+3, egorlex
wrote:
Tha
Hi Egorlex,
ShingleFilter could be used to achieve your goal.
Ahmet
On Tuesday, June 19, 2018, 8:06:46 PM GMT+3, egorlex wrote:
Hi,
I need help with Lucene.
How a can realize same search result for worlds with and without spaces.
For example request "similar issues" and "similari
Hi,
string_ci type could be constructed from: keyword tokenizer + lowercase filter
+ may be trim filter.
Ahmet
On Friday, May 25, 2018, 1:50:19 PM GMT+3, Chellasamy G
wrote:
Hi Team,
Kindly help me out with this problem.
Thanks,
Satyan
On Wed, 23 May 2018 15:01:3
Hi Roy,
In order to activate payloads during scoring, you need to do two separate
things at the same time:
* use a payload aware query type: org.apache.lucene.queries.payloads.*
* use payload aware similarity
Here is an old post that might inspire you :
https://lucidworks.com/2009/08/05/get
Hi,
I am also intersted into the answer to this question.
I wonder whether term freq. function query would work here.
Ahmet
On Friday, November 17, 2017, 10:32:23 AM GMT+3, Dwaipayan Roy
wrote:
Hi,
I want to get the term frequency of a given term t in a given document with
lucene
Hi Nicolas,
With SpanQuery family, it is possible to retrieve spans (index/position
information)
Also, you may find luwak relevant.
https://github.com/flaxsearch/luwak
Ahmet
On Sunday, October 22, 2017, 1:16:01 AM GMT+3, Nicolas Paris
wrote:
Hi
I am looking for a way to get
accent
characters and it supports only Latin like accent characters. Am I missing
anything?
Chitra
On Wed, Sep 27, 2017 at 5:47 PM, Ahmet Arslan wrote:
Hi,
Yes ICUFoldingFilter or ASCIIFoldingFilter could be used.
ahmet
On Wednesday, September 27, 2017, 1:54:43 PM GMT+3, Chitra
Hi,
Yes ICUFoldingFilter or ASCIIFoldingFilter could be used.
ahmet
On Wednesday, September 27, 2017, 1:54:43 PM GMT+3, Chitra
wrote:
Hi,
In Lucene, I want to search greek characters(with accent
insensitive) by removing or replacing accent marks with similar charact
at 7:57 AM, Ahmet Arslan
wrote:
> Hi Jean,
>
> I am also interested answers to this question. I need this feature too.
> Currently I am using a hack.
> I create an artificial field (with an artificial token) attached to every
> document.
>
> I traverse all documents using t
Hi Jean,
I am also interested answers to this question. I need this feature too.
Currently I am using a hack.
I create an artificial field (with an artificial token) attached to every
document.
I traverse all documents using the code snippet given in my previous related
question. (no one answ
:58:25 PM GMT+3, Adrien Grand
wrote:
FILTER does the opposite of MUST_NOT.
Regarding scoring, putting the query in a FILTER or MUST_NOT clause is good
enough since such clauses do not need scores. You do not need to add an
additional ConstantScoreQuery wrapper.
Le mar. 8 août 2017 à 23:06, Ahmet
Hi all,
I am trying to access document lenght statistics of the documents that do not
contain a given term.
I have written following piece of code
BooleanQuery.Builder builder = new BooleanQuery.Builder();builder.add(new
MatchAllDocsQuery(), BooleanClause.Occur.MUST).add(new TermQuery(te
How about Solr's exists function query? How does it work?function queries are
now part of Lucene (org.apache.lucene.queries.function.) right?
Ahmet
On Sunday, July 16, 2017, 11:19:40 AM GMT+3, Trejkaz
wrote:
On Sat, Jul 15, 2017 at 8:12 PM, Uwe Schindler wrote:
> That is the "Solr" answer.
Hi,
I am traversing posting list of a given term/word using the following code. I
am accessing/processing term frequency and document length.
Term term = new Term(field, word);
PostingsEnum postingsEnum = MultiFields.getTermDocsEnum(reader, field,
term.bytes());
if (postingsEnum == null) return
Hi,
As an alternative, function queries can also be used.exists function may be
more intuitive.
q={!func}(not(exists(field3))
On Saturday, July 15, 2017, 1:01:04 PM GMT+3, Rajnish kamboj
wrote:
Ok, I will check.
On Sat, 15 Jul 2017 at 3:26 PM, Ahmet Arslan wrote:
> Hi,
>
> Yes, h
Hi,
Yes, here it is: q=+*:* -field3:[* TO *]
Ahmet
On Saturday, July 15, 2017, 8:16:00 AM GMT+3, Rajnish kamboj
wrote:
Hi
Does Lucene provide any API to fetch documents for which a field is not
defined.
Example
Document1 : field1=value1, field2=value2,field3=value3
Document2 : field1=value4,
Hi,
You can completely ban within-a-word search by simply using WhitespaceTokenizer
for example.By the way, it is all about how you tokenize/analyze your text.
Once you decided, you can create a two versions of a single field using
different analysers.This allows you to assign different weights
Hi,
LimitTokenCountFilter is used to index first n tokens. May be it can inspire
you.
Ahmet
On Friday, April 21, 2017, 6:20:11 PM GMT+3, Edoardo Causarano
wrote:
Hi all.
I’m relatively new to Lucene, so I have a couple questions about writing custom
filters.
The way I understand it, one woul
Hi Jean,
How about LukeRequest handler? Many of the information displayed on the admin
screen comes from it.https://wiki.apache.org/solr/LukeRequestHandler
Ahmet
On Sunday, April 9, 2017, 2:21:38 AM GMT+3, Jean-Claude Dauphin
wrote:
Hello,
I need to check the index last modification date to c
Hi,
May be look at the factory class to see how types argument is handled?
Ahmet
On Friday, March 17, 2017 11:05 PM, "pha...@mailbox.org"
wrote:
Hi,
I am trying to index words like 'e-mail' as 'email', 'e mail' and 'e-mail' with
Lucene 4.4.0.
Lucene's WordDelimiterFilter should be ide
Hi,
You can retrieve the list of field names using LukeRequestHandler.
Ahmet
On Friday, March 17, 2017 9:53 PM, Cristian Lorenzetto
wrote:
It permits to search in a predefined lists of fields that you have to know
in advance. In my case i dont know what is the fieldname.
maybe WildcardQuer
te.
I don't understand how "a customised word delimiter filter factory" works
in tokenizer.
2017-03-06 22:26 GMT+08:00 Ahmet Arslan :
> Hi Zhao,
>
> WhiteSpace tokeniser followed by a customised word delimiter filter
> factory would be solution.
> Please see types att
punctuation, but it only breaks word by
space.
I didn’t explain my requirement clearly.
I want to an analyzer like standard analyzer but may keep some punctuation
configured.
2017-03-06 18:03 GMT+08:00 Ahmet Arslan :
> Hi,
>
> Whitespace analyser/tokenizer for example.
>
> Ahmet
&g
Hi,
Whitespace analyser/tokenizer for example.
Ahmet
On Monday, March 6, 2017 10:21 AM, Yonghui Zhao wrote:
Lucene standard anlyzer will remove almost all punctuation.
In some cases, we want to keep some punctuation, for example in music
search, some singer name and album name could be a punc
ot;name");
System.out.print("size="+ terms.size());
}
}
///
I got this error:
numFound: 32
Exception in thread "main" java.lang.NullPointerException
at testPkg.App3.main(App3.java:30)
On 5 January 2017 at 18:25, Ahm
Hi,
I think you are missing the main query parameter? q=*:*
By the way you may get more response in the sole-user mailing list.
Ahmet
On Wednesday, January 4, 2017 4:59 PM, huda barakat
wrote:
Please help me with this:
I have this code which return term frequency from techproducts example:
Hi,
You can index whole address in a separate field.
Otherwise, how would you handle positions of the split tokens?
By the way, speed of phrase search may be just fine, so consider trying first.
Ahmet
On Tuesday, December 20, 2016 5:15 PM, suriya prakash
wrote:
Hi,
I am using standard anal
Hi Otmar,
A single term inside quotes is meaningless. A phrase query should have at least
two terms in it, shouldn't it?
What is your intention with a such "john*" query?
Ahmet
On Tuesday, December 20, 2016 4:56 PM, Otmar Caduff wrote:
Hi,
I have an index with a single document with a fi
How about keeping two indices: page index and document index.
Issue the query to the document index and list n documents.
For each document, list k pages fetched from page index.
Ahmet
On Saturday, November 26, 2016 12:16 PM, Joe MA wrote:
Greetings,
I am trying to use Lucene to search lar
discrimination power based in all the body text, not just the titles.
Because otherwise terms that are really not that relevant end up being
very high!
El 17/11/16 a las 18:25, Ahmet Arslan escribió:
> Hi Nicholas,
>
> IDF, among others, is a measure of term specificity. If 'or
Hi Nicholas,
IDF, among others, is a measure of term specificity. If 'or' is not so usual in
titles, then it has some discrimination power in that domain.
I think it's OK 'or' to get a high IDF value in this case.
Ahmet
On Thursday, November 17, 2016 9:09 PM, Nicolás Lichtmaier
wrote:
IDF
Hi,
Match all docs query minus Promotion.endDate:[* TO *]
+*:* -Promotion.endDate:[* TO *]
Ahmet
On Friday, November 11, 2016 5:59 PM, voidmind wrote:
Hi,
I have indexed content about Promotions with effectiveDate and endDate
fields for when the promotions start and end.
I want to query for
Hi Mossaab,
Probably due to the encodeNormValue/decodeNormValue transformation of the
document length.
Please see the aforementioned methods in BM25Similarity.java
Ahmet
On Wednesday, November 9, 2016 10:25 PM, Mossaab Bagdouri
wrote:
Hi,
On Lucene 6.2.1, I have the following explain ou
Hi,
I forgot to include : .addTokenFilter("asciifolding")
Ahmet
On Tuesday, October 11, 2016 5:37 PM, Ahmet Arslan wrote:
Hi Kumaran,
Writing a custom analyzer is easier than it seems.
Please see how I added kstem to classic analyzer:
return CustomAnalyzer.builder()
.withTokenize
Hi Kumaran,
Writing a custom analyzer is easier than it seems.
Please see how I added kstem to classic analyzer:
return CustomAnalyzer.builder()
.withTokenizer("classic")
.addTokenFilter("classic")
.addTokenFilter("lowercase")
.addTokenFilter("kstem")
.build();
Ahmet
On Tuesday, October 11,
Hi,
I thought the link/url below has the example code, no?
http://makble.com/what-is-term-vector-in-lucene
If not, in the source tree, under the tests folder, there should be some test
cases for termVectors, which can be used as en example code.
I guess internal lucene document id, which easy
Hi,
First you need to enable term vectors at index time.
Then you can access terms and their statistics in a document.
http://makble.com/what-is-term-vector-in-lucene
Ahmet
On Tuesday, September 13, 2016 11:53 AM, szzoli wrote:
Hi,
how can I use TermVectors ? I have read the API, but it is
Hi,
If you have some tool/mechanism to detect paragraph boundaries, yes it is
possible to search for a paragraph.
But Lucene it self cannot detect sentence/paragraph for you.
There are other libraries for this.
Ahmet
On Monday, September 12, 2016 1:06 PM, szzoli wrote:
Hi All,
Is it possibl
Hi,
TermVectors perhaps?
Ahmet
On Tuesday, September 6, 2016 4:21 PM, szzoli wrote:
Hi All,
How can I list all the terms from a document? I also need the counts of each
term per document.
I use Lucene 6.2. I found some solutions for older versions. These din't
work with 6.2
Thank you in ad
in byte format for less memory consumption. But while debugging, I
found that the doc length, that is passed in score() is 2621.44 where the
actual doc length is 2355.
I am confused. Please help.
On Fri, Jul 22, 2016 at 1:46 PM, Ahmet Arslan wrote:
> Hi Roy,
>
> It is about storing
Hi Roy,
It is about storing the document length into a byte (to use less memory).
Please edit the source code to avoid this encode/decode thing:
/**
* Encodes the document length in a lossless way
*/
@Override
public long computeNorm(FieldInvertState state) {
return state.getLength() - state.getN
Hi Andres,
While there can be other ways, in general term vectors are used to extract
"important terms" from top-k documents returned by the initial query.
Please see getTopTerms() method in
http://www.cortecostituzionale.it/documenti/news/advancedluceneeu_69.pdf
Ahmet
On Tuesday, June 28, 20
e a custom query parser if they want reasonable results?
- On Jun 24, 2016, at 12:25 PM, Ahmet Arslan
wrote:
> Hi Daniel,
> You can add optional clauses to your query for boosting purposes.
> for example,
> temperate OR climates OR "temperate climates"~5^100
>
Hi Daniel,
You can add optional clauses to your query for boosting purposes.
for example,
temperate OR climates OR "temperate climates"~5^100
ahmet
On Friday, June 24, 2016 5:07 PM, Daniel Bigham wrote:
Something significant that I've noticed about using the default Lucene
query parser is
other version of that analyzer.
Whenever any of those analyzer is changed, I will need to manually apply
the changes.
Isn't there a better way to do this?
El 23/06/2016 a las 20:28, Ahmet Arslan escribió:
> Hi,
>
> Zero or more CharFilter(s) is the way to manipulate text before the t
Hi,
Zero or more CharFilter(s) is the way to manipulate text before the tokenizer.
I think init reader is the method you want to plug char filters.
https://github.com/apache/lucene-solr/blob/master/lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java
Hi,
You can supply custom types.
please see WordDelimiterFilterFactory and wdfftypes.txt for an example.
ahmet
On Wednesday, June 15, 2016 10:32 PM, Xiaolong Zheng
wrote:
Hi,
How can I prevent WordDelimiterFilter tokenize the string with underscore,
e.g. word_with_underscore.
I am using Wo
Hi Singhal,
May be MemoryIndex or RAMDirectory?
Ahmet
On Saturday, May 21, 2016 1:42 PM, Prateek Singhal
wrote:
You can consider that I want to store the lucene index in some sort of
temporary memory or a HashMap so that I do not need to index the documents
every time as it is a costly opera
Hi Taher,
Please find and see QueryParser.jj file in the source tree.
You can find all operators such as && || AND OR !.
Ahmet
On Sunday, May 15, 2016 1:57 PM, Taher Galal wrote:
Hi All,
I was just checking the query grammer found in the java docs of the query
parser :
Query ::= ( Clause )
Hi Luis,
Thats an interesting question. Can you share your similarity?
I suspect you return 1 expect Similarity#coord method.
Not sure but, for phrase query, one may require to modify
ExactPhraseScorer/ExactPhraseScorer etc.
ahmet
On Thursday, May 12, 2016 5:41 AM, Luís Filipe Nassif
wrote:
Hi Daniel,
Since you are restricting inOrder=true and proximity=0 in the top level query,
there is no problem in your particular example.
If you weren't restricting, injecting synonyms with plain OR, sometimes cause
'query drift': injection/addition of one term changes result list drastically.
Hi,
MemoryIndex is used for that purpose.
Please see :
https://github.com/flaxsearch/luwak
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html
http://lucene.apache.org/core/6_0_0/memory/index.html?org/apache/lucene/index/memory/MemoryIndex.html
Ahmet
On Mo
s around
BlendedTermQuery. Just to help isolate the issues. Here's Lucene's tests
for BlendedTermQuery as a basis
https://github.com/apache/lucene-solr/blob/5e5fd662575105de88d8514b426bccdcb4c76948/lucene/core/src/test/org/apache/lucene/search/TestBlendedTermQuery.java
On Tue, Ap
Hi Again,
For those who are interested, I uploaded BM25's Term Frequency graph [0] for
some common and content-bearing words.
[0] http://2.1m.yt/PgUEcZ.png
Ahmet
On Tuesday, April 19, 2016 5:16 PM, Ahmet Arslan
wrote:
Hi Markus,
It is a known property of BM25. It produces neg
Hi Markus,
It is a known property of BM25. It produces negative scores for common terms.
Most of the term-weighting models are developed for indices in which stop words
are eliminated.
Therefore, most of the term-weighting models have problems scoring common terms.
By the way, DFI model does a
itting on dot,
> hyphen, and underscore, in addition to whitespace and other punctuation.
>
> Can you post some specific test cases you are concerned with? (You should
> always run some test cases.)
>
> -- Jack Krupansky
>
> On Tue, Apr 12, 2016 at 10:35 AM, Ahmet Arslan
> w
Hi Chamarty,
Well, there are a lot of options here.
1) Use LetterTokenizer
2) Use WordDelimeterFilter combined with WhiteSpaceTokenizer
3) Use MappingCharFilter to replace those characters with spaces
.
.
.
Ahmet
On Tuesday, April 12, 2016 3:58 PM, PrasannaKumar Chamarty
wrote:
Hi,
What
Hi,
If you are writing your queries programmatically, (without using a query
parser), nested proximity is possible with SpanQuery family. Actually there
exists surround query parser for this. Please see
o.a.lucene.queryparser.surround.parser.QueryParser
Proximity search uses position informati
Hi Otmar,
For this requirement, you need to create an additional field containing the
number of words/terms in the field.
For example.
field : blue pill
length = 2
query : if you take the blue pill
length : 6
Please see my previous responses on the same topic:
http://search-lucene.com/m/e
Hi,
I think, xml query parser examples [1] are the safest way to persist Lucene
queries.
[1]https://github.com/apache/lucene-solr/tree/master/lucene/queryparser/src/test/org/apache/lucene/queryparser/xml
Ahmet
On Friday, March 18, 2016 4:02 PM, "Bauer, Herbert S. (Scott)"
wrote:
Has anyone
Hi Dwaipayan,
Another way is to use KeywordMarkerFilter. Stemmer implementations respect this
attribute.
If you want to supply your own mappings, StemmerOverrideTokenFilter could be
used as well.
ahmet
On Monday, March 14, 2016 4:31 PM, Dwaipayan Roy
wrote:
I am using EnglishAnalyzer wi
Hi Yannick,
More like this (mlt) stuff does this already.
It extracts "interesting terms" from top N documents.
Don't remember but this feature may require "term vectors" to be stored.
Ahmet
On Wednesday, January 27, 2016 10:41 AM, Yannick Martel
wrote:
Le Tue, 15 Dec 2015 17:56:05 +0100,
Ya
Hi Daniel,
The exception you have posted is a parse exception.
Something occurs during querying. Not indexing.
There are some special characters that are part of query parsing syntax.
You need to escape them.
Ahmet
On Sunday, December 27, 2015 10:53 PM, Daniel Valdivia
wrote:
Hi
I'm tryi
Hi Shay,
I suggest you to extend o.a.l.search.similarities.SimilarityBase.
All you need to implement a score() method. After all fancy names (language
models, etc), a similarity is a function of seven salient statistics. It is
actually six: avgFieldLength can derived from other two (numberOfFiel
Hi,
Yes, TextField includes positions.
Ahmet
On Friday, December 11, 2015 5:40 PM, Douglas Kunzma
wrote:
All -
I'm using a TextField and a BufferedReader to add text to a Lucene Document
object.
Can I still get all of the matches in a Document including the position
information and start an
Hi,
May be windows path separator messing things.
Can you try to copy jars to current working directory and re-try
java -classpath lucene-demo-5.3.1.jar;lucene-core-5.3.1.jar
Ahmet
On Thursday, December 3, 2015 11:57 PM, jerrittpace
wrote:
I am trying to set the classpath for the lucene jars
Hi Zong,
I don't think Lucene has this. People usually needs all candidate documents to
be scored.
They sometimes sort by price, popularity, etc, sometimes combined with document
relevancy scores.
However, with time limited collector, closest thing could be:
https://issues.apache.org/jira/br
w can I pass query length(maxOverlap/maxCoord) inside the
Similarity.SimScorer#score method?
Any help on this is really appreciated.
Thanks,
Ahmet
On Tuesday, October 27, 2015 10:27 AM, Ahmet Arslan wrote:
Hi,
How can I access length of the query (number of words in the query) ins
Hi,
How can I access length of the query (number of words in the query) inside a
SimilarityBase implementation?
P.S. I am implementing multi-aspect TF [1] for an experimental study.
So it does not have to be fast/optimized as production code.
[1] http://dl.acm.org/citation.cfm?doid=2484028.2484
Hi Uwe,
What is the meaning of "the Unicode Policeman" ?
Thanks,
Ahmet
On Thursday, October 22, 2015 2:59 PM, Uwe Schindler wrote:
Hi,
> >> Setting aside the fact that Character.toLowerCase is already dubious
> >> in some locales (e.g. Turkish),
> >
> > This is not true. Character.toLower
Hi Ajinkya,
I don't think there exists any production-ready LtR-Lucene/Solr setup.
LtR simply re-rank top N (typically 1000) documents.
Fetching top N documents is what we do today with Lucene.
There is an API for re-rank in Lucene/Solr but no LtR support yet.
https://cwiki.apache.org/confluenc
Hi,
Why don't you create your query with API?
Term term = new Term("B", "1 2");
Query query = new TermQuery(term);
Ahmet
On Friday, June 19, 2015 9:31 AM, Gimantha Bandara wrote:
Correction..
second time I used the following code to test. Then I got the above
IllegalStateException issue.
w
tates"
(two terms) or "free speech zones" (three terms).
Shay
On Mon, Jun 15, 2015 at 4:55 PM Ahmet Arslan
wrote:
> Hi Hummel,
>
> regarding df,
>
> Term term = new Term(field, word);
> TermStatistics termStatistics = searcher.termStatistics(term,
> Te
Hi Hummel,
regarding df,
Term term = new Term(field, word);
TermStatistics termStatistics = searcher.termStatistics(term,
TermContext.build(reader.getContext(), term));
System.out.println(query + "\t totalTermFreq \t " +
termStatistics.totalTermFreq());
System.out.println(query + "\t docFreq \t
re if collectors could
easily have the same performance without them.
To me, such scores seem always undesirable and only bugs, and the
current assertions are a good tradeoff.
On Fri, May 29, 2015 at 8:18 AM, Ahmet Arslan wrote:
> Hello List,
>
> When a similarity returns NEGATIVE_INFINIT
Hello List,
When a similarity returns NEGATIVE_INFINITY, hits[i].doc becomes 2147483647.
Thus, exception is thrown in the following code:
for (int i = 0; i < hits.length; i++) {
int docId = hits[i].doc;
Document doc = searcher.doc(docId);
}
I know it is an awkward to return infinity (comes from
Hi,
I have a number of similarity implementation that extends SimilarityBase.
I need to learn which term I am scoring inside the method :
abstract float score(BasicStats stats, float freq, float docLen);
What is the easiest way to access the query term that I am scoring in
similarity class?
Th
Hello All,
I am traversing posting list of a single term by following code. (not sure if
there is a better way)
Now I need to handle/aggregate multiple terms. Traverse intersection of
multiple posting lists and obtain summed freq() of multiple terms per document.
What is the easiest way to obta
Hi,
May be LUCENE-5317 relevant?
Ahmet
On Thursday, April 23, 2015 8:33 PM, Shashidhar Rao
wrote:
Hi,
I have a large text and from that I need to calculated the top frequencies
of words ,
say 'Driving' occurs the most.
Now , I need to find phrase containing 'Driving' in the given text and th
Hi Lisa,
I think AnalyzerWrapper
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/AnalyzerWrapper.html
Ahmet
On Sunday, April 19, 2015 1:37 PM, Lisa Ziri wrote:
Hi,
I'm upgrading to lucene 5.1.0 from lucene 4.
In our index we have documents in different languages which are
ed, Apr 15, 2015 at 3:50 AM Ahmet Arslan
wrote:
> Hi Hummel,
>
> You can perform sentence detection outside of the solr, using opennlp for
> instance, and then feed them to solr.
>
> https://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.sentdetect
&g
Hi Hummel,
You can perform sentence detection outside of the solr, using opennlp for
instance, and then feed them to solr.
https://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.sentdetect
Ahmet
On Tuesday, April 14, 2015 8:12 PM, Shay Hummel wrote:
Hi
I would l
Hi Spyros,
Not 100% sure but I think you should override reset method.
@Override
public void reset() throws IOException {
super.reset();
cachedInput = null;
}
Ahmet
On Monday, March 23, 2015 1:29 PM, Spyros Kapnissis
wrote:
Hello,
We have a couple of custom token filters that use CachingTo
Hi Gimanta,
Not sure about the lucene internals, but here are some pointers :
http://find.searchhub.org/document/a81b4c9af49c3d0f
http://find.searchhub.org/?q=contribute#%2Fp%3Alucene%2Fs%3Aemail
Ahmet
On Thursday, March 19, 2015 3:58 PM, Gimantha Bandara wrote:
Any clue on where to start
s full float precision, but scoring being
>>> fuzzy anyway this would multiply your memory needs for norms by 4
>>> while not really improving the quality of the scores of your
>>> documents. This precision loss is the right trade-off for most
>>> use-cases.
&g
Hi Adrien,
I read somewhere that norms are stored using docValues.
In my understanding, docvalues can store lossless float values.
So the question is, why are still several decode/encode methods exist in
similarity implementations?
Intuitively switching to docvalues for norms should prevent prec
ll compute length of fields by myself.
Thanks,
Ahmet
On Friday, February 6, 2015 5:31 PM, Michael McCandless
wrote:
On Fri, Feb 6, 2015 at 8:51 AM, Ahmet Arslan wrote:
> Hi Michael,
>
> Thanks for the explanation. I am working with a TREC dataset,
> since it is static, I
approximately in the doc's norm value.
Maybe you can use that? Alternatively, you can store this statistic
yourself, e.g as a doc value.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Feb 5, 2015 at 7:24 PM, Ahmet Arslan wrote:
> Hello Lucene Users,
>
> I am traversing all
Hello Lucene Users,
I am traversing all documents that contains a given term with following code :
Term term = new Term(field, word);
Bits bits = MultiFields.getLiveDocs(reader);
DocsEnum docsEnum = MultiFields.getTermDocsEnum(reader, bits, field,
term.bytes());
while (docsEnum.nextDoc() != Doc
Hi Rob,
May be you wrap your query in a ConstantScoreQuery?
ahmet
On Thursday, February 5, 2015 9:17 AM, Rob Audenaerde
wrote:
Hi all,
I'm doing some analytics with a custom Collector on a fairly large number
of searchresults (+-100.000, all the hits that return from a query). I need
to retr
Hi Ralf,
Does following code fragment work for you?
/**
* Modified from :
http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/analysis/package-summary.html
*/
public List getAnalyzedTokens(String text) throws IOException {
final List list = new ArrayList<>();
try (TokenStream ts = analy
Hi Clemens,
Please see : https://issues.apache.org/jira/browse/LUCENE-5620
Ahmet
On Tuesday, January 27, 2015 10:56 AM, Clemens Wyss DEV
wrote:
> I very much preserveOriginal="true" when applying the
>ASCIIFoldingFilter for (german)suggestions
Must revise my statement, as I just noticed tha
Hi Clemens,
Since you are a lucene user, you might be interested in Uwe's response on a
similar topic :
http://find.searchhub.org/document/abb73b45a48cb89e
Ahmet
On Wednesday, January 7, 2015 6:30 PM, Erick Erickson
wrote:
Should be, but it's a bit confusing because the query syntax is not
hetaphi.de
> -Original Message-
> From: Barry Coughlan [mailto:b.coughl...@gmail.com]
> Sent: Monday, January 05, 2015 3:40 PM
> To: java-user@lucene.apache.org; Ahmet Arslan
> Subject: Re: IndexSearcher.setSimilarity thread-safety
>
> Hi Ahmet,
>
> The IndexSearcher is "t
an use a single
IndexReader for the IndexSearchers
Barry
On Mon, Jan 5, 2015 at 1:10 PM, Ahmet Arslan
wrote:
>
>
> anyone?
>
>
>
> On Thursday, December 25, 2014 4:42 PM, Ahmet Arslan
> wrote:
> Hi all,
>
> Javadocs says "IndexSearcher instances are completely th
anyone?
On Thursday, December 25, 2014 4:42 PM, Ahmet Arslan
wrote:
Hi all,
Javadocs says "IndexSearcher instances are completely thread safe, meaning
multiple threads can call any of its
methods, concurrently"
Is this true for setSimilarity() method?
What happens when every t
Hi all,
Javadocs says "IndexSearcher instances are completely thread safe, meaning
multiple threads can call any of its
methods, concurrently"
Is this true for setSimilarity() method?
What happens when every thread uses different similarity implementations?
Thanks,
Ahmet
-
Hi Sascha,
Generally RangeQuery is used for that, e.g. fieldName:[* TO *]
Ahmet
On Monday, December 1, 2014 9:44 PM, Sascha Janz wrote:
Hi,
is there a chance to add a additional clause to a query for a field that
should not be null ?
greetings
sascha
-
Hi,
Mahout and Carrot2 can cluster the documents from lucene index.
ahmet
On Tuesday, November 11, 2014 10:37 PM, Elshaimaa Ali
wrote:
Hi All,
I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to
extract a Document-term matrix, and Document Document similarity matri
o the LowerCaseFilter. This seems to
work.
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: 10 Nov 2014 15 19
To: java-user@lucene.apache.org
Subject: Re: How to disable LowerCaseFilter when using SnowballAnalyzer in
Lucene 3.0.2
Hi,
Regarding Uwe's warnin
1 - 100 of 247 matches
Mail list logo