Hi
I am using the TermsComponent in my solr config like this to deal with
queries about terms in the index:
--
true
false
terms
---
For example, I want to fetch any *terms* containing "surface defects".
Using solr I can d
Hi
Is it possible to get a list of terms within a document, and also TF of
each of these terms *in that document only*? (Lucene 5.3)
IndexReader has a method "Terms getTermVector(int docID, String field)",
which gives me a "Terms" object, on which I can get a TermsEnum. But I
do not know whe
, you need to enable term vectors during indexing. The
pattern how to use terms enum can be looked up at various places in Lucene
source code. It's a very expert API but it is the way to go here.
Uwe
Am 20. September 2015 15:35:40 MESZ, schrieb Ziqi Zhang
:
Hi
Is it possible to get a li
Hi
Given a document in a lucene index, I would like to get a list of terms
in that document and their offsets. I suppose starting with
IndexReader.getTermVector can get me going with this. I have some code
as below (Lucene 5.3) of which I have some questions:
Hi
I need a special kind of 'token' which is a sentence, so I need a
tokenizer that splits texts into sentences.
I wonder if there is already such or similar implementations?
If I have to implement it myself, I suppose I need to implement a
subclass of Tokenizer. Having looked at a few exist
a sentence delimiter, then use that delimiter
as the basis for tokenization.
More info on sentence recognition
http://opennlp.apache.org/documentation/manual/opennlp.html
On Wed, Sep 23, 2015 at 11:18 AM, Ziqi Zhang
wrote:
Hi
I need a special kind of 'token' which is a sentence, s
P) to do
sentence splitting and add to the doc as stored fields?
Steve
www.lucidworks.com
On Sep 23, 2015, at 11:39 AM, Ziqi Zhang wrote:
Thanks that is understood.
My application is a bit special in the way that I need both an indexed field
with standard tokenization and an unindexed but s
se.WholeSentenceTokenizer.
However, it sounds like you only need to store the sentences, not search
against them, so I don’t think you need sentence *tokenization*.
why not simply use the JDK’s BreakIterator (or as you say OpenNLP) to do
sentence splitting and add to the doc as stored fields?
Steve
www
ago
sentences (stored): 33,52|Something happened.
Steve
www.lucidworks.com
On Sep 23, 2015, at 3:26 PM, Ziqi Zhang wrote:
Thanks Steve.
It probably also makes sense to extract sentences and then store them. But
along with each sentence i also need to store its start/end offset. I
Hi
Is there a way to remove just the leading and trailing stopwords from a
token n-gram?
Currently I have the following combination which removes any n-gram that
contains a stopword:
ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
Hi,
I have a problem which I think is the same as that described here:
http://stackoverflow.com/questions/8892143/error-when-opening-a-lucene-index-map-failed
However the solution does not apply in this case so I am providing more
details and asking again.
The index is created using Solr 5.3
Hi
I am trying to pin-point a mismatch between the offsets produced by
lucene indexing process when I use the offsets to substring from the
original document content.
I try to debug as far as I can go but I lost track of lucene when I am
at line 298 of DefaultIndexingChain (lucene 5.3.0):
o its applied on system startup.
This depends on your Linux distribution, we cannot give any help on this.
I would also recommend to review my blog post as stated with URL in
the exception message!
Uwe
Am 1. Oktober 2015 21:25:30 MESZ, schrieb Ziqi Zhang
:
Hi,
I have a problem wh
we Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Ziqi Zhang [mailto:ziqi.zh...@sheffield.ac.uk]
Sent: Saturday, October 03, 2015 5:01 PM
To: java-user@lucene.apache.org
Subject: lucene deliberately removes \r (windows car
14 matches
Mail list logo