Hi,
I have the same problem.
This is useful when you try to extract the contexts (terms before and after) of
a certain term (for example).
I found a solution but it performs badly: when you try to retrieve those
contexts you have to re-tokenize the documents containing the given term (i.e.
"socc
Hi,
I have a multi-threaded indexing application that indexes documents into a set
of Lucene index databases (I have millions of documents to index, hence the
split DB) . When a thread gets an index request, it determines the index DB to
index the data in. It grabs the IndexWriter for that d
Hi,
I have a question about ParallelMultiSearcher performance.
I want to search documents on about 10 gigabytes of index.
(The index has 10,000,000 documents.)
I get very slow performance using IndexSearcher with ONE index normally.
Then I tried to use ParallelMultiSearcher with 10 servers of re
Hi all,
In some situation, index files may throw read past EOF exception so that
the index cannot be used any more. I wonder how to recover the index files
in such situation?
--
Thanks,
Jiang
The problem stems from using the query parser for searching a non tokenized
field ("book").
You can either create a term query for searching in that field, like this:
new TermQuery(new Term("book","first title"));
Or tokenize the field "book" and keep using QueryParser.
Decision is based on ho
I am indexing individual pages of books.
I get no results from the query
accurate AND book:"first title"
Each lucene document which represents one page of one book gets a field
"book" which is indexed, stored, and not tokenized to store the title
of the book.
The word "accurate" appears on
You can store TermVectors with position info, but I don't think this would
be enough for what you are asking, because it is not meant for direct
access to a term by its position, and because TermVectors store tokens,
i.e. the "indexed" form of the word, which I am not sure is what you need.
It see
Le Lundi 02 Octobre 2006 23:06, Renzo Scheffer a écrit :
> Hi,
>
>
>
> can anybody be so kind to tell me if it is possible to search a Term by its
> position?
>
>
>
> I search a term (for excample "soccer") and get back the DocId's and
> positions as follows:
>
>
>
>
>
> TermPositions termPos = rea
: Initially, I had anticipated that doing this would updated the
: Similarity as part of the add process. But after running some tests,
: this does not appear to be the case.
fieldNorms are computed when the document is added to the index ...
merging indexes doesn't affect them.
: Is there some
Hi,
can anybody be so kind to tell me if it is possible to search a Term by its
position?
I search a term (for excample "soccer") and get back the DocId's and
positions as follows:
TermPositions termPos = reader.termPositions(new Term("contents","soccer"));
while(termPos.next()){
i
I have an existing index which was created with DefaultSimilarity. I
want to update the index to use my own Similarity class (need to change
the lengthNorm). I wrote a quick script which creates a new index,
calls setSimilarity(new MySimilarity) for that indexes IndexWriter, and
then calls wr
I guess the thundering silence is rooted in the problem statement. I have a
hard time understanding how this index is used. By storing things this way,
you'll force the user to know the *exact* format of anything she's looking
for. That is, it's hard to search for and
get docs containing both an
Another Erick (note the correct spelling ). See below..
On 10/2/06, Los Morales <[EMAIL PROTECTED]> wrote:
Hi Erik,
Thanks for the response.
>Consider the index in the back of a book. You could tear that out and
>still use it to tell what page something is on, but you have no actual
>conte
SSN actually is a common situation.
Assume you have a (relational) database with a table of products with three
columns :
- SSN, which is also a primary key for that table,
- DESCRIPTION, which has free text (i.e. unformatted text) describing the
product.
- OTHER - additional info.
Also assume you
Hi Erik,
Thanks for the response.
Consider the index in the back of a book. You could tear that out and
still use it to tell what page something is on, but you have no actual
content in hand.
So, I guess what I'm having a hard time trying to figure out is, what's the
point of having an ind
On Oct 2, 2006, at 2:08 PM, Los Morales wrote:
I'm new to Lucene and IR in general. I'm a bit confused on the
concept of fields. From what I've read, a field does not have to
be indexed but its value can be stored in an index. Likewise a
field can be indexed but its value is not stored i
Hi,
I'm new to Lucene and IR in general. I'm a bit confused on the concept of
fields. From what I've read, a field does not have to be indexed but its
value can be stored in an index. Likewise a field can be indexed but its
value is not stored in an index. Now how can a field be searchable
: I want to modify the PrefixQuery so that it instead of casting the
: TooManyBooleanClause exception takes out the most frequent N terms
: matching the prefix and only searches for those. Is this possible?
It should be ... look at the rewrite method of PrefixQuery and the docFreq
method of TermE
: Is my only option here really going to be to add some more colums? I've slept
: on it over the weekend, and not had any more bright ideas ... ?
I have to admit, i dont't relaly udnerstand your problem ... you speak of
Products and Stores and Categories and Primary Categories and wondering
how t
: This should solve most of my heartache.
: Whats the suggested way to use this ? Copy a solr jar ? Or just copy
: the code for this 1 query ?
that's entirely up to you, it depends on what kind of source management
you want to have -- the suggested way to use it is to run Solr and use it
via the
: I have a custom-built Analyzer where I tokenize all non-whitespace
: characters as well available in the field "TERM" (which is the only
: field being tokenised).
: If I now query my index file for a term "6/12" for instance, I get back
: only ONE result
: instead of TWO. There is another token
John Haxby wrote:
I ran across the problem with DateTools not using UTC when I tried to
use an index created in California from the UK: I was looking for
documents with a particular date stamp but I found documents with a
date stamp from the wrong day. Even more interesting and bizarre
things
Volodymyr Bychkoviak wrote:
I'm using DateTools with Resolution.DAY.
I know that dates internally are converted to GMT.
Converting dates "2006-10-01 00:00" and "2006-10-01 15:00" from
"Etc/GMT-2" timezone will give us
"20060930" and "20061001" respectively.
But these dates are identical with
I'm using DateTools with Resolution.DAY.
I know that dates internally are converted to GMT.
Converting dates "2006-10-01 00:00" and "2006-10-01 15:00" from
"Etc/GMT-2" timezone will give us
"20060930" and "20061001" respectively.
But these dates are identical with day resolution.
Is this bug
Hello!
I've indexed HTML pages and stored html codes as UN_TOKENIZED fields. So, I
need to search for specific tags in those documents,
for example:
Do I need to write some custom analyzer or something like that?
Please help me!
I want to modify the PrefixQuery so that it instead of casting the
TooManyBooleanClause exception takes out the most frequent N terms
matching the prefix and only searches for those. Is this possible?
/
Regards
Marcus
Can't you just add several values to the Store field?
I.E:
doc.addField(field.text(STOREFIELD, val1)
doc.addField(field.text(STOREFIELD, val2)
-Ursprungligt meddelande-
Från: Stuart Grimshaw [mailto:[EMAIL PROTECTED]
Skickat: den 2 oktober 2006 10:09
Till: java-user@lucene.apache.org
Ä
On Thursday 28 September 2006 10:12, Stuart Grimshaw wrote:
> We have an existing lucene based search, and a recent change to the way we
> organise our products has caused a bit of a problem for search results.
>
> Our products are arranged into subcategories, categories & stores. A
> product can o
28 matches
Mail list logo