date:20061002

Re: get terms by positions

2006-10-02 Thread Catalin Mititelu

Hi, I have the same problem. This is useful when you try to extract the contexts (terms before and after) of a certain term (for example). I found a solution but it performs badly: when you try to retrieve those contexts you have to re-tokenize the documents containing the given term (i.e. "socc

Multi-threaded IndexWriter

2006-10-02 Thread Antony Bowesman

Hi, I have a multi-threaded indexing application that indexes documents into a set of Lucene index databases (I have millions of documents to index, hence the split DB) . When a thread gets an index request, it determines the index DB to index the data in. It grabs the IndexWriter for that d

Searching documents on big index by using ParallelMultiSearcher is slow...

2006-10-02 Thread Scott

Hi, I have a question about ParallelMultiSearcher performance. I want to search documents on about 10 gigabytes of index. (The index has 10,000,000 documents.) I get very slow performance using IndexSearcher with ONE index normally. Then I tried to use ParallelMultiSearcher with 10 servers of re

[Lucene 2.0]How to recover index?

2006-10-02 Thread zhu jiang

Hi all, In some situation, index files may throw read past EOF exception so that the index cannot be used any more. I wonder how to recover the index files in such situation? -- Thanks, Jiang

Re: A question about query syntax, has it changed?

2006-10-02 Thread Doron Cohen

The problem stems from using the query parser for searching a non tokenized field ("book"). You can either create a term query for searching in that field, like this: new TermQuery(new Term("book","first title")); Or tokenize the field "book" and keep using QueryParser. Decision is based on ho

A question about query syntax, has it changed?

2006-10-02 Thread Bill Taylor

I am indexing individual pages of books. I get no results from the query accurate AND book:"first title" Each lucene document which represents one page of one book gets a field "book" which is indexed, stored, and not tokenized to store the title of the book. The word "accurate" appears on

Re: get terms by positions

2006-10-02 Thread Doron Cohen

You can store TermVectors with position info, but I don't think this would be enough for what you are asking, because it is not meant for direct access to a term by its position, and because TermVectors store tokens, i.e. the "indexed" form of the word, which I am not sure is what you need. It see

Re: get terms by positions

2006-10-02 Thread Nicolas Lalevée

Le Lundi 02 Octobre 2006 23:06, Renzo Scheffer a écrit : > Hi, > > > > can anybody be so kind to tell me if it is possible to search a Term by its > position? > > > > I search a term (for excample "soccer") and get back the DocId's and > positions as follows: > > > > > > TermPositions termPos = rea

Re: Changing Similarity on existing index

2006-10-02 Thread Chris Hostetter

: Initially, I had anticipated that doing this would updated the : Similarity as part of the add process. But after running some tests, : this does not appear to be the case. fieldNorms are computed when the document is added to the index ... merging indexes doesn't affect them. : Is there some

get terms by positions

2006-10-02 Thread Renzo Scheffer

Hi, can anybody be so kind to tell me if it is possible to search a Term by its position? I search a term (for excample "soccer") and get back the DocId's and positions as follows: TermPositions termPos = reader.termPositions(new Term("contents","soccer")); while(termPos.next()){ i

Changing Similarity on existing index

2006-10-02 Thread Shane Perry

I have an existing index which was created with DefaultSimilarity. I want to update the index to use my own Similarity class (need to change the lengthNorm). I wrote a quick script which creates a new index, calls setSimilarity(new MySimilarity) for that indexes IndexWriter, and then calls wr

Re: Search in HTML code

2006-10-02 Thread Erick Erickson

I guess the thundering silence is rooted in the problem statement. I have a hard time understanding how this index is used. By storing things this way, you'll force the user to know the *exact* format of anything she's looking for. That is, it's hard to search for and get docs containing both an

Re: lucene newbie question

2006-10-02 Thread Erick Erickson

Another Erick (note the correct spelling ). See below.. On 10/2/06, Los Morales <[EMAIL PROTECTED]> wrote: Hi Erik, Thanks for the response. >Consider the index in the back of a book. You could tear that out and >still use it to tell what page something is on, but you have no actual >conte

Re: lucene newbie question

2006-10-02 Thread Doron Cohen

SSN actually is a common situation. Assume you have a (relational) database with a table of products with three columns : - SSN, which is also a primary key for that table, - DESCRIPTION, which has free text (i.e. unformatted text) describing the product. - OTHER - additional info. Also assume you

Re: lucene newbie question

2006-10-02 Thread Los Morales

Hi Erik, Thanks for the response. Consider the index in the back of a book. You could tear that out and still use it to tell what page something is on, but you have no actual content in hand. So, I guess what I'm having a hard time trying to figure out is, what's the point of having an ind

Re: lucene newbie question

2006-10-02 Thread Erik Hatcher

On Oct 2, 2006, at 2:08 PM, Los Morales wrote: I'm new to Lucene and IR in general. I'm a bit confused on the concept of fields. From what I've read, a field does not have to be indexed but its value can be stored in an index. Likewise a field can be indexed but its value is not stored i

lucene newbie question

2006-10-02 Thread Los Morales

Hi, I'm new to Lucene and IR in general. I'm a bit confused on the concept of fields. From what I've read, a field does not have to be indexed but its value can be stored in an index. Likewise a field can be indexed but its value is not stored in an index. Now how can a field be searchable

Re: Modifying the PrefixQuery

2006-10-02 Thread Chris Hostetter

: I want to modify the PrefixQuery so that it instead of casting the : TooManyBooleanClause exception takes out the most frequent N terms : matching the prefix and only searches for those. Is this possible? It should be ... look at the rewrite method of PrefixQuery and the docFreq method of TermE

Re: Indexing a single product in multiple categories.

2006-10-02 Thread Chris Hostetter

: Is my only option here really going to be to add some more colums? I've slept : on it over the weekend, and not had any more bright ideas ... ? I have to admit, i dont't relaly udnerstand your problem ... you speak of Products and Stores and Categories and Primary Categories and wondering how t

Re: Very high fieldNorm for a field resulting in bad results

2006-10-02 Thread Chris Hostetter

: This should solve most of my heartache. : Whats the suggested way to use this ? Copy a solr jar ? Or just copy : the code for this 1 query ? that's entirely up to you, it depends on what kind of source management you want to have -- the suggested way to use it is to run Solr and use it via the

Re: Performing a like query

2006-10-02 Thread Chris Hostetter

: I have a custom-built Analyzer where I tokenize all non-whitespace : characters as well available in the field "TERM" (which is the only : field being tokenised). : If I now query my index file for a term "6/12" for instance, I get back : only ONE result : instead of TWO. There is another token

Re: DateTools again

2006-10-02 Thread John Haxby

John Haxby wrote: I ran across the problem with DateTools not using UTC when I tried to use an index created in California from the UK: I was looking for documents with a particular date stamp but I found documents with a date stamp from the wrong day. Even more interesting and bizarre things

Re: DateTools again

2006-10-02 Thread John Haxby

Volodymyr Bychkoviak wrote: I'm using DateTools with Resolution.DAY. I know that dates internally are converted to GMT. Converting dates "2006-10-01 00:00" and "2006-10-01 15:00" from "Etc/GMT-2" timezone will give us "20060930" and "20061001" respectively. But these dates are identical with

DateTools again

2006-10-02 Thread Volodymyr Bychkoviak

I'm using DateTools with Resolution.DAY. I know that dates internally are converted to GMT. Converting dates "2006-10-01 00:00" and "2006-10-01 15:00" from "Etc/GMT-2" timezone will give us "20060930" and "20061001" respectively. But these dates are identical with day resolution. Is this bug

Search in HTML code

2006-10-02 Thread John Bugger

Hello! I've indexed HTML pages and stored html codes as UN_TOKENIZED fields. So, I need to search for specific tags in those documents, for example: Do I need to write some custom analyzer or something like that? Please help me!

Modifying the PrefixQuery

2006-10-02 Thread Marcus Falck

I want to modify the PrefixQuery so that it instead of casting the TooManyBooleanClause exception takes out the most frequent N terms matching the prefix and only searches for those. Is this possible? / Regards Marcus

SV: Indexing a single product in multiple categories.

2006-10-02 Thread Marcus Falck

Can't you just add several values to the Store field? I.E: doc.addField(field.text(STOREFIELD, val1) doc.addField(field.text(STOREFIELD, val2) -Ursprungligt meddelande- Från: Stuart Grimshaw [mailto:[EMAIL PROTECTED] Skickat: den 2 oktober 2006 10:09 Till: java-user@lucene.apache.org Ä

Re: Indexing a single product in multiple categories.

2006-10-02 Thread Stuart Grimshaw

On Thursday 28 September 2006 10:12, Stuart Grimshaw wrote: > We have an existing lucene based search, and a recent change to the way we > organise our products has caused a bit of a problem for search results. > > Our products are arranged into subcategories, categories & stores. A > product can o

Re: get terms by positions

Multi-threaded IndexWriter

Searching documents on big index by using ParallelMultiSearcher is slow...

[Lucene 2.0]How to recover index?

Re: A question about query syntax, has it changed?

A question about query syntax, has it changed?

Re: get terms by positions

Re: get terms by positions

Re: Changing Similarity on existing index

get terms by positions

Changing Similarity on existing index

Re: Search in HTML code

Re: lucene newbie question

Re: lucene newbie question

Re: lucene newbie question

Re: lucene newbie question

lucene newbie question

Re: Modifying the PrefixQuery

Re: Indexing a single product in multiple categories.

Re: Very high fieldNorm for a field resulting in bad results

Re: Performing a like query

Re: DateTools again

Re: DateTools again

DateTools again

Search in HTML code

Modifying the PrefixQuery

SV: Indexing a single product in multiple categories.

Re: Indexing a single product in multiple categories.

28 matches

Site Navigation

Mail list logo

Footer information