date:20101014

Re: ParallelReader

2010-10-14 Thread Rob Bygrave

>> Any case where it would break? If a query uses multiple fields it would break. That is, usually all the fields need to be in doc in index 2 - not just the modified one. On Fri, Oct 15, 2010 at 2:35 PM, Erick Erickson wrote: > This seems like far too much work if I'm reading things right. You

Re: ParallelReader

2010-10-14 Thread Erick Erickson

This seems like far too much work if I'm reading things right. You can't update a field, but you #can# update a document which actually re-index that document under the covers (you have to have a way to uniquely identify the doc). Then, when you reopen your index reader, you'll only see the new val

Re: ParallelReader

2010-10-14 Thread Nilesh Vijaywargiay

Hey Erick, Sure. * * *What I am trying to achieve:* A) Update a field in Index A B) When searching for that old field, it should be a miss. *How I achieved it* *Index 1 * Doc 1 - Field1, Value 1 Doc 2 - Field1, Value 1 *Index 2* Doc 1 - Field1, Modified_Value 1 Doc 2 - EMPTY Add index 2 before

Re: ParallelReader

2010-10-14 Thread Erick Erickson

No. And you don't even want to try... Document IDs are NOT invariant. Particularly when you delete a document and optimize an index, all the documents that come after the deleted one get new doc IDs. Trying to keep these two indexes in synch will be a nightmare. Perhaps you could explain what you'

proposed change to CharTokenizer

2010-10-14 Thread Mike Sokolov

Background: I've been trying to enable hit highlighting of XML documents in such a way that the highlighting preserves the well-formedness of the XML. I thought I could get this to work by implementing a CharFilter that extracts text from XML (somewhat like HTMLStripCharFilter, except I am us

RE: determining the type of a term - retrieving a payload

2010-10-14 Thread Sykes, Derek

Hey Grant, Fair point on the next(). In this case I'm iterating through the terms returned from a PrefixTermEnum so I know they're in the index. The analyser I'm using looks like this: public class TypeSavingAnalyzer extends StandardAnalyzer { public TypeSavingAnalyzer(Version version) {

ParallelReader

2010-10-14 Thread Nilesh Vijaywargiay

I have two index, A and B. Can two documents doc1[in index A] and doc2[in index B] have a common field? doc1 and doc2 have same document Id's.

Re: Use of Lucene to store data from RSS feeds

2010-10-14 Thread Grant Ingersoll

On Oct 14, 2010, at 10:17 AM, app...@dsl.pipex.com wrote: > Hello > > I would like to store data retrieved hourly from RSS feeds in a database or > in Lucene so that the text can be easily > indexed for word frequencies. > > I need to get the text from the title and description elements of RSS

Re: determining the type of a term - retrieving a payload

2010-10-14 Thread Grant Ingersoll

On Oct 13, 2010, at 11:37 AM, Sykes, Derek wrote: > Hi there, > > I'm currently trying to work out how I can determine the type > (string/number/date/etc)of a term. I've not seen any off the shelf way to do > it so am trying to store a payload against each term that records the type. > > I'm

Use of Lucene to store data from RSS feeds

2010-10-14 Thread appy74

Hello I would like to store data retrieved hourly from RSS feeds in a database or in Lucene so that the text can be easily indexed for word frequencies. I need to get the text from the title and description elements of RSS items. Ideally, for each hourly retrieval from a given feed, I would add

Re: IndexSearch very slow after reopening the index

2010-10-14 Thread subwayne

Ok, I read the Wiki page related to improving the searching speed and adopted some advices. One of the slow queries is simply. Here are some: plaintext:guid 107.0 ms resultSet.totalHits = 1 plaintext:allianc 51.0 ms resultSet.totalHists = 1 plaintext:engin 46.0 ms resultSet.totalHits = 1 plain

Re: Storing additional Metadata with Fields

2010-10-14 Thread Christoph Hermann

Am Donnerstag, 14. Oktober 2010, 12:29:43 schrieben Sie: Hello, > > is there a way to store additional metadata with fields? > > Example: > > I have the following content: > > > > This is a very > > interesting text. > > This is boring text > > > > Is there any way to include the page,x,y val

Cannot view open issues in Hudson

2010-10-14 Thread David Clarke

Hey Guys Whenever I try to view open issues in hudson it doesn't display any information. Does anyone know why this is the case or how I could fix it? Thanks in advance -Dave Clarke

Re: IndexSearch very slow after reopening the index

2010-10-14 Thread Ian Lea

OK, so it looks like we're down to a more general "why is searching slow" question. The number of docs is not very large by lucene standards. Work through http://wiki.apache.org/lucene-java/ImproveSearchingSpeed. If that still doesn't help, pick a slow query and post again with: . the output of

Re: Storing additional Metadata with Fields

2010-10-14 Thread Pradeep Singh

Payload!! 2010/10/14 Christoph Hermann > Hi, > > is there a way to store additional metadata with fields? > > My Problem is as follows: > I'm extracting extended html with tika. This extended html contains > references > to pages, x,y values of the text etc. I want to be able to retrieve those >

Re: IndexSearch very slow after reopening the index

2010-10-14 Thread Pradeep Singh

Many times when you run a search for the first time it has to load all field values IF the field is being sorted on. Subsequent searches use that cache and are faster. Does that happen in your case? From your description it doesn't look like you are sorting, although this kind of performance degrad

Storing additional Metadata with Fields

2010-10-14 Thread Christoph Hermann

Hi, is there a way to store additional metadata with fields? My Problem is as follows: I'm extracting extended html with tika. This extended html contains references to pages, x,y values of the text etc. I want to be able to retrieve those values when text was found while searching. So when cr

Re: IndexSearch very slow after reopening the index

2010-10-14 Thread subwayne

Hi Ian, thank you for your quick response. I am running Lucene on Ubuntu 10.04, 64 bit. I switched from MMapDirectory to NIOFSDirectory without any significant changes in performance. The Lucene version running is 3.0.2. I followed your advice and opened the IndexSearcher after I added all docume

Re: IndexSearch very slow after reopening the index

2010-10-14 Thread Ian Lea

Do the fast searches that you get while the app is running use the searcher you create before you add all the docs to the index? Surely that won't see the added docs. There are general tips on speeding up searches at http://wiki.apache.org/lucene-java/ImproveSearchingSpeed. There are some gotcha

IndexSearch very slow after reopening the index

2010-10-14 Thread subwayne

Hi, I'am facing some problems in using Lucene. The index I am using is constructed like this: try { Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_30, "English"); Directory dir = MMapDirectory.open(index); IndexWriter writer = new IndexWriter(dir, analyzer, MaxFieldLength.LIMITED)

Re: ParallelReader

Re: ParallelReader

Re: ParallelReader

Re: ParallelReader

proposed change to CharTokenizer

RE: determining the type of a term - retrieving a payload

ParallelReader

Re: Use of Lucene to store data from RSS feeds

Re: determining the type of a term - retrieving a payload

Use of Lucene to store data from RSS feeds

Re: IndexSearch very slow after reopening the index

Re: Storing additional Metadata with Fields

Cannot view open issues in Hudson

Re: IndexSearch very slow after reopening the index

Re: Storing additional Metadata with Fields

Re: IndexSearch very slow after reopening the index

Storing additional Metadata with Fields

Re: IndexSearch very slow after reopening the index

Re: IndexSearch very slow after reopening the index

IndexSearch very slow after reopening the index

20 matches

Site Navigation

Mail list logo

Footer information