date:20090324

Re: Can you create a RAM index from a file index

2009-03-24 Thread Anshum

Hi Ganesh, What you are talking about is loading partial index (as per requirement) into RAM. This is exactly what any other decently designed application would do. On the other hand, RAM Directory implementation just copies all of the index into RAM. Also, tmpfs is nothing but an explicit copy o

Re: Can you create a RAM index from a file index

2009-03-24 Thread Ganesh

FileSystem index reader loads the data to RAM, I have tried with more than 6 GB of index (sharded to 20 index) and the response is pretty fast. What significance gain would be to use RAM directory. How the modifications done in RAM directory will sync with FileSystem. Regards Ganesh - Ori

Re: Can you create a RAM index from a file index

2009-03-24 Thread Otis Gospodnetic

That's indeed an alternative. Moreover, I have heard (not measured/comparered myself) from people who tried both MM and tmpfs approach that the former has some overhead. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Anshum > To: java

Re: Index Partitioning

2009-03-24 Thread Chris Hostetter

: This is perfect, exactly what I was looking for. Thanks much Andrzej! if you code that up and it works out well, contributing your code as a Jira attachment could help it become a re-usable tool for others in the future. (a simple command line that takes the directory of hte index, a value

Re: MergePolicy public but SegmentInfos package protected?

2009-03-24 Thread Chris Hostetter

: I'd rather not make SegmentInfos public; it's a large API and we do : make changes to it as we change the index format. It's also quite : internal to Lucene. : : Making your own MergePolicy/Scheduler is very much an "advanced" use : case... so I think it's acceptable to have to put it into o.a

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

2009-03-24 Thread Michael McCandless

It looks like you are reusing a Field (the f.setValue(...) calls); are you sure you're not changing a Document/Field while another thread is adding it to the index? If you can post the full code, then I can try to run it on my wikipedia dump locally. Mike Jason Rutherglen wrote: > Mike, > > It

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

2009-03-24 Thread Jason Rutherglen

Mike, It only happens when at least 1 million documents are indexed in a multithreaded fashion. Maybe I should post the code? I will try indexing without the payload field, I assume it won't fail because I indexed wikipedia before with no issues. Thanks! Jason On Tue, Mar 24, 2009 at 12:25 PM

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

2009-03-24 Thread Jason Rutherglen

Using StandardAnalyzer. It's probably the payload field? This is the code that creates the payload field: private static class SinglePayloadTokenStream extends TokenStream { private Token token = new Token(UID_TERM.text(), 0, 0); private byte[] buffer = new byte[4];

Re: Memory Leak?

2009-03-24 Thread Paul Smith

No, I don't hit OOME if I comment out the call to getHTMLTitle. The heap behaves perfectly. I completely agree with you, the thread count goes haywire the moment I call the HTMLParser.getTitle(). I have seen a thread count of like 600 before my I hit OOME (with the getTitle() call on) and

Re: Corrupt index (IndexOutOfBoundsException)

2009-03-24 Thread René Zöpnek

Thank you for your help Michael. I've solved the problem by new creation of the index. The OutOfErrorException killed the thread, which was responsible for index maintenance. So the index recreation failed without an error message. So after recreating the index, the problem is solved. Sorry for

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

2009-03-24 Thread Michael McCandless

I was just able to index all of wikipedia, using StandardAnalyzer, with assertions enabled, without hitting that exception. Which analyzer are you using (besides your payload field)? Mike Michael McCandless wrote: > H. > > Jason is this easily/compactly repeated? EG, try to index the N doc

Re: MergePolicy public but SegmentInfos package protected?

2009-03-24 Thread Michael McCandless

I'd rather not make SegmentInfos public; it's a large API and we do make changes to it as we change the index format. It's also quite internal to Lucene. Making your own MergePolicy/Scheduler is very much an "advanced" use case... so I think it's acceptable to have to put it into o.a.l.index pack

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

2009-03-24 Thread Michael McCandless

H. Jason is this easily/compactly repeated? EG, try to index the N docs before that one. If you remove the SinglePayloadTokenStream field, does the exception still happen? Mike Jason Rutherglen wrote: > While indexing using > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker

MergePolicy public but SegmentInfos package protected?

2009-03-24 Thread Jason Rutherglen

I'm overriding MergePolicy which is public, however SegmentInfos is package protected which means the MergePolicy subclass must be in the org.apache.lucene.index package. Can we make SegmentInfos public?

Re: Memory Leak?

2009-03-24 Thread Michael McCandless

Actually, I was hoping you could try leaving the getHTML calls in, but increase the heap size of your Tomcat instance. Ie, to be sure there really is a leak vs you're just not giving the JRE enough memory. I do like your hypothesis, but looking at HTMLParser it seems like the thread should exit a

Re: Memory Leak?

2009-03-24 Thread Chetan Shah

Highly appreciate your replies Michael. No, I don't hit OOME if I comment out the call to getHTMLTitle. The heap behaves perfectly. I completely agree with you, the thread count goes haywire the moment I call the HTMLParser.getTitle(). I have seen a thread count of like 600 before my I hit OOME

Re: Memory Leak?

2009-03-24 Thread Michael McCandless

Odd. I don't know of any memory leaks w/ the demo HTMLParser, hmm though it's doing some fairly scary stuff in its getReader() method. EG it spawns a new thread every time you run it. And, it's parsing the entire HTML document even though you only want the title. You may want to switch to better

Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

2009-03-24 Thread Jason Rutherglen

While indexing using contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker. The asserion error is from TermsHashPerField.comparePostings(RawPostingList p1, RawPostingList p2). A Payload is added to the document representing a UID. Only 1-2 out of 1 million documents indexed generates th

Re: Memory Leak?

2009-03-24 Thread Chetan Shah

After some more researching I discovered that the following code snippet seems to be the culprit. I have to call this to get the "title" of the indexed html page. And this is called 10 times as my I display 10 results on a page. Any Suggestions on how to achieve this without the OOME issue.

Streaming results of analysis to shards ... possible?

2009-03-24 Thread Cass Costello

Hello all, Our application involves a high index write rate - anywhere from a few dozen to many thousands of docs per sec. The write rate is frequently higher than the read rate (though not always), and our index must be as fresh as possible (we'd like search results to be no more than a couple o

Re: Term level boosting

2009-03-24 Thread Koji Sekiguchi

Seid Mohammed wrote: ok, but I need to know how to proceed with it. I mean how to include to my application many thanks Seid M You may want to look at the following articles: http://lucene.jugem.jp/?eid=133 http://lucene.jugem.jp/?eid=134 articles are in Japanese, but ignore them. :) Pro

question about grouping text

2009-03-24 Thread MFM

I have been able to successfully index and search text from structured documents like PDF and MS Word. I am having a real hard time trying to figure out how to group the index strings together e.g. if my document had a question and answer in a table, the search will produce the text with the quest

Re: "People you might know" ( a la Facebook) - slightly offtopic

2009-03-24 Thread Karl Wettin

There is even an old thread about this on the Mahout-users list: http://markmail.org/message/ludu5hjfczuvgk3n 17 mar 2009 kl. 15.17 skrev Grant Ingersoll: Have a look at the Lucene sister project: Mahout: http://lucene.apache.org/mahout . In there is the Taste collaborative filtering project

Re: Term level boosting

2009-03-24 Thread Seid Mohammed

ok, but I need to know how to proceed with it. I mean how to include to my application many thanks Seid M On 3/24/09, Koji Sekiguchi wrote: > Seid Mohammed wrote: >> Hi All >> I want my lucene to index documents and making some terms to have more >> boost value. >> so, if I index the document "

Re: Term level boosting

2009-03-24 Thread Koji Sekiguchi

Seid Mohammed wrote: Hi All I want my lucene to index documents and making some terms to have more boost value. so, if I index the document "The quick fox jumps over the lazy dog" and I want the term fox and dog to have greater boost value. How can I do that Thanks a lot seid M How about

Term level boosting

2009-03-24 Thread Seid Mohammed

Hi All I want my lucene to index documents and making some terms to have more boost value. so, if I index the document "The quick fox jumps over the lazy dog" and I want the term fox and dog to have greater boost value. How can I do that Thanks a lot seid M -- "RABI ZIDNI ILMA" ---

Re: Corrupt index (IndexOutOfBoundsException)

2009-03-24 Thread Michael McCandless

When I run checkIndex on your index, I see a new exception: org.apache.lucene.index.CorruptIndexException: Incompatible format version: 119865344 expected 1 or lower at org.apache.lucene.index.FieldsReader.(FieldsReader.java:116) at org.apache.lucene.index.SegmentReader.initialize

Re: How to know the matched field?

2009-03-24 Thread Paul Libbrecht

Here's my first approach but I note that, typically, I have fields (which are not stored) which may be the matching field but still not be the one I want to return. Typically, I have a field "names in all languages along the standard- analyzer" which is not the one I want to "see as matched".

Re: Corrupt index (IndexOutOfBoundsException)

2009-03-24 Thread Michael McCandless

Instead of ignoring the exceptions in your finally clause, can you log them? It could be something interesting is happening in there... I'll have a look at the index. Mike "René Zöpnek" wrote: > Thanks for your answer, Mike. > > Unfortunately I have no direct access to the server with the corr

Re: Can you create a RAM index from a file index

2009-03-24 Thread Anshum

Hi Paul, Going by what you've conveyed here, I'd assume that you have more than some data. You could either go ahead with Ian's way which is the suggested one(as far as lucene implementation is concerned) but It'd not be possible if you're index is greater than 2 Gigs and you are not running the 6

Re: Can you create a RAM index from a file index

2009-03-24 Thread Paul Taylor

Ian Lea wrote: Hi You can load an existing index into a RAMDirectory using one of the constructors that takes an existing index. I believe that a RAM index will be the same size as a file based index. Of course I was looking at IndexSearcher but the constructor is for RAMDirectory MMapDir

Re: Can you create a RAM index from a file index

2009-03-24 Thread Ian Lea

Hi You can load an existing index into a RAMDirectory using one of the constructors that takes an existing index. I believe that a RAM index will be the same size as a file based index. MMapDirectory is another possibility. -- Ian. On Tue, Mar 24, 2009 at 8:42 AM, Paul Taylor wrote: > Hi

Can you create a RAM index from a file index

2009-03-24 Thread Paul Taylor

Hi Ive built some file based indexes based on data in a database, and it took quite some time. I am interested in trying to use RAM based indexes instead of file based indexes to compare search performance but its going to take some time to rebuild the index from the original database, isnt it

Re: Scores between words. Boosting?

2009-03-24 Thread Grant Ingersoll

Do you have any info that helps you narrow down how many to choose, like some type of ranking of the synonyms? I guess I would start smaller, say maybe 3, and then evaluate your results with different numbers. On Mar 22, 2009, at 2:40 PM, liat oren wrote: Ok, thanks. I will look how to u

Corrupt index (IndexOutOfBoundsException)

2009-03-24 Thread René Zöpnek

Thanks for your answer, Mike. Unfortunately I have no direct access to the server with the corrupt index. So changing the creation process of the index is not possible. I've uploaded the index to http://drop.io/hlu53sl (9 MB). Here is the code for creating the index: public static void crea

Re: Can you create a RAM index from a file index

Re: Can you create a RAM index from a file index

Re: Can you create a RAM index from a file index

Re: Index Partitioning

Re: MergePolicy public but SegmentInfos package protected?

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Re: Memory Leak?

Re: Corrupt index (IndexOutOfBoundsException)

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Re: MergePolicy public but SegmentInfos package protected?

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

MergePolicy public but SegmentInfos package protected?

Re: Memory Leak?

Re: Memory Leak?

Re: Memory Leak?

Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Re: Memory Leak?

Streaming results of analysis to shards ... possible?

Re: Term level boosting

question about grouping text

Re: "People you might know" ( a la Facebook) - slightly offtopic

Re: Term level boosting

Re: Term level boosting

Term level boosting

Re: Corrupt index (IndexOutOfBoundsException)

Re: How to know the matched field?

Re: Corrupt index (IndexOutOfBoundsException)

Re: Can you create a RAM index from a file index

Re: Can you create a RAM index from a file index

Re: Can you create a RAM index from a file index

Can you create a RAM index from a file index

Re: Scores between words. Boosting?

Corrupt index (IndexOutOfBoundsException)

35 matches

Site Navigation

Mail list logo

Footer information