date:20090219

Re: stream of events never to know when it ends? how to index such things & search

2009-02-19 Thread Christian Brennsteiner

hi erick, ram and fsdir: we will hold every day of the 30 days (in the past) in ram. we will start a seperate process every 1 or 2 days which holds 1-2 days. i think that FSDir might be too slow? never tested that my goal is to search 30 days with indexes about 300-700 M / day -> 21 G (max) w

Re: Indexer.Java problem

2009-02-19 Thread Michael McCandless

The early access version of LIA2 (accessible at http://www.manning.com/hatcher3/) has updated this example to work with recent Lucene releases (though it's still using deprecated APIs -- that'll be fixed before the book is released). Oh actually the first chapter is a free PDF on Manning'

RE: 2.3.2 -> 2.4.0 StandardTokenizer issue

2009-02-19 Thread Philip Puffinburger

Actually, WhitespaceTokenizer won't work. Too many person names and it won't do anything with punctuation. Something had to have changed in StandardTokenizer, and we need some of the 2.4 fixes/features, so we are kind of stuck. -Original Message- From: Philip Puffinburger [mailto:ppuf

Re: Indexer.Java problem

2009-02-19 Thread Erick Erickson

Unfortunately, not really. I haven't tried to get the LIA examples working for years... The various release notes on the Wiki, especially the 1.9 and 2.0 release notes are probably the best place to start. Best Erick On Thu, Feb 19, 2009 at 11:13 AM, Seid Mohammed wrote: > I better modify it,

Re: Filters - at what stage are they applied?

2009-02-19 Thread Yonik Seeley

On Thu, Feb 19, 2009 at 6:53 AM, Joel Halbert wrote: > By way of clarification, when a filter is used with a search query, is > the filter applied only to documents that matched the search query or is > it applied to all documents in the index before the query is executed? Filters are currently a

Re: Indexer.Java problem

2009-02-19 Thread Seid Mohammed

I better modify it, but can you give just a hint on how to modify thanks a lot Seid M On 2/19/09, Erick Erickson wrote: > LIA was written for a pretty early version of Lucene, if you're using a > recent > release you need to modify the code to be compliant with that version. > > Or install an ol

Re: Indexer.Java problem

2009-02-19 Thread Erick Erickson

LIA was written for a pretty early version of Lucene, if you're using a recent release you need to modify the code to be compliant with that version. Or install an older release of Lucene. Erick On Thu, Feb 19, 2009 at 10:41 AM, Seid Mohammed wrote: > I am using netbeans on windows to test luc

Indexer.Java problem

2009-02-19 Thread Seid Mohammed

I am using netbeans on windows to test lucene. I have added all the lib files from the /lib directory to my project library. down the end of Indexer.java program, it states the Field.Text method is not available the error message is as follows ---

Re: stream of events never to know when it ends? how to index such things & search

2009-02-19 Thread Erick Erickson

My indexes have been much more static than yours, so I'll defer indexing event logging recommendations to others. But as I remember, the issue of indexing log files has been discussed on the list before, a search of logfiles or log files in the searchable archive might be useful. Your problem is a

Re: lucene index details

2009-02-19 Thread Erick Erickson

You have to look at Analyzers a bit here because that's what controls what is in the index. The simplest case is a WhitespaceAnalyzer that breaks the input stream up into tokens on any whitespace. So, in your example and using a WhitespaceAnalyzer, you'd get the following tokens: lucene, is, used,

Re: searching a sentence or paragraph

2009-02-19 Thread Seid Mohammed

Thanks Nada, it again works perfectly seid m. On 2/19/09, Nada Mimouni wrote: > > > > You need to create a TermQuery or PhraseQuery with terms in your query > depending on what result you need exactly. > > To create PhraseQuery, try the built-in phrase processing with double > quotes, e.g. > "th

Re: what's the best practice for getting "next page" of hits?

2009-02-19 Thread Erick Erickson

The best practice is, well, "It Depends" (tm). First off, I wouldn't do any caching of results unless and until you had a reasonable certainty that you had performance issues, so would by my first choice. And if you *did* start to see performance issues, I'd look first at why the queries were expe

class used to create term document matrix in lucene

2009-02-19 Thread nitin gopi

Hi all Can anybody tell me which class and its methods are used to create term document matrix in lucene? Regards, Nitin

Re: Index Structure

2009-02-19 Thread Koji Sekiguchi

There is no additional setting for me... Koji Seid Mohammed wrote: I have trioed Amharic fonts, it displays square like character, may be there is a kind of setting for it? Seid M On 2/19/09, Koji Sekiguchi wrote: Seid Mohammed wrote: great, I have got it do luke support unicode? I

Re: Lucene search performance on Sun UltraSparc T2 (T5120) servers

2009-02-19 Thread Glen Newton

I will look a little deeper into the information you supplied and comment, but will suggest this on my initial cursory review: 1 - You have 32GB of memory. Using the 64bit VM, try using a 16GB or 24GB heap; 2 - Turn-on huge pages: -XX:+UseLargePages -XX:LargePageSizeInBytes=256m 3 - Tu

Re: Phrase indexing and searching with Lucene

2009-02-19 Thread Erick Erickson

It looks to me like what you're trying to do is akin to document similarity, which I haven't had to delve into. But it's been discussed on the user list a few times, so perhaps your best bet would be to search the mail archives for that topic. Best Erick On Thu, Feb 19, 2009 at 3:14 AM, Nada Mimo

Re: Index Structure

2009-02-19 Thread Seid Mohammed

I have trioed Amharic fonts, it displays square like character, may be there is a kind of setting for it? Seid M On 2/19/09, Koji Sekiguchi wrote: > Seid Mohammed wrote: >> great, >> I have got it >> do luke support unicode? I am trying lucene in non-english languaguage >> >> > Of course. I can

Re: Index Structure

2009-02-19 Thread Koji Sekiguchi

Seid Mohammed wrote: great, I have got it do luke support unicode? I am trying lucene in non-english languaguage Of course. I can see Japanese terms without problems. Koji - To unsubscribe, e-mail: java-user-unsubscr...@

RE: searching a sentence or paragraph

2009-02-19 Thread Nada Mimouni

You need to create a TermQuery or PhraseQuery with terms in your query depending on what result you need exactly. To create PhraseQuery, try the built-in phrase processing with double quotes, e.g. "this is a phrase". See the Term section at http://lucene.apache.org/java/2_4_0/queryparsersyn

searching a sentence or paragraph

2009-02-19 Thread Seid Mohammed

from lucen index, how can we search a sentence or a paragraph which satisfy our query? thanks a lot seid m -- "RABI ZIDNI ILMA" - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: j

Re: stream of events never to know when it ends? how to index such things & search

2009-02-19 Thread Christian Brennsteiner

hi erick, nr of events are 107/sec in average with 400/sec peak and 20/sec low. between searchable should be less than 20 minutes. we are planning to index IN RAM only for a duration of one day MAX. per lucene process on the operating system. currently we need 500 M RAM for indexing one day (just

Re: "Near" force in query server side?

2009-02-19 Thread Grant Ingersoll

You will likely need to create n-grams from the user query and from that construct a sloppy PhraseQuery. There is an n-gram Filter in the contrib/analysis package (I think it is called the ShingleFilter) On Feb 19, 2009, at 7:19 AM, Ian Vink wrote: Once my app gets the query string from t

"Near" force in query server side?

2009-02-19 Thread Ian Vink

Once my app gets the query string from the user, is there a way to tell the query engine to only return documents where these words are at most 5 words apart? I can't tell the user to change their query, I have to do it server side. Is so do I have to add anything to my index to let Lucene know abo

Re: Analyse TermQuery and PhraseQuery

2009-02-19 Thread Grant Ingersoll

On Feb 19, 2009, at 5:54 AM, Nada Mimouni wrote: Hello, String ws = " "; String query = "The"+ws+"president"+ws+"of"+ws+"the"+ws+"USA"+ws +"is"+ws+""\Barak Obama\""; Query q = QueryParser.parse(query, new StandardAnalyser()); Query q = QueryParser.parse(query, new WhitespaceAnalyser());

Re: Index Structure

2009-02-19 Thread Seid Mohammed

great, I have got it do luke support unicode? I am trying lucene in non-english languaguage thanks a lot seid m On 2/19/09, Nada Mimouni wrote: > > Hello, > > When indexing Lucene generates terms from your original text. > > To see the content and the structure of the index, use "Luke" which is

Filters - at what stage are they applied?

2009-02-19 Thread Joel Halbert

Hi, By way of clarification, when a filter is used with a search query, is the filter applied only to documents that matched the search query or is it applied to all documents in the index before the query is executed? Rgs, Joel

RE: Index Structure

2009-02-19 Thread Nada Mimouni

Hello, When indexing Lucene generates terms from your original text. To see the content and the structure of the index, use "Luke" which is a Lucene index toolbox. You can download it here : http://www.getopt.org/luke/ There is a detailed description of this tool (with pretty screen-shots) in

lucene index details

2009-02-19 Thread Seid Mohammed

I am new to lucene, and reading lucene in action book sometimes, i better understand when somone tell me an answer than a book. my queston is when indexing, what actually lucene is doing? if i have a file called test.txt with contents " lucen is used to index files" and i apply lucene indexing, wh

Index Structure

2009-02-19 Thread Seid Mohammed

I am new to lucene, and reading lucene in action book sometimes, i better understand when somone tell me an answer than a book. my queston is when indexing, what actually lucene is doing? if i have a file called test.txt with contents " lucen is used to index files" and i apply lucene indexing, wh

Analyse TermQuery and PhraseQuery

2009-02-19 Thread Nada Mimouni

Hello, String ws = " "; String query = "The"+ws+"president"+ws+"of"+ws+"the"+ws+"USA"+ws+"is"+ws+""\Barak Obama\""; Query q = QueryParser.parse(query, new StandardAnalyser()); Query q = QueryParser.parse(query, new WhitespaceAnalyser()); In this example: - could we create a query in such a fo

Incremental search, CachingWrapperFilter and BooleanFilter

2009-02-19 Thread Konstantyn Smirnov

Hi all I implemented an autocomplete functionality, which is pretty classical: a user types in some words in an input field, and sees a list of matches in a drop-down. I've done it using filters (BooleanFilter, and TermsFilter + PrefixFilter), and it's working against and index (loaded in RAM) w

Re: what's the best practice for getting "next page" of hits?

2009-02-19 Thread Joel Halbert

Out of interest, if the index is entirely in memory (using a RAMDir) is there any significant different in performance between options (a) and (b) as outlined below? Rgs, Joel -Original Message- From: Ganesh Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org, rolaren..

RE: Phrase indexing and searching with Lucene

2009-02-19 Thread Nada Mimouni

Hello, Thank you Erick for this detailed answer, that makes things clearer in my mind. >I'm still not clear why the built-in phrase query syntax won't work. I have programmed a set of java classes (I use Lucene classes) to index and search into a collection of documents for a set of queries. T

Re: stream of events never to know when it ends? how to index such things & search

Re: Indexer.Java problem

RE: 2.3.2 -> 2.4.0 StandardTokenizer issue

Re: Indexer.Java problem

Re: Filters - at what stage are they applied?

Re: Indexer.Java problem

Re: Indexer.Java problem

Indexer.Java problem

Re: stream of events never to know when it ends? how to index such things & search

Re: lucene index details

Re: searching a sentence or paragraph

Re: what's the best practice for getting "next page" of hits?

class used to create term document matrix in lucene

Re: Index Structure

Re: Lucene search performance on Sun UltraSparc T2 (T5120) servers

Re: Phrase indexing and searching with Lucene

Re: Index Structure

Re: Index Structure

RE: searching a sentence or paragraph

searching a sentence or paragraph

Re: stream of events never to know when it ends? how to index such things & search

Re: "Near" force in query server side?

"Near" force in query server side?

Re: Analyse TermQuery and PhraseQuery

Re: Index Structure

Filters - at what stage are they applied?

RE: Index Structure

lucene index details

Index Structure

Analyse TermQuery and PhraseQuery

Incremental search, CachingWrapperFilter and BooleanFilter

Re: what's the best practice for getting "next page" of hits?

RE: Phrase indexing and searching with Lucene

33 matches

Site Navigation

Mail list logo

Footer information