Re: Serving remote lucene client - RMI vs HTTP

2007-07-16 Thread Ryan McKinley
customizable Solr really is (rather the ease with which we can do it). Also Solr doesn't support queryFilter out of the box (Hossman: there's nothing to stop a solr request handler from using QueryFilter's if they want). How much extra work is it? out of the box, solr supports query filters.

Re: two questions about NumberTools

2007-07-16 Thread Mohammad Norouzi
Thanks Dima the first link is very nice and I put some comment on that if you take a look again but it has no decode method. anyway, I decided to use solr solution thanks again :) On 7/16/07, Dima May <[EMAIL PROTECTED]> wrote: Mohammad, see for my 2 cents below, Good luck. D On 7/16/07,

Re: Serving remote lucene client - RMI vs HTTP

2007-07-16 Thread kumarlimbu
Thank you everyone for your response, Size of our index is around 10GB. Our queries usually take similar response times. Only exception is when we are updating our index and after the index has been switched to a new one from the old one. Yesterday, as Grant and Hossman pointed us to Solr and we

Re: index U.K. U.S. U.N. U.V.

2007-07-16 Thread crspan
Are we sure about KeywordAnalyzer here? Which suppose to "Tokenizes" the entire stream as a single token. (useful for data like zip codes, ids, and some product names.) In the scenario we are discussing, U.S. is just a token within the text and we still would like to leverage from Standard

Re: index U.K. U.S. U.N. U.V.

2007-07-16 Thread Otis Gospodnetic
Use KeywordAnalyzer to leave "U.S." as-is and index it as-is. Otis -- Lucene Consulting -- http://lucene-consulting.com/ - Original Message From: crspan <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Saturday, July 14, 2007 5:18:59 PM Subject: index U.K. U.S. U.N. U.V. Would

Re: Scaling Lucene to 500 million documents - preferred architecture

2007-07-16 Thread Otis Gospodnetic
Hi Murali (redirecting to the more appropriate java-user list) Sounds doable. I'd go with FSDirectory (or even its memory mapped cousin) instead of RAMDirectory - let the OS cache Lucene indices. I'm looking at a search cluster with 3 times that many machines (but not as high-end as your 8 CP

Failed to load Main-Class manifest attribute

2007-07-16 Thread psimoneschi
Hi, I'm getting an error message when trying to create my Lucene index files. The error is as follows: Failed to load Main-Class manifest attribute from e:\lib\jakarta-regexp-1.3\jakarta-regexp-1.3.jar When I clear the error message by clicking OK it seems that the index finishes running prope

Re: What should I download to use "RegexQuery"

2007-07-16 Thread Mark Miller
Its in a contrib jar. Download the release: http://www.apache.org/dyn/closer.cgi/lucene/java/ then look in the contrib folder for a folder that has 'regex' in it for the correct jar. - Mark mhzmark wrote: Hi, everybody. I am new in lucene technology. I've downloaded lucene-demos-2.2.0.jar

What should I download to use "RegexQuery"

2007-07-16 Thread mhzmark
Hi, everybody. I am new in lucene technology. I've downloaded lucene-demos-2.2.0.jar and lucene-core-2.2.0.jar. Then I added these files to my CLASSPATH. And now I can successfully import classes like: import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query;

RE: search through all fields

2007-07-16 Thread Renaud Waldura
Often documents can be divided in "metadata" and "contents" sections. Say you're indexing Web pages, you could index them with HEAD data all in one field, and the BODY content in another. While also creating separate fields for every HEAD field, e.g. TITLE etc. At search time, you rewrite every qu

Re: Serving remote lucene client - RMI vs HTTP

2007-07-16 Thread Erick Erickson
Another question would be what queries take the longest. Are your response times pretty constant on a per-query basis or are there outliers that could perhaps point to a different solution? Finally, what is the size of your index? The total number of documents is certainly useful, but so is the f

Re: two questions about NumberTools

2007-07-16 Thread Dima May
Mohammad, see for my 2 cents below, Good luck. D On 7/16/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote: Hello I have problem in range queries, for example, I have queries like "field:[1 TO 25]" or "field:[1.1 TO 11.25]" currently these queries not work. field:[20 TO 25] works fine but when

Re: Does Index have a Tokenizer Built into it

2007-07-16 Thread John Paul Sondag
Some of the data sets that will be using have about 2 TB of data (90 million web pages). The Snippet I will be generating I would like to include the words that are being queried, so I don't want to simply store the first 2 or 3 lines. I have looked at the HighlighterTest and I do believe that i

RE: Token offset values for custom Tokenizer

2007-07-16 Thread Ard Schrijvers
Hello, The issue is about lucene 1.9. Can you test it with lucene 2.2? Perhaps the issue is already addressed and solved... Regards Ard > > Thank you for the reply Ard, > > The tokens exist in the index and are returned accurately, except for > the offsets. In this case I am not dealing with

Re: Token offset values for custom Tokenizer

2007-07-16 Thread Shahan Khatchadourian
The issue continues to exist with nightly 146 from Jul 10, 2007. http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/146/ Ard Schrijvers wrote: Hello, The issue is about lucene 1.9. Can you test it with lucene 2.2? Perhaps the issue is already addressed and solved... Regards Ard

Re: Token offset values for custom Tokenizer

2007-07-16 Thread Shahan Khatchadourian
Thank you for the reply Ard, The tokens exist in the index and are returned accurately, except for the offsets. In this case I am not dealing with the positions, so the termvector is specified as using 'with_offsets'. I have left the term position incrememt as its default. Looking at the exist

RE: Serving remote lucene client - RMI vs HTTP

2007-07-16 Thread Ard Schrijvers
Hello, > Hi EVeryone, > > Thank you all for your replies. > > And reply to your questions Grant: > We have more than 3 Million document in our index. > We get more than 150,000 searches (queries) per day. We > expect this no to go > up. Just curious, but suppose those 150.000 searches are don

RE: Does Index have a Tokenizer Built into it

2007-07-16 Thread Ard Schrijvers
Hello, > Ard, > > I do have access to the URL's of the documents, but because I > will be making > short snippets for many pages (suppose it had about 20 hits > per page and I > need to make Snippets for each of them) I was worried it would be > inefficient to open each "hit" tokenize it and th

RE: Token offset values for custom Tokenizer

2007-07-16 Thread Ard Schrijvers
Hello, > Hi, > I am storing custom values in the Tokens provided by a Tokenizer but > when retrieving them from the index the values don't match. What do you mean by retrieving? Do you mean retrieving terms, or do you mean doing a search with words you know that should be in, but you do not fi

two questions about NumberTools

2007-07-16 Thread Mohammad Norouzi
Hello I have problem in range queries, for example, I have queries like "field:[1 TO 25]" or "field:[1.1 TO 11.25]" currently these queries not work. field:[20 TO 25] works fine but when the both limits of the range have different number of digits the query won't work. so the solution is NumberToo