Nutch - Microsoft Search Server integration

2008-01-14 Thread Lukas Vlcek
Hi, Is it possible to integrate Nutch into MS Search Server via OpenSearch API? (MS Search Server support Open Search: http://www.microsoft.com/enterprisesearch/serverproducts/searchserver/features.aspx ) I think it should be possible to pass user query from MS server to Nutch and integrate Nutch

Spell Check + Adding records

2008-01-14 Thread rakeshxp
Hello Everyone, I have a query regarding Spell Checker. I created the spell index using the following code SpellChecker spellChecker = new SpellChecker(spellDir); spellChecker.indexDictionary(dictionary); This works perfectly. But is there any way in which I can dynamically add records to the sp

How?

2008-01-14 Thread coolgeng coolgeng
Hi guys, Some problems confuse me. When I would like to index some data from a table in database. While I create the index on this table, the searching job keeps going . How can I work out it? By the way, the number of data is around 1 hundred million. -- Best Regards Cooper Geng

Re: Sort does not work for me

2008-01-14 Thread Aron Sogor
Let me qualify my question: Sort is not working for a field that I stored : document.add(new Field(FIELD_RECEIVED, DateTools.timeToString( System.currentTimeMillis(), DateTools.Resolution.SECOND), Field.Store.NO, Field.Index.UN_TOKENIZED)); using

Re: Cannot bind RMIMessenger exception:non-JRMP server at remote endpoint

2008-01-14 Thread Chris Hostetter
: Trying config file at path /var/www/.lsearch.conf : Trying config file at path /usr/local/search/ls2/lsearch.conf : 0[main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded unicode : decomposer : java.rmi.ConnectIOException: non-JRMP server at remote endpoint : at sun.rmi.tran

Re: SimpleFragmenter docs

2008-01-14 Thread Mark Miller
I think your right, and thats not the only place...the whole handling of maxDocBytesToAnalyze in the main Highlighter class shares this issue. I guess the idea is an ascii holdover one byte equals one char? I am sure Mark H can clear it up, but don't forgot the maxDocBytesToAnalyze part as well

SimpleFragmenter docs

2008-01-14 Thread Grant Ingersoll
I was looking at the SimpleFragmenter in contrib/Highlighter and was wondering about the fragmentSize value. It says the value is the number of bytes, but looking at the code it's using the String offset, right? So it should be the number of characters, right? I can fix it, just wanted to

Lucene 2.3 RC3 available for testing

2008-01-14 Thread Michael Busch
Hi all, I just uploaded Lucene 2.3 RC3 to: http://people.apache.org/~buschmi/staging_area/lucene_2_3/ RC3 fixes a problem in the indexer that could cause it to hang after a disk full exception occurred. (see https://issues.apache.org/jira/browse/LUCENE-1130 for details). Please switch to RC3 and

spell checking for combined words

2008-01-14 Thread solr_user
Does Lucene spell checker have the ability to suggest splitting of combined words. So for e.g. if I have got the word "apple" and "computer" in my index and if I type "applecomputer" then how can I make it suggest "apple computer" -- View this message in context: http://www.nabble.com/spell-che

RE: Lucene sorting case-sensitive by default?

2008-01-14 Thread Alex Wang
No problem Erick. Thanks for clarifying it. Alex -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 12:35 PM To: java-user@lucene.apache.org Subject: Re: Lucene sorting case-sensitive by default? Sorry, I was confused about this for the long

Re: Lucene sorting case-sensitive by default?

2008-01-14 Thread Erick Erickson
Sorry, I was confused about this for the longest time (and it shows!). You don't actually have to store two separate fields. Field.Store.YES stores the input exactly as is, without passing it through anything. So you really only have to store your field. I still think of it conceptually as two enti

RE: Lucene sorting case-sensitive by default?

2008-01-14 Thread Alex Wang
Thanks a lot Erik for the great tip! I do need to display all the fields and allow the users to sort by each field as they wish. My index is currently about 200 mb. Your suggestion about storing (but not index) the cased version, and indexing (but not store) the lower-case version is an excellent

Re: Lucene sorting case-sensitive by default?

2008-01-14 Thread Erick Erickson
Several things: 1> do you need to display all the fields? Would just storing them lower-case work? The only time I've needed to store fields case- sensitive is when I'm showing them to the user. If the user is just searching on them, I can store them any way I want and she'll never know. 2> You m

Re: Index merging and optimizing

2008-01-14 Thread Erick Erickson
OK, I think I'm getting a better handle here. I can't imagine how it would work to combine indexes that use *different* analyzers on the *same* field. Regardless of what Lucene did, you simply could NOT explain this to a user. To take a simple example, index part of your data for field1 with Keywor

Re: How to model hierarchy info to be searched related to a document

2008-01-14 Thread Developer Developer
Yeah I think what u need is one Filed where you store a list of propertytag and value combination and also be able to search on the filed on values and identify that the value is for a particular propertytag. something like propertytag1, value propertytag2,value propertytag3,value etc To be fran

RE: Lucene sorting case-sensitive by default?

2008-01-14 Thread Alex Wang
Thanks everyone for your replies! Guess I did not fully understand the meaning of "natural order" in the Lucene Java doc. To add another all-lower-case field for each sortable field in my index is a little too much, since the app requires sorting on pretty much all fields (over 100). Toke, you me

RE: Index merging and optimizing

2008-01-14 Thread spring
> Then why would you want to combine them? > > I really think you need to explain what you're trying to accomplish > rather then obsess on the details. I have to create indexes in parallel because the amount of data is very high. Then I want to merge them into bigger indexes an move them to the s

RE: How to model hierarchy info to be searched related to a document

2008-01-14 Thread Roger Camargo
Why am I afraid? Building the index wouldn't be a problem. I guess. Querying it would be more difficult. Let's see. Custom properties... defined by the user, there is no restriction, in quantity, and values. > > Custom property name: Frequency> > Custom property value: Quarterly> >> > > > Cus

RE: When to use which Analyzer

2008-01-14 Thread spring
> You can answer an awful lot of this much faster than waiting > for someone > to reply by getting a copy of Luke and look at the parse results using > various > analyzers. Ah cool, you mean the "explain structure" button. > Try KeywordAnalyzer for your query. > > Combine queries programmatica

Re: Index merging and optimizing

2008-01-14 Thread Erick Erickson
Then why would you want to combine them? I really think you need to explain what you're trying to accomplish rather then obsess on the details. Erick On Jan 14, 2008 10:17 AM, <[EMAIL PROTECTED]> wrote: > > I admit I've never used IndexMergeTool, I've always used > > IndexWriter.AddIndexex and

RE: Index merging and optimizing

2008-01-14 Thread spring
> I admit I've never used IndexMergeTool, I've always used > IndexWriter.AddIndexex and then execute > IndexWriter.optimize(). > > And I've seen no problems. That call takes no > analyzer. So you take the first index an add a remaining indexes via addIndexes? What happens if the indexes were crea

Re: When to use which Analyzer

2008-01-14 Thread Erick Erickson
You can answer an awful lot of this much faster than waiting for someone to reply by getting a copy of Luke and look at the parse results using various analyzers. And you can use query.toString() to see the parsed results as well. Try KeywordAnalyzer for your query. Combine queries programmatica

Re: Index merging and optimizing

2008-01-14 Thread Erick Erickson
I admit I've never used IndexMergeTool, I've always used IndexWriter.AddIndexex and then execute IndexWriter.optimize(). And I've seen no problems. That call takes no analyzer. Erick On Jan 14, 2008 6:12 AM, <[EMAIL PROTECTED]> wrote: > > See org.apache.lucene.misc.IndexMergeTool > > Thank you.

Re: How to model hierarchy info to be searched related to a document

2008-01-14 Thread Developer Developer
I am not sure why you are afraid of adding more fields to the document. Having 20-30 fields to a document is not a bad thing to do. Do you have any constraints to limit the number of fields in the document? On Jan 14, 2008 7:59 AM, Roger Camargo <[EMAIL PROTECTED]> wrote: > Thanks for ans

RE: How to model hierarchy info to be searched related to a document

2008-01-14 Thread Roger Camargo
Thanks for answering. It seems that there isn't any other way around, having every combination of dimension and level. The example for the observations of the dimension, would be as follow, maybe isn't such an important information to be stored, but type it is. Dimension name: RegionDimensi

RE: When to use which Analyzer

2008-01-14 Thread spring
> The caution to use the same analyzer at index and query time is, > in my experience, simply good advice to follow until you are > familiar enough with how Lucene uses analyzers to keep from > getting really, really, really confused. Once you understand > when analyzers are used and how they effec

RE: When to use which Analyzer

2008-01-14 Thread spring
> > How can I search for fields stored with Field.Index.UN_TOKENIZED? > > Use TermQuery. > > > Why do I need an analyzer for searching? > > Consider a full-text field that will be tokenized removing special > characters and lowercased, and then a user querying for an uppercase > word. The

RE: Max size of index (FSDirectory )

2008-01-14 Thread spring
> OG: again, it depends. If the index you'd get by merging is > of manageable size, then merge your indices. OK, this is what I tought. A single index should be faster than multiple indexes with a MultiSearcher, right? But what about the ParallelMultiSearcher? As I understand the docs it searc

RE: Index merging and optimizing

2008-01-14 Thread spring
> See org.apache.lucene.misc.IndexMergeTool Thank you. But this uses a hardcoded analyzer and deprecated API-Calls. How does the used analyzer effect the merge process? Is everything reindexed with this new analyzer again? Does this make sense? What if the sources indexes had other analyzers us

Re: Lucene sorting case-sensitive by default?

2008-01-14 Thread Toke Eskildsen
On Fri, 2008-01-11 at 11:40 -0500, Alex Wang wrote: > Looks like Lucene is separating upper case and lower case while sorting. As Tom points out, default sorting uses natural order. It's worth noting that this implies that default sorting does not produce usable results as soon as you use non-ASCI