Re: (~) opertor query....

2007-12-14 Thread Shai Erera
I just noticed MultiPhraseQuery has a setSlop method, so I think this Query is what you're looking for. On Dec 15, 2007 7:04 AM, Shai Erera <[EMAIL PROTECTED]> wrote: > You can look at org.apache.lucene.search.MultiPhraseQuery which does > something similar to what you ask. From its javadoc: > >

Re: index and access to lines of a CSV file

2007-12-14 Thread Shai Erera
You can also look up Apache Derby which is an open source DB which can be integrated into your app (not needing an install, like MySQL which is also free). On Dec 14, 2007 12:43 PM, Ingolf Tobias Rothe <[EMAIL PROTECTED]> wrote: > Hello Mike, > > thank you for the answer. Currently I hold this d

Re: (~) opertor query....

2007-12-14 Thread Shai Erera
You can look at org.apache.lucene.search.MultiPhraseQuery which does something similar to what you ask. From its javadoc: * To use this class, to search for the phrase "Microsoft app*" first use * add(Term) on the term "Microsoft", then find all terms that have "app" as * prefix using IndexRead

Re: index and access to lines of a CSV file

2007-12-14 Thread Ingolf Tobias Rothe
Hello Mike, thank you for the answer. Currently I hold this datastructure in a HashTable in Memory but the ressource consumtion is very high. Lucene looks easy in the way of usage and is suposed to be extremely perfomant. I though also to use later the abilities to lucene to attach parameters to

Re: Basic Named Entity Indexing

2007-12-14 Thread Chris Hostetter
: a) index the documents by wrapping the whitespace analyzer with : ngramanalyzerwrapper and then retrieving only the words which have 3 or more : characters and start with a capital, filtering the "garbage" manually. : b) creating my own analyzer which will only index ngrams that start with : cap

Re: (~) opertor query....

2007-12-14 Thread Chris Hostetter
: I am parsing this query: "Auto* machine"~4. : : Will it work? If yes then right now it's not working. Can : anyone help on this? Tt depends, what do you want it to do? :) If you are hoping it will match documents that contain a word that starts with "Auto" withing a di

Re: Custom SynonymMap

2007-12-14 Thread Chris Hostetter
: Is there a way to add synonyms to the SynonymMap map? : The HashMap that holds all the words is not visible (private) so extending : it will not work. : : Has anyone added their own custom vocabulary? I assume your question is in regards tothe SynonymMap that is part of the memory index cont

Re: "Field weights"

2007-12-14 Thread Shai Erera
What about boosting documents of the Brand type? You can statically boost those documents with a log() function or something similar ... On Dec 14, 2007 8:24 PM, Doron Cohen <[EMAIL PROTECTED]> wrote: > It seems that documents having less fields satisfying > the query worth more than those satisf

Re: "Field weights"

2007-12-14 Thread Doron Cohen
It seems that documents having less fields satisfying the query worth more than those satisfying more fields of the query, because the first ones are more "to the point". At least it seems like it in the example. If this makes sense I would try to compose a top level boolean query out of the one-

Re: Heads-up on SSD

2007-12-14 Thread Erick Erickson
This is great stuff, thanks for posting it. Erick On Dec 14, 2007 5:59 AM, Toke Eskildsen <[EMAIL PROTECTED]> wrote: > There's an interesting article on state-of-the-art setup with Mtron > Solid State Drives at > http://www.nextlevelhardware.com/storage/battleship/ > The concise version is that

Re: QueryParser: if key contains a colon

2007-12-14 Thread Erick Erickson
This should help http://lucene.apache.org/java/docs/queryparsersyntax.html Erick On Dec 14, 2007 7:28 AM, Helmut Jarausch <[EMAIL PROTECTED]> wrote: > Hi, > > how can one search for a key containing a colon when > using QueryParser (with WhitespaceAnalyzer) > > E.g. > searching for 'abc:def'

Re: "Field weights"

2007-12-14 Thread Paul Elschot
Karl, This might work for you: https://issues.apache.org/jira/browse/LUCENE-293 Regards, Paul Elschot On Friday 14 December 2007 18:06:01 Karl Wettin wrote: > I have an index that contains three sorts of documents: > > Car brand > Tire brand > Tire pressure > > (Please bear with me, the real i

Re: Heads-up on SSD

2007-12-14 Thread Chris Lu
Toke, This is fantastic stuff! I always wanted to convince (rich) customers to try SSD. Now it's more convincing! I think the results will be more interesting for indexing, which has a lot of file merges. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/App

"Field weights"

2007-12-14 Thread Karl Wettin
I have an index that contains three sorts of documents: Car brand Tire brand Tire pressure (Please bear with me, the real index has nothing to do with cars. I just try to explain the problem in an alternative domain to avoid NDA conflicts.) There is a heirarchial composite relationship bet

QueryParser: if key contains a colon

2007-12-14 Thread Helmut Jarausch
Hi, how can one search for a key containing a colon when using QueryParser (with WhitespaceAnalyzer) E.g. searching for 'abc:def' Giving this string to QueryParser's parse method, abc: will be misinterpreted as the name of a field. How can this be avoided? Is there something like an escape tec

Heads-up on SSD

2007-12-14 Thread Toke Eskildsen
There's an interesting article on state-of-the-art setup with Mtron Solid State Drives at http://www.nextlevelhardware.com/storage/battleship/ The concise version is that Mtron flash drives puts all traditional harddrives to shame and seems especially well suited for applications that performs a l

highlighter with PhraseQuery = limitation?

2007-12-14 Thread Helmut Jarausch
Hi, When highlighting a phrase query like "Erik Hatcher" all instances of "Eric" as well as all instances of "Hatcher" are highlighted even if they are not next to each other. Is this a limitation of highlighting with Lucene? Many thanks for an explanation, Helmut. -- Helmut Jarausch Lehrstuh

bi-gram with wildcard on QueryParser

2007-12-14 Thread Scott Tiger
QueryParser ingnores tokenizing when query includes wildcard. Here is an example using BigramAnalyzer. Normally. query is : abcde parsed to : ab bc cd de When query includes wildcard. query is : abcde* parsed to : abcde* But I want below parsed result. query is : abcde* parsed to :

Re: Indexing Wikipedia dumps

2007-12-14 Thread Dawid Weiss
Good pointers, thanks. I asked because I did have a problem like this a few months ago -- none of the existing parsers solved it for me (back then). D. Petite Abeille wrote: On Dec 13, 2007, at 8:39 AM, Dawid Weiss wrote: Just incidentally -- do you know of something that would parse the