Re: Indexing the spider content

2008-06-24 Thread John Wang
Maybe building a Lucene gateway to hook in with VSpider. Are you using VSpider or K2Spider? -John On Tue, Jun 24, 2008 at 8:35 PM, yugana <[EMAIL PROTECTED]> wrote: > > Hi Otis, > > Thanks for the reply. So you mean it is not possible to use Lucene to index > the fetched (Verity Spider Content)

Re: lucene query parser for double-worded term query

2008-06-24 Thread Chris Lu
Erick, Thanks! It's the analyzer problem. I should have used the same analyzer, KeywordAnalyzer, to create the query parser. Thanks a lot! -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight

Re: Indexing the spider content

2008-06-24 Thread yugana
Hi Otis, Thanks for the reply. So you mean it is not possible to use Lucene to index the fetched (Verity Spider Content) content. Yug Otis Gospodnetic wrote: > > It sounds like you want to check out Nutch - fetched, indexer, searcher, > etc. in one lovely package. > > > Otis > -- > Sematext

RE: Concurrent query benchmarks, with 1,2,4,8 readers

2008-06-24 Thread Rakesh Shete
Hi Glen, Is your source code available? I would like to have a look at it and check if whatever I have tried makes sense. --Regards, Rakesh S > Date: Fri, 13 Jun 2008 12:51:51 -0400 > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: Concurrent query benchmarks, with 1,

Re: lucene query parser for double-worded term query

2008-06-24 Thread Erick Erickson
What analyzers are you using for both indexing and querying? Have you looked at your index with Luke to see what's actually in the index? The reason I'm asking is I'm wondering whether you are having capitalization issues. That is, your index analyzer lower cases the tokens and your query analyzer

Re: lucene query parser for double-worded term query

2008-06-24 Thread Chris Lu
Yonik, Thanks for your quick reply! But I found after backslash escape spaces, both tags:San\ Francisco tags:"San\ Francisco" turns into PhraseQuery, just like tags:"San Francisco", still no results returned. Maybe Lucene Query Parser does not handle this case? -- Chris Lu -

Re: lucene query parser for double-worded term query

2008-06-24 Thread Yonik Seeley
You can backslash escape spaces, so these should both work: tags:San\ Francisco tags:"San\ Francisco" -Yonik On Tue, Jun 24, 2008 at 8:14 PM, Chris Lu <[EMAIL PROTECTED]> wrote: > I have a tags field. And each tag can have multiple words, like "San > Francisco". Each tag is analyzed into Keyword

lucene query parser for double-worded term query

2008-06-24 Thread Chris Lu
Hi, I have a tags field. And each tag can have multiple words, like "San Francisco". Each tag is analyzed into Keyword field like this new Field("tags", "San Francisco",Field.Store.YES, Field.Index.UN_TOKENIZED) It should be searchable if using TermQuery directly, like new TermQuery(new Term("

Re: lucene search options

2008-06-24 Thread Chris Hostetter
: I am using MultiFieldQueryParser. Can I use setAllowLeadingWildCard with : MultiFieldQueryParser?I am doing the following: : : parser = lucene.MultiFieldQueryParser(fields, analyzer ) : parser.setAllowLeadingWildcard(True) : query = parser.parse(command) : :

Re: Termdocs question

2008-06-24 Thread Chris Hostetter
: : >termDocs = reader.termDocs(term); : >while(termDocs.next()){ : >int index = termDocs.doc(); : >if(reader.document(index).get("id").equals(id)){ : >reader.deleteDocument(index); : >} : >} :

Re: How about adding a new paramer to Similarity.scorePayload( ) ?

2008-06-24 Thread Grant Ingersoll
Not sure, but would a CustomScoreQuery (or an extension of it) work for you? This way, you could try to combine the BoostingTermQuery with a ValueSourceQuery (i.e. FunctionQuery) Another option is to just extend BoostingTermQuery and implement your own scorer that takes into account the fi

changing index format

2008-06-24 Thread John Wang
Hi: I am trying to add couple more values to the TermInfo file and want to keep the index backward compatible. But I see values such as docFreq etc. are stored as a VInt, so I couldn't do things like using the signed bit to determine whether to read/write the extra values. Any suggestions? (

testing, pls ignore

2008-06-24 Thread Jay dragon

Re: searching for C++

2008-06-24 Thread Alex Soto
Thanks everyone. I appreciate the help. I think I will write my own tokenizer, because I do not have a predefined list of words with symbols. I will modify the grammar by defining a SYMBOL token as John suggested and redefine ALPHANUM to include it. Regards, Alex Soto On Tue, Jun 24, 2008 at 1

How about adding a new paramer to Similarity.scorePayload( ) ?

2008-06-24 Thread wuqi
Hi, I want to customize a new Similarity class which need to adopt payload information.The current definition of scorePayload is below: "public float scorePayload(String fieldName, byte [] payload, int offset, int length)" I have a problem when using this function.In case we have two Boostin

Re: searching for C++

2008-06-24 Thread N. Hira
This isn't ideal, but if you have a defined list of such terms, you may find it easier to filter these terms out into a separate field for indexing. -h -- Hira, N.R. Solutions Architect Cognocys, Inc. (773) 251-7453 On 24-Ju

Re: searching for C++

2008-06-24 Thread John Byrne
I don't think there is a simpler way. I think you will have to modify the tokenizer. Once you go beyond basic human-readable text, you always end up having to do that. I have modified the JavaCC version of StandardTokenizer for allowing symbols to pass through, but I've never used the JFlex ve

searching for words with symbols

2008-06-24 Thread Alex Soto
Hello: I have a problem where I need to search for the word "C++". If I use StandardAnalyzer, the "+" characters are removed and the search is done on just the "c" character which is not what is intended. Yet, I need to use standard analyzer for the other benefits it provides. I think I need to w

searching for C++

2008-06-24 Thread Alex Soto
Hello: I have a problem where I need to search for the term "C++". If I use StandardAnalyzer, the "+" characters are removed and the search is done on just the "c" character which is not what is intended. Yet, I need to use standard analyzer for the other benefits it provides. I think I need to w

Re: java.io.Ioexception cannot overwrite fdt

2008-06-24 Thread Michael McCandless
Are you by any chance, separately, removing files from your index directory manually? That's the one case I know of which can lead to that exception, if you also have an IndexReader open on the directory at that time. The code below has one problem. In your if statement true & false c

Re: Indexing the spider content

2008-06-24 Thread Otis Gospodnetic
It sounds like you want to check out Nutch - fetched, indexer, searcher, etc. in one lovely package. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: yugana <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, June 24, 2008

Re: uniqueWords, and termDocs

2008-06-24 Thread Otis Gospodnetic
I have this uncommitted class locally (forgot its origins), which you'll like: $ svn st ? contrib/miscellaneous/src/java/org/apache/lucene/misc/AllTerms.java Slap the package statement and add imports and you have it. Read this into some data structure and pick random terms from there. /

Re: Wildcard and Literal Searches combined

2008-06-24 Thread Erick Erickson
Do you require that the words be right next to each other? You can, of course, set your default to AND (it's OR unless you change it explicitly). That'll give you documents that have both Dublin and City. You don't need wildcards at all in this case. If you require exact matches, you can use Phras

RE: Wildcard and Literal Searches combined

2008-06-24 Thread Jon Loken
Our approach is: If the keyword is a single word, then append the star (e.g. replace -> replace*) If the keyword is a phrase containing one of more spaces, then it is treated as an exact phrase (e.g. replace this -> 'replace this') Regards Jon -Original Message- From: mick l [mailto:[EM

Re: uniqueWords, and termDocs

2008-06-24 Thread Erick Erickson
Isn't asking for unique words (actually tokens) equivalent to enumerating all the terms in a field? I have no idea how to select a random word. Seems like you'd have to somehow use a TermEnum, but I don't think there's anything built in. Best Erick On Mon, Jun 23, 2008 at 6:03 PM, Cam Bazz <[EMA

RE: Wildcard and Literal Searches combined

2008-06-24 Thread mick l
Cheers, Out of interest, how did you go about it in the end? jloken wrote: > > Hi, > > I posed a similar question on 09 May 08. > > The response was as below. I did not go down this route however, as a > wild carded 'exact' phrase is in a way contradictory. > Regards > Jon > > > > Hi, >

RE: Wildcard and Literal Searches combined

2008-06-24 Thread Jon Loken
Hi, I posed a similar question on 09 May 08. The response was as below. I did not go down this route however, as a wild carded 'exact' phrase is in a way contradictory. Regards Jon Hi, Here's a searchable mailing list archive: http://www.gossamer-threads.com/lists/lucene/java-user/ As re

Wildcard and Literal Searches combined

2008-06-24 Thread mick l
Folks, My users require wildcard searches. Sometimes their search phrases contain spaces. I am having trouble trying to implement a wildcard search on strings containing spaces, so if the term includes spaces I force a literal search by adding double quotes to the search term. So the search string

RE: How to search on the indexed content

2008-06-24 Thread Aamir.Yaseen
You can check this code for reference. http://lucene.apache.org/java/docs/demo4.html Regards, Aamir Yaseen -Original Message- From: Lucas F. A. Teixeira [mailto:[EMAIL PROTECTED] Sent: 24 June 2008 10:57 AM To: java-user@lucene.apache.org Subject: Re: How to search on the indexed conten

Re: How to search on the indexed content

2008-06-24 Thread Lucas F. A. Teixeira
http://lucene.apache.org/java/docs/queryparsersyntax.html []s, Lucas Frare A. Teixeira [EMAIL PROTECTED] Tel: +55 11 3660.1622 - R3018 yugana escreveu: Hi All, I have created an index file and indexing the content retrieved from a database. How can I search on thi

How to search on the indexed content

2008-06-24 Thread yugana
Hi All, I have created an index file and indexing the content retrieved from a database. How can I search on this content? When indexed 3 files namely _0.cfs, segments.gen and segments_k are created. Need help on this. Thanks, Yugana -- View this message in context: http://www.nabble.com/How-t

Indexing the spider content

2008-06-24 Thread yugana
Hi All, I am new to Lucene Search. Can you let me know if it is possible to index the "Verity Spider" content. If possible please let me know how to create a index form it and search on it. Also share some code snippets on how to proceed on indexing and searching. I donot have much time to go thr

BoostingQuery

2008-06-24 Thread Jay dragon
Hi, BoostingQuery is designed to demote the scores of documents when they match the undesired query by the boosting/demoting the final score. The problem I see is this demoting factor is static/universal in the sense that it does not depend on how much the docs match the negative query terms. Ideal