Re: Lucene searching class

2007-10-25 Thread Steven Rowe
Hi Pooja, poojasreejith wrote: > I am using lucene2.2.0 for my application. I have a searcher.java class. > The problem I am facing is, it is not supporting > > Query query = QueryParser.parse(q, "contents",new StandardAnalyzer()); it > shows error; the method parse in the type QueryParser is

Re: Corpus interpretation

2007-10-24 Thread Steven Rowe
Hi Liaqat, Liaqat Ali wrote: > I want to index the Urdu language corpus (200 documents in CES XML DTD > format). Is net necessary to break the XML file into 200 different files > or it can be indexed in the original form using Lucene. Kindly guide in > this regard. A Lucene document is composed o

Re: Is there bug in CJKAnalyzer?

2007-10-23 Thread Steven Rowe
Hi Ivan, Ivan Vasilev wrote: > But how to understand the meaning of this: “To overcome this, you > have to index chinese characters as single tokens (this will increase > recall, but decrease precision).” > > I understand it so: To increase the results I have to use instead of > the Chinese anot

Re: Questions Lucene

2007-09-11 Thread Steven Rowe
Hi Durga, I have moved this discussion to the java-user list, since the java-dev list is devoted to development of the Java Lucene library, and not to questions about its capabilities. My answers are inline below. [EMAIL PROTECTED] wrote: >1) What are the various languages supported by L

Re: Look for strange encodings -- tokenization

2007-09-05 Thread Steven Rowe
poeta simbolista wrote: > I'd want to know the best way to look for strange encodings on a Lucene > index. > i have several inputs where input can have been encoded on different sets. I > not always know if my guess about the encoding has been ok. Hence, I'd > thought of querying the index for some

Re: Lucene indexing for pdf files

2007-08-31 Thread Steven Rowe
Hi Madhu, Madhu wrote: > i am indexing pdf document using pdfbox 7.4, its working fine for some pdf > files. for japanese pdf files its giving the below exception. > > caught a class java.io.IOException > with message: Unknown encoding for 'UniJIS-UCS2-H' > > Can any one help me , how to set th

Re: Postal Code Radius Search

2007-08-29 Thread Steven Rowe
Mike wrote: > I've searched the mailing list archives, the web, read the FAQ, etc and I > don't see anything relevant so here it goes… > > I'm trying to implement a radius based searching based on zip/postal codes. Here is a selection of interesting threads from the Lucene ML with relevant info:

Re: performance on filtering against thousands of different publications

2007-08-14 Thread Steven Rowe
Hi Cedric, Cedric Ho wrote: > On 8/13/07, Erick Erickson <[EMAIL PROTECTED]> wrote: >> Are you iterating through a Hits object that has more than >> 100 (maybe it's 200 now) entries? Are you loading each document that >> satisfies the query? Etc. Etc. > > Unfortunately, yes. And I know this is an

Re: Lucene in large database contexts

2007-08-10 Thread Steven Rowe
Hi Antonello, Antonello Provenzano wrote: > I've been working for a while on the implementation of a website > oriented to contents that would contain millions of entries, most of > them indexable (such as descriptions, texts, names, etc.). > The ideal solution to make them searchable would be to

Re: multiple field searcher

2007-08-03 Thread Steven Rowe
qaz zaq wrote: > I have Search Terms: T1, T2... Tn. Also I have document fields of F1 F2... Fm. > > I want to search the match documents across F1 to Fm fields,i.e., all of the > T1, T2, ...Tn need to be matched, but can be in the combination of T1, T2, > ... Tn field. > > I check the MultiFie

Re: Search that supports all valid characters in a Unix filename

2007-07-09 Thread Steven Rowe
Hi Ed, Ed Murray wrote: > Could > someone let me know the best Analyzer to use to get an exact match on a Unix > filename when it is inserted into an untokened field. > > Filenames > obviously contain spaces and forward slashes along with other characters. I > am using > a WhitespaceAnalyzer bu

Re: Rewrite one phrase to another in search query

2007-06-27 Thread Steven Rowe
Hi Aliaksandr, Aliaksandr Radzivanovich wrote: > What if I need to search for synonyms, but synonyms can be expanded to > phrases of several words? > For example, user enters query "tcp", then my application should also > find documents containing phrase "Transmission Control Protocol". And > conv

Re: JavaCC Download

2007-06-27 Thread Steven Rowe
version I got from javacc.dev.java.net: http://atlas.ucpel.tche.br/~dubois/compiladores/javacc-4.0.zip Good luck, Steve Mahdi Rahimi wrote: > How can I access to Certificate of this site? > > Steven Rowe wrote: >> I don't think you need to register - I am not registered an

Re: JavaCC Download

2007-06-26 Thread Steven Rowe
> 2007/6/23, Mahdi Rahimi <[EMAIL PROTECTED]>: >> >> >> Hi Steven. >> >> When i access to this address, this message appread >> >> Forbidden >> You don't have permission to access /servlets/ProjectHome on this server. >> >>

Re: Porter stemming problem

2007-06-22 Thread Steven Rowe
Hi Rob, Robert Walpole wrote: > At the moment I am attempting to do this as follows... > > analyzer = new PorterStemAnalyzer(); > parser = new QueryParser("content", analyzer); > Query query = parser.parse("keywords: relaxing"); > Hits hits = idxSearcher.search(query); > > ...but this is not ret

Re: Facet searching on single field with multiple words value

2007-06-21 Thread Steven Rowe
Hi Sawan, Sawan Sharma wrote: > Now, The problem occured when I passed the multiple words in term query. > e.g. > QueryFilter filter = new QueryFilter(new TermQuery(new Term(FieldName, > FieldValue))); > > where field name and field value dynamically getting. > here we take the example value. >

Re: JavaCC Download

2007-06-21 Thread Steven Rowe
Mahdi Rahimi wrote: > Hi. > > How can I access JavaCC?? > > Thanks https://javacc.dev.java.net/ -- Steve Rowe Center for Natural Language Processing http://www.cnlp.org/tech/lucene.asp - To unsubscribe, e-mail: [EMAIL PROTECT

Re: Position of matches to affect scoring

2007-06-19 Thread Steven Rowe
Hi Jes, Jesse Prabawa wrote: > The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ > mentions that the position of the matches in the text does not affect > scoring. So is there anyway that I can make the position of the > matches affect scoring? For example, I want matches that occur a

Re: how to search the fields in SimpleAnalyzer

2007-06-19 Thread Steven Rowe
Hi Sebastin, Sebastin wrote: > i index my document using SimpleAnalyzer() when i search the Indexed > field in the searcher class it doesnt give me the results.help me to sort > out this issue. > > My Code: > > test="9840836598" > test1="bch01" > > testRecords=(test+" "+test1); > > docum

Re: negative queries

2007-06-18 Thread Steven Rowe
Hi Daniel, Daniel Noll wrote: > On Saturday 16 June 2007 11:39:35 Chris Hostetter wrote: >> : The mailing list has already answered this question dozens of times. >> : I've been wondering lately, does this list have a FAQ? If so, is this >> : question on it? >> >> The wiki is open to editing by

Re: negative queries

2007-06-15 Thread Steven Rowe
Hi Antony, Antony Sequeira wrote: > In the attached test file I am using string queries and showing the > failure case. The attachment didn't make it for some reason. > Basically I get the impression that I can not have a clause like > +(-x:y) anywhere in my query. What follows assumes that the

Re: negative queries

2007-06-15 Thread Steven Rowe
Daniel Noll wrote: > On Friday 15 June 2007 11:07:25 Antony Sequeira wrote: >> Hi >> I am aware that with Lucene I can not do negative only queries such as >> -foo:bar > > The mailing list has already answered this question dozens of times. I've > been wondering lately, does this list have a F

Re: QueryParser stripping special char

2007-06-12 Thread Steven Rowe
Hi Harini, Harini Raghavan wrote: > I am trying to create a lucene query to search for companies based on > areacode. The phone no. is stored in the lucene index in the form of > '415-567-2323'. I need to create a query like +areaCode:"415-". But the > QueryParser is stripping off the hyphen(-). >

Re: How can I search over all documents NOT in a certain subset?

2007-06-08 Thread Steven Rowe
-Original Message- > From: Antony Bowesman [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 06, 2007 11:36 PM > To: java-user@lucene.apache.org > Subject: Re: How can I search over all documents NOT in a certain subset? > > Steven Rowe wrote: >> Conceptually (cav

Re: I need 'cat???' to match 'cat' again!

2007-06-06 Thread Steven Rowe
Hi Tim, Tim Smith wrote: > How can I restore the behavior of the old > WildcardQuery under 2.1? > I badly need 'cat???' to match 'cat' again just like > in the older versions. The behavior you want was last sighted in Java Lucene four releases ago (v1.4.3). See Doug Cutting's response to a simil

Re: How can I search over all documents NOT in a certain subset?

2007-06-05 Thread Steven Rowe
Hi Hilton, Hilton Campbell wrote: > Hello all, > > In my application I want to perform a search over all the documents > that are NOT in a certain subset, and I'm not sure how I should do > it. > > Specifically, the application performs a search and the top N results > are shown to the user. The

Re: Maintain a backup index

2007-06-05 Thread Steven Rowe
Hi Divya, The Lucene library itself provides no support for "backup". You might be interested in the Solr project[1], which extends Lucene, and which automates index replication. From the Solr Introduction / Features page[2]: Replication * Efficient distribution of index parts that have

Re: Modifying StandardAnalyzer so that it also splits words after pun ctuation characters that are not followed by whitespace

2007-05-29 Thread Steven Rowe
Hi Michael, Michael Böckling wrote: > Hi folks! > > The topic says it all: I want to modify the StandardAnalyzer so that it also > splits words after punctuation characters (.,: etc.) that are NOT followed > by a whitespace character, in addition to punctuation characters that ARE > followed by w

Re: WhitespaceAnalyzer [was: Re: regaridng Reader.terms()]

2007-05-29 Thread Steven Rowe
Hi Mohammad, Mohammad Norouzi wrote: > [Hoss wrote:] >> ...are there Persian characters with a category type of SPACE_SEPARATOR, >> LINE_SEPARATOR, or PARAGRAPH_SEPARATOR ? > > How can I know that? The Unicode standard's codes[1] for these are: SPACE SEPARATOR: Zs LINE SEPARATOR: Zl PA

Re: multiple tokens at the same position

2007-05-25 Thread Steven Rowe
Hi Enis, Enis Soztutar wrote: > In nutch we have a use case in which we need to store tokens with their > original text plus their stemmed form plus their canonical form(through > some asciifization). From my understanding of lucene, it makes sense to > write a tokenstream which generates several

Re: KeywordAnalyzer vs. Field.Index.UN_TOKENIZED

2007-05-24 Thread Steven Rowe
Hi Terry, The one place I know where KeywordAnalyzer is definitely useful is when it is used in conjunction with PerFieldAnalyzerWrapper. Steve dontspamterry wrote: > Hi Otis, > > I tried both ways, did some queries, and results are the same, so I guess > it's a matter of preference??? > > -Te

WhitespaceAnalyzer [was: Re: regaridng Reader.terms()]

2007-05-23 Thread Steven Rowe
; and my language is Persian, and only change I've made is not to ignoring > unicode characters in Persian and arabic language, because with original > WhitespaceAnalyzer it didnt work fine whether it ignore or something > else, I > dont know but I extends my classes and now I am using

Re: regaridng Reader.terms()

2007-05-22 Thread Steven Rowe
Hi Mohammad, May I ask what your language is? And what kind of changes to WhitespaceAnalyzer were required to make it work with your language? If you have made modifications to WhitespaceAnalyzer that are generally useful, please consider contributing your changes back to the Lucene project. Th

Re: documents with large numbers of fields

2007-05-21 Thread Steven Rowe
Mike Klaas wrote: > On 18-May-07, at 1:01 PM, charlie w wrote: >> Is there an upper limit on the number of fields comprising a document, >> and if so what is it? > > There is not. They are relatively costless if omitNorms=False Mike, I think you meant "relatively costless if omitNorms=True". St

Re: Multi-field distinct query

2007-05-16 Thread Steven Rowe
it to model a > single, parent-child relation as one document due to requirements and the > fact that we were having memory issues for cases where a parent had an > extremely large number of children (~200,000). > > -Terry > > > Steven Rowe wrote: >> Hi Terry, >&

Re: Concept Search

2007-05-16 Thread Steven Rowe
ot;synonym" of > WildAnimals$ whenever you encountered any of the items in your > list, then when concept searching is called for, search on > WildAnimals$. > > Highlighting might be tricky, but certainly do-able, especially with > the capabilities of a MemoryIndex.. >

Re: Concept Search

2007-05-16 Thread Steven Rowe
Hi Charles, The need presented by your use case sounds very similar to that served by the SynonymAnalyzer given in Erik Hatcher's and Otis Gospodnetic's excellent book "Lucene in Action" - take a look: http://lucenebook.com/ Steve Charles Patridge wrote: > I have looked around on Lucene w

Re: Indexing the ORACLE using lucene

2007-05-11 Thread Steven Rowe
Krishna Prasad Mekala wrote: > I have to create the index from my Oracle database. Can anybody tell me > how to create the index from Oracle using lucene? Check out Marcelo Ochoa's Oracle/Lucene integration: http://issues.apache.org/jira/browse/LUCENE-724 Steve --

Re: Extracting a subset of an index

2007-04-03 Thread Steven Rowe
Karl Wettin's code to facilitate index copying may be useful (the below link is to a post of Karl's to the java-dev mailing list): Steve Erick Erickson wrote: > In the immortal words of Erik H. ...it depends... >

Re: how to search over another search

2007-03-27 Thread Steven Rowe
Mohammad Norouzi wrote: > Steven, > what this means: > "Each index added must have the same number of documents, but > typically each contains different fields. Each document contains the > union of the fields of all documents with the same document number. > When searching, matches for a query ter

Re: Virtually merge two indexes?

2007-03-26 Thread Steven Rowe
Hi Chris, Chris Lu wrote: > Hi, Steven, > > Thanks for the instant reply! But let's see the warning in the > ParallelReader javadoc: > "It is up to you to make sure all indexes are created and modified > the same way. For example, if you add documents to one index, you need > to add the same docu

Re: Virtually merge two indexes?

2007-03-26 Thread Steven Rowe
I think ParallelReader, first released in Lucene-Java 1.9, should meet your needs: - An IndexReader which reads multiple, parallel indexes. Each index added must have the same number of documents, but typical

Re: how to search over another search

2007-03-26 Thread Steven Rowe
nts to one index, you need to add the same documents in the same order to the other indexes. Failure to do so will result in undefined behavior. - Steve Steven Rowe wrote: > Hi Mohammad, > > Have you looked at MultiSearcher? > > <http://lucene.apache.org/java/docs/api/

Re: how to search over another search

2007-03-26 Thread Steven Rowe
Hi Mohammad, Have you looked at MultiSearcher? Section 5.6 of Lucene in Action covers its use. Steve Mohammad Norouzi wrote: > hi > I have two separated index but there are some fields that are common > betwee

Re: search for phrase with specail chars?

2007-03-12 Thread Steven Rowe
Hi Ruchika, Are there are any quote characters in your index (may the Luke be with you[1])? If not, you could just remove all quotes from your query (except the surrounding ones indicating phrase matching, of course), and things will work, as you have indicated. Which version of Lucene are you u

Re: pdf box help

2007-03-12 Thread Steven Rowe
This may help: http://www.pdfbox.org/userguide/text_extraction.html#Lucene+Integration ashwin kumar wrote: > hi all i am able to convert a pdf in to a text file using pdfbox. and this > is the code that i used > > import org.pdfbox.pdfparser.PDFParser; > import org.pdfbox.pdmodel.PDDocument; > i

Re: Indexing & search?

2007-03-06 Thread Steven Rowe
Hi senthil, senthil kumaran wrote: >I've indexed 4 among 5 fields with Field.Store.YES & Field.Index.NO. And > indexed the remaining one, say it's Field Name is *content*, with > Field.Store.YES & Field.Index.Tokenized(It's value is collective value of > other 4 fields and some more values).So

Re: Registering a local dtd file for use with Digester

2007-02-22 Thread Steven Rowe
Hi Mike, > I have a collection of XML files that I would like to parse using Digester > in order to index them for Lucene. A DTD file has been supplied for the XML > files, but none of those files has a line associating them > with the DTD file. Can the Digester's register function be used to tel

Re: Why this query is not correct?

2007-01-30 Thread Steven Rowe
Check out QueryParser.setAllowLeadingWildcard(): (though AFAICT this feature is not in any released version of Lucene yet - you'll have to use a nightly build). poeta simbolist

Re: Highlighting brackets bug ?

2007-01-15 Thread Steven Rowe
heikki doeleman wrote: > One question though .. is there an easy way to download the sources > from the svn repository, in one go ? I did it now by right-clicking > links to files The "Source Code" section of the Lucene Java Developer Resources page

Re: Getting a Better Understanding of Lucene's Search Operators

2007-01-10 Thread Steven Rowe
Walt Stoneburner wrote: > Do I have correct and complete understanding of the two operators? Not entirely complete :) - more information in the October 2006 thread "QueryParser is Badly Broken": -

Re: hithighlighter bug

2007-01-10 Thread Steven Rowe
Jason wrote: > Hi all, > I have come across what I think is a curious but insidious bug with > the java lucene hit highlighter. [...] > when I search for -> Acquisition Plan <- > in my search results I get: > (ancilliary stuff deleted) > attached to the Acquisition > < em>Planand signed >

Re: Speed of grouped queries

2007-01-03 Thread Steven Rowe
Hi Scott, sdeck wrote: > I guess, any ideas why I would run out of heap memory by combining all of > those boolean queries together and then running the query? What is happening > in the background that would make that occur? Is it storing something in > memory, like all of the common terms or som

Re: Speed of grouped queries

2007-01-03 Thread Steven Rowe
ny case, it sounds like the # of documents in your index is fairly small -- have you tried using RAMDirectory <http://lucene.apache.org/java/docs/api/org/apache/lucene/store/RAMDirectory.html>? Hope it helps, Steve > Steven Rowe wrote: >> Hi Sdeck, >> >> sdeck wrote

Re: Speed of grouped queries

2007-01-03 Thread Steven Rowe
Hi Sdeck, sdeck wrote: > The query for collecting a specific actor is around 200-300 milliseconds, > and the movie one, that actually queries each actor, takes roughly 500-700 > milliseconds. Yet, for a genre, where you may have 50-100 movies, it takes > 500 milliseconds*# of movies I'm having tr

Re: Nested Queries

2006-12-28 Thread Steven Rowe
Hi Kapil, Kapil Chhabra wrote: > Hi Steve, > Thanks for the response. > Actually I am not looking for a query language. My question is, whether > Lucene supports Nested Queries or self joins? > As per > http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html > > In B

Re: Nested Queries

2006-12-27 Thread Steven Rowe
Hi Kapil, Kapil Chhabra wrote: > Just to mention, I have tokenized FIELD2 on "," and indexed it. > > FIELD2:3 should return 1,2 > FIELD2:(FIELD2:3) should return something like the output of: > > *FIELD2: 1 OR FIELD2: 2 Given your data table, I assume you mean: FIELD1:3 should return 1,2

Re: Lucene id generation

2006-12-19 Thread Steven Rowe
Antonio Bruno wrote: > To use but directly the docId would render efficient and fastest the > searches much. Thoughts to the possibility of being able to apply a > first CachingWrapperFilter F1 on an index and a second > CachingWrapperFilter F2 on an other index and after to make (F1 AND > F2) and

Re: Lucene change field values to wrong ones when indexing

2006-12-14 Thread Steven Rowe
Hi Adrian, I don't see anything obviously wrong with your code. Can you give more details about which field values are different from what you expect? I'm guessing it's the id field you're worried about, but it's not clear from what you have written whether it's the title or the id field which i

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe
Karl Koch wrote: > Is there any other paper that actually shows the benefit of doing > this particular normalisation with coord_q_d? I am not suggesting > here that it is not useful, I am just looking for evidence how the > idea developed. I think it's a mischaracterization to call coordination a

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe
Karl Koch wrote: > The coord(q,d) normalisation is "a score factor based on how many of > the query terms are found in the specified document." and described > here: > > http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord > > Does this have a theoretical

Re: Using Lucene to search log files

2006-12-11 Thread Steven Rowe
abdul aleem wrote: > How to actually retrieve the content of search, > > Most of the examples in Lucene in Action > Searcher gives the results found in number of > documents > > but i coudln't find an API to retrieve the line or > paragraph where the search is matched Hi Abdul, I don't know w

Re: Limiting QueryParser

2006-11-21 Thread Steven Rowe
static String QueryParser.escape(String) should do the trick: Look at the bottom of the below-linked page for the list of characters that the above method will escape:

Re: is there any way to find unique records ?

2006-11-21 Thread Steven Rowe
Bhavin, Mark Harwood gives a solution that looks almost exactly like what you want: http://www.mail-archive.com/java-user@lucene.apache.org/msg05154.html Steve Chris Hostetter wrote: > serach the archives for "faceted searching" and "category counts" and you > should find lots of discussions

Re: Lucene 2.0.1 release date

2006-10-26 Thread Steven Rowe
George Aroush wrote: > From your email, I take it that even for the Java folks, they can't > accumulate the list of files that make up 2.0.1. Am I right? There has never been and likely will never be a 2.0.1 release. "2.0.1", "2.1" -- these are labels for *potential* future releases. "2.1" is m

Re: Searching pdf, getting page number

2006-10-16 Thread Steven Rowe
Hi Bill, Bill Taylor wrote: > On Oct 16, 2006, at 5:44 AM, Christoph Pächter wrote: >> I know that I can index pdf-files (using a third-party library). > > Could you please tell me where to find this library? There are several PDF extraction packages listed here (look under the "Lucene Document

Re: Looking for a stemmer that can return all inflected forms

2006-10-16 Thread Steven Rowe
Hi Jong, Jong Kim wrote: > I'm looking for a stemmer that is capable of returning all morphological > variants of a query term (to be used for high-recall search). For example, > given a query term of 'cares', I would like to be able to generate 'cares', > 'care', 'cared', and 'caring'. To ac

Re: Performing a like query

2006-10-09 Thread Steven Rowe
Hi Rahil, Rahil wrote: > I was just wondering whether there is a > difference between the regular expression you sent me i.e. > (i) \s*(?:\b|(?<=\S)(?=\s)|(?<=\s)(?=\S))\s* > >and > (ii) \\b > > as they lead to the same output. For example, the string search "testing > a-new string=3/4

Re: Performing a like query

2006-10-06 Thread Steven Rowe
Steven Rowe wrote: >\s*(?:\b|(?<=\S)(?=\s)|(?<=\s)(?=\S))\s* Oops, here's an improved version to cover the beginning- and end-of-string non-alphanumeric cases (E.g. "=some text-"): \s*(?:\b|(?<=\S)(?

Re: Performing a like query

2006-10-06 Thread Steven Rowe
Hi Rahil, Rahil wrote: > I couldnt figure out a valid regular expression to write a valid > Pattern.compile(String regex) which can tokenise a string into "O/E - > visual acuity R-eye=6/24" into "O","/","E", "-", "visual", "acuity", > "R", "-", "eye", "=", "6", "/", "24". The following regular e

Re: Case sensitive / insensitive

2006-10-06 Thread Steven Rowe
Marcus Falck wrote: > Any good approaches for allowing case sensitive and case insensitive > searches? > > Except adding an additional field and skipping the LowerCaseFilter. > Since this severely increases the index size (and the index already > is around 1 TB). Hi Marcus, How about a filter tha

Re: 'categorized-term' web index

2006-09-28 Thread Steven Rowe
Vladimir Olenin wrote: > - is there a place I can get already crawled internet web pages in an > archive (10 - 100Gb of data) I don't the sizes of the corpora mentioned on Lucene Wiki's Resources page, but it's a good place to start:

Re: Lucene In Action Book vs Lucene 2.0

2006-09-27 Thread Steven Rowe
http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_0_0/CHANGES.txt Otis Gospodnetic wrote: > CHANGES.txt is your best source for that answer. > > KEGan <[EMAIL PROTECTED]> wrote: > > What about the internal of Lucene? Are there any major changes in there? -

Re: Help wanted

2006-09-20 Thread Steven Rowe
The Resources page on the Lucene Wiki has a collection of articles that may be useful to you: Michael McCandless wrote: > Mark Miller wrote: >> I'll one up you: >> >> http://www.manning.com/hatcher2/ >> >> Might as well save yourself a whole lot o

Re: Versions

2006-09-18 Thread Steven Rowe
Hi Luis, Chris Hostetter wrote: > Luis Rodrigo Aguado wrote: > : I've been looking through the documentation in the official > : web-site, and the Javadoc belongs to v2.1, that I could not find > : anywhere, anyone has a clue about where to find it or when will it be > : officially released? >

Re: Documents that know more?

2006-08-29 Thread Steven Rowe
There has been a long-running thread on the java-dev list about how to allow application-specific "extra stuff" to be placed in the index, at multiple levels of granularity. Some of this conversation is captured on the Wiki at: http://wiki.apache.org/jakarta-lucene/FlexibleIndexing Maybe you cou

Re: Indexing Documents which has Attachments and are Refered many times!!

2006-08-12 Thread Steven Rowe
As Jason says, you can structure each Lucene document with one Field per content type, and index all data that way. The database is not required. To address your search complexity concern, you can create queries that search only those Field(s) the user wants -- there is no need to have a Field fo

Re: EMAIL ADDRESS: Tokenize (i.e. an EmailAnalyzer)

2006-07-31 Thread Steven Rowe
Michael J. Prichard wrote: > Hey Otis, > > Sure I would love to! Can you ping me at [EMAIL PROTECTED] and > let me know what I need to do? Do I just post it to JIRA? > > Thanks, > Michael > > Otis Gospodnetic wrote: > >> A good place for that in JIRA. could you put it there? We have a >> b

Re: Sorting with Parallelreader fails

2006-07-25 Thread Steven Rowe
Steven Rowe wrote: > And, por supuesto, posting what appears to be Visual Basic code > (presumably to be used with Lucene.Net) to an explicitly *Java* list > (dude, the name of the list is "java-user") may be prove fruitful than > you might hope That should read: ... may

Re: Sorting with Parallelreader fails

2006-07-25 Thread Steven Rowe
neils wrote: > Hi, > > i have 3 indexfiles which i access over a parallelreader. > > When i make a search, everything works fine, butwhen i want to make a > search and sorting by a special > column i get an error. You need to say exactly what the error is, right? Or else we won't know, hm

Re: Matching accented with non-accented characters

2006-07-25 Thread Steven Rowe
Rajan, Renuka wrote: > I am trying to match accented characters with non-accented characters > in French/Spanish and other Western European languages. The use case > is that the users may type letters without accents in error and we > still want to be able to retrieve valid matches. The one idea,

Re: query for search through lucene for BLOB

2006-07-12 Thread Steven Rowe
Hi Sudarshan, When your question is Java usage related, you will almost certainly get better responses by asking just on the Java User list. Oddly enough, hitting all of the mailing lists for the project at once with the same question is likely to *reduce* your chances of getting polite/on-to

Re: combined filesystem and web search

2006-07-11 Thread Steven Rowe
Tomi NA wrote: I wish people would start selling .pdf books online... :( Your wish is granted: Then there's IndexMergeTool which I haven't used, but looks interesting. I haven't ran into it. Can you direct me to a document or two? It's in contrib und

Re: weird error with SVN of Lucene

2006-06-27 Thread Steven Rowe
On 06/27/06 at 1:00 PM, Yura Smolsky wrote: > svn co -r 417135 > http://svn.apache.org/repos/asf/lucene/java/trunk > lucene-java-2.0.0-417135 I successfully ran this exact command line just now -- no errors. It is strange that the revision number given with the checkout command (417135) does not

Re: Exact Match Searches and Stop Words

2006-06-20 Thread Steven Rowe
Hugh Ross wrote: The problem is that the standard analyzer removes the stop word (i.e. "of") before indexing and searching. Is there an workaround for this? See my response to a similar question here: In

Re: How can I tell Lucene to also use analyzer for Keyword fields

2006-06-12 Thread Steven Rowe
Mordo, Aviran (EXP N-NANNATEK) wrote: What you are asking is not possible. The whole purpose of the analyzer is to tokenize the fields, so if you want them to be tokenized don't use the Keyword fields. Um, KeywordAnalyzer?

Re: lucene search sentence

2006-04-27 Thread Steven Rowe
Anton Feldmann wrote: 3) How do I display the sentence before and after the sentence the hit is in? You could: 1. Make your Lucene Document be a set of three sentences (before, searchable, after), which you store, but write a custom Analyzer which only returns tokens for the "searchable" cen

Re: exact match ..

2006-02-20 Thread Steven Rowe
Mufaddal Khumri wrote: lets say i do this while indexing: doc.add(Field.Text("categoryNames", categoryNames)); Now while searching categoryNames, I do a search for "digital cameras". I only want to match the exact phrase digital cameras with documents who have exactly the phrase "digital came

Re: Extract term and its frequency from the index and file?

2005-11-14 Thread Steven Rowe
MALCOLM CLARK wrote: Could you send me the url for HighFreqTerms.java in cvs? ViewCVS URL: - To uns

Re: Search problems

2005-11-01 Thread Steven Rowe
Such an analyzer already exists, in Lucene's Subversion repository, under contrib/analyzers/: KeywordAnalyzer. Robert Watkins wrote: One approach for matching your queries with Luke would be to write a custom Analyzer that does absolutely nothing to the terms. Then, if you put this Analyzer in

Re: Is there a way to get absolutely exact phrase matching (no stop words, etc)

2005-10-24 Thread Steven Rowe
Hi Bob, StandardAnalyzer filters the token stream created by StandardTokenizer through StandardFilter, LowercaseFilter, and then StopFilter. Unless you supply a stoplist to the StandardAnalyzer constructor, you get the default set of English stopwords, from StopAnalyzer: public static fin

Re: lucene and databases

2005-10-24 Thread Steven Rowe
Code and examples for embedding Lucene in HSQLDB and Derby relational databases: Rick Hillegas wrote: Thanks to Yonik for replying to my last question about queries and filters. Now I have another issue. I would appreciate any pointers to attem

Re: Indexing and Hit Highlighting OCR Data

2005-06-06 Thread Steven Rowe
There is a proposal to extend indexing (item #11 in the API Changes section): http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard An excerpt: 11. (Hard) Make indexing more flexible, so that one could e.g., not store positions or even frequencies, or alternately, to store extra inf