What is multiple indexing and how does it work in Lucene [Java]

2009-10-27 Thread DHIVYA M
Can anyone tell me what is multiple indexing and how does it work in lucene [Java].   Kindly provide the informations either the explanation or any source for such details.   Thanx in advance

[ANNOUNCE] Lucene MeetUp in Oakland, CA - Tue Nov 3rd @ 8PM

2009-10-27 Thread Chris Hostetter
(cross posted to many user lists, please confine reply to gene...@lucene) There will be a Lucene meetup next week at ApacheCon in Oakland, CA on Tuesday, November 3rd. Meetups are free (the rest of the conference is not). See: http://wiki.apache.org/lucene-java/LuceneAtApacheConUs2009 For ot

Re: Split single string into several fields?

2009-10-27 Thread Robert Muir
Will, I think this parsing of documents into different fields, is separate and unrelated from lucene's analysis (tokenization)... the analysis comes to play once you have a field, and you want to break the text into indexable units (words, or entire field as token like your urls). i wouldn't sugge

Re: Split single string into several fields?

2009-10-27 Thread Grant Ingersoll
Not sure if it completely applies here, but you might also have a look at the TeeSinkTokenFilter in the contrib/analysis package. It is designed to tee/sink tokens off from one main field to other fields. On Oct 27, 2009, at 9:56 PM, Will Murnane wrote: On Tue, Oct 27, 2009 at 21:21, Jake

Re: Split single string into several fields?

2009-10-27 Thread Will Murnane
On Tue, Oct 27, 2009 at 21:21, Jake Mannix wrote: > On Tue, Oct 27, 2009 at 6:12 PM, Erick Erickson > wrote: > >> Could you go into your use case a bit more? Because I'm confused. >> Why don't you want your text tokenized? You say you want to search it, >> which means you have to analyze it. > >

Re: Split single string into several fields?

2009-10-27 Thread Jake Mannix
On Tue, Oct 27, 2009 at 6:12 PM, Erick Erickson wrote: > Could you go into your use case a bit more? Because I'm confused. > Why don't you want your text tokenized? You say you want to search it, > which means you have to analyze it. I think Will is suggesting that he doesn't want to have to ana

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-27 Thread Yonik Seeley
On Tue, Oct 27, 2009 at 9:07 PM, Luis Alves wrote: > But there needs to be some forced push for these shorter major release > cycles, > to allow for code clean cycles to also be sorter. Maybe... or maybe not. There's also value in a more stable API over a longer period of time. Different people w

Re: Split single string into several fields?

2009-10-27 Thread Erick Erickson
Could you go into your use case a bit more? Because I'm confused. Why don't you want your text tokenized? You say you want to search it, which means you have to analyze it. All I'm suggesting is passing the text from whatever HTML element into the analyzer, without the surrounding markup. I'm sugge

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-27 Thread Luis Alves
Mark Miller wrote: Luis Alves wrote: Mark Miller wrote: Mark Miller wrote: Michael Busch wrote: Why will just saying once again "Hey, let's just release more often" work now if it hasn't in the last two years? Mich I don't know that we

Re: Split single string into several fields?

2009-10-27 Thread Will Murnane
On Tue, Oct 27, 2009 at 19:17, Erick Erickson wrote: > Unless I don't understand at all what you're going for, wouldn't > it work to just put the HTML through some kind of parser (strict or > loose depending on how well-formed your HTML is), then just > extract the text from your document and push

SearchFiles demo fails with exception while IndexFiles works

2009-10-27 Thread s rajan
hi, I am playing with lucene 2.9.0 source build, ant 1.7.1, jdk1.6.0, win XP home edition. I dont have clover or jFlex installed. I built the srcs and ran IndexFiles demo and that worked. However when I run SearchFiles I have an exception that says: Exception in thread "main" java.lang.Error: Unres

Re: Split single string into several fields?

2009-10-27 Thread Erick Erickson
Unless I don't understand at all what you're going for, wouldn't it work to just put the HTML through some kind of parser (strict or loose depending on how well-formed your HTML is), then just extract the text from your document and push them into your Lucene document? Various parsers make this mor

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-27 Thread Mark Miller
Luis Alves wrote: > Mark Miller wrote: >> Mark Miller wrote: >> >>> Michael Busch wrote: >>> Why will just saying once again "Hey, let's just release more often" work now if it hasn't in the last two years? Mich >>> I don't know that we need to release

Split single string into several fields?

2009-10-27 Thread Will Murnane
Hello list, I have some semi-structured text that has some markup elements, and I want to put those elements into a separate field so I can search by them. For example (using HTML syntax): 8< document Section title Body content >8 I can find that the things inside s are "Sect

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-27 Thread Luis Alves
Mark Miller wrote: Mark Miller wrote: Michael Busch wrote: Why will just saying once again "Hey, let's just release more often" work now if it hasn't in the last two years? Mich I don't know that we need to release more often to take advantage of major numbers. 2.2 wa

Re: Seattle / NW Hadoop, Lucene, Apache "Cloud Stack" Meetup, Wed Oct 28 6:45pm

2009-10-27 Thread Bradford Stephens
Hey guys! Don't forget this is tomorrow (Wednesday). See you there! Cheers, Bradford On Sun, Oct 18, 2009 at 5:10 PM, Bradford Stephens wrote: > Greetings, > > (You're receiving this e-mail because you're on a DL or I think you'd > be interested) > > It's time for another Hadoop/Lucene/Apache "C

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-27 Thread Luis Alves
gabriele renzi wrote: On Fri, Oct 16, 2009 at 9:39 AM, Paul Elschot wrote: I'd prefer B), with a minimum period of about two months to the next release in case it removes deprecations. +1 for B) - To unsubscribe, e-m

Re: Question about the extends the query parser to support NumericField on Lucene 2.9.0

2009-10-27 Thread Luis Alves
Hi, The new queryparser, as the same restriction. Since +/- are operators for the lucene syntax, you need to escape them age:\-32 or use double quotes as suggested by Uwe. We have the idea to add queryparser extensions to the new queryparser in contrib in the near future, this would allow for u

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
Without the optimize, it looks like there are errors on all segments except the first: Opening index @ D:\mnsavs\lresumes1\lresumes1.luc\lresumes1.search.main.2 Segments file=segments_2 numSegments=3 version=FORMAT_DIAGNOSTICS [Lucene 2.9] 1 of 3: name=_0 docCount=413557 compound=false

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
It's reproducible with a large no. of docs (>1 million), but not with 100K docs. I got same error with jvm 1.6.0_16. The index was optimized after all docs are added. I'll try removing the optimize. Peter On Tue, Oct 27, 2009 at 2:57 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > T

Re: IO exception during merge/optimize

2009-10-27 Thread Michael McCandless
This is odd -- is it reproducible? Can you narrow it down to a small set of docs that when indexed produce a corrupted index? If you attempt to optimize the index, does it fail? Mike On Tue, Oct 27, 2009 at 1:40 PM, Peter Keegan wrote: > It seems the index is corrupted immediately after the in

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
It seems the index is corrupted immediately after the initial build (ample disk space was provided): Output from CheckIndex: Opening index @ D:\mnsavs\lresumes1\lresumes1.luc\lresumes1.search.main.2 Segments file=segments_3 numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9] 1 of 1: name=_7

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
On Tue, Oct 27, 2009 at 10:37 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > OK that exception looks more reasonable, for a disk full event. > > But, I can't tell from your followon emails: did this lead to index > corruption? > Yes, but this may be caused by the application ignorin

Re: IO exception during merge/optimize

2009-10-27 Thread Michael McCandless
OK that exception looks more reasonable, for a disk full event. But, I can't tell from your followon emails: did this lead to index corruption? Also, I noticed you're using a rather old 1.6.0 JRE (1.6.0_03) -- you really should upgrade that to the latest 1.6.0 -- there's at least one known proble

Re: Multiterms query and payloads

2009-10-27 Thread Mauro Dragoni
Thanks for the advice... I watch in the documentation, but I saw that the PayloadTermQuery accept only one term a time... however, samething might be done with the PayloadNearQuery. On Mon, Oct 26, 2009 at 3:35 PM, Grant Ingersoll wrote: > In 2.9, there is now the PayloadNearQuery, which might h

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
Clarification: this CheckIndex is on the index from which the merge/optimize failed. Peter On Tue, Oct 27, 2009 at 10:07 AM, Peter Keegan wrote: > Running CheckIndex after the IOException did produce an error in a term > frequency: > > Opening index @ D:\mnsavs\lresumes3\lresumes3.luc\lresumes3.s

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
Running CheckIndex after the IOException did produce an error in a term frequency: Opening index @ D:\mnsavs\lresumes3\lresumes3.luc\lresumes3.search.main.3 Segments file=segments_4 numSegments=2 version=FORMAT_DIAGNOSTICS [Lucene 2.9] 1 of 2: name=_7 docCount=1075533 compound=false has

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
After rebuilding the corrupted indexes, the low disk space exception is now occurring as expected. Sorry for the distraction. fyi, here are the details: java.io.IOException: There is not enough space on the disk at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAcces

Re: faceted search performance

2009-10-27 Thread Toke Eskildsen
On Mon, 2009-10-12 at 20:02 +0200, Jake Mannix wrote: > This killer is the "TermQuery for each term" part - this is huge. You need > to invert this process, and use your query as is, but while walking in the > HitCollector, on each doc which matches your query, increment counters for > each of the

Re: Performance tips when creating a large index from database.

2009-10-27 Thread Toke Eskildsen
On Thu, 2009-10-22 at 15:14 +0200, Erick Erickson wrote: > Besides the other suggestions, I'd really, really, really put > some instrumentationin the code and see where you're spending your time. For > a fast hint, put > a cumulative timer around your indexing part only. This will indicate > whethe

Re: Deleting documents using "starts with"

2009-10-27 Thread Ian Lea
There are IndexWriter.deleteDocuments methods that take queries. Passing a TermQuery and a WildcardQuery to writer.deleteDocuments(Query[]) should do the trick. -- Ian. On Tue, Oct 27, 2009 at 3:10 AM, Paul J. Lucas wrote: > I currently have code that looks like: > >    Term[] terms = new Term

Re: Exception in thread main - error

2009-10-27 Thread DHIVYA M
Thanx for the info. Now i understood what exactly the classpath is. --- On Mon, 10/26/09, Chris Hostetter wrote: From: Chris Hostetter Subject: Re: Exception in thread main - error To: "java user" Date: Monday, October 26, 2009, 6:39 PM : As said i have set the classpath in environment var