Can anyone tell me what is multiple indexing and how does it work in lucene
[Java].
Kindly provide the informations either the explanation or any source for such
details.
Thanx in advance
(cross posted to many user lists, please confine reply to gene...@lucene)
There will be a Lucene meetup next week at ApacheCon in Oakland, CA on
Tuesday, November 3rd. Meetups are free (the rest of the conference is
not). See: http://wiki.apache.org/lucene-java/LuceneAtApacheConUs2009
For ot
Will, I think this parsing of documents into different fields, is separate
and unrelated from lucene's analysis (tokenization)...
the analysis comes to play once you have a field, and you want to break the
text into indexable units (words, or entire field as token like your urls).
i wouldn't sugge
Not sure if it completely applies here, but you might also have a look
at the TeeSinkTokenFilter in the contrib/analysis package. It is
designed to tee/sink tokens off from one main field to other fields.
On Oct 27, 2009, at 9:56 PM, Will Murnane wrote:
On Tue, Oct 27, 2009 at 21:21, Jake
On Tue, Oct 27, 2009 at 21:21, Jake Mannix wrote:
> On Tue, Oct 27, 2009 at 6:12 PM, Erick Erickson
> wrote:
>
>> Could you go into your use case a bit more? Because I'm confused.
>> Why don't you want your text tokenized? You say you want to search it,
>> which means you have to analyze it.
>
>
On Tue, Oct 27, 2009 at 6:12 PM, Erick Erickson wrote:
> Could you go into your use case a bit more? Because I'm confused.
> Why don't you want your text tokenized? You say you want to search it,
> which means you have to analyze it.
I think Will is suggesting that he doesn't want to have to ana
On Tue, Oct 27, 2009 at 9:07 PM, Luis Alves wrote:
> But there needs to be some forced push for these shorter major release
> cycles,
> to allow for code clean cycles to also be sorter.
Maybe... or maybe not.
There's also value in a more stable API over a longer period of time.
Different people w
Could you go into your use case a bit more? Because I'm confused.
Why don't you want your text tokenized? You say you want to search it,
which means you have to analyze it. All I'm suggesting is passing the text
from whatever HTML element into the analyzer, without the surrounding
markup. I'm sugge
Mark Miller wrote:
Luis Alves wrote:
Mark Miller wrote:
Mark Miller wrote:
Michael Busch wrote:
Why will just saying once again "Hey, let's just release more often"
work now if it hasn't in the last two years?
Mich
I don't know that we
On Tue, Oct 27, 2009 at 19:17, Erick Erickson wrote:
> Unless I don't understand at all what you're going for, wouldn't
> it work to just put the HTML through some kind of parser (strict or
> loose depending on how well-formed your HTML is), then just
> extract the text from your document and push
hi, I am playing with lucene 2.9.0 source build, ant 1.7.1, jdk1.6.0, win XP
home edition.
I dont have clover or jFlex installed.
I built the srcs and ran IndexFiles demo and that worked. However when I run
SearchFiles
I have an exception that says:
Exception in thread "main" java.lang.Error: Unres
Unless I don't understand at all what you're going for, wouldn't
it work to just put the HTML through some kind of parser (strict or
loose depending on how well-formed your HTML is), then just
extract the text from your document and push them into your
Lucene document? Various parsers make this mor
Luis Alves wrote:
> Mark Miller wrote:
>> Mark Miller wrote:
>>
>>> Michael Busch wrote:
>>>
Why will just saying once again "Hey, let's just release more often"
work now if it hasn't in the last two years?
Mich
>>> I don't know that we need to release
Hello list,
I have some semi-structured text that has some markup elements, and
I want to put those elements into a separate field so I can search by
them. For example (using HTML syntax):
8< document
Section title
Body content
>8
I can find that the things inside s are "Sect
Mark Miller wrote:
Mark Miller wrote:
Michael Busch wrote:
Why will just saying once again "Hey, let's just release more often"
work now if it hasn't in the last two years?
Mich
I don't know that we need to release more often to take advantage of
major numbers. 2.2 wa
Hey guys! Don't forget this is tomorrow (Wednesday). See you there!
Cheers,
Bradford
On Sun, Oct 18, 2009 at 5:10 PM, Bradford Stephens
wrote:
> Greetings,
>
> (You're receiving this e-mail because you're on a DL or I think you'd
> be interested)
>
> It's time for another Hadoop/Lucene/Apache "C
gabriele renzi wrote:
On Fri, Oct 16, 2009 at 9:39 AM, Paul Elschot wrote:
I'd prefer B), with a minimum period of about two months to the
next release in case it removes deprecations.
+1 for B)
-
To unsubscribe, e-m
Hi,
The new queryparser, as the same restriction.
Since +/- are operators for the lucene syntax, you need to escape them
age:\-32 or use double quotes as suggested by Uwe.
We have the idea to add queryparser extensions to the new queryparser in
contrib in the near future,
this would allow for u
Without the optimize, it looks like there are errors on all segments except
the first:
Opening index @ D:\mnsavs\lresumes1\lresumes1.luc\lresumes1.search.main.2
Segments file=segments_2 numSegments=3 version=FORMAT_DIAGNOSTICS [Lucene
2.9]
1 of 3: name=_0 docCount=413557
compound=false
It's reproducible with a large no. of docs (>1 million), but not with 100K
docs.
I got same error with jvm 1.6.0_16.
The index was optimized after all docs are added. I'll try removing the
optimize.
Peter
On Tue, Oct 27, 2009 at 2:57 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> T
This is odd -- is it reproducible?
Can you narrow it down to a small set of docs that when indexed
produce a corrupted index?
If you attempt to optimize the index, does it fail?
Mike
On Tue, Oct 27, 2009 at 1:40 PM, Peter Keegan wrote:
> It seems the index is corrupted immediately after the in
It seems the index is corrupted immediately after the initial build (ample
disk space was provided):
Output from CheckIndex:
Opening index @ D:\mnsavs\lresumes1\lresumes1.luc\lresumes1.search.main.2
Segments file=segments_3 numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene
2.9]
1 of 1: name=_7
On Tue, Oct 27, 2009 at 10:37 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> OK that exception looks more reasonable, for a disk full event.
>
> But, I can't tell from your followon emails: did this lead to index
> corruption?
>
Yes, but this may be caused by the application ignorin
OK that exception looks more reasonable, for a disk full event.
But, I can't tell from your followon emails: did this lead to index corruption?
Also, I noticed you're using a rather old 1.6.0 JRE (1.6.0_03) -- you
really should upgrade that to the latest 1.6.0 -- there's at least one
known proble
Thanks for the advice...
I watch in the documentation, but I saw that the PayloadTermQuery
accept only one term a time... however, samething might be done with
the PayloadNearQuery.
On Mon, Oct 26, 2009 at 3:35 PM, Grant Ingersoll wrote:
> In 2.9, there is now the PayloadNearQuery, which might h
Clarification: this CheckIndex is on the index from which the merge/optimize
failed.
Peter
On Tue, Oct 27, 2009 at 10:07 AM, Peter Keegan wrote:
> Running CheckIndex after the IOException did produce an error in a term
> frequency:
>
> Opening index @ D:\mnsavs\lresumes3\lresumes3.luc\lresumes3.s
Running CheckIndex after the IOException did produce an error in a term
frequency:
Opening index @ D:\mnsavs\lresumes3\lresumes3.luc\lresumes3.search.main.3
Segments file=segments_4 numSegments=2 version=FORMAT_DIAGNOSTICS [Lucene
2.9]
1 of 2: name=_7 docCount=1075533
compound=false
has
After rebuilding the corrupted indexes, the low disk space exception is now
occurring as expected. Sorry for the distraction.
fyi, here are the details:
java.io.IOException: There is not enough space on the disk
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAcces
On Mon, 2009-10-12 at 20:02 +0200, Jake Mannix wrote:
> This killer is the "TermQuery for each term" part - this is huge. You need
> to invert this process, and use your query as is, but while walking in the
> HitCollector, on each doc which matches your query, increment counters for
> each of the
On Thu, 2009-10-22 at 15:14 +0200, Erick Erickson wrote:
> Besides the other suggestions, I'd really, really, really put
> some instrumentationin the code and see where you're spending your time. For
> a fast hint, put
> a cumulative timer around your indexing part only. This will indicate
> whethe
There are IndexWriter.deleteDocuments methods that take queries.
Passing a TermQuery and a WildcardQuery to
writer.deleteDocuments(Query[]) should do the trick.
--
Ian.
On Tue, Oct 27, 2009 at 3:10 AM, Paul J. Lucas wrote:
> I currently have code that looks like:
>
> Term[] terms = new Term
Thanx for the info. Now i understood what exactly the classpath is.
--- On Mon, 10/26/09, Chris Hostetter wrote:
From: Chris Hostetter
Subject: Re: Exception in thread main - error
To: "java user"
Date: Monday, October 26, 2009, 6:39 PM
: As said i have set the classpath in environment var
32 matches
Mail list logo