Thanks for ur suggestion. By the way will u please give me some more details
about TermEnum and its usage. Because am a beginner in using lucene and i want
some theoritical explanations for TermEnum. So kindly provide me a website or
any tutorial's link that provides ample information regarding
Thanks all for the good suggestions !
But any idea of the storage? How can we make the indexes as small as possible?
We know compressing is the only way, but when and where to compress is
best for search?
Thanks all again!
2009/11/24 Kay Kay :
> fulin tang wrote:
>>
>> We are going to add full
The example you have given is invalid, as offsets should always refer to the
original position in the source stream, so should be:
a(0,1,1) b(2,3,1) a(6,7,1) c(11,12,1).
Deng: I'm afraid that if (case1) index "axxxb", and then I search "axb" or
(case2) index "axb" and then search "ab", which
Hmmm, are they unit tests? Or would you be wiling to create stand-alone
unit tests demonstrating this and submit it as a patch?
Best
er...@alwaystrollingforworkfromothers.opportunistic.
On Wed, Nov 25, 2009 at 5:38 PM, Christopher Tignor wrote:
> my own tests with my own data show you are correc
my own tests with my own data show you are correct and the 1-n slop works
for matching terms at the same ordinal position.
thanks!
C>T>
On Wed, Nov 25, 2009 at 4:25 PM, Paul Elschot wrote:
> Op woensdag 25 november 2009 21:20:33 schreef Christopher Tignor:
> > It's worth noting however that thi
Op woensdag 25 november 2009 21:20:33 schreef Christopher Tignor:
> It's worth noting however that this -1 slop doesn't seem to work for cases
> where oyu want to discover instances of more than two terms at the same
> position. Would be nice to be able to explicitly set this in the query
> constr
It's worth noting however that this -1 slop doesn't seem to work for cases
where oyu want to discover instances of more than two terms at the same
position. Would be nice to be able to explicitly set this in the query
construction.
thanks,
C>T>
On Tue, Nov 24, 2009 at 9:17 AM, Christopher Tignor
In addition to Erick's advice, since you are storing filename without
analysis you could use a TermQuery to find it. You can use
BooleanQuery to combine that with other queries, including those
generated by QueryParser.
--
Ian.
On Wed, Nov 25, 2009 at 6:11 PM, Erick Erickson wrote:
> The first
I don't mind adding the "positions" of the payloads in them. However,
maybe we can be little more clear in the javadocs what's going on
underneath?
On Wed, Nov 25, 2009 at 5:36 AM, Mark Miller wrote:
> Grant Ingersoll wrote:
>> On Nov 20, 2009, at 6:49 PM, Jason Rutherglen wrote:
>>
>>
>>> I'm i
The first question for this is always "what analyzers do you use at index
AND
query time?".
I'd do two things immediately. First, what does query.toString() show you
the query parses
to? StandardAnalyzer does some "interesting" things with periods. Also, you
have a hyphen
(-) in your query which i
Hi,
i'm just using Lucene 2.4 and have a problem with a "." within a field.
This field contains a filename and obviously a filename can contain a
"." (or multiple of them)...
So if i do a search "+filename:testExcel-xaz.xls" this file will not be
found...If i replace the "." with "?" it works
On Wed, Nov 25, 2009 at 11:18 AM, Erick Erickson wrote:
> Why do you want to kill your indexer anyway? Just because it had
> been running "too long"? Or was it behaving poorly?
>
> But yeah, you need to change your process, you're almost guaranteeing
> that you'll corrupt your index.
I've learne
Why do you want to kill your indexer anyway? Just because it had
been running "too long"? Or was it behaving poorly?
But yeah, you need to change your process, you're almost guaranteeing
that you'll corrupt your index. Perhaps, if you really need to stop and
restart you could have your indexer vol
On Wed, Nov 25, 2009 at 9:49 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> Before 2.4 it was possible that a crash of the OS, or sudden power
> loss to the machine, could corrupt the index. But that's been fixed
> with 2.4.
>
> The only known sources of corruption are hardware faul
Before 2.4 it was possible that a crash of the OS, or sudden power
loss to the machine, could corrupt the index. But that's been fixed
with 2.4.
The only known sources of corruption are hardware faults (bad RAM, bad
disk, etc.), and, accidentally allowing 2 writers to write to the same
index at o
Yes, good point. Messing around with lucene locking may well be a way
to get corrupt indexes. Any others?
--
Ian.
On Wed, Nov 25, 2009 at 3:37 PM, Max Lynch wrote:
> On Wed, Nov 25, 2009 at 9:31 AM, Ian Lea wrote:
>
>> > What are the typical scenarios when the index will go corrupt?
>>
>> D
On Wed, Nov 25, 2009 at 9:31 AM, Ian Lea wrote:
> > What are the typical scenarios when the index will go corrupt?
>
> Dodgy disks.
>
I also have had index corruption on two occasions. It is not a big deal for
me since my data is fairly real time so the old documents aren't as
important.
Howev
> What are the typical scenarios when the index will go corrupt?
Dodgy disks.
> E.g. can a simple JVM crash during indexing will cause it?
No. See the javadocs for IndexWriter.
> What are the best way to minimalize the possibility of corrupt index?
Don't use dodgy disks.
> Copy the directory
Hi,
What are the typical scenarios when the index will go corrupt? E.g.
can a simple JVM crash during indexing will cause it?
What are the best way to minimalize the possibility of corrupt index?
Copy the directory before indexing / then flipping the pointers?
I'm using Lucene 2.9.
Thanks,
I
The problem is that I need to be able to match spans resulting from a a
SpanNearQuery with the Term they came from so I can eliminate using Payloads
from certain Terms on a query-by-query basis.
I still need this term to effect the results of a NearSpanQuery as per the
usual logic, I just need to
Grant Ingersoll wrote:
> On Nov 20, 2009, at 6:49 PM, Jason Rutherglen wrote:
>
>
>> I'm interested in getting the payload information from the
>> matching span, however it's unclear from the javadocs why
>> NearSpansUnordered is different than NearSpansOrdered in this
>> regard.
>>
>> NearSpans
On Nov 20, 2009, at 6:49 PM, Jason Rutherglen wrote:
> I'm interested in getting the payload information from the
> matching span, however it's unclear from the javadocs why
> NearSpansUnordered is different than NearSpansOrdered in this
> regard.
>
> NearSpansUnordered returns payloads in a has
On Nov 24, 2009, at 9:56 AM, Christopher Tignor wrote:
> Hello,
>
> For certain span queries I construct problematically by piecing together my
> own SpanTermQueries I would like to enforce that Payload data is not
> returned for matches on those specific terms used by the constituent
> SapnTerm
On Nov 24, 2009, at 12:34 AM, m.harig wrote:
>
> hello all
>
>is there any way to update the spell index directory ? please any1 help
> me out of this.
You have to rebuild it, as there is no incremental indexing.
--
Grant Ingersoll
http://www.lucidimagination.com
On Nov 20, 2009, at 5:46 AM, Wilson Wu wrote:
> hi,
>I have a problem with scoring a document in lucene. I know there
> are some factors such as docNum,boost,idf,docFreq,lengthNorm and so
> on. And I also know how to count docNum,docFreq,idf, but I really have
> no idea about counting the len
I do not understand your request completely, maybe you tell us some more
requirements of your implementation.
The example you have given is invalid, as offsets should always refer to the
original position in the source stream, so should be:
a(0,1,1) b(2,3,1) a(6,7,1) c(11,12,1).
The second probl
Hi Dhivya,
you can iterate all terms in the index using a TermEnum, that can be
retrieved using IndexReader.terms(Term startTerm).
If you are interested in all terms from a specific field, position the
TermEnum on the first possible term in this field ("") and iterate until the
field name changes
27 matches
Mail list logo