Hi,
It is not obvious what you have done, but the issue may come from custom
builds, e.g., if you are not using the original Lucene JAR file but a
modified one. Another reason may be Maven Shade plugin or other
assemblies like Uber-JARs!
Make sure that all class files and module information
This was an internal build issue that is now fixed. Sorry for the confusion.
Thanks,
Shubham
On Tue, Jun 27, 2023 at 12:48 AM Shubham Chaudhary
wrote:
> Hi everyone,
>
> I’m trying to build and run my software using JDK 19 which has a direct
> dependency on Apache Lucene 9.6 built with JDK 17 a
Hi!
Thanks a lot for your help. I will try both of you suggestions (taxo
index and per-segment ord ranges).
Thanks for clarifying, that I have to iterate the ords. I wasn't sure,
if I didn't just overlook something obvious. Like some way to do an
advanceExact on ords.
Regards
harry
On 01.
To address the last topic (building up ordinal ranges per-segment),
what I'm thinking is that you'd iterate all unique ordinals in the
SSDV field and "memorize" the ordinal range for each dimension
up-front, but on a per-segment basis. This would be very similar to
what DefaultSortedSetDocValuesRea
Hi!
On 01.07.22 00:46, Greg Miller wrote:
Have you considered taxonomy faceting for your use-case? Because the
taxonomy structure is maintained in a separate index, it's
(relatively) trivial to iterate all direct child ordinals of a given
dimension. The cost of mapping to a global ordinal space
Hi Harry-
Have you considered taxonomy faceting for your use-case? Because the
taxonomy structure is maintained in a separate index, it's
(relatively) trivial to iterate all direct child ordinals of a given
dimension. The cost of mapping to a global ordinal space is done when
the index is merged.
Text-based fields indeed do not have that limit for the _entire_ field. They
_do_ have that limit for any single token produced. So if your field contains,
say, a base-64 encoded image that is not broken up into smaller tokens, you’ll
still get this error.
Best,
Erick
> On Oct 25, 2019, at 4:2
Some code interrupted (Thread.interrupt) a java thread while it was
blocked on I/O. This is not safe to do with lucene, because
unfortunately in this situation java's NIO code closes file
descriptors and releases locks.
The second exception is because the indexwriter tried to write when it
no long
I was thinking this was a Solr question rather than a Lucene one so
the [docid] bit doesn't apply if you're in the lucene code. If you
_are_ really going from solr, just put [docid] in your Solr "fl" list.
Look in the Solr ref guide for an explanation:
https://lucene.apache.org/solr/guide/6_6/trans
Hi Erick,
Many thanks for your reply and explanation.
I really want this to work. The good news for me is, the index is static, there
is no chance of any modification of the index.
> Luke and the like are using a point-in-time snapshot of the index.
I want to get that lucene-assigned docid, th
You almost certainly do _not_ want this unless you are absolutely and
totally sure that your index does not change between the time you ask
for for the internal Lucene doc ID and the time you use it. No docs
may be added. No forceMerges are done. In fact, I'd go so far as to
say you shouldn't open
Thank you very much for your reply. Yes, I really want this (for
implementing a retrieval function that extends the LMDir function).
Precisely, I want the document numbering same as that we see in
Lucene-Index-Viewers like Luke.
I am not sure what you meant by "segment offset, held by a leaf reade
Thank you very much for your reply. Yes, I really want this (for
implementing a retrieval function that extends the LMDir function).
Precisely, I want the document numbering same as that we see in
Lucene-Index-Viewers like Luke.
I am not sure what you meant by "segment offset, held by a leaf reade
Are you sure you want this? Lucene docids aren't generally useful outside a
narrow internal context. They can change over time for example.
But if you do, it sounds like maybe what you are seeing is the per segment
docid. To get a global one you have to add the segment offset, held by a
leaf reade
Thanks Mike. Yeah, i saw the changelist you mentioned. Unfortunately i can't
upgrade to 6.2 because of stack limitations :( .
Regards.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258p4305728.html
Sent from the Lucene - Java User
Hi lukes,
Sorry, this was a recent change in Lucene:
https://issues.apache.org/jira/browse/LUCENE-7302
You need to upgrade to at least 6.2 to see it.
And the long value that is returned is just an incrementing number,
incremented for every op (add, update, delete) that changes the index.
Mike M
Hi Michael,
Thanks for the reply. Regarding IW(IndexWriter) returning long sequence
number, i looked at the signature of commit and it seems to be void. Can you
please point me in the direction ? I am using Lucene 5.5.2. Also is this
number aggregation of deletes, updates and new documents ? Is
Hi lukes,
First, IW never "auto commits". The maxBufferedDocs/RAMBufferSizeMB
settings control when IW moves the recently indexed documents from RAM
to disk, but that moving, which writes new segments files, does not
commit them. It just writes them to disk, not visible yet to an
external reader
Hi,
Can anyone please suggest or point in some directions.
Regards.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258p4305503.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
--
e
>-
>Uwe Schindler
>H.-H.-Meier-Allee 63, D-28213 Bremen
>http://www.thetaphi.de
>eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Wayne Xin [mailto:wayne_...@hotmail.com]
>> Sent: Friday, August 14, 2015 8:44 PM
>
; Sent: Friday, August 14, 2015 8:44 PM
> To: java-user@lucene.apache.org
> Subject: Re: getting full english word from tokenizing with
> SmartChineseAnalyzer
>
> Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is
> final, otherwise we could overwrite createCompone
Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is
final, otherwise we could overwrite createComponents().
New output:
女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选手 马 林
first seed 同 处 1 4 区 3 号
种子 李 雪 芮 和 韩国 选手 korean player 成 池 铉 处在 2 4 区 不过 成 池 铉
先 要 过 日本 小将
japanese player
The easiest thing to do is to create your own analyzer, cut and paste the
code from org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer into it,
and get rid of the line in createComponents(String fieldName, Reader
reader) that says
result = new PorterStemFilter(result);
On Fri, Aug 14,
: If you cannot do this for whatever reason, I vaguely remember someone
: posting a link to a program they'd put together to do this for a
: docValues field, you'd have to search the archives to find it.
It was Toke - he generated DocValues for an existing index by writing an
IndexReader Filter
My first recommendation, of course, would be to re-index the corpus
with a new field. If possible, frankly, that would probably be less
effort than trying to hack in an ID after the fact as well as not as
error-prone.
If you cannot do this for whatever reason, I vaguely remember someone
posting a
Hi,
This generally happens if you don't deploy the original Lucene JAR files and
instead create so-called super-jars (one large JAR file with all classes merged
together). Unfortunately this approach misses to copy/merge relevant metadata
in the META-INF folder of the original JARs. Without the
Use TermsEnum.totalTermFreq(), which is the total number of
occurrences of the term, not TermsEnum.docFreq(), which is the number
of documents that contain at least one occurrence of the term.
Mike McCandless
http://blog.mikemccandless.com
On Sun, Feb 22, 2015 at 6:47 AM, Maisnam Ns wrote:
> H
Hi,
Sorry for my ignorance, how do I obtain AtomicReader from a IndexReader?
I figured above code but it gives me a list of atomic readers.
for (AtomicReaderContext context : reader.leaves()) {
NumericDocValues docValues = context.reader().getNormValues(field);
if (docValues != null)
normValu
On Fri, Feb 6, 2015 at 8:51 AM, Ahmet Arslan wrote:
> Hi Michael,
>
> Thanks for the explanation. I am working with a TREC dataset,
> since it is static, I set size of that array experimentally.
>
> I followed the DefaultSimilarity#lengthNorm method a bit.
>
> If default similarity and no index ti
Hi Michael,
Thanks for the explanation. I am working with a TREC dataset,
since it is static, I set size of that array experimentally.
I followed the DefaultSimilarity#lengthNorm method a bit.
If default similarity and no index time boost is used,
I assume that norm equals to 1.0 / Math.sqrt
How will you know how large to allocate that array? The within-doc
term freq can in general be arbitrarily large...
Lucene does not directly store the total number of terms in a
document, but it does store it approximately in the doc's norm value.
Maybe you can use that? Alternatively, you can s
: Is there some way when faceted search is executed, we can retrieve the
: possible min/max values of numeric doc-values field with supplied custom
: ranges in (LongRangeFacetCounts) or some other way to do it ?
:
: As i believe this can give application hint, and next search request can be
: muc
e-
> From: Rajendra Rao [mailto:rajendra@launchship.com]
> Sent: Thursday, September 25, 2014 11:28 AM
> To: java-user@lucene.apache.org
> Subject: Re: getting exception while deploying on axis 2
>
> Hello Uwe,
>
> My project Is java project built in eclipse and I
ka.tuf...@launchship.com]
> > Sent: Thursday, September 25, 2014 9:22 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: getting exception while deploying on axis 2
> >
> > thanks Uwe for your reply,
> >
> > Can you explain what you mean by *original* JAR files of
remen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Priyanka Tufchi [mailto:priyanka.tuf...@launchship.com]
> Sent: Thursday, September 25, 2014 9:22 AM
> To: java-user@lucene.apache.org
> Subject: Re: getting exception while deploying on axis 2
thanks Uwe for your reply,
Can you explain what you mean by *original* JAR files of Lucene. And if I
did not use original Jar, from where i can get it?
As my project is java project and i have no idea how to use maven .Can you
give some idea how to add and use maven shade plugin in my project a
Hi,
this happens if you don't use the *original* JAR files of Lucene. If you
repackage them, be sure to include the META-INF/services folders, and if
multiple Lucene JAR files are included, merge the entries in the services files
from all of them. Yu can do this with the Maven Shade Plugin and
Hi Rob,
While the demo code uses a fixed number of 3 values, you don't need to
encode the number of values up front. Since your read the byte[] of a
document up front, you can read in a while loop as long as in.position() <
in.length().
Shai
On Tue, Apr 29, 2014 at 10:04 AM, Rob Audenaerde
wrot
Hi Shai,
I read the article on your blog, thanks for it! It seems to be a natural fit to
do multi-values like this, and it is helpful indeed. For my specific problem, I
have multiple values that do not have a fixed number, so it can be either 0 or
10 values. I think the best way to solve this i
Hi Rob,
Your question got me interested, so I wrote a quick prototype of what I
think solves your problem (and if not, I hope it solves someone else's!
:)). The idea is to write a special ValueSource, e.g. MaxValueSource which
reads a BinadyDocValues, decodes the values and returns the maximum one
I don't think that you should use the facet module. If all you want is to
encode a bunch of numbers under a 'foo' field, you can encode them into a
byte[] and index them as a BDV. Then at search time you get the BDV and
decode the numbers back. The facet module adds complexity here: yes, you
get th
Thanks for all the questions, gives me an opportunity to clarify it :)
I want the user to be able to give a (simple) formula (so I don't know it
on beforehand) and use that formula in the search. The Javascript
expressions are really powerful in this use case, but have the single-value
limitation.
A NumericDocValues field can only hold one value. Have you thought about
encoding the values in a BinaryDocValues field? Or are you talking about
multiple fields (different names), each has its own single value, and at
search time you sum the values from a different set of fields?
If it's one fiel
Hi Shai, all,
I am trying to write that Filter :). But I'm a bit at loss as how to
efficiently grab the multi-values. I can access the
context.reader().document() that accesses the storedfields, but that seems
slow.
For single-value fields I use a compiled JavaScript Expression with
simplebinding
You can do that by writing a Filter which returns matching documents based
on a sum of the field's value. However I suspect that is going to be slow,
unless you know that you will need several such filters and can cache them.
Another approach would be to write a Collector which serves as a Filter,
Hi Mike,
Thanks for your reply.
I think it is not-so-much an invalid use case for Lucene. Lucene already
has (experimental) support for Dynamic Range Facets, expressions
(javascript expressions, geospatial haversin etc. etc). There are all
computed on the fly; and work really well. They just depe
This isn't really a good use case for an index like Lucene. The most
essential property of an index is that it lets you look up documents
very quickly based on *precomputed* values.
-Mike
On 04/23/2014 06:56 AM, Rob Audenaerde wrote:
Hi all,
I'm looking for a way to use multi-values in a f
You can persist the IndexConfiguration somewhere using a Serializable
object and persisting the configuration on a "File using an
ObjectOutputStream", persist the configuration on a "persistent mechanism
like a Database or on a fever of the moment a JSON storage" or like "Solr"
using a Xml File.
I
The SortedSetDocValuesField worked great.
Thanks.
Kyle
> From: luc...@mikemccandless.com
> Date: Wed, 12 Feb 2014 05:39:24 -0500
> Subject: Re: Getting term ords during collect
> To: java-user@lucene.apache.org
>
> It sounds like you are just indexing at TextFiel
> Kyle
>
>> From: luc...@mikemccandless.com
>> Date: Tue, 11 Feb 2014 19:59:03 -0500
>> Subject: Re: Getting term ords during collect
>> To: java-user@lucene.apache.org
>>
>> SortedSetDV is probably the best way to do so. You could also encode
>>
gt; From: luc...@mikemccandless.com
> Date: Tue, 11 Feb 2014 19:59:03 -0500
> Subject: Re: Getting term ords during collect
> To: java-user@lucene.apache.org
>
> SortedSetDV is probably the best way to do so. You could also encode
> the ords yourself into a byte[] and use binary DV.
>
> Bu
SortedSetDV is probably the best way to do so. You could also encode
the ords yourself into a byte[] and use binary DV.
But why are you seeing it take too long to load? You can switch to
different DV formats to tradeoff RAM usage and lookup speed..
Mike McCandless
http://blog.mikemccandless.co
Peter. Thanks for reply.
This code is just sample for question.
Actually, I have index many documents.
And the reason for try this, I want to get statistics of index file.
Thanks and Regards.
2013/10/8 Peter Chang
> Your doc freq is always 1. It's useless.
> I don't know why you try to inde
Thanks very much! Uwe.
I have get right value using NumericUtils.
And as you talk, there were many terms more than I have indexing.
Thanks and Regards.
2013/10/8 Uwe Schindler
> Hi,
>
> Use NumericUtils to convert the BytesRef back to a number:
> http://goo.gl/3KG9Pd
> But be careful, the term
Your doc freq is always 1. It's useless.
I don't know why you try to index and search a binary field except for
range searching.
On Mon, Oct 7, 2013 at 11:23 PM, 장용석 wrote:
> Dear,
>
> I have indexing integer field like this
>
> -
> Document doc = new Document();
> FieldType fieldType = new
Hi,
Use NumericUtils to convert the BytesRef back to a number: http://goo.gl/3KG9Pd
But be careful, the terms index contains more terms with lower precisions (bits
stripped off), unless you use infinite precisionStep!
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.theta
k Krupansky
-Original Message-
From: Michael McCandless
Sent: Thursday, May 23, 2013 10:39 AM
To: Lucene Users
Subject: Re: Getting position increments directly from the the index
On Thu, May 23, 2013 at 9:54 AM, Igor Shalyminov
wrote:
But, just to clarify, is there a way to get, let
On Thu, May 23, 2013 at 9:54 AM, Igor Shalyminov
wrote:
> But, just to clarify, is there a way to get, let's say, a vector of position
> increments directly from the index, without re-parsing document contents?
Term vectors (as Jack suggested) are one option, but they are very
heavy (slows down
Take a look at the Term Vectors Component:
http://wiki.apache.org/solr/TermVectorComponent
-- Jack Krupansky
-Original Message-
From: Igor Shalyminov
Sent: Thursday, May 23, 2013 9:54 AM
To: java-user@lucene.apache.org
Subject: Re: Getting position increments directly from the the
upansky
>
> -Original Message-
> From: Michael McCandless
> Sent: Thursday, May 23, 2013 6:28 AM
> To: Lucene Users
> Subject: Re: Getting position increments directly from the the index
>
> Do you actually index the sentence boundary as a token? If so, you
> could j
-Original Message-
From: Michael McCandless
Sent: Thursday, May 23, 2013 6:28 AM
To: Lucene Users
Subject: Re: Getting position increments directly from the the index
Do you actually index the sentence boundary as a token? If so, you
could just get the totalTermFreq of that token?
Mike
Do you actually index the sentence boundary as a token? If so, you
could just get the totalTermFreq of that token?
Mike McCandless
http://blog.mikemccandless.com
On Wed, May 22, 2013 at 10:11 AM, Igor Shalyminov
wrote:
> Hello!
>
> I'm storing sentence bounds in the index as position increme
OK, I've played with all this solutions and basically only one gave me
satisfying results. Using build()
with TermFreqPayload argument gave me horrible performance, because it
takes more than 5 mins
to iterate through all Terms in the index and to filter them based on the
doc id. Not sure if this n
On Sat, Mar 16, 2013 at 7:47 AM, Bratislav Stojanovic
wrote:
> Hey Mike,
>
> Is this what I should be looking at?
> https://builds.apache.org/job/Lucene-Artifacts-trunk/javadoc/suggest/org/apache/lucene/search/suggest/analyzing/package-summary.html
>
> Not sure how to call build(), i.e. what to pa
013 7:29 AM
To: java-user@lucene.apache.org
Subject: Re: Getting documents from suggestions
Hey Jack,
I've tried MoreLikeTHis, but it always returns me 0 hits. Here's the code,
it's very simple :
// test2
Index lucene = null;
try {
lucene = new Index();
MoreLikeThis mlt = new More
Hey Mike,
Is this what I should be looking at?
https://builds.apache.org/job/Lucene-Artifacts-trunk/javadoc/suggest/org/apache/lucene/search/suggest/analyzing/package-summary.html
Not sure how to call build(), i.e. what to pass as a parameter...Any
examples?
Where to specify my payload (which is
Hey Jack,
I've tried MoreLikeTHis, but it always returns me 0 hits. Here's the code,
it's very simple :
// test2
Index lucene = null;
try {
lucene = new Index();
MoreLikeThis mlt = new MoreLikeThis(lucene.reader);
mlt.setAnalyzer(lucene.analyzer);
Reader target = new StringReader("apache");
Quer
gestion is X, do you simply want to know a few of the
> >> documents which have the highest term frequency for X?
> >>
> >> Or is there some other term-oriented metric you might propose?
> >>
> >>
> >> -- Jack Krupansky
> >>
> >> -Or
y want to know a few of the
>> documents which have the highest term frequency for X?
>>
>> Or is there some other term-oriented metric you might propose?
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Bratislav Stojanovic
you simply want to know a few of the
> documents which have the highest term frequency for X?
>
> Or is there some other term-oriented metric you might propose?
>
>
> -- Jack Krupansky
>
> -Original Message- From: Bratislav Stojanovic
> Sent: Thursday, March 14, 2013
Sent: Thursday, March 14, 2013 6:14 PM
To: java-user@lucene.apache.org
Subject: Re: Getting documents from suggestions
Wow that was fast :)
I have implemented a simple search box with auto-suggestions, so whenever
user
types in something, ajax call is fired to the SuggestServlet and in return
10 sugges
Wow that was fast :)
I have implemented a simple search box with auto-suggestions, so whenever
user
types in something, ajax call is fired to the SuggestServlet and in return
10 suggestions
are shown. It's working fine with the SpellChecker class, but I only get
array of Strings.
What I want is t
If you are using AnalyzingSuggester or FuzzySuggester than you can use
its new payloads feature to store an arbitrary byte[] with each
suggestion:
https://issues.apache.org/jira/browse/LUCENE-4820
But this won't help if you're using spell checker ...
Mike McCandless
http://blog.mikemccandle
Could you give us some examples of what you expect? I mean, how is your
suggested set of documents any different from simply executing a query with
the list of suggested terms (using q.op=OR)?
Or, maybe you want something like MoreLikeThis?
-- Jack Krupansky
-Original Message-
From:
Have you already checked Solr's more like this?
http://wiki.apache.org/solr/MoreLikeThisHandler and
http://wiki.apache.org/solr/MoreLikeThis Your describe a problem similar to
the use case of that component and if there is something to hack is solr's
more like this.
Lucene's similarity is a low le
Hi again!
So far I think that the easiest way to get all span matches is indeed this
method (Lucene v 4.1 code):
public Spans getSpans(final AtomicReaderContext context, Bits acceptDocs,
Map termContexts)
But there is no annotation for this code except 'for internal use only', and
the input pa
hi Denis,
thanks for your reply. OffsetAttribute gives the character position
whereas I was looking for the Token Position. I ended up adding the
attached PositionAttribute/PositionAttributeImpl/PositionFilter.
as it turned out though I didn't need that attribute as there was an
easier way
What you are looking for is OffsetAttribute. Also consider the possibility of
using ShingleFilter with position increment > 1 and then filtering tokens
containing "_" (underscore). This will be easier, I guess.
On Jan 11, 2013, at 7:14 AM, Igal @ getRailo.org wrote:
> hi all,
>
> how can I ge
Great! I'll look into that.
Thanks!
2013/1/9 김한규
> Try SpanTermQuery, getSpans() function. It returns Spans object which you
> can iterate through to find position of every hits in every documents.
>
> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/spans/SpanTermQuery.html
>
Try SpanTermQuery, getSpans() function. It returns Spans object which you
can iterate through to find position of every hits in every documents.
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/spans/SpanTermQuery.html
2013/1/9 Itai Peleg
> Hi,
>
> I'n new to Lucene, and I'm hav
Thanks a lot Aditya and Andrzej .. Your responses were really helpful.
On Fri, Jul 27, 2012 at 6:15 AM, Andrzej Bialecki wrote:
> On 26/07/2012 22:04, Phanindra R wrote:
>
>> Thanks for the reply Abdul.
>>
>> I was exploring the API and I think we can retrieve all those words by
>> using a brute
On 26/07/2012 22:04, Phanindra R wrote:
Thanks for the reply Abdul.
I was exploring the API and I think we can retrieve all those words by
using a brute-force approach.
1) Get all the terms using indexReader.terms()
2) Process the term only if it belongs to the target field.
3) Get all the do
Hi
If the data is not stored then it cannot be retrieved in the same format.
Using IndexReader as you listed you could retrieve the list of the terms
available in the doc. It may be analyzed. You may not be getting exact data.
Regards
Aditya
www.findbestopensource.com
On Fri, Jul 27, 2012 at 1:3
Thanks for the reply Abdul.
I was exploring the API and I think we can retrieve all those words by
using a brute-force approach.
1) Get all the terms using indexReader.terms()
2) Process the term only if it belongs to the target field.
3) Get all the docs using indexReader.termDocs(term);
4) S
No , it's not possible to get the data which not stored ..
On Jul 26, 2012 10:27 PM, "Phanindra R [via Lucene]"
> Hi,
> I've an index to analyze (manually). Unfortunately, I cannot rebuild
> the index. Some of the fields are 'unstored'. I was wondering whether
> there's any way to get the ter
int numDocs = filterIndexReader.numDocs();
...
idf = Math.log10((double) numDocs / docFreq);
Sethu_424 wrote
>
>
wrong formula. numDoc should not be a count of documents in index - but
documents containing searching term.
We need something like IndexReader.docFreq( term );
--
View this messa
In general you can't rely on anything like this. I admit the merge
stuff isn't my area of expertise, but when segments are merged,
there's no guarantee that they're merged in order. In general
the internal Lucene doc ID should be treated as predictable only
for closed segments.
Your solution of us
What version of lucene are you using? If not the latest, try that.
If you really think there is a lucene bug post a small self-contained
test case that demonstrates the problem.
--
Ian.
On Fri, May 11, 2012 at 12:35 PM, Kasun Perera wrote:
> On Fri, May 11, 2012 at 4:52 PM, Ian Lea wrote:
>
On Fri, May 11, 2012 at 4:52 PM, Ian Lea wrote:
> Can't spot anything obviously wrong in your code and what you are
> trying to do should work. Are you positive that what you think is the
> second doc is really being added second? You only show one doc being
> added. Are there already 7 docs i
Can't spot anything obviously wrong in your code and what you are
trying to do should work. Are you positive that what you think is the
second doc is really being added second? You only show one doc being
added. Are there already 7 docs in the index before you start?
--
Ian.
On Fri, May 11,
Hmm... it looks like File.length() is somehow, sometimes lying, on
your NFS filesystem.
What's happening is Lucene is writing out a file, and it wrote 59540
bytes, closed the file (all with no exceptions), and then tried to
verify the length was 59540 but in fact the filesystem reported 32768
byte
OS : RHEL 5.5 64 bit.
Filesystem: NFS
Thanks for the reply.
Thanks,
Jamir
On Fri, Dec 9, 2011 at 10:22 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> Which OS/filesystem?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Dec 8, 2011 at 9:46 PM, Jamir Shaikh
> wro
Which OS/filesystem?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Dec 8, 2011 at 9:46 PM, Jamir Shaikh wrote:
> I am using Lucene 3.5. I want to create around 30 million documents.
> While doing Indexing I am getting the following Exception:
>
> Caused by: java.lang.RuntimeException:
Complicated with all those indexes.
3 suggestions:
1. Just give it more memory.
2. Profile it to find out what is actually using the memory.
3. Cut down the number of indexes. See recent threads on pros and
cons of multiple indexes vs one larger index.
--
Ian.
On Mon, Jun 20, 2011 at 2:
Hi Erick,
In continuation to my below mails, I have a socket based multithreaded
server that serves in average 1 request per second.
The index size is 31GB and document count is about 22 millions.
The index directories are first divided in 4 directories and then each
subdivided to 21 directories.
Hi Erick,
i will gather the info and let u know.
thanks
harsh
On 6/17/11, Erick Erickson wrote:
> Please review:
> http://wiki.apache.org/solr/UsingMailingLists
>
> You've given us no information to go on here, what are you
> trying to do when this happens? What have you tried? What
> is the quer
Please review:
http://wiki.apache.org/solr/UsingMailingLists
You've given us no information to go on here, what are you
trying to do when this happens? What have you tried? What
is the query you're running when this happens? How much
memory are you allocating to the JVM?
You're apparently sorting
Does IndexWriter (or somewhere else) have the method such that
it gets the number of updated documents before commit?
you have maxDocs which gives you the maxdocid-1 but this might not be
super accurate since there might have been merges going on in the
background. I am not sure if this number yo
hey Koji,
2011/3/10 Koji Sekiguchi :
> Hello,
>
> Does IndexWriter (or somewhere else) have the method such that
> it gets the number of updated documents before commit?
you have maxDocs which gives you the maxdocid-1 but this might not be
super accurate since there might have been merges going on
>From Hossman's Apache Page:
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in th
1 - 100 of 260 matches
Mail list logo