han
> 3.x's terms index... if you run CheckIndex with -verbose it will print
> additional details about the block structure of your terms indices...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Jan 9, 2015 at 4:15 PM, Tom Burton-West
> wrot
Thanks Mike,
> OK. It would be good to know where all your RAM is being consumed,
> and how much of that is really the terms index: it ought to be a very
> small part of it.
>
> I made a bunch of heap dumps. I just watched with jconsole and ran jmap
-histo when memory use got high.
I've appende
tional details about the block structure of your terms indices...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Jan 9, 2015 at 4:15 PM, Tom Burton-West
> wrote:
> > Hello all,
> >
> > We have over 3 billion unique terms in our indexes
mat.html#Lucene41PostingsFormat%28int,%20int%29>
"
Is there documentation or discussion somewhere about how to determine
appropriate parameters or some detail about what setting the maxBlockSize
and minBlockSize does?
Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search
ce(see
attached) but I continue getting this error.
Can someone please explain why after the GC frees memory, I continue to get
the error?
p.s. My documents average about 800KB and at completion each shard has over
3 billion unique terms.
docvalues fields; 0 BINARY; 0 NUMERIC;
0 SORTED; 0 SORTED_SET]
No problems were detected with this index.
On Thu, Aug 8, 2013 at 11:24 AM, Robert Muir wrote:
> On Thu, Aug 8, 2013 at 11:18 AM, Tom Burton-West
> wrote:
> > Sure I should be able to build a lucene core and give
le for you to build a lucene-core.jar from
> branch_4x and run checkindex with that jar file to confirm it really
> addresses the issue: if this is possible in any way it would be
> fantastic.
>
> There is nothing wrong with your index: its just a code thing :)
>
> On Thu, Aug 8, 2
Hi Robert,
I've been running CheckIndex for over a week and it is still working
through seekCeil()
(See below.)
I'm going to kill the CheckIndex. Admittedly, this index is an unusual
one, but at one point we were considering using MLT in our regular index
which would result in a large termvecto
Thanks Robert,
Looks like it switches between seekCeil and seekExact:
"main" prio=10 tid=0x0e79a000 nid=0x5fe5 runnable
[0x2b32de0cc000]
jstack.out3- java.lang.Thread.State: RUNNABLE
jstack.out3-at
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum.see
>
> On Tue, Jul 30, 2013 at 1:06 PM, Tom Burton-West
> wrote:
> > Thanks Mike, Robert and Adrien,
> >
> > Unfortunately, I killed the processes, so its too late to get a stack
> > trace. On thing that was suspicious was that top was reporting memory
> use
ccandless.com> wrote:
> You should also upgrade your Java!
>
> 1.6.0_16 is really ancient and has exciting bugs ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Jul 30, 2013 at 1:06 PM, Tom Burton-West
> wrote:
> > Thanks Mike,
Thanks Mike, Robert and Adrien,
Unfortunately, I killed the processes, so its too late to get a stack
trace. On thing that was suspicious was that top was reporting memory use
as 20GB res even though I invoked the JVM with java -Xmx10g -Xms10g.
I'm going to double the memory, turn on GC logging,
lues..." after that.
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Jul 29, 2013 at 4:30 PM, Tom Burton-West
> wrote:
> > We have very large indexes, almost a terabyte for a single index, and
> > normally it takes overnight to run a che
We have very large indexes, almost a terabyte for a single index, and
normally it takes overnight to run a checkindex. I started a CheckIndex
on Friday and today (Monday) it seems to be stuck testing vectors although
we haven't got vectors turned on. (See below)
The output file was last written J
om
>
>
> On Tue, Jun 18, 2013 at 12:48 PM, Tom Burton-West
> wrote:
> > Hello,
> >
> > I'm trying to understand BlockGroupingCollector. I thought I would
> start
> > by running the tests in the debugger. However the only test I can find
> is
> >
I'm trying to build trunk and when I run "ant compile"
the build hangs right after "Building replicator" at the line
"common.resolve:". (see below for more context)
I'm not familiar with Ivy so I'm not too sure where to look for the problem.
Can someone point me to the FAQ or the appropriate reso
to make it more understandable!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Jun 18, 2013 at 12:48 PM, Tom Burton-West
> wrote:
> > Hello,
> >
> > I'm trying to understand BlockGroupingCollector. I thought I would
> start
>
Hello,
I'm trying to understand BlockGroupingCollector. I thought I would start
by running the tests in the debugger. However the only test I can find is
lucene/grouping/src/test/org/apache/lucene/search/grouping/TestGrouping.java
In TestGrouping.java, in the second test, "testRandom" it see
Please add tburtonw to contributors
Tom Burton-West
tburtonw at umich dot edu
Tom
On Mon, Mar 25, 2013 at 9:05 AM, Steve Rowe wrote:
>
> On Mar 25, 2013, at 8:49 AM, Rafał Kuć wrote:
> > Could you add RafalKuc to contributors ? Thanks :)
>
> Added to ContributorsGroup.
>
Hello Oliver,
We are very interested in group sorting based on some aggregation function
also. Would you consider contributing your code to Lucene, or posting your
results?
Tom
Tom Burton-West
Information Retrieval Programmer
Digital Library Production Service
University of Michigan Library
n segments (containing 865870 documents) detected
WARNING: would write new segments file, and 865870 documents would be lost,
if -fix were specified
On Wed, Dec 5, 2012 at 5:29 PM, Robert Muir wrote:
> On Wed, Dec 5, 2012 at 2:27 PM, Tom Burton-West
> wrote:
>
> > Thanks Robert,
> &g
s.sun.com/bugdatabase/view_bug.do?bug_id=5091921
>
> We tried to add workarounds to lucene to dodge problems from this, but
> really a newer unaffected version would be safer.
>
> On Wed, Dec 5, 2012 at 1:47 PM, Robert Muir wrote:
>
> >
> > On Wed, Dec 5, 2012 at 1:30
Hello,
I'm trying to merge 12 indexed into one big index using the Lucene
IndexMergeTool (command line used appended below). The merge seemed to
finish successfully, but when I ran CheckIndex on the merged index, I got
an array out of bounds error "java.lang.ArrayIndexOutOfBoundsException:
13315
Hi Mike,
>>Honestly I've never heard of anyone using "dogs" to mean feet either, but
hey nobody's perfect.
This is really off topic but I couldn't resist. This usage of "dogs" to
mean feet occurs in old blues lyrics such as Blind Lemon Jefferson's "Hot
Dogs"
http://www.youtube.com/watch?v=v670qV
Hi Otis,
I hope this is not off-topic,
Apparently in Lucene similarity does not have to be set at index time:
See http://lucene.apache.org/core/4_0_0/changes/Changes.html under Lucene
2959
"All models default to the same index-time norm encoding as
DefaultSimilarity, so you can easily try these
I agree with Erick that you probably need to give your client a list of
concrete examples, and perhaps to explain the trade-offs.
All stemmers both overstem and understem. Understemming means that some
forms of a word won’t get searched. For example, without stemming, searching
for “dogs” would
the other
hand, once we started building our test indexes so they were significantly
larger than the amount of memory available for OS disk caching, we could see
results that extrapolated out to the large index.
Tom Burton-West
www.hathitrust.org
ryguasu wrote:
>
> I'd like
27 matches
Mail list logo