On Fri, Oct 25, 2013 at 9:58 AM, Igor Shalyminov
wrote:
> What is ProxBooleanTermQuery?
> I couldn't find it in the trunk and in that ticket's
> (https://issues.apache.org/jira/browse/LUCENE-2878) patch.
Sorry, this is on https://issues.apache.org/jira/browse/LUCENE-5288
Next time try searchin
What is ProxBooleanTermQuery?
I couldn't find it in the trunk and in that ticket's
(https://issues.apache.org/jira/browse/LUCENE-2878) patch.
And for now it's very fuzzy to me how the searching/scoring works. Are there
any tutorials or talks on how do Queries, Scorers, Collectors interoperate?
On Tue, Oct 22, 2013 at 9:43 AM, Igor Shalyminov
wrote:
> Thanks for the link, I'll definitely dig into SpanQuery internals very soon.
You could also just make a custom query. If you start from the
ProxBooleanTermQuery on that issue, but change it so that it rejects
hits that didn't have terms
Hello Mike!
19.10.2013, 14:54, "Michael McCandless" :
> On Fri, Oct 18, 2013 at 5:50 PM, Igor Shalyminov
> wrote:
>
>> But why is it so costly?
>
> I think because the matching is inherently complex? But also because
> it does high-cost things like allocating new List and Set for every
> match
On Fri, Oct 18, 2013 at 5:50 PM, Igor Shalyminov
wrote:
> But why is it so costly?
I think because the matching is inherently complex? But also because
it does high-cost things like allocating new List and Set for every
matched doc (e.g. NearSpansOrdered.shrinkToAfterShortestMatch) to hold
all p
But why is it so costly?
In a regular query we walk postings and match document numbers, in a SpanQuery
we match position numbers (or position segments), what's the principal
difference?
I think it's just that #documents << #positions.
For "A,sg" and "A,pl" I use unordered SpanNearQueries with
On Fri, Oct 18, 2013 at 1:19 PM, Igor Shalyminov
wrote:
> OK, it turns out that DirectPostingsFormat is really an extreme thing: 8GB of
> index couldn't fit into 20+ java heap.
> I wonder if there is a postings format that works from disk the standard way
> but uses no compression?
Yes, it's v
Unfortunately, SpanNearQuery is a very costly query. What slop are you passing?
You might want to check out
https://issues.apache.org/jira/browse/LUCENE-5288 ... it adds
proximity boosting to queries, but it's still very early in the
iterating, and if you need a precise count of only those docume
Hello!
OK, it turns out that DirectPostingsFormat is really an extreme thing: 8GB of
index couldn't fit into 20+ java heap.
I wonder if there is a postings format that works from disk the standard way
but uses no compression?
--
Best Regards,
Igor
18.10.2013, 02:06, "Igor Shalyminov" :
> Mik
Mike,
For now I'm using just a SpanQuery over a ~600MB index segment
single-threadedly (one segment - one thread, the complete setup is 30 segments
with the total of 20GB).
I'm trying to use Lucene for the morphologically annotated text corpus (namely,
Russian National Corpus).
The main query
DirectPostingsFormat holds all postings in RAM, uncompressed, as
simple java arrays. But it's quite RAM heavy...
The hotspots may also be in the queries you are running ... maybe you
can describe more how you're using Lucene?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Oct 17, 2013
Hello!
I've tried two approaches: 1) RAMDirectory, 2) MMapDirectory + tmpfs. Both work
the same for me (the same bad:( ).
Thus, I think my problem is not disk access (although I always see getPayload()
in the VisualVM top).
So, maybe the hard part in the postings traversal is decompression?
Are
I don't think you want to load indexes of this size into a RAMDirectory.
The reasons have been listed multiple times here... in short, just use
MMapDirectory.
On Wed, Oct 9, 2013 at 3:17 PM, Igor Shalyminov
wrote:
> Hello!
>
> I need to perform an experiment of loading the entire index in RAM an
Hello!
I need to perform an experiment of loading the entire index in RAM and seeing
how the search performance changes.
My index has TermVectors with payload and position info, StoredFields, and
DocValues. It takes ~30GB on disk (the server has 48).
_indexDirectoryReader = DirectoryReader.open
Follow the instruction here:
http://lucene.apache.org/core/discussion.html
-- Jack Krupansky
-Original Message-
From: Noopur Julka
Sent: Monday, September 10, 2012 12:43 PM
To: java-user@lucene.apache.org
Cc: Dhananjeyan Balaretnaraja
Subject: Re: How to create a Lucene in-memory
unsuscribe me pls
Regards,
Noopur Julka
On Fri, Sep 7, 2012 at 10:40 AM, Kasun Perera wrote:
> I have a web java/jsp application running on Apache Tomcat server. In this
> web application I have used lucene, to index and calculate similrarity
> between some PDF documents(PDF documents are in
You can do stuff with scopes and contexts and web.xml and whatever
(google something like tomcat application scope). Or use some static
classes or singletons to look after the single index.
--
Ian.
On Fri, Sep 7, 2012 at 6:10 AM, Kasun Perera wrote:
> I have a web java/jsp application running
I have a web java/jsp application running on Apache Tomcat server. In this
web application I have used lucene, to index and calculate similrarity
between some PDF documents(PDF documents are in the database). My live
server dosent allow web-app to access files, so I have created the
in-memory lucen
18 matches
Mail list logo