Hi All
I have been experimenting with parent child relation code in Apache
Lucene using ParentBlockJoinQuery .Can any one explain me if I don't
add ParentQuery in indexSearcher and simply search by childQuery What
will happen .I tried it using two docs and it give equal score to both
docs.
Than
A correction on the system details posted earlier...
Windows runs on : 1CPU, 1 cores. 8 GB RAM
Linux runs on : 1 CPU , 2 cores, 8 GB RAM
--
View this message in context:
http://lucene.472066.n3.nabble.com/Suggestions-Required-110-Concurrency-users-indexing-on-Lucene-dont-finish-in-200-ms-
Mike,
More info -
Windows on an average takes 192 ms for 1 thread to index 100 json documents
Links on an average takes 711 ms for 1 thread to index 100 json documents
(same set of data)
We have set the heap size to 124 MB in both cases and runs on JDK 7
Windows runs on : 2CPU,
I would be fine with throwing a parse exception or excluding the particular
clause. I will look at the StandardQueryNodeProcessorPipeline as well as
Hoss' suggestion. Thank you very much!
On Thu, Feb 20, 2014 at 4:20 AM, Trejkaz wrote:
> On Thu, Feb 20, 2014 at 1:43 PM, Jamie Johnson wrote:
Yes, all postings for the entire doc are held in RAM data structures
... you could make your own indexing chain to somehow change this
behavior, but I don't think that's an easy task.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Feb 20, 2014 at 4:02 PM, Igor Shalyminov
wrote:
> Mike,
Mike, thank you!
So eventually this amount of data must stay entirely in RAM (as postings)
before flushing to disk?
Can it be hacked?)
The documents themselves (that I will deliver to user) are of a regular size,
but features that I generate grow combinatorially in size and blow the index up
i
Yes, in 4.x IndexWriter now takes an Iterable that enumerates the
fields one at a time.
You can also pass a Reader to a Field.
That said, there will still be massive RAM required by IW to hold the
inverted postings for that one document, likely much more RAM than the
original document's String co
Hello!
I'va faced a problem of indexing huge documents. The indexing itself goes
allright, but when the document processing becomes concurrent, OutOfMemories
start appearing (even with heap of about 32GB).
The issue, as I see it, is that I have to create a Document instance to send it
to IndexW
Hi Greet,
I suggest you to do these kind of transformation on query time only. Don't
interfere with the index. This is way is more flexible. You can disable/enable
on the fly, change your list without re-indexing.
Just an imaginary example : When user passes String as International
Businessma
Thanks again for the help. Upon further investigation I found out we weren't
using our custom version of the analyzer, which explains why it wasn't doing
what I thought it should. When I have time to get back to it I'll reconfigure
it to use our tokenizer.
Diego Fernandez - 爱国
Software Engine
I've created https://issues.apache.org/jira/browse/LUCENE-5461, and
attached a small test that shows the error it a setup similar to what I
would like to run
The 1% is a overestimation - it seems to be related to concurrent commit on
the index writer
Hans Lund
On Thu, Feb 20, 2014 at 2:04 PM, M
On Thu, Feb 20, 2014 at 7:52 AM, Hans Lund wrote:
> Ok, thats also what I expected, but not what I observed ;-)
Ahh, not good.
> For the very huge majority of index updates reopens are not an issue,
> minutes will be very fine. A very few updates are done 'interactively' and
> must be in RT (or
Ok, thats also what I expected, but not what I observed ;-)
For the very huge majority of index updates reopens are not an issue,
minutes will be very fine. A very few updates are done 'interactively' and
must be in RT (or as close as possible).
I don't know if this is a rare use case - but we do
Can you summarize what you observed?
What was the net throughput difference on Windows vs Linux?
Was everything else identical (same hardware, same JVM, etc.)?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Feb 20, 2014 at 5:23 AM, sree wrote:
> We tried a standalone program to index
If you already know the set of phrases you need to detect then you can
use Lucene's SynonymFilter to spot them and insert a new token.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Feb 20, 2014 at 7:21 AM, Benson Margulies wrote:
> It sounds like you've been asked to implement Named E
It sounds like you've been asked to implement Named Entity Recognition.
OpenNLP has some capability here. There are also, um, commercial
alternatives.
On Thu, Feb 20, 2014 at 6:24 AM, Yann-Erwan Perio wrote:
> On Thu, Feb 20, 2014 at 10:46 AM, Geet Gangwar
> wrote:
>
> Hi,
>
> > My requirement
On Thu, Feb 20, 2014 at 10:46 AM, Geet Gangwar wrote:
Hi,
> My requirement is it should have capabilities to match multiple words as
> one token. for example. When user passes String as International Business
> machine logo or IBM logo it should return International Business Machine as
> one tok
It is intended that there are two different stale times.
When a specific generation is requested, we wait for the minStaleSec
since the last reopen; this is to prevent too-frequent reopens when
specific gens are requested.
The maxStaleSec is how long we wait between reopens for the "normal"
perio
We tried a standalone program to index 50 documents from 100 threads
concurrently. We executed the program for 1000 threads with 10 mins delay to
avoid jvm warming issue as you suggested in last post.
Also we are running with restricted heap size i.e 124MB ( as our product is
running in linux with
Hi,
I have a requirement to write a custom tokenizer using Lucene framework.
My requirement is it should have capabilities to match multiple words as
one token. for example. When user passes String as International Business
machine logo or IBM logo it should return International Business Machine
Hi all
I'm a bit unsure about the intended function of
the ControlledRealTimeReopenThread in a NRT context - especially regarding
stale times.
As of now if you are waiting for a generation to become refreshed, it looks
like the stale time is either the min stale time or the max stale time. Is
thi
On Thu, Feb 20, 2014 at 1:43 PM, Jamie Johnson wrote:
> Is there a way to limit the fields a user can query by when using the
> standard query parser or a way to get all fields/terms that make up a query
> without writing custom code for each query subclass?
If you mean StandardQueryParser, you c
22 matches
Mail list logo