That’s great! I will look into it. Thanks a lot!
-Siraj
-Original Message-
From: Adrien Grand
Sent: Tuesday, November 5, 2024 11:19 AM
To: java-user@lucene.apache.org
Subject: Re: Indexing multiple numeric ranges
Hello Siraj,
You can do this by creating a Lucene document that has 3
Hello Siraj,
You can do this by creating a Lucene document that has 3
org.apache.lucene.document.IntRange fields in it, one for each of the
ranges that you would like to index. Lucene will then match the document if
any of the ranges matches.
On Tue, Nov 5, 2024 at 5:16 PM Siraj Haider
wrote:
>
Hello,
Thanks Matt. I also had run a test dramatically increasing the LRU cache,
but in the end it was still better for our case to run with the previous
cache. We won't encounter the bug that the switch to LRU cache addresses
(for now). After returning to the previous cache implementation we
act
Marc,
We also ran into this problem on updating to Lucene 9.5. We found it
sufficient in our use case to just bump up LRU cache in the constructor to
a high enough value to not pose a performance problem. The default value
of 4k was way too low for our use case with millions of unique facet
valu
Thanks for the follow-up, Marc. I'm not familiar with this part of the code
but reading through the original issue that changed this, the rationale
was to avoid a memleak from a thread local. The LRU cache has
synchronized blocks sprinkled all over it - again, I haven't checked but it
seems the ove
Hello,
I've done bisect between 9.4.2 and 9.5 and found the PR affecting my
particular set up : https://github.com/apache/lucene/pull/12093
This is the switch from UTF8TaxonomyWriterCache to an
LruTaxonomyWriterCache. I don't see a way to control the size of this
cache to never expel items and ma
Hello,
Thanks for the leads. I haven't yet gone as far as doing a git bisect, but
I have found that the big jump in time is in the call to
facetsConfig.build(taxonomyWriter, doc); I made a quick and dirty
instrumented version of the FacetsConfig class and found that calls to
TaxonomyWriter.add(Fac
Hi Marc,
You could try git bisect lucene repository to pinpoint the commit that
caused what you're observing. It'll take some time to build but it's a
logarithmic bisection and you'd know for sure where the problem is.
D.
On Thu, Apr 18, 2024 at 11:16 PM Marc Davenport
wrote:
> Hi Adrien et al
Does your application see a lot of document updates/deletes?
GITHUB#11761 could have potentially affected you. Whenever I see large
indexing times, my first suspicion is towards increased merge activity.
Regards,
Gautam Worah.
On Thu, Apr 18, 2024 at 2:14 PM Marc Davenport
wrote:
> Hi Adrien e
Hi Adrien et al,
I've been doing some investigation today and it looks like whatever the
change is, it happens between 9.4.2 and 9.5.0.
I made a smaller test set up for our code that mocks our documents and just
runs through the indexing portion of our code sending in batches of 4k
documents at a t
Hi Marc,
Nothing jumps to mind as a potential cause for this 2x regression. It would
be interesting to look at a profile.
On Wed, Apr 17, 2024 at 9:32 PM Marc Davenport
wrote:
> Hello,
> I'm finally migrating Lucene from 8.11.2 to 9.10.0 as our overall build can
> now support Java 11. The quick
Thank you for your fast response.
Yes i have tired this. Actually also There is directly polygon created from
geojson recommend me to do dame. Because its returns Polygon Array But is
it the Most efficient method if indexing spatial data ? Ana same For
MultiLine They are also type of Line Arrays
Hello!
Let's consider polygons. I imagine you are doing something like this to
index one polygon:
Polygon polygon =
Document document = new Document();
Field[] fields = LatLonShape.createIndexableFields(FIELDNAME, polygon);
for (Field f : fields) {
document.add(f);
}
So a multipolygon is
Thanks for Bringing
I will post it there
Regards
neo
--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-
Hi Neo,
You will likely find better help on the solr-user mailing-list. This
mailing list is for questions about Lucene.
Le mer. 11 avr. 2018 à 12:21, neotorand a écrit :
> with Solrcloud What happens if indexing is partially completed and ensemble
> goes down.What are the ways to Resume.In one
e a transaction log in parallel to
> > indexing,
> > >> so they commit very seldom. If the system crashes, the changes are
> > replayed
> > >> from tranlog since last commit.
> > >>
> > >> Uwe
> > >>
> > >>
gt; >>
> >> -
> >> Uwe Schindler
> >> Achterdiek 19, D-28357 Bremen
> >> http://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >> > -Original Message-
> >> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.c
>> > -----Original Message-
>> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com]
>> > Sent: Monday, January 29, 2018 11:29 AM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: indexing performance 6.6 vs 7.1
>> >
>> >
we
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com]
> > Sent: Monday, January 29, 2018 11:29 AM
> > To
28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Rob Audenaerde [mailto:rob.audenae...@gmail.com]
> Sent: Monday, January 29, 2018 11:29 AM
> To: java-user@lucene.apache.org
> Subject: Re: indexing performance 6.6 vs 7.1
>
> H
Hi all,
Some follow up (sorry for the delay).
We built a benchmark in our application, and profiled it (on a smallish
data set). What we currently see in the profiler is that in Lucene 7.1 the
calls to `commit()` take much longer.
The self-time committing in 6.6: 3,215 ms
The self-time committin
Robert:
Ah, right. I keep confusing my gmail lists
"lucene dev"
and
"lucene list"
Siiih.
On Thu, Jan 18, 2018 at 9:18 AM, Adrien Grand wrote:
> If you have sparse data, I would have expected index time to *decrease*,
> not increase.
>
> Can you enable the IW info stream and share
If you have sparse data, I would have expected index time to *decrease*,
not increase.
Can you enable the IW info stream and share flush + merge times to see
where indexing time goes?
If you can run with a profiler, this might also give useful information.
Le jeu. 18 janv. 2018 à 11:23, Rob Aude
Erick I don't think solr was mentioned here.
On Thu, Jan 18, 2018 at 8:03 AM, Erick Erickson wrote:
> My first question is always "are you running the Solr CPUs flat out?".
> My guess in this case is that the indexing client is the same and the
> problem is in Solr, but it's worth checking whethe
My first question is always "are you running the Solr CPUs flat out?".
My guess in this case is that the indexing client is the same and the
problem is in Solr, but it's worth checking whether the clients are
just somehow not delivering docs as fast as they were before.
My suspicion is that the in
I did use the Date into millisec and stored the long into index, this
helped me to convert the searched index into any date format later on the
o/p.
On Wed, Apr 5, 2017 at 6:08 PM, Frederik Van Hoyweghen <
frederik.vanhoyweg...@chapoo.com> wrote:
> Hey everyone,
>
> I'm seeing some conflicting su
;
> Uwe
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: aravinth thangasami [mailto:aravinththangas...@gmail.com]
> > Sent: Friday, April 7, 2017 8:54 AM
>
phi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: aravinth thangasami [mailto:aravinththangas...@gmail.com]
> Sent: Friday, April 7, 2017 8:54 AM
> To: java-user@lucene.apache.org
> Subject: Re: Indexing Numeric value in Lucene 4.10.4
>
> we don't have to sort o
we don't have to sort on that field
So that we thought of that approach
Thanks for your opinion
will consider improving precision step
Kind regards,
Aravinth
On Thu, Apr 6, 2017 at 8:51 PM, Erick Erickson
wrote:
> bq: What are your opinions on this?
>
> That this is not a sound approach. Why
bq: What are your opinions on this?
That this is not a sound approach. Why do you think Trie is expensive?
What evidence do you have at all for that? Strings are significantly
expensive relative to numeric fields. Plus, you can adjust the
precision step to reduce the "overhead" of a trie field.
I
n Hoyweghen
> [mailto:frederik.vanhoyweg...@chapoo.com]
> Sent: Wednesday, April 5, 2017 3:17 PM
> To: java-user@lucene.apache.org
> Subject: Re: Indexing a Date/DateTime/Time field in Lucene 4
>
> Let's say I want to search between 2 dates, search for a date that's
> before/after a
Let's say I want to search between 2 dates, search for a date that's
before/after another, etc (the usual stuff ^^ ), is this all with either
fieldtype?
Thanks for your reply!
Frederik
On Wed, Apr 5, 2017 at 3:04 PM, Adrien Grand wrote:
> Hi Frederik,
>
> Both options would work but LongField (
Hi Frederik,
Both options would work but LongField (or LongPoint on Lucene 6.0+) would
indeed provide better performance for range queries. If you need to sort or
aggregate date values, you might also want to add a NumericDocValuesField.
Le mer. 5 avr. 2017 à 14:38, Frederik Van Hoyweghen <
frede
Hi,
Any better architecture ideas for my below mentioned use case?
Regards,
Suriya
On Wed, 28 Dec 2016 at 11:27 PM, suriya prakash wrote:
> Hi,
>
> I have 100 thousand indexes in Hadoop grid because 90% of my indexes will
> be inactive and I can distribute the other active indexes based on loa
So when a query arrives, you know the query is only allowed to match
either module:1 (analyzed terms) or module:2 (not analyzed) but never
both? If so, you should be fine.
Though relevance will be sort of wonky, in case that matters, because
you are polluting the unique term space; you would get
You can do this, Lucene will let you, but it's typically a bad idea
for search relevance because some documents will return only if you
search for precisely the same whole token, others if you search for an
analyzed token, giving the user a broken experience.
Mike McCandless
http://blog.mikemcca
Hi All,
Can anyone say, is it advisable to have index with both analyzed and
not_analyzed values in one field?
Use case: i have custom fields in my product which can be configured
differently ( ANALYZED and NOT_ANALYZED ) in different modules
--
Kumaran R
On Wed, Oct 26, 2016 at 12:0
Hi Rajnish
It is not advisable to index values with two data types in a field.
Features like phrase query, sorting may break in those indexes.
related previous discussion :
http://www.gossamer-threads.com/lists/lucene/java-user/289159?do=post_view_flat#289159
-
Kumaran R
On Fri, Nov 4,
Ok mike.. thanks for the explanation... i have another doubt...
i read in some article like, we can have one storedfield & docvalue field
with same field... is it so?
--
Kumaran R
On Thu, Jul 28, 2016 at 9:29 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> OK, sorry, you cann
OK, sorry, you cannot change how the field is indexed for the same field
name across different field indices.
Lucene will "downgrade" that field to the lowest settings, e.g. "docs, no
positions" in your case.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jul 28, 2016 at 9:31 AM, Kumara
Hi Mike,
For your information, am using lucene 4.10.4.. am i missing anything?
--
Kumaran R
On Wed, Jul 27, 2016 at 1:52 AM, Kumaran Ramasubramanian wrote:
>
> Hi Mike,
>
> 1.if we index one field as analyzed and not analyzed using same name,
> phrase queries are not working (field "co
Hi Mike,
1.if we index one field as analyzed and not analyzed using same name,
phrase queries are not working (field "comp" was indexed without position
data, cannot run phrasequery) for analyzed terms also... because indexed
document ( term properties are not proper, even if tokenized, not able t
On Sat, Jul 23, 2016 at 4:48 AM, Kumaran Ramasubramanian wrote:
> Hi Mike,
>
> *Two different fields can be the same name*
>
> Is it so? You mean we can index one field as docvaluefield and also stored
> field, Using same name?
>
This should be fine, yes.
> And AFAIK, We cannot index one field
Hi Mike,
*Two different fields can be the same name*
Is it so? You mean we can index one field as docvaluefield and also stored
field, Using same name?
And AFAIK, We cannot index one field as analyzed and not analyzed using the
same name. Am i right?
Kumaran R
On Jul 21, 2016 11:50 PM, "Michae
Two different fields can be the same name.
I think the problem is that you are indexing it as doc values, which is not
searchable.
To make your numeric fields searchable, use e.g. LongPoint (as of Lucene
6.0) or LongField (before 6.0).
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jul
Use threads, only commit at the end (and use a near-real-time reader if you
want to search at points-in-time), increase IW's indexing buffer.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 6, 2016 at 4:37 PM, Nomar Morado wrote:
> Hi
>
> I am trying to write 15 million documents (a
Actually Lucene terms can be arbitrary/fully binary tokens in the
low-level postings APIs.
It's just that our analysis APIs are geared towards analyzing text,
but using StringField you can easily index an arbitrary single-token
byte[].
Mike McCandless
http://blog.mikemccandless.com
On Tue, Sep
You are correct that Lucene only works with text (no binary or primitives),
Base64 would be the way I would suggest.
On Monday, August 31, 2015 11:19 AM, Dan Smith wrote:
What's the best way to index binary data in Lucene? I'm adding a Lucene
index to a key value store, and I want t
Aha! My version of Lucene was out of date. That should work perfectly.
Thanks,
-Dan
Original message
From: Michael McCandless
Date:08/31/2015 12:57 PM (GMT-08:00)
To: Lucene Users , dsm...@pivotal.io
Cc:
Subject: Re: Indexing a binary field
StringField now also
StringField now also takes a BytesRef value to index, so you can index
a single binary token that way. Does that work?
Mike McCandless
http://blog.mikemccandless.com
On Mon, Aug 31, 2015 at 12:19 PM, Dan Smith wrote:
> What's the best way to index binary data in Lucene? I'm adding a Lucene
>
Hi, folks!
This is not a trivial question, but I appeal to your experience with Lucene...
Lucene Implementation Version: 2.9.1
Solr Implementation Version: 1.4
Java version: 1.6
This is legacy environment with a huge amount of indexed data. The main
question that I encountered few days ago was
I think if you follow the Field.fieldType().numericType() chain you'll
end up with INT or DOUBLE or whatever.
But if you know you stored it as an IntField then surely you already
know it's an integer? Unless you sometimes store different things in
the one field. I wouldn't do that.
--
Ian.
O
You could store the length of the field (in terms) in a second field and
then add a MUST term to the BooleanQuery which is a RangeQuery with an
upper bound that is the maximum length that can match.
-- Jack Krupansky
On Wed, Feb 18, 2015 at 4:54 AM, Ian Lea wrote:
> You mean you'd like a Boolea
Oops, alright, I'll probably look around for a workaround.
On Wed, Feb 18, 2015 at 3:24 PM, Ian Lea wrote:
> You mean you'd like a BooleanQuery.setMaximumNumberShouldMatch()
> method? Unfortunately that doesn't exist and I can't think of a
> simple way of doing it.
>
>
> --
> Ian.
>
>
> On Wed,
You mean you'd like a BooleanQuery.setMaximumNumberShouldMatch()
method? Unfortunately that doesn't exist and I can't think of a
simple way of doing it.
--
Ian.
On Wed, Feb 18, 2015 at 5:26 AM, Deepak Gopalakrishnan wrote:
> Thanks Ian. Also, if I have a unigram in the query, and I want to ma
Thanks Ian. Also, if I have a unigram in the query, and I want to make sure
I match only index entries that do not have more than 2 tokens, is there a
way to do that too?
Thanks
On Wed, Feb 18, 2015 at 2:23 AM, Ian Lea wrote:
> Break the query into words then add them as TermQuery instances as
Break the query into words then add them as TermQuery instances as
optional clauses to a BooleanQuery with a call to
setMinimumNumberShouldMatch(2) somewhere along the line. You may want
to do some parsing or analysis on the query terms to avoid problems of
case matching and the like.
--
Ian.
Thank you Uwe!
Your reply is very useful and insightful. Your workflow matches my
requirements exactly.
My confusion was coming from the fact that I didn't understand what the
Analyzers are doing. Actually I am still wondering, isn't it possible to
provide an abstraction on Lucene side to make th
An example why you might do this is if your input is a term vector (ie
a list of unique terms with weights) rather than a text in the usual
sense. It does seem as if the best way forward in this case is to
generate a text with repeated terms. I looked at the alternative and it
is quite invol
You could consider payloads but why do you want to do this?
What's the use case here? Sounds a little like an XY problem, you're
asking us how to do something without explaining the why; there
may be other ways to accomplish your task.
For instance, there's the "termfreq" function, which an be ret
Hi,
> OK. I found the Alfresco code on GitHub. So it's open source it seems.
>
> And I found the DateTimeAnalyser, so I will just take that code as a starting
> point:
> https://github.com/lsbueno/alfresco/tree/master/root/projects/repository/
> source/java/org/alfresco/repo/search/impl/lucene/an
OK. I found the Alfresco code on GitHub. So it's open source it seems.
And I found the DateTimeAnalyser, so I will just take that code as a
starting point:
https://github.com/lsbueno/alfresco/tree/master/root/projects/repository/source/java/org/alfresco/repo/search/impl/lucene/analysis
Thank you
Thank you Barry, I really appreciate your time to respond,
Let me clarify this a little bit more. I think it was not clear.
I know how to parse dates, this is not the question here. (See my previous
email: "how can I pipe my converter logic into the indexing process?")
All of your solutions guys
Hi Gergely,
Writing an analyzer would work but it is unnecessarily complicated. You
could just parse the date from the string in your input code and index it
in the LongField like this:
SimpleDateFormat format = new SimpleDateFormat("-MM-dd HH:mm:ss.S'Z'");
format.setTimeZone(TimeZone.getTime
Thank you for taking your time to respond Karthik,
Can you show me an example how to convert DateTime to milliseconds? I mean
how can I pipe my converter logic into the indexing process?
I suspect I need to write my own Analyzer/Tokenizer to achieve this. Is
this correct?
2015-02-09 22:58 GMT+09
Hi
Long time ago,.. I used to store datetime in millisecond .
TermRangequery used to work in perfect condition
Convert all datetime to millisecond and index the same.
On search condition again convert datetime to millisecond and use
TermRangequery.
With regards
Karthik
On Feb 9, 2015 1:24
Thank you for the great answer Uwe!
Sadly my department rejected the above combination of using Logstash +
Elasticsearch. According to their experience, elastic search works fine on
about 3 days of log data, but slows terribly down providing the magnitude
of 3 months of data or so.
But I will tak
Hi,
> I am in the beginning of implementing a Lucene application which would
> supposedly search through some log files.
>
> One of the requirements is to return results between a time range. Let's say
> these are two lines in a series of log files:
> 2015-02-08 00:02:06.852Z INFO...
> ...
> 2015
Basically there is a stored fork and an indexed fork.
If you specify the input should be stored, a verbatim
copy is put in a special segment file with the
extension .fdt.
This is entirely orthogonal to indexing the tokens,
which are what search operates on.
So you can store and index, store but n
6 PM
> To: java-user@lucene.apache.org
> Subject: Re: Indexing Error
>
> Looks like the version of Lucene on your runtime classpath is not the same as
> the one you compiled against. In some recent version they changed the
> naming convention in those fields. You may want:
Hi,
This generally happens, if you have an older version of Lucene somewhere in
your classpath. E.g., if older Lucene was placed outside of the webapp
somewhere in the classpath of Websphere itself.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...
Looks like the version of Lucene on your runtime classpath is not the same as
the one you compiled against. In some recent version they changed the naming
convention in those fields. You may want: ‘LUCENE_4_7_0’.
https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/util/Version.html
The second solution sounds great and a lot more natural than payloads.
I know how to overwrite the Similarity class but this one would only be
called at search time and then already use the existing term frequency.
Looking up the probabilities every time a search is performed is
probably also
There are a few approaches possible here, we had a similar use case and
went for the second one below. I primarily deal with Solr, so I don't know
of Lucene-only examples, but hopefully you can dig this up..
(1) You can attach payloads to each occurrence of the tag, and modify the
scoring to use t
4 3:20 PM
> To: java-user@lucene.apache.org
> Subject: Re: indexing json
>
> Elasticsearch does what I need, but I'd like to avoid bringing all the cluster
> management bits along with it. I will take a look at siren
>
> thanks.
>
>
> On Thu, Sep 4, 2014 at
Elasticsearch does what I need, but I'd like to avoid bringing all the
cluster management bits along with it. I will take a look at siren
thanks.
On Thu, Sep 4, 2014 at 8:11 AM, Marcio Napoli
wrote:
> Hey!
>
> Elasticsearch Is a good option and uses Lucene as core :)
>
> http://www.elasticsear
Hey!
Elasticsearch Is a good option and uses Lucene as core :)
http://www.elasticsearch.org/overview/elasticsearch/
[]s
Napoli
http://numere.stela.org.br
2014-09-04 7:46 GMT-03:00 Larry White :
> Hi,
>
> Is there a way to index an entire json document automatically as one can do
> with the
On 9/4/2014 6:46 AM, Larry White wrote:
Hi,
Is there a way to index an entire json document automatically as one can do
with the new PostgreSQL json support? By automatically, I mean to create an
inverted index entry (path: value) for each element in the document without
having to specify in adv
the ngram token filter, and the a query of 512 would match by itself:
>> http://lucene.apache.org/core/4_9_0/analyzers-common/org/
>> apache/lucene/analysis/ngram/NGramTokenFilter.html
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Erick Erick
-- From: Erick Erickson
> Sent: Thursday, August 28, 2014 11:52 PM
> To: java-user
> Subject: Re: indexing all suffixes to support leading wildcard?
>
>
> The "usual" approach is to index to a second field but backwards.
> See ReverseStringFilter... Then all your
: java-user
Subject: Re: indexing all suffixes to support leading wildcard?
The "usual" approach is to index to a second field but backwards.
See ReverseStringFilter... Then all your leading wildcards
are really trailing wildcards in the reversed field.
Best,
Erick
On Thu, Aug 28,
The "usual" approach is to index to a second field but backwards.
See ReverseStringFilter... Then all your leading wildcards
are really trailing wildcards in the reversed field.
Best,
Erick
On Thu, Aug 28, 2014 at 10:38 AM, Rob Nikander
wrote:
> Hi,
>
> I've got some short fields (phone nu
Again, because merging is based on byte size, you have to be careful how
you measure (hint: use LogDocMergePolicy).
Otherwise you are comparing apples and oranges.
Separately, your configuration is using experimental codecs like
"disk"/"memory" which arent as heavily benchmarked etc as the defaul
al Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Saturday, June 14, 2014 6:27 AM
To: java-user
Subject: Re: Indexing size increase 20% after switching from lucene 4.4 to 4.5
or 4.8 with BinaryDocValuesField
They are still encoded the same way: so likely you arent testing ap
They are still encoded the same way: so likely you arent testing apples to
apples (e.g. different number of segments or whatever).
On Fri, Jun 13, 2014 at 8:28 PM, Zhao, Gang wrote:
>
>
> I used lucene 4.4 to create index for some documents. One of the indexing
> fields is BinaryDocValuesField.
It all depends on the statistics: how the ranges are correlated. If the
integer range is small: from 1-2, for example, you might consider
indexing every integer in each range as a separate value, especially if
most documents will only have a small number of small ranges.
If there are too
Hi,
Continuing your example, you could do the following:
Document:
range1_from:1
range1_to:3
range2_from:12
range2_to:20
range3_from:13290
range3_to:16509
... other fields...
Query (for "2"):
(+range1_from:[* TO 2] +range1_to:[2 TO *]) OR
(+range2_from:[* TO 2] +range2_to:[2 TO *]) OR
(+ran
hello Arjen van der Meijden
if its not too much of a trouble can you point me to any sites with example
implementation on Neo4j for problem similar to mine
i want to check if neo4j resolve all my problems as this is new technology i
need to do a lot of research and feew examples will be a good st
Given that he is already using Java, simply building a object-tree based
on the text file may be also possible. Although a 300MB file may turn
out to be fairly large in memory consumption (possibly caused by quite a
bit of object-overhead).
If that turns out to consume too much memory there ar
Hello!
To me, Lucene doesn't sound as good solution to this problem.
It seems to me that you need classic relational database. Storing tree
structure in relational DBs isn't simple thing but this presentation will
help you:
http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back (slides
Thanks Doug
i have gone through SIREN DB Unfortunately i couldn't find enough examples
which i could match to my requirement could you point me to any examples
involving tree structure represented in text files
regards,
Girish Durgasi
--
View this message in context:
http://lucene.472066.
Hey
You might want to check out SirenDB, set of Lucerne and Solr plugins for
advanced nested/tree support. They even have a custom codec for nested docs.
We've been pretty interested in it here at OpenSource Connections
http://sirendb.com/
Sent from Windows Mail
From: kumagiris
März 2014 um 16:01 Uhr
Von: "Uwe Schindler"
An: java-user@lucene.apache.org
Betreff: RE: Indexing and storing very large documents
Stored fields do not support Readers at the moment.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u
Ah, ok, so i cannot use PostingsHighlighter as it requires stored fields, right?
Regards
Mirko
Gesendet: Montag, 24. März 2014 um 16:01 Uhr
Von: "Uwe Schindler"
An: java-user@lucene.apache.org
Betreff: RE: Indexing and storing very large documents
Stored fields do not support
Stored fields do not support Readers at the moment.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Mirko Sertic [mailto:mirko.ser...@web.de]
> Sent: Monday, March 24, 2014 3:03 PM
> To: java-user@lucene
SynonymFilter makes sense.
The planned payloads are indeed not needed. I guess a better solution would
be making out of the boost an attribute during query time that will be
consumed in the queryParser in order to boost these n-gram terms.
Thanks for the hints.
Manuel
On Wed, Mar 12, 2014 at 12
You could also use SynonymFilter?
Why does the boost need to be encoded in the index (in a payload) vs
at query time when you create the TermQuery for that term? Does the
boost vary depending on the surrounding context / document?
Mike McCandless
http://blog.mikemccandless.com
On Wed, Mar 12,
Thanks, Mike.
Once I was that deep in the guts of the indexer, I knew things were
probably not going to go my way.
I'll check out CachingTokenFilter.
On Tue, Mar 11, 2014 at 3:09 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> You can't rely on how IndexWriter will iterate/consum
You can't rely on how IndexWriter will iterate/consume those fields;
that's an implementation detail.
Maybe you could use CachingTokenFilter to pre-process the text fields
and append the new fields? And then during indexing, replay the
cached tokens, so you don't have to tokenize twice.
Mike McC
Hi all!
A little bit more of exploration:)
After indexing with multiple atomic field values, here is what I get:
indexSearcher.doc(0).getFields("gramm")
stored,indexed,tokenized,termVector,omitNorms
stored,indexed,tokenized,termVector,omitNorms
stored,indexed,tokenized,termVector,omitNorms
stor
1 - 100 of 733 matches
Mail list logo