RE: Indexing multiple numeric ranges

2024-11-05 Thread Siraj Haider
That’s great! I will look into it. Thanks a lot! -Siraj -Original Message- From: Adrien Grand Sent: Tuesday, November 5, 2024 11:19 AM To: java-user@lucene.apache.org Subject: Re: Indexing multiple numeric ranges Hello Siraj, You can do this by creating a Lucene document that has 3

Re: Indexing multiple numeric ranges

2024-11-05 Thread Adrien Grand
Hello Siraj, You can do this by creating a Lucene document that has 3 org.apache.lucene.document.IntRange fields in it, one for each of the ranges that you would like to index. Lucene will then match the document if any of the ranges matches. On Tue, Nov 5, 2024 at 5:16 PM Siraj Haider wrote: >

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-26 Thread Marc Davenport
Hello, Thanks Matt. I also had run a test dramatically increasing the LRU cache, but in the end it was still better for our case to run with the previous cache. We won't encounter the bug that the switch to LRU cache addresses (for now). After returning to the previous cache implementation we act

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-24 Thread Matt Davis
Marc, We also ran into this problem on updating to Lucene 9.5. We found it sufficient in our use case to just bump up LRU cache in the constructor to a high enough value to not pose a performance problem. The default value of 4k was way too low for our use case with millions of unique facet valu

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-23 Thread Dawid Weiss
Thanks for the follow-up, Marc. I'm not familiar with this part of the code but reading through the original issue that changed this, the rationale was to avoid a memleak from a thread local. The LRU cache has synchronized blocks sprinkled all over it - again, I haven't checked but it seems the ove

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-22 Thread Marc Davenport
Hello, I've done bisect between 9.4.2 and 9.5 and found the PR affecting my particular set up : https://github.com/apache/lucene/pull/12093 This is the switch from UTF8TaxonomyWriterCache to an LruTaxonomyWriterCache. I don't see a way to control the size of this cache to never expel items and ma

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-19 Thread Marc Davenport
Hello, Thanks for the leads. I haven't yet gone as far as doing a git bisect, but I have found that the big jump in time is in the call to facetsConfig.build(taxonomyWriter, doc); I made a quick and dirty instrumented version of the FacetsConfig class and found that calls to TaxonomyWriter.add(Fac

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-18 Thread Dawid Weiss
Hi Marc, You could try git bisect lucene repository to pinpoint the commit that caused what you're observing. It'll take some time to build but it's a logarithmic bisection and you'd know for sure where the problem is. D. On Thu, Apr 18, 2024 at 11:16 PM Marc Davenport wrote: > Hi Adrien et al

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-18 Thread Gautam Worah
Does your application see a lot of document updates/deletes? GITHUB#11761 could have potentially affected you. Whenever I see large indexing times, my first suspicion is towards increased merge activity. Regards, Gautam Worah. On Thu, Apr 18, 2024 at 2:14 PM Marc Davenport wrote: > Hi Adrien e

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-18 Thread Marc Davenport
Hi Adrien et al, I've been doing some investigation today and it looks like whatever the change is, it happens between 9.4.2 and 9.5.0. I made a smaller test set up for our code that mocks our documents and just runs through the indexing portion of our code sending in batches of 4k documents at a t

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-17 Thread Adrien Grand
Hi Marc, Nothing jumps to mind as a potential cause for this 2x regression. It would be interesting to look at a profile. On Wed, Apr 17, 2024 at 9:32 PM Marc Davenport wrote: > Hello, > I'm finally migrating Lucene from 8.11.2 to 9.10.0 as our overall build can > now support Java 11. The quick

Re: Indexing & Searching Geometries ( MultiLine & MultiPolygon )

2020-10-02 Thread thturk
Thank you for your fast response. Yes i have tired this. Actually also There is directly polygon created from geojson recommend me to do dame. Because its returns Polygon Array But is it the Most efficient method if indexing spatial data ? Ana same For MultiLine They are also type of Line Arrays

Re: Indexing & Searching Geometries ( MultiLine & MultiPolygon )

2020-10-02 Thread Ignacio Vera
Hello! Let's consider polygons. I imagine you are doing something like this to index one polygon: Polygon polygon = Document document = new Document(); Field[] fields = LatLonShape.createIndexableFields(FIELDNAME, polygon); for (Field f : fields) { document.add(f); } So a multipolygon is

Re: Indexing fails on the way

2018-04-11 Thread neotorand
Thanks for Bringing I will post it there Regards neo -- Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-

Re: Indexing fails on the way

2018-04-11 Thread Adrien Grand
Hi Neo, You will likely find better help on the solr-user mailing-list. This mailing list is for questions about Lucene. Le mer. 11 avr. 2018 à 12:21, neotorand a écrit : > with Solrcloud What happens if indexing is partially completed and ensemble > goes down.What are the ways to Resume.In one

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Rob Audenaerde
e a transaction log in parallel to > > indexing, > > >> so they commit very seldom. If the system crashes, the changes are > > replayed > > >> from tranlog since last commit. > > >> > > >> Uwe > > >> > > >>

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Adrien Grand
gt; >> > >> - > >> Uwe Schindler > >> Achterdiek 19, D-28357 Bremen > >> http://www.thetaphi.de > >> eMail: u...@thetaphi.de > >> > >> > -Original Message- > >> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.c

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Rob Audenaerde
>> > -----Original Message- >> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] >> > Sent: Monday, January 29, 2018 11:29 AM >> > To: java-user@lucene.apache.org >> > Subject: Re: indexing performance 6.6 vs 7.1 >> > >> >

Re: indexing performance 6.6 vs 7.1

2018-01-29 Thread Rob Audenaerde
we > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] > > Sent: Monday, January 29, 2018 11:29 AM > > To

RE: indexing performance 6.6 vs 7.1

2018-01-29 Thread Uwe Schindler
28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] > Sent: Monday, January 29, 2018 11:29 AM > To: java-user@lucene.apache.org > Subject: Re: indexing performance 6.6 vs 7.1 > > H

Re: indexing performance 6.6 vs 7.1

2018-01-29 Thread Rob Audenaerde
Hi all, Some follow up (sorry for the delay). We built a benchmark in our application, and profiled it (on a smallish data set). What we currently see in the profiler is that in Lucene 7.1 the calls to `commit()` take much longer. The self-time committing in 6.6: 3,215 ms The self-time committin

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Erick Erickson
Robert: Ah, right. I keep confusing my gmail lists "lucene dev" and "lucene list" Siiih. On Thu, Jan 18, 2018 at 9:18 AM, Adrien Grand wrote: > If you have sparse data, I would have expected index time to *decrease*, > not increase. > > Can you enable the IW info stream and share

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Adrien Grand
If you have sparse data, I would have expected index time to *decrease*, not increase. Can you enable the IW info stream and share flush + merge times to see where indexing time goes? If you can run with a profiler, this might also give useful information. Le jeu. 18 janv. 2018 à 11:23, Rob Aude

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Robert Muir
Erick I don't think solr was mentioned here. On Thu, Jan 18, 2018 at 8:03 AM, Erick Erickson wrote: > My first question is always "are you running the Solr CPUs flat out?". > My guess in this case is that the indexing client is the same and the > problem is in Solr, but it's worth checking whethe

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Erick Erickson
My first question is always "are you running the Solr CPUs flat out?". My guess in this case is that the indexing client is the same and the problem is in Solr, but it's worth checking whether the clients are just somehow not delivering docs as fast as they were before. My suspicion is that the in

Re: Indexing a Date/DateTime/Time field in Lucene 4

2017-04-08 Thread KARTHIK SHIVAKUMAR
I did use the Date into millisec and stored the long into index, this helped me to convert the searched index into any date format later on the o/p. On Wed, Apr 5, 2017 at 6:08 PM, Frederik Van Hoyweghen < frederik.vanhoyweg...@chapoo.com> wrote: > Hey everyone, > > I'm seeing some conflicting su

Re: Indexing Numeric value in Lucene 4.10.4

2017-04-07 Thread aravinth thangasami
; > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: aravinth thangasami [mailto:aravinththangas...@gmail.com] > > Sent: Friday, April 7, 2017 8:54 AM >

RE: Indexing Numeric value in Lucene 4.10.4

2017-04-07 Thread Uwe Schindler
phi.de eMail: u...@thetaphi.de > -Original Message- > From: aravinth thangasami [mailto:aravinththangas...@gmail.com] > Sent: Friday, April 7, 2017 8:54 AM > To: java-user@lucene.apache.org > Subject: Re: Indexing Numeric value in Lucene 4.10.4 > > we don't have to sort o

Re: Indexing Numeric value in Lucene 4.10.4

2017-04-06 Thread aravinth thangasami
we don't have to sort on that field So that we thought of that approach Thanks for your opinion will consider improving precision step Kind regards, Aravinth On Thu, Apr 6, 2017 at 8:51 PM, Erick Erickson wrote: > bq: What are your opinions on this? > > That this is not a sound approach. Why

Re: Indexing Numeric value in Lucene 4.10.4

2017-04-06 Thread Erick Erickson
bq: What are your opinions on this? That this is not a sound approach. Why do you think Trie is expensive? What evidence do you have at all for that? Strings are significantly expensive relative to numeric fields. Plus, you can adjust the precision step to reduce the "overhead" of a trie field. I

RE: Indexing a Date/DateTime/Time field in Lucene 4

2017-04-05 Thread Uwe Schindler
n Hoyweghen > [mailto:frederik.vanhoyweg...@chapoo.com] > Sent: Wednesday, April 5, 2017 3:17 PM > To: java-user@lucene.apache.org > Subject: Re: Indexing a Date/DateTime/Time field in Lucene 4 > > Let's say I want to search between 2 dates, search for a date that's > before/after a

Re: Indexing a Date/DateTime/Time field in Lucene 4

2017-04-05 Thread Frederik Van Hoyweghen
Let's say I want to search between 2 dates, search for a date that's before/after another, etc (the usual stuff ^^ ), is this all with either fieldtype? Thanks for your reply! Frederik On Wed, Apr 5, 2017 at 3:04 PM, Adrien Grand wrote: > Hi Frederik, > > Both options would work but LongField (

Re: Indexing a Date/DateTime/Time field in Lucene 4

2017-04-05 Thread Adrien Grand
Hi Frederik, Both options would work but LongField (or LongPoint on Lucene 6.0+) would indeed provide better performance for range queries. If you need to sort or aggregate date values, you might also want to add a NumericDocValuesField. Le mer. 5 avr. 2017 à 14:38, Frederik Van Hoyweghen < frede

Re: Indexing architecture

2017-01-04 Thread suriya prakash
Hi, Any better architecture ideas for my below mentioned use case? Regards, Suriya On Wed, 28 Dec 2016 at 11:27 PM, suriya prakash wrote: > Hi, > > I have 100 thousand indexes in Hadoop grid because 90% of my indexes will > be inactive and I can distribute the other active indexes based on loa

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Michael McCandless
So when a query arrives, you know the query is only allowed to match either module:1 (analyzed terms) or module:2 (not analyzed) but never both? If so, you should be fine. Though relevance will be sort of wonky, in case that matters, because you are polluting the unique term space; you would get

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Michael McCandless
You can do this, Lucene will let you, but it's typically a bad idea for search relevance because some documents will return only if you search for precisely the same whole token, others if you search for an analyzed token, giving the user a broken experience. Mike McCandless http://blog.mikemcca

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Kumaran Ramasubramanian
​Hi All, ​ Can anyone say, is it advisable to have index with both analyzed and not_analyzed values in one field? ​Use case: i have custom fields in my product which can be configured differently ( ANALYZED and NOT_ANALYZED ) in different modules -- Kumaran R On Wed, Oct 26, 2016 at 12:0

Re: Indexing values of different datatype under same field

2016-11-04 Thread Kumaran Ramasubramanian
Hi Rajnish It is not advisable to index values with two data types in a field. Features like phrase query, sorting may break in those indexes. related previous discussion : http://www.gossamer-threads.com/lists/lucene/java-user/289159?do=post_view_flat#289159 - Kumaran R On Fri, Nov 4,

Re: Indexing and storing Long fields

2016-07-28 Thread Kumaran Ramasubramanian
Ok mike.. thanks for the explanation... i have another doubt... i read in some article like, we can have one storedfield & docvalue field with same field... is it so? -- Kumaran R On Thu, Jul 28, 2016 at 9:29 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > OK, sorry, you cann

Re: Indexing and storing Long fields

2016-07-28 Thread Michael McCandless
OK, sorry, you cannot change how the field is indexed for the same field name across different field indices. Lucene will "downgrade" that field to the lowest settings, e.g. "docs, no positions" in your case. Mike McCandless http://blog.mikemccandless.com On Thu, Jul 28, 2016 at 9:31 AM, Kumara

Re: Indexing and storing Long fields

2016-07-28 Thread Kumaran Ramasubramanian
Hi Mike, For your information, am using lucene 4.10.4.. am i missing anything? ​-- Kumaran R​ On Wed, Jul 27, 2016 at 1:52 AM, Kumaran Ramasubramanian wrote: > > Hi Mike, > > 1.if we index one field as analyzed and not analyzed using same name, > phrase queries are not working (field "co

Re: Indexing and storing Long fields

2016-07-26 Thread Kumaran Ramasubramanian
Hi Mike, 1.if we index one field as analyzed and not analyzed using same name, phrase queries are not working (field "comp" was indexed without position data, cannot run phrasequery) for analyzed terms also... because indexed document ( term properties are not proper, even if tokenized, not able t

Re: Indexing and storing Long fields

2016-07-23 Thread Michael McCandless
On Sat, Jul 23, 2016 at 4:48 AM, Kumaran Ramasubramanian wrote: > Hi Mike, > > *Two different fields can be the same name* > > Is it so? You mean we can index one field as docvaluefield and also stored > field, Using same name? > This should be fine, yes. > And AFAIK, We cannot index one field

Re: Indexing and storing Long fields

2016-07-23 Thread Kumaran Ramasubramanian
Hi Mike, *Two different fields can be the same name* Is it so? You mean we can index one field as docvaluefield and also stored field, Using same name? And AFAIK, We cannot index one field as analyzed and not analyzed using the same name. Am i right? Kumaran R On Jul 21, 2016 11:50 PM, "Michae

Re: Indexing and storing Long fields

2016-07-21 Thread Michael McCandless
Two different fields can be the same name. I think the problem is that you are indexing it as doc values, which is not searchable. To make your numeric fields searchable, use e.g. LongPoint (as of Lucene 6.0) or LongField (before 6.0). Mike McCandless http://blog.mikemccandless.com On Thu, Jul

Re: indexing 15 million documents to lucene

2016-07-06 Thread Michael McCandless
Use threads, only commit at the end (and use a near-real-time reader if you want to search at points-in-time), increase IW's indexing buffer. Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 4:37 PM, Nomar Morado wrote: > Hi > > I am trying to write 15 million documents (a

Re: Indexing a binary field

2015-09-01 Thread Michael McCandless
Actually Lucene terms can be arbitrary/fully binary tokens in the low-level postings APIs. It's just that our analysis APIs are geared towards analyzing text, but using StringField you can easily index an arbitrary single-token byte[]. Mike McCandless http://blog.mikemccandless.com On Tue, Sep

Re: Indexing a binary field

2015-09-01 Thread Mark Hanfland
You are correct that Lucene only works with text (no binary or primitives), Base64 would be the way I would suggest. On Monday, August 31, 2015 11:19 AM, Dan Smith wrote: What's the best way to index binary data in Lucene? I'm adding a Lucene index to a key value store, and I want t

Re: Indexing a binary field

2015-08-31 Thread Dan Smith
Aha! My version of Lucene was out of date. That should work perfectly.  Thanks,  -Dan Original message From: Michael McCandless Date:08/31/2015 12:57 PM (GMT-08:00) To: Lucene Users , dsm...@pivotal.io Cc: Subject: Re: Indexing a binary field StringField now also

Re: Indexing a binary field

2015-08-31 Thread Michael McCandless
StringField now also takes a BytesRef value to index, so you can index a single binary token that way. Does that work? Mike McCandless http://blog.mikemccandless.com On Mon, Aug 31, 2015 at 12:19 PM, Dan Smith wrote: > What's the best way to index binary data in Lucene? I'm adding a Lucene >

Java versions transition without re-indexing.

2015-03-24 Thread Bogdan Snisar
Hi, folks! This is not a trivial question, but I appeal to your experience with Lucene... Lucene Implementation Version: 2.9.1 Solr Implementation Version: 1.4 Java version: 1.6 This is legacy environment with a huge amount of indexed data. The main question that I encountered few days ago was

Re: Indexing an IntField but getting SotredField from found Document

2015-02-19 Thread Ian Lea
I think if you follow the Field.fieldType().numericType() chain you'll end up with INT or DOUBLE or whatever. But if you know you stored it as an IntField then surely you already know it's an integer? Unless you sometimes store different things in the one field. I wouldn't do that. -- Ian. O

Re: Indexing Query

2015-02-18 Thread Jack Krupansky
You could store the length of the field (in terms) in a second field and then add a MUST term to the BooleanQuery which is a RangeQuery with an upper bound that is the maximum length that can match. -- Jack Krupansky On Wed, Feb 18, 2015 at 4:54 AM, Ian Lea wrote: > You mean you'd like a Boolea

Re: Indexing Query

2015-02-18 Thread Deepak Gopalakrishnan
Oops, alright, I'll probably look around for a workaround. On Wed, Feb 18, 2015 at 3:24 PM, Ian Lea wrote: > You mean you'd like a BooleanQuery.setMaximumNumberShouldMatch() > method? Unfortunately that doesn't exist and I can't think of a > simple way of doing it. > > > -- > Ian. > > > On Wed,

Re: Indexing Query

2015-02-18 Thread Ian Lea
You mean you'd like a BooleanQuery.setMaximumNumberShouldMatch() method? Unfortunately that doesn't exist and I can't think of a simple way of doing it. -- Ian. On Wed, Feb 18, 2015 at 5:26 AM, Deepak Gopalakrishnan wrote: > Thanks Ian. Also, if I have a unigram in the query, and I want to ma

Re: Indexing Query

2015-02-17 Thread Deepak Gopalakrishnan
Thanks Ian. Also, if I have a unigram in the query, and I want to make sure I match only index entries that do not have more than 2 tokens, is there a way to do that too? Thanks On Wed, Feb 18, 2015 at 2:23 AM, Ian Lea wrote: > Break the query into words then add them as TermQuery instances as

Re: Indexing Query

2015-02-17 Thread Ian Lea
Break the query into words then add them as TermQuery instances as optional clauses to a BooleanQuery with a call to setMinimumNumberShouldMatch(2) somewhere along the line. You may want to do some parsing or analysis on the query terms to avoid problems of case matching and the like. -- Ian.

Re: Indexing and searching a DateTime range

2015-02-11 Thread Gergely Nagy
Thank you Uwe! Your reply is very useful and insightful. Your workflow matches my requirements exactly. My confusion was coming from the fact that I didn't understand what the Analyzers are doing. Actually I am still wondering, isn't it possible to provide an abstraction on Lucene side to make th

Re: Indexing documents with pre-calculated term frequencies

2015-02-11 Thread Michael Sokolov
An example why you might do this is if your input is a term vector (ie a list of unique terms with weights) rather than a text in the usual sense. It does seem as if the best way forward in this case is to generate a text with repeated terms. I looked at the alternative and it is quite invol

Re: Indexing documents with pre-calculated term frequencies

2015-02-11 Thread Erick Erickson
You could consider payloads but why do you want to do this? What's the use case here? Sounds a little like an XY problem, you're asking us how to do something without explaining the why; there may be other ways to accomplish your task. For instance, there's the "termfreq" function, which an be ret

RE: Indexing and searching a DateTime range

2015-02-10 Thread Uwe Schindler
Hi, > OK. I found the Alfresco code on GitHub. So it's open source it seems. > > And I found the DateTimeAnalyser, so I will just take that code as a starting > point: > https://github.com/lsbueno/alfresco/tree/master/root/projects/repository/ > source/java/org/alfresco/repo/search/impl/lucene/an

Re: Indexing and searching a DateTime range

2015-02-09 Thread Gergely Nagy
OK. I found the Alfresco code on GitHub. So it's open source it seems. And I found the DateTimeAnalyser, so I will just take that code as a starting point: https://github.com/lsbueno/alfresco/tree/master/root/projects/repository/source/java/org/alfresco/repo/search/impl/lucene/analysis Thank you

Re: Indexing and searching a DateTime range

2015-02-09 Thread Gergely Nagy
Thank you Barry, I really appreciate your time to respond, Let me clarify this a little bit more. I think it was not clear. I know how to parse dates, this is not the question here. (See my previous email: "how can I pipe my converter logic into the indexing process?") All of your solutions guys

Re: Indexing and searching a DateTime range

2015-02-09 Thread Barry Coughlan
Hi Gergely, Writing an analyzer would work but it is unnecessarily complicated. You could just parse the date from the string in your input code and index it in the LongField like this: SimpleDateFormat format = new SimpleDateFormat("-MM-dd HH:mm:ss.S'Z'"); format.setTimeZone(TimeZone.getTime

Re: Indexing and searching a DateTime range

2015-02-09 Thread Gergely Nagy
Thank you for taking your time to respond Karthik, Can you show me an example how to convert DateTime to milliseconds? I mean how can I pipe my converter logic into the indexing process? I suspect I need to write my own Analyzer/Tokenizer to achieve this. Is this correct? 2015-02-09 22:58 GMT+09

Re: Indexing and searching a DateTime range

2015-02-09 Thread KARTHIK SHIVAKUMAR
Hi Long time ago,.. I used to store datetime in millisecond . TermRangequery used to work in perfect condition Convert all datetime to millisecond and index the same. On search condition again convert datetime to millisecond and use TermRangequery. With regards Karthik On Feb 9, 2015 1:24

Re: Indexing and searching a DateTime range

2015-02-09 Thread Gergely Nagy
Thank you for the great answer Uwe! Sadly my department rejected the above combination of using Logstash + Elasticsearch. According to their experience, elastic search works fine on about 3 days of log data, but slows terribly down providing the magnitude of 3 months of data or so. But I will tak

RE: Indexing and searching a DateTime range

2015-02-09 Thread Uwe Schindler
Hi, > I am in the beginning of implementing a Lucene application which would > supposedly search through some log files. > > One of the requirements is to return results between a time range. Let's say > these are two lines in a series of log files: > 2015-02-08 00:02:06.852Z INFO... > ... > 2015

Re: Indexing

2015-01-15 Thread Erick Erickson
Basically there is a stored fork and an indexed fork. If you specify the input should be stored, a verbatim copy is put in a special segment file with the extension .fdt. This is entirely orthogonal to indexing the tokens, which are what search operates on. So you can store and index, store but n

RE: Indexing Error

2014-11-22 Thread Uwe Schindler
6 PM > To: java-user@lucene.apache.org > Subject: Re: Indexing Error > > Looks like the version of Lucene on your runtime classpath is not the same as > the one you compiled against. In some recent version they changed the > naming convention in those fields. You may want:

RE: Indexing Error

2014-11-22 Thread Uwe Schindler
Hi, This generally happens, if you have an older version of Lucene somewhere in your classpath. E.g., if older Lucene was placed outside of the webapp somewhere in the classpath of Websphere itself. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...

Re: Indexing Error

2014-11-22 Thread Robert Nikander
Looks like the version of Lucene on your runtime classpath is not the same as the one you compiled against. In some recent version they changed the naming convention in those fields. You may want: ‘LUCENE_4_7_0’. https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/util/Version.html

Re: Indexing Weighted Tags per Document

2014-10-28 Thread Ralf Bierig
The second solution sounds great and a lot more natural than payloads. I know how to overwrite the Similarity class but this one would only be called at search time and then already use the existing term frequency. Looking up the probabilities every time a search is performed is probably also

Re: Indexing Weighted Tags per Document

2014-10-28 Thread Ramkumar R. Aiyengar
There are a few approaches possible here, we had a similar use case and went for the second one below. I primarily deal with Solr, so I don't know of Lucene-only examples, but hopefully you can dig this up.. (1) You can attach payloads to each occurrence of the tag, and modify the scoring to use t

RE: indexing json

2014-09-04 Thread Uwe Schindler
4 3:20 PM > To: java-user@lucene.apache.org > Subject: Re: indexing json > > Elasticsearch does what I need, but I'd like to avoid bringing all the cluster > management bits along with it. I will take a look at siren > > thanks. > > > On Thu, Sep 4, 2014 at

Re: indexing json

2014-09-04 Thread Larry White
Elasticsearch does what I need, but I'd like to avoid bringing all the cluster management bits along with it. I will take a look at siren thanks. On Thu, Sep 4, 2014 at 8:11 AM, Marcio Napoli wrote: > Hey! > > Elasticsearch Is a good option and uses Lucene as core :) > > http://www.elasticsear

Re: indexing json

2014-09-04 Thread Marcio Napoli
Hey! Elasticsearch Is a good option and uses Lucene as core :) http://www.elasticsearch.org/overview/elasticsearch/ []s Napoli http://numere.stela.org.br 2014-09-04 7:46 GMT-03:00 Larry White : > Hi, > > Is there a way to index an entire json document automatically as one can do > with the

Re: indexing json

2014-09-04 Thread Michael Sokolov
On 9/4/2014 6:46 AM, Larry White wrote: Hi, Is there a way to index an entire json document automatically as one can do with the new PostgreSQL json support? By automatically, I mean to create an inverted index entry (path: value) for each element in the document without having to specify in adv

Re: indexing all suffixes to support leading wildcard?

2014-08-29 Thread Rob Nikander
the ngram token filter, and the a query of 512 would match by itself: >> http://lucene.apache.org/core/4_9_0/analyzers-common/org/ >> apache/lucene/analysis/ngram/NGramTokenFilter.html >> >> -- Jack Krupansky >> >> -Original Message- From: Erick Erick

Re: indexing all suffixes to support leading wildcard?

2014-08-29 Thread Rob Nikander
-- From: Erick Erickson > Sent: Thursday, August 28, 2014 11:52 PM > To: java-user > Subject: Re: indexing all suffixes to support leading wildcard? > > > The "usual" approach is to index to a second field but backwards. > See ReverseStringFilter... Then all your

Re: indexing all suffixes to support leading wildcard?

2014-08-28 Thread Jack Krupansky
: java-user Subject: Re: indexing all suffixes to support leading wildcard? The "usual" approach is to index to a second field but backwards. See ReverseStringFilter... Then all your leading wildcards are really trailing wildcards in the reversed field. Best, Erick On Thu, Aug 28,

Re: indexing all suffixes to support leading wildcard?

2014-08-28 Thread Erick Erickson
The "usual" approach is to index to a second field but backwards. See ReverseStringFilter... Then all your leading wildcards are really trailing wildcards in the reversed field. Best, Erick On Thu, Aug 28, 2014 at 10:38 AM, Rob Nikander wrote: > Hi, > > I've got some short fields (phone nu

Re: Indexing size increase 20% after switching from lucene 4.4 to 4.5 or 4.8 with BinaryDocValuesField

2014-06-17 Thread Robert Muir
Again, because merging is based on byte size, you have to be careful how you measure (hint: use LogDocMergePolicy). Otherwise you are comparing apples and oranges. Separately, your configuration is using experimental codecs like "disk"/"memory" which arent as heavily benchmarked etc as the defaul

RE: Indexing size increase 20% after switching from lucene 4.4 to 4.5 or 4.8 with BinaryDocValuesField

2014-06-14 Thread Zhao, Gang
al Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Saturday, June 14, 2014 6:27 AM To: java-user Subject: Re: Indexing size increase 20% after switching from lucene 4.4 to 4.5 or 4.8 with BinaryDocValuesField They are still encoded the same way: so likely you arent testing ap

Re: Indexing size increase 20% after switching from lucene 4.4 to 4.5 or 4.8 with BinaryDocValuesField

2014-06-14 Thread Robert Muir
They are still encoded the same way: so likely you arent testing apples to apples (e.g. different number of segments or whatever). On Fri, Jun 13, 2014 at 8:28 PM, Zhao, Gang wrote: > > > I used lucene 4.4 to create index for some documents. One of the indexing > fields is BinaryDocValuesField.

Re: Indexing integer ranges for point search

2014-06-05 Thread Michael Sokolov
It all depends on the statistics: how the ranges are correlated. If the integer range is small: from 1-2, for example, you might consider indexing every integer in each range as a separate value, especially if most documents will only have a small number of small ranges. If there are too

Re: Indexing integer ranges for point search

2014-06-05 Thread Mindaugas Žakšauskas
Hi, Continuing your example, you could do the following: Document: range1_from:1 range1_to:3 range2_from:12 range2_to:20 range3_from:13290 range3_to:16509 ... other fields... Query (for "2"): (+range1_from:[* TO 2] +range1_to:[2 TO *]) OR (+range2_from:[* TO 2] +range2_to:[2 TO *]) OR (+ran

Re: Indexing Huge tree structure represented in a Text file

2014-04-16 Thread kumagirish
hello Arjen van der Meijden if its not too much of a trouble can you point me to any sites with example implementation on Neo4j for problem similar to mine i want to check if neo4j resolve all my problems as this is new technology i need to do a lot of research and feew examples will be a good st

Re: Indexing Huge tree structure represented in a Text file

2014-04-15 Thread Arjen van der Meijden
Given that he is already using Java, simply building a object-tree based on the text file may be also possible. Although a 300MB file may turn out to be fairly large in memory consumption (possibly caused by quite a bit of object-overhead). If that turns out to consume too much memory there ar

Re: Indexing Huge tree structure represented in a Text file

2014-04-15 Thread Ivan Krišto
Hello! To me, Lucene doesn't sound as good solution to this problem. It seems to me that you need classic relational database. Storing tree structure in relational DBs isn't simple thing but this presentation will help you: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back (slides

Re: Indexing Huge tree structure represented in a Text file

2014-04-15 Thread kumagirish
Thanks Doug i have gone through SIREN DB Unfortunately i couldn't find enough examples which i could match to my requirement could you point me to any examples involving tree structure represented in text files regards, Girish Durgasi -- View this message in context: http://lucene.472066.

Re: Indexing Huge tree structure represented in a Text file

2014-04-14 Thread dturnbull
Hey You might want to check out SirenDB, set of Lucerne and Solr plugins for advanced nested/tree support. They even have a custom codec for nested docs. We've been pretty interested in it here at OpenSource Connections http://sirendb.com/ Sent from Windows Mail From: kumagiris

Re: Aw: RE: Indexing and storing very large documents

2014-03-24 Thread Alexandre Patry
März 2014 um 16:01 Uhr Von: "Uwe Schindler" An: java-user@lucene.apache.org Betreff: RE: Indexing and storing very large documents Stored fields do not support Readers at the moment. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u

Aw: RE: Indexing and storing very large documents

2014-03-24 Thread Mirko Sertic
Ah, ok, so i cannot use PostingsHighlighter as it requires stored fields, right? Regards Mirko     Gesendet: Montag, 24. März 2014 um 16:01 Uhr Von: "Uwe Schindler" An: java-user@lucene.apache.org Betreff: RE: Indexing and storing very large documents Stored fields do not support

RE: Indexing and storing very large documents

2014-03-24 Thread Uwe Schindler
Stored fields do not support Readers at the moment. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mirko Sertic [mailto:mirko.ser...@web.de] > Sent: Monday, March 24, 2014 3:03 PM > To: java-user@lucene

Re: Indexing useful N-grams (phrases & entities) and adding payloads

2014-03-12 Thread Manuel Le Normand
SynonymFilter makes sense. The planned payloads are indeed not needed. I guess a better solution would be making out of the boost an attribute during query time that will be consumed in the queryParser in order to boost these n-gram terms. Thanks for the hints. Manuel On Wed, Mar 12, 2014 at 12

Re: Indexing useful N-grams (phrases & entities) and adding payloads

2014-03-12 Thread Michael McCandless
You could also use SynonymFilter? Why does the boost need to be encoded in the index (in a payload) vs at query time when you create the TermQuery for that term? Does the boost vary depending on the surrounding context / document? Mike McCandless http://blog.mikemccandless.com On Wed, Mar 12,

Re: Indexing a document that modifies itself as it's being indexed

2014-03-11 Thread Stephen Green
Thanks, Mike. Once I was that deep in the guts of the indexer, I knew things were probably not going to go my way. I'll check out CachingTokenFilter. On Tue, Mar 11, 2014 at 3:09 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > You can't rely on how IndexWriter will iterate/consum

Re: Indexing a document that modifies itself as it's being indexed

2014-03-11 Thread Michael McCandless
You can't rely on how IndexWriter will iterate/consume those fields; that's an implementation detail. Maybe you could use CachingTokenFilter to pre-process the text fields and append the new fields? And then during indexing, replay the cached tokens, so you don't have to tokenize twice. Mike McC

Re: Indexing documents with multiple field values

2013-10-04 Thread Igor Shalyminov
Hi all! A little bit more of exploration:) After indexing with multiple atomic field values, here is what I get: indexSearcher.doc(0).getFields("gramm") stored,indexed,tokenized,termVector,omitNorms stored,indexed,tokenized,termVector,omitNorms stored,indexed,tokenized,termVector,omitNorms stor

  1   2   3   4   5   6   7   8   >