RE: Indexing multiple numeric ranges

2024-11-05 Thread Siraj Haider
That’s great! I will look into it. Thanks a lot! -Siraj -Original Message- From: Adrien Grand Sent: Tuesday, November 5, 2024 11:19 AM To: java-user@lucene.apache.org Subject: Re: Indexing multiple numeric ranges Hello Siraj, You can do this by creating a Lucene document that has 3

Re: Indexing multiple numeric ranges

2024-11-05 Thread Adrien Grand
Hello Siraj, You can do this by creating a Lucene document that has 3 org.apache.lucene.document.IntRange fields in it, one for each of the ranges that you would like to index. Lucene will then match the document if any of the ranges matches. On Tue, Nov 5, 2024 at 5:16 PM Siraj Haider wrote: >

Indexing multiple numeric ranges

2024-11-05 Thread Siraj Haider
Hello there, I want to index multiple numeric ranges in lucene index and then perform range searches on it. For examples, I want to index 3 numeric ranges 1-5, 7-10, 13-20 and then run a search with a range (i.e. 2-4) as criteria and have it return the document if searched range is part of any o

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-26 Thread Marc Davenport
on we actually saw an improvement in our overall indexing time and some performance improvements across the board. Thanks for all the feedback. Marc On Wed, Apr 24, 2024 at 9:47 AM Matt Davis wrote: > Marc, > > We also ran into this problem on updating to Lucene 9.5. We found it > suff

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-24 Thread Matt Davis
: > > > > > >> Hi Marc, > > >> > > >> You could try git bisect lucene repository to pinpoint the commit that > > >> caused what you're observing. It'll take some time to build but it's a > > >> logarithmic bisection

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-23 Thread Dawid Weiss
bserving. It'll take some time to build but it's a > >> logarithmic bisection and you'd know for sure where the problem is. > >> > >> D. > >> > >> On Thu, Apr 18, 2024 at 11:16 PM Marc Davenport > >> wrote: > >> > >>

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-22 Thread Marc Davenport
gt;> logarithmic bisection and you'd know for sure where the problem is. >> >> D. >> >> On Thu, Apr 18, 2024 at 11:16 PM Marc Davenport >> wrote: >> >> > Hi Adrien et al, >> > I've been doing some investigation today and it l

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-19 Thread Marc Davenport
looks like whatever the > > change is, it happens between 9.4.2 and 9.5.0. > > I made a smaller test set up for our code that mocks our documents and > just > > runs through the indexing portion of our code sending in batches of 4k > > documents at a time. This way I can

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-18 Thread Dawid Weiss
ote: > Hi Adrien et al, > I've been doing some investigation today and it looks like whatever the > change is, it happens between 9.4.2 and 9.5.0. > I made a smaller test set up for our code that mocks our documents and just > runs through the indexing portion of our code sendin

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-18 Thread Gautam Worah
Does your application see a lot of document updates/deletes? GITHUB#11761 could have potentially affected you. Whenever I see large indexing times, my first suspicion is towards increased merge activity. Regards, Gautam Worah. On Thu, Apr 18, 2024 at 2:14 PM Marc Davenport wrote: > Hi Adr

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-18 Thread Marc Davenport
Hi Adrien et al, I've been doing some investigation today and it looks like whatever the change is, it happens between 9.4.2 and 9.5.0. I made a smaller test set up for our code that mocks our documents and just runs through the indexing portion of our code sending in batches of 4k documents

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-17 Thread Adrien Grand
t Java 11. The quick first step of renaming packages and > importing the new libraries has gone well. I'm even seeing a nice > performance bump in our average query time. I am however seeing a dramatic > increase in our indexing time. We are indexing ~3.1 million documents each > wit

Indexing time increase moving from Lucene 8 to 9

2024-04-17 Thread Marc Davenport
a dramatic increase in our indexing time. We are indexing ~3.1 million documents each with about 100 attributes used for facet filter, and sorting; no lexical text search. Our indexing time has jumped from ~1k seconds to ~2k seconds. I have yet to profile the individual aspects of how we convert o

Re: Exception from the codec layer during indexing

2023-09-28 Thread Rahul Goswami
who's seen similar exceptions ? Or any > > insights on what might be going on? > > > > Thanks, > > Rahul > > > > On Wed, Sep 27, 2023 at 1:00 AM Rahul Goswami > wrote: > > > > > Hello, > > > On one of the servers running Solr 7.7.2,

Re: Exception from the codec layer during indexing

2023-09-28 Thread Adrien Grand
2023 at 4:49 PM Rahul Goswami wrote: > > Hi, > Following up on my issue...anyone who's seen similar exceptions ? Or any > insights on what might be going on? > > Thanks, > Rahul > > On Wed, Sep 27, 2023 at 1:00 AM Rahul Goswami wrote: > > > Hello, > &g

Re: Exception from the codec layer during indexing

2023-09-28 Thread Rahul Goswami
Hi, Following up on my issue...anyone who's seen similar exceptions ? Or any insights on what might be going on? Thanks, Rahul On Wed, Sep 27, 2023 at 1:00 AM Rahul Goswami wrote: > Hello, > On one of the servers running Solr 7.7.2, during indexing I observe 2 > different kinds

Exception from the codec layer during indexing

2023-09-26 Thread Rahul Goswami
Hello, On one of the servers running Solr 7.7.2, during indexing I observe 2 different kinds of exceptions coming from the Lucene codec layer. I can't think of an application/data issue that could be causing this. In particular, Exception-2 seems like a potential bug since it complains

Fwd: How to retain % sign against numbers in lucene indexing/ search

2023-07-13 Thread Amitesh Kumar
*Warm Regards,* *Amitesh K* -- Forwarded message - From: Amitesh Kumar Date: Wed, Jul 12, 2023 at 7:03 AM Subject: How to retain % sign against numbers in lucene indexing/ search To: Hi Group, I am facing a requirement change to get % sign retained in searches. e.g Sample

Re: Analyzer.createComponents(String fieldname) only being called once, when indexing multiple documents

2023-06-09 Thread Michael McCandless
pass that to super() when you create your custom Analyzer. However, that is generally not a great idea in general -- poor indexing throughput. Another possibility is to create a Field with a pre-analyzed TokenStream, basically bypassing Analyzer entirely and making your own TokenStream chain that

Analyzer.createComponents(String fieldname) only being called once, when indexing multiple documents

2023-06-08 Thread Usman Shaikh
boost the document by setting a PayloadAttribute with the boost amount. However I've noticed when indexing several documents at once, the createComponents method is only called the first time. For all subsequent documents execution goes straight into the incrementToken method of

Re: Can I integrate Apache Lucene with Dovecot POP3/IMAP incoming mail server to perform indexing and fast searching of email messages?

2022-08-13 Thread Uwe Schindler
Uwe Am 12.08.2022 um 08:43 schrieb Turritopsis Dohrnii Teo En Ming: Subject: Can I integrate Apache Lucene with Dovecot POP3/IMAP incoming mail server to perform indexing and fast searching of email messages? Good day from Singapore, I have a Virtual Private Server (VPS) in Germany running Virtual

Can I integrate Apache Lucene with Dovecot POP3/IMAP incoming mail server to perform indexing and fast searching of email messages?

2022-08-11 Thread Turritopsis Dohrnii Teo En Ming
Subject: Can I integrate Apache Lucene with Dovecot POP3/IMAP incoming mail server to perform indexing and fast searching of email messages? Good day from Singapore, I have a Virtual Private Server (VPS) in Germany running Virtualmin/Webmin web hosting control panel. Virtualmin uses Dovecot

Re: NRT readers and overall indexing/querying throughput

2021-08-08 Thread Robert Muir
On Tue, Aug 3, 2021 at 10:43 PM Alexander Lukyanchikov wrote: > > Maybe I have wrong expectations, and less frequent commits with NRT refresh > were not intended to improve overall performance? > > Some details about the tests - > Base implementation commits and refreshes a regular reader every se

RE: NRT readers and overall indexing/querying throughput

2021-08-08 Thread Uwe Schindler
Hi, in general, NRT indexing throughput is always a bit slower than a normal indexing as it reopens readers and needs to flush segments more often (and therefor you should use NRTCachingDirectory). So 10% slower indexing throughput is quite normal. You can improve by parallelizing, but still

NRT readers and overall indexing/querying throughput

2021-08-03 Thread Alexander Lukyanchikov
Hello everyone, We are considering switching from regular to NRT readers, hoping it would improve overall indexing/querying throughput and also optimize the turnaround time. I did some benchmarks, mostly to understand how much benefit we can get and make sure I'm implementing everything corr

lucene indexing stuck with NFS storage mount

2021-05-10 Thread peterbasut...@gmail.com
Hi all, We are indexing documents using apache lucene using several parallel indexing pipelines(java process) to NFS mounted directory. All of them follows same code and workflow most of the pipelines succeeds without any issue, but only only few indexing pipelines remains in idle and in RUN

RE: stucked indexing process

2021-05-10 Thread peterbasut...@gmail.com
Hi i am from the same team of Tamer who initiated this thread We are indexing documents using apache lucene using several parallel indexing pipelines(java process) to NFS mounted directory. All of them follows same code and workflow most of the pipelines succeeds without any issue, but only

RE: stucked indexing process

2021-05-10 Thread peterbasut...@gmail.com
Hi i am from the same team of Tamer who initiated this thread We are indexing documents using apache lucene using several parallel indexing pipelines(java process) to NFS mounted directory. All of them follows same code and workflow most of the pipelines succeeds without any issue, but only

MMapDirectory usage during indexing and search

2020-12-14 Thread baris . kazar
Hi,-  are there some examples on how to use MMapDirectory during indexing (i used the constructor to create it) and search? what are the best practices? should i repeat during search what i did during indexing for MMapDirectory i.e, use the constructor to create the MMapDirectory object by

RE: stucked indexing process

2020-10-14 Thread Uwe Schindler
Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Sachin909 > Sent: Wednesday, October 14, 2020 9:43 AM > To: java-user@lucene.apache.org > Subject: RE: stucked indexing process > > Hi Uwe, > > I have observed the similer issue with my

RE: stucked indexing process

2020-10-14 Thread Sachin909
Hi Uwe, I have observed the similer issue with my application. Application stack: "coreLoadExecutor-4-thread-1" #86 prio=5 os_prio=0 tid=0x7fbb1c364800 *nid=0x1616* runnable [0x7fbaa96ef000] java.lang.Thread.State: RUNNABLE at sun.nio.fs.UnixNativeDispatcher.stat0(Native Met

Re: Indexing & Searching Geometries ( MultiLine & MultiPolygon )

2020-10-02 Thread thturk
Thank you for your fast response. Yes i have tired this. Actually also There is directly polygon created from geojson recommend me to do dame. Because its returns Polygon Array But is it the Most efficient method if indexing spatial data ? Ana same For MultiLine They are also type of Line

Re: Indexing & Searching Geometries ( MultiLine & MultiPolygon )

2020-10-02 Thread Ignacio Vera
eated search queries for > those indexes .But i can not understand how i will index Other Geometry > Types is there any documents or code examples for Lucene Spatial Indexing > I > have seen Component2D but There is only InMemeory search as i understand. > > Lucene 8.6.0 &g

Indexing & Searching Geometries ( MultiLine & MultiPolygon )

2020-10-02 Thread thturk
Other Geometry Types is there any documents or code examples for Lucene Spatial Indexing I have seen Component2D but There is only InMemeory search as i understand. Lucene 8.6.0 Jdk 12 -- Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

Re: Simultaneous Indexing and searching

2020-09-09 Thread Christoph Kaser
hat are then replicated to the frontend server indexes. Best regards Christoph On 01.09.2020 08:28, Richard So wrote: Hi there, I am beginner for using Lucene especially in the area of Indexing and searching simultaneously. Our environment is that we have several webserver for the search fr

Re: Simultaneous Indexing and searching

2020-09-02 Thread Matt Davis
as your > > indexer and responds to queries from the web server(s)? > > > > On Tue, Sep 1, 2020 at 11:13 AM Richard So > > wrote: > > > > > > Hi there, > > > > > > I am beginner for using Lucene especially in the area of Indexing and >

Re: Simultaneous Indexing and searching

2020-09-02 Thread Alex K
rvice that runs on the same box as your > indexer and responds to queries from the web server(s)? > > On Tue, Sep 1, 2020 at 11:13 AM Richard So > wrote: > > > > Hi there, > > > > I am beginner for using Lucene especially in the area of Indexing and > searching

Re: Simultaneous Indexing and searching

2020-09-01 Thread Michael Sokolov
ver(s)? On Tue, Sep 1, 2020 at 11:13 AM Richard So wrote: > > Hi there, > > I am beginner for using Lucene especially in the area of Indexing and > searching simultaneously. > > Our environment is that we have several webserver for the search front-end > that submit sear

Simultaneous Indexing and searching

2020-09-01 Thread Richard So
Hi there, I am beginner for using Lucene especially in the area of Indexing and searching simultaneously. Our environment is that we have several webserver for the search front-end that submit search request and also a backend server that do the full text indexing; whereas the index files are

Re: N-dimensional Point Indexing

2018-11-14 Thread Adrien Grand
If you need them for scoring, then the natural choice would be to encode them in a BinaryDocValuesField. How do you plan to filter on these filter vectors? This is too many dimensions for points and doc values are not good at filtering. On Thu, Oct 18, 2018 at 2:32 AM Ken Krugler wrote: > > I’ve

Re: N-dimensional Point Indexing

2018-10-17 Thread Ken Krugler
I’ve been looking at directly storing feature vectors and providing scoring/filtering support. This is for vectors consisting of (typically 300 - 2048) floats or doubles. It’s following the same pattern as geospatial support - so a new field type and query/parser, plus plumbing to hook it into

Re: Indexing fails on the way

2018-04-11 Thread neotorand
Thanks for Bringing I will post it there Regards neo -- Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-

Re: Indexing fails on the way

2018-04-11 Thread Adrien Grand
Hi Neo, You will likely find better help on the solr-user mailing-list. This mailing list is for questions about Lucene. Le mer. 11 avr. 2018 à 12:21, neotorand a écrit : > with Solrcloud What happens if indexing is partially completed and ensemble > goes down.What are the ways to Res

Indexing fails on the way

2018-04-11 Thread neotorand
with Solrcloud What happens if indexing is partially completed and ensemble goes down.What are the ways to Resume.In one of the scenario i am using 3 ZK Node in ensemble.Lets say i am indexing 5 million data and i have partially indexed the data and ZK ensemble goes down. What should be the best

Re: N-dimensional Point Indexing

2018-02-26 Thread Luís Filipe Nassif
Thank you, Adrian. Em 26 de fev de 2018 21:19, "Adrien Grand" escreveu: > Yes it is. > > Le mar. 27 févr. 2018 à 00:03, Luís Filipe Nassif a > écrit : > >> Hi Lucene community, >> >> Is BinaryPoint limited up to 8 dimensions? >> >> Thanks, >> Luis >> >> Em 6 de fev de 2018 16:07, "Luís Filipe N

Re: N-dimensional Point Indexing

2018-02-26 Thread Adrien Grand
Yes it is. Le mar. 27 févr. 2018 à 00:03, Luís Filipe Nassif a écrit : > Hi Lucene community, > > Is BinaryPoint limited up to 8 dimensions? > > Thanks, > Luis > > Em 6 de fev de 2018 16:07, "Luís Filipe Nassif" > escreveu: > > Is it limited up to 8 dimensions as described at > https://www.elas

Re: N-dimensional Point Indexing

2018-02-26 Thread Luís Filipe Nassif
Hi Lucene community, Is BinaryPoint limited up to 8 dimensions? Thanks, Luis Em 6 de fev de 2018 16:07, "Luís Filipe Nassif" escreveu: Is it limited up to 8 dimensions as described at https://www.elastic.co/blog/lucene-points-6.0? 2018-02-06 15:35 GMT-02:00 Luís Filipe Nassif : > Sorry, I wa

Re: N-dimensional Point Indexing

2018-02-06 Thread Luís Filipe Nassif
Is it limited up to 8 dimensions as described at https://www.elastic.co/blog/lucene-points-6.0? 2018-02-06 15:35 GMT-02:00 Luís Filipe Nassif : > Sorry, I was looking at the wrong place. Should I use BinaryPoint ( > https://lucene.apache.org/core/6_0_0/core/org/apache/ > lucene/document/BinaryPoi

Re: N-dimensional Point Indexing

2018-02-06 Thread Luís Filipe Nassif
Sorry, I was looking at the wrong place. Should I use BinaryPoint ( https://lucene.apache.org/core/6_0_0/core/org/apache/lucene/document/BinaryPoint.html) ? 2018-02-06 14:17 GMT-02:00 Luís Filipe Nassif : > Hi all, > > Lucene is able to index generic n-dimensional points for efficient > similarit

N-dimensional Point Indexing

2018-02-06 Thread Luís Filipe Nassif
Hi all, Lucene is able to index generic n-dimensional points for efficient similarity or nearest neightbors search? I have looked at spatial package in the past but seems it is specific to geo points? The use case is to index image feature vectors to search for similar images in a corpus. Current

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Rob Audenaerde
2k documents, which is tiny. Plus it should try to > better replicate production workload, otherwise we will draw wrong > conclusions. > > I also suspect something is not quite right in your indexing code. When I > look at the IW logs, 562 out of the 642 flushes only write 1 documen

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Adrien Grand
Hi Rob, I don't think your benchmark is good. If I read it correctly, it only indexes between 21k and 22k documents, which is tiny. Plus it should try to better replicate production workload, otherwise we will draw wrong conclusions. I also suspect something is not quite right in your ind

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Rob Audenaerde
-Rob > > On Mon, Jan 29, 2018 at 12:18 PM, Uwe Schindler wrote: > >> Hi, >> >> How often do you commit? If you index the data initially (that's the case >> where indexing needs to be fast), one would call commit at the end of the >> whole job, so the actu

Re: indexing performance 6.6 vs 7.1

2018-01-29 Thread Rob Audenaerde
Hi, > > How often do you commit? If you index the data initially (that's the case > where indexing needs to be fast), one would call commit at the end of the > whole job, so the actual time it takes is not so important. > > If you have a system where the index is updated all

RE: indexing performance 6.6 vs 7.1

2018-01-29 Thread Uwe Schindler
Hi, How often do you commit? If you index the data initially (that's the case where indexing needs to be fast), one would call commit at the end of the whole job, so the actual time it takes is not so important. If you have a system where the index is updated all the time, then of c

Re: indexing performance 6.6 vs 7.1

2018-01-29 Thread Rob Audenaerde
> > Siiih. > > > > On Thu, Jan 18, 2018 at 9:18 AM, Adrien Grand wrote: > > If you have sparse data, I would have expected index time to *decrease*, > > not increase. > > > > Can you enable the IW info stream and share flush + merge times to see

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Erick Erickson
ble the IW info stream and share flush + merge times to see > where indexing time goes? > > If you can run with a profiler, this might also give useful information. > > Le jeu. 18 janv. 2018 à 11:23, Rob Audenaerde a > écrit : > >> Hi all, >> >> We

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Adrien Grand
If you have sparse data, I would have expected index time to *decrease*, not increase. Can you enable the IW info stream and share flush + merge times to see where indexing time goes? If you can run with a profiler, this might also give useful information. Le jeu. 18 janv. 2018 à 11:23, Rob

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Robert Muir
Erick I don't think solr was mentioned here. On Thu, Jan 18, 2018 at 8:03 AM, Erick Erickson wrote: > My first question is always "are you running the Solr CPUs flat out?". > My guess in this case is that the indexing client is the same and the > problem is in Solr,

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Erick Erickson
My first question is always "are you running the Solr CPUs flat out?". My guess in this case is that the indexing client is the same and the problem is in Solr, but it's worth checking whether the clients are just somehow not delivering docs as fast as they were before. My suspic

indexing performance 6.6 vs 7.1

2018-01-18 Thread Rob Audenaerde
Hi all, We recently upgraded from Lucene 6.6 to 7.1. We see a significant drop in indexing performace. We have a-typical use of Lucene, as we (also) index some database tables and add all the values as AssociatedFacetFields as well. This allows us to create pivot tables on search results really

Re: Spatial Indexing of Polygons

2017-08-15 Thread David Smiley
looking into the spatial3D api and it appears > that there may be some ability to do this (storing multiple points under > same geo3dpoint field) but it doesn't seem to be well documented if it > exists. Is there a recommended way to support indexing and searching of > polygons (build

Spatial Indexing of Polygons

2017-08-14 Thread Tom Hirschfeld
ed if it exists. Is there a recommended way to support indexing and searching of polygons (building footprint sized polygons, not huge ones)? If so what is the currently recommended API to use? We are currently thinking about using the s2cell library from google. Best, Tom Hirschfeld

Re: stucked indexing process

2017-07-12 Thread Tamer Gur
thanks a lot for the "hack" and jstack suggestion Uwe i will try them. Unfortunately we are in the NFS mount since we don't have other choices. also might be related, in the cluster(computing farm) we are indexing parallel several size of different datasets and most them are i

RE: stucked indexing process

2017-07-12 Thread Uwe Schindler
che.org; Uwe Schindler > Subject: Re: stucked indexing process > > thanks Uwe for reply. we are indexing data in a cluster where there are > many mount points so it is possible that one them has issue or slowness > when this check first tried but now when i execute "mount" it

Re: stucked indexing process

2017-07-12 Thread Tamer Gur
thanks Uwe for reply. we are indexing data in a cluster where there are many mount points so it is possible that one them has issue or slowness when this check first tried but now when i execute "mount" it is responding all the mount points. I was wondering is there any configurati

RE: stucked indexing process

2017-07-12 Thread Uwe Schindler
ttp://www.thetaphi.de/> http://www.thetaphi.de eMail: u...@thetaphi.de From: Tamer Gur [mailto:t...@ebi.ac.uk] Sent: Wednesday, July 12, 2017 12:29 PM To: java-user@lucene.apache.org Subject: stucked indexing process Hi all, we are having an issue in our indexing pipeline time to ti

Random Index Corruption exceptions during bulk indexing

2017-06-08 Thread simon
.1). It did not happen with Solr 5.4 (which i can't go back to). Oddly enough, I ran Solr 6.3.0 unvenetfully for several weeks before this problem first occurred. LuceneMatchVersion in solrconfig.xml is set to 6.3.0 Standalone (non cloud) environment. Our indexing subsystem is a complex

Indexing strategies for metadata fields

2017-05-24 Thread José Tomás Atria
Hello all, I'm trying to come up with a reasonable indexing strategy for my document's metadata, and I'm seeing some weird undocumented behaviours. My original approach was to build fields like these: FieldType ft = new FieldType(); ft.setDocValuesType( DocVa

Re: Indexing a Date/DateTime/Time field in Lucene 4

2017-04-08 Thread KARTHIK SHIVAKUMAR
ing some conflicting suggestions concerning the type of field to use > for indexing a Date/DateTime/Time value. > > Some suggest conversion using DateTools.timeToString() and using a > StringField, > while others suggest using the long value of getTime() and using a > LongField (th

Re: Indexing Numeric value in Lucene 4.10.4

2017-04-07 Thread aravinth thangasami
; > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: aravinth thangasami [mailto:aravinththangas...@gmail.com] > > Sent: Friday, April 7, 2017 8:54 AM >

RE: Indexing Numeric value in Lucene 4.10.4

2017-04-07 Thread Uwe Schindler
phi.de eMail: u...@thetaphi.de > -Original Message- > From: aravinth thangasami [mailto:aravinththangas...@gmail.com] > Sent: Friday, April 7, 2017 8:54 AM > To: java-user@lucene.apache.org > Subject: Re: Indexing Numeric value in Lucene 4.10.4 > > we don't have to sort o

Re: Indexing Numeric value in Lucene 4.10.4

2017-04-06 Thread aravinth thangasami
gt; > On Thu, Apr 6, 2017 at 6:32 AM, aravinth thangasami > wrote: > > Hi all, > > > > I'm searching numeric value and will not perform range query on that > field > > I thought of indexing it as String field instead of NumericField > > so that it will impr

Re: Indexing Numeric value in Lucene 4.10.4

2017-04-06 Thread Erick Erickson
Thu, Apr 6, 2017 at 6:32 AM, aravinth thangasami wrote: > Hi all, > > I'm searching numeric value and will not perform range query on that field > I thought of indexing it as String field instead of NumericField > so that it will improve indexing time by avoiding numeric tries &

Indexing Numeric value in Lucene 4.10.4

2017-04-06 Thread aravinth thangasami
Hi all, I'm searching numeric value and will not perform range query on that field I thought of indexing it as String field instead of NumericField so that it will improve indexing time by avoiding numeric tries What are your opinions on this? Kind regards, Aravinth

RE: Indexing a Date/DateTime/Time field in Lucene 4

2017-04-05 Thread Uwe Schindler
n Hoyweghen > [mailto:frederik.vanhoyweg...@chapoo.com] > Sent: Wednesday, April 5, 2017 3:17 PM > To: java-user@lucene.apache.org > Subject: Re: Indexing a Date/DateTime/Time field in Lucene 4 > > Let's say I want to search between 2 dates, search for a date that's > before/after a

Re: Indexing a Date/DateTime/Time field in Lucene 4

2017-04-05 Thread Frederik Van Hoyweghen
erik.vanhoyweg...@chapoo.com> a écrit : > > > Hey everyone, > > > > I'm seeing some conflicting suggestions concerning the type of field to > use > > for indexing a Date/DateTime/Time value. > > > > Some suggest conversion using DateTools.timeToString() and

Re: Indexing a Date/DateTime/Time field in Lucene 4

2017-04-05 Thread Adrien Grand
< frederik.vanhoyweg...@chapoo.com> a écrit : > Hey everyone, > > I'm seeing some conflicting suggestions concerning the type of field to use > for indexing a Date/DateTime/Time value. > > Some suggest conversion using DateTools.timeToString() and using a > StringField,

Indexing a Date/DateTime/Time field in Lucene 4

2017-04-05 Thread Frederik Van Hoyweghen
Hey everyone, I'm seeing some conflicting suggestions concerning the type of field to use for indexing a Date/DateTime/Time value. Some suggest conversion using DateTools.timeToString() and using a StringField, while others suggest using the long value of getTime() and using a LongField (th

Deleting document from Lucene indexing not working in version 34

2017-02-22 Thread har...@oneit.com.au
Hi All, I am using lucene version 3.4.0 and StandardAnalyzer. I am trying to delete the document using document id from indexing but it seems not working. I tried many diff ways as follow: IndexReader.deleteDocuments(new Term("documentDocId", "LuceneObj:215487")) IndexWriter

Re: how do i improve Indexing and Searching performance of 2 billion documents over SolrCloud

2017-02-14 Thread Duke DAI
t; > Hi, we have 4 solr instances running > > > > we are using solr cloud for indexing hbase table column names. > > each column in hbase will end up as a document in solr, which resulted in > > over 2 billion documents in solr. > > primary goal is to search the co

Re: how do i improve Indexing and Searching performance of 2 billion documents over SolrCloud

2017-02-14 Thread Adrien Grand
This list is for users of the Lucene Java API, maybe try solr-user instead? Le lun. 13 févr. 2017 à 21:24, yeshwanth kumar a écrit : > Hi, we have 4 solr instances running > > we are using solr cloud for indexing hbase table column names. > each column in hbase will end up as a docu

how do i improve Indexing and Searching performance of 2 billion documents over SolrCloud

2017-02-13 Thread yeshwanth kumar
Hi, we have 4 solr instances running we are using solr cloud for indexing hbase table column names. each column in hbase will end up as a document in solr, which resulted in over 2 billion documents in solr. primary goal is to search the column names. we have 4 shards for the collection, queries

Re: Indexing architecture

2017-01-04 Thread suriya prakash
Hi, Any better architecture ideas for my below mentioned use case? Regards, Suriya On Wed, 28 Dec 2016 at 11:27 PM, suriya prakash wrote: > Hi, > > I have 100 thousand indexes in Hadoop grid because 90% of my indexes will > be inactive and I can distribute the other active indexes based on loa

Indexing architecture

2016-12-28 Thread suriya prakash
Hi, I have 100 thousand indexes in Hadoop grid because 90% of my indexes will be inactive and I can distribute the other active indexes based on load. Scoring will work better for each index but I don't worry about it now. What are the optimisations I need to do to Scale better? I do commit ever

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Michael McCandless
So when a query arrives, you know the query is only allowed to match either module:1 (analyzed terms) or module:2 (not analyzed) but never both? If so, you should be fine. Though relevance will be sort of wonky, in case that matters, because you are polluting the unique term space; you would get

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Michael McCandless
You can do this, Lucene will let you, but it's typically a bad idea for search relevance because some documents will return only if you search for precisely the same whole token, others if you search for an analyzed token, giving the user a broken experience. Mike McCandless http://blog.mikemcca

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Kumaran Ramasubramanian
​Hi All, ​ Can anyone say, is it advisable to have index with both analyzed and not_analyzed values in one field? ​Use case: i have custom fields in my product which can be configured differently ( ANALYZED and NOT_ANALYZED ) in different modules -- Kumaran R On Wed, Oct 26, 2016 at 12:0

Re: Indexing values of different datatype under same field

2016-11-04 Thread Kumaran Ramasubramanian
e able to index and retrieve desired results but, > > *We could not find Lucene (5.3.1) documentation around this behavior.* > Please comment on, > 1. If we can go with this behavior and what would be the performance > implication of indexing and querying different datatype under same f

Indexing values of different datatype under same field

2016-11-03 Thread Rajnish kamboj
ior.* Please comment on, 1. If we can go with this behavior and what would be the performance implication of indexing and querying different datatype under same field? 2. How the two are stored internally (i.e. different datatype under same field)? 2. If we upgrade to new Lucene version 5.4 or to major

indexing analyzed and not_analyzed values in same field

2016-10-25 Thread Kumaran Ramasubramanian
Hi All, i have indexed 4 documents in an index where BANKNAME field is analyzed in two documents and it is not_analyzed in another two documents. i have mentioned search cases below where i am able to search using both analyzed ( using classic analyzer ) and not_analyzed ( using keyword analyzer )

Re: Approach for indexing and queryin good volume data.

2016-09-08 Thread lukes
Hi all, Can anyone please respond ? Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Approach-for-indexing-and-queryin-good-volume-data-tp4295109p4295218.html Sent from the Lucene - Java Users mailing list archive at Nabble.com

Approach for indexing and queryin good volume data.

2016-09-07 Thread lukes
Hi all, I am planning to use Lucene(not in cluster) for indexing and querying good volume data. Use case is, 10-20 documents / second(roughly around 15-20 fields) and in parallel doing query. Below is the approach i am planning to take, can anyone please let me know from their past experience if

indexing array fields

2016-09-03 Thread Cam Bazz
Hello, I need to index arrays of long, usually of long[20], 20 in length. Its been a while since I worked with lucene, last time was probably < version 3. I read https://lucene.apache.org/core/6_2_0/core/org/apache/lucene/document/Field.html There are SortedDocValuesField and SortedSetDocValuesF

Re: BufferedUpdateStreams breaks high performance indexing

2016-08-04 Thread Michael McCandless
; Yes IndexWriterConfig is changed from default: > >>>> > >>>> 8 > >>>> 1024 > >>>> -1 > >>>> > >>>> 8 > >>>> 100 > >>>> 512 >

Re: BufferedUpdateStreams breaks high performance indexing

2016-08-04 Thread Bernd Fehling
>>>> 8 >>>> >>> class="org.apache.lucene.index.ConcurrentMergeScheduler"/> >>>> ${solr.lock.type:native} >>>> ... >>>> >>>> >>>> A unique id as example: "ftoxfordilej:ar.1770.

Re: BufferedUpdateStreams breaks high performance indexing

2016-07-29 Thread Michael McCandless
native} > >> ... > >> > >> > >> A unique id as example: "ftoxfordilej:ar.1770.x.x.13.x.x.u1" > >> Somewhere between 20 and 50 characters in length. > >> > >> Thanks for your help, > >> Bernd > >> > >

Re: BufferedUpdateStreams breaks high performance indexing

2016-07-29 Thread Bernd Fehling
gt; Somewhere between 20 and 50 characters in length. >> >> Thanks for your help, >> Bernd >> >> >> Am 28.07.2016 um 15:35 schrieb Michael McCandless: >>> Hmm not good. >>> >>> If you are really only adding documents, you should be using >&

Re: Indexing and storing Long fields

2016-07-28 Thread Kumaran Ramasubramanian
Ok mike.. thanks for the explanation... i have another doubt... i read in some article like, we can have one storedfield & docvalue field with same field... is it so? -- Kumaran R On Thu, Jul 28, 2016 at 9:29 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > OK, sorry, you cann

Re: Indexing and storing Long fields

2016-07-28 Thread Michael McCandless
OK, sorry, you cannot change how the field is indexed for the same field name across different field indices. Lucene will "downgrade" that field to the lowest settings, e.g. "docs, no positions" in your case. Mike McCandless http://blog.mikemccandless.com On Thu, Jul 28, 2016 at 9:31 AM, Kumara

  1   2   3   4   5   6   7   8   9   10   >