Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-16 Thread Navneet Verma
Hi Uwe, Thanks for the prompt response. I have created the gh issue: https://github.com/apache/lucene/issues/13920 for more discussion. We can move all discussions to the gh issues. Thanks Navneet On Tue, Oct 15, 2024 at 3:17 AM Uwe Schindler wrote: > Hi, > > The problem with your aproach is th

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-15 Thread Uwe Schindler
Hi, The problem with your aproach is that you can change the madvise on a clone, but as the underlying memory is the same for the cloned index input, it won't revert back to RANDOM. Basically there's no need to clone or create a slice. We should better allow to change the advise for an Index

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-15 Thread Navneet Verma
Hi Uwe, *>> thinking about it a bit more: In 10.x we already have some ways to **preload data with WILL_NEED (or similar). Maybe this can also be used on **merging when we reuse an already open IndexInput. Maybe it is possible **to change the madvise on an already open IndexInput and change it **b

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Navneet Verma
Hi Uwe, Thanks for sharing the link and providing the useful information. I will definitely go ahead and create a gh issue. In the meantime I did some testing by changing the IOContext from RANDOM to READ for FlatVectors

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
Hi, thinking about it a bit more: In 10.x we already have some ways to preload data with WILL_NEED (or similar). Maybe this can also be used on merging when we reuse an already open IndexInput. Maybe it is possible to chanhge the madvise on an already open IndexInput and change it before merg

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
Hi, great. I still think the difference between RANDOM and READ is huge in your case. Are you sure that you have not misconfigured your system. The most important thing for Lucene is to make sure that heap space of the Java VM is limited as much as possible (shortly over the OOM boundary) and

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Navneet Verma
Hi Uwe, To ans your question about the RAM and heap size. Here are some details RAM: 128GB Heap: 32GB CPU: 16 This is where I will put some reproducible benchmarks using Lucene alone. I have currently used Opensearch 2.17 version to run these benchmarks. *In general, the correct fix for this is

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
Hi, this seems to be aspecial case in FlatVectors, because normally theres a separate method to open an IndexInput for checksumming: https://github.com/apache/lucene/blob/524ea208c870861a719f21b1ea48943c8b7520da/lucene/core/src/java/org/apache/lucene/store/Directory.java#L155-L157 Could you o

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-30 Thread Navneet Verma
Hi Uwe and Mike, Thanks for providing such a quick response. Let me try to ans few things here: *In addition, inLucene 9.12 (latest 9.x) version released today there are some changesto ensure that checksumming is always done with IOContext.READ_ONCE(which uses READ behind the scenes).* I didn'

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-30 Thread Uwe Schindler
Hi, please also note: In Lucene 10 there checksum IndexInput will always be opened with IOContext.READ_ONCE. If you want to sequentially read a whole index file for other reasons than checksumming, please pass the correct IOContext. In addition, in Lucene 9.12 (latest 9.x) version released t

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-29 Thread Michael McCandless
Hi Navneet, With RANDOM IOcontext, on modern OS's / Java versions, Lucene will hint the memory mapped segment that the IO will be random using madvise POSIX API with MADV_RANDOM flag. For READ IOContext, Lucene maybe hits with MADV_SEQUENTIAL, I'm not sure. Or maybe it doesn't hint anything? It'

Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-29 Thread Navneet Verma
Hi Lucene Experts, I wanted to understand the performance difference between opening and reading the whole file using an IndexInput with IoContext as RANDOM vs READ. I can see .vec files(storing the flat vectors) are opened with RANDOM and whereas dvd files are opened as READ. As per my testing wi

Re: Getting LinkageError due to Panama APIs

2023-06-30 Thread Uwe Schindler
Hi, It is not obvious what you have done, but the issue may come from custom builds, e.g., if you are not using the original Lucene JAR file but a modified one. Another reason may be Maven Shade plugin or other assemblies like Uber-JARs! Make sure that all class files and module information

Re: Getting LinkageError due to Panama APIs

2023-06-29 Thread Shubham Chaudhary
This was an internal build issue that is now fixed. Sorry for the confusion. Thanks, Shubham On Tue, Jun 27, 2023 at 12:48 AM Shubham Chaudhary wrote: > Hi everyone, > > I’m trying to build and run my software using JDK 19 which has a direct > dependency on Apache Lucene 9.6 built with JDK 17 a

Getting LinkageError due to Panama APIs

2023-06-26 Thread Shubham Chaudhary
Hi everyone, I’m trying to build and run my software using JDK 19 which has a direct dependency on Apache Lucene 9.6 built with JDK 17 and I’m running into below exception due to Panama APIs. Is this expected behaviour? Any help would be highly appreciated. Exception in thread "main" java.lang.Li

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-09 Thread Michael McCandless
I'd also love to understand this: > using SimpleFSDirectoryFactory (since Mmap doesn't quite work well on Windows for our index sizes which commonly run north of 1 TB) Is this a known problem on certain versions of Windows? Normally memory mapped IO can scale to very large sizes (well beyond s

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-07 Thread Adrien Grand
I agree it's worth discussing. I opened https://github.com/apache/lucene/issues/12355 and https://github.com/apache/lucene/issues/12356. On Tue, Jun 6, 2023 at 9:17 PM Rahul Goswami wrote: > > Thanks Adrien. I spent some time trying to understand the readByte() in > ReverseRandomAccessReader (thr

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Rahul Goswami
Thanks Adrien. I spent some time trying to understand the readByte() in ReverseRandomAccessReader (through FST) and compare with 7.x. Although I don't understand ALL of the details and reasoning for always loading the FST (and in turn the term index) off-heap (as discussed in https://github.com/ap

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Adrien Grand
Yes, this changed in 8.x: - 8.0 moved the terms index off-heap for non-PK fields with MMapDirectory. https://github.com/apache/lucene/issues/9681 - Then in 8.6 the FST was moved off-heap all the time. https://github.com/apache/lucene/issues/10297 More generally, there's a few files that are no l

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Rahul Goswami
Thanks Adrien. Is this behavior of FST something that has changed in Lucene 8.x (from 7.x)? Also, is the terms index not loaded into memory anymore in 8.x? To your point on MMapDirectoryFactory, it is much faster as you anticipated, but the indexes commonly being >1 TB makes the Windows machine fr

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Adrien Grand
+Alan Woodward helped me better understand what is going on here. BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory) doesn't play well with the fact that the FST reads bytes backwards: every call to readByte() triggers a refill of 1kB because it wants to read the byte that is just be

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Adrien Grand
My best guess based on your description of the issue is that SimpleFSDirectory doesn't like the fact that the terms index now reads data directly from the directory instead of loading the terms index in heap. Would you be able to run the same benchmark with MMapDirectory to check if it addresses th

Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-05 Thread Rahul Goswami
Hello, We started experiencing slowness with atomic updates in Solr after upgrading from 7.7.2 to 8.11.1. Running several tests revealed the slowness to be in RealTimeGet's SolrIndexSearcher.getFirstMatch() call which eventually calls Lucene's SegmentTermsEnum.seekExact().. In the benchmarks I ran

Re: Getting all values for a specific dimension for SortedSetDocValues per document

2022-07-01 Thread Harald Braumann
Hi! Thanks a lot for your help. I will try both of you suggestions (taxo index and per-segment ord ranges). Thanks for clarifying, that I have to iterate the ords. I wasn't sure, if I didn't just overlook something obvious. Like some way to do an advanceExact on ords. Regards harry On 01.

Re: Getting all values for a specific dimension for SortedSetDocValues per document

2022-07-01 Thread Greg Miller
To address the last topic (building up ordinal ranges per-segment), what I'm thinking is that you'd iterate all unique ordinals in the SSDV field and "memorize" the ordinal range for each dimension up-front, but on a per-segment basis. This would be very similar to what DefaultSortedSetDocValuesRea

Re: Getting all values for a specific dimension for SortedSetDocValues per document

2022-07-01 Thread Harald Braumann
Hi! On 01.07.22 00:46, Greg Miller wrote: Have you considered taxonomy faceting for your use-case? Because the taxonomy structure is maintained in a separate index, it's (relatively) trivial to iterate all direct child ordinals of a given dimension. The cost of mapping to a global ordinal space

Re: Getting all values for a specific dimension for SortedSetDocValues per document

2022-06-30 Thread Greg Miller
Hi Harry- Have you considered taxonomy faceting for your use-case? Because the taxonomy structure is maintained in a separate index, it's (relatively) trivial to iterate all direct child ordinals of a given dimension. The cost of mapping to a global ordinal space is done when the index is merged.

Getting all values for a specific dimension for SortedSetDocValues per document

2022-06-30 Thread Harald Braumann
Hi! I'm looking for a solution for the following problem: I would like to get all the values for a specific dimension for SortedSetDocValues per document. I've basically copied SortedSetDocValuesFacetCounts, but instead of just counting, I build a map from doc to values. The problem here is,

Re: How to change sorting *after* getting search results

2021-11-30 Thread Luís Filipe Nassif
one of the various > Rescorers. Have you looked at those? > > On Tue, Nov 30, 2021, 9:15 AM Luís Filipe Nassif > wrote: > >> Hi Lucene community, >> >> Our users could do very heavy searches and they are able to change the >> sorting criteria multiple times after ge

Re: How to change sorting *after* getting search results

2021-11-30 Thread Michael Sokolov
ery heavy searches and they are able to change the > sorting criteria multiple times after getting the results. We collect all > of them, this is important for our use case, disabling scoring if the > result size is too large to make the search faster. Currently we have our > own multi-thre

How to change sorting *after* getting search results

2021-11-30 Thread Luís Filipe Nassif
Hi Lucene community, Our users could do very heavy searches and they are able to change the sorting criteria multiple times after getting the results. We collect all of them, this is important for our use case, disabling scoring if the result size is too large to make the search faster. Currently

I am getting an exception in ComplexPhraseQueryParser when fuzzy searching

2021-11-12 Thread Shifflett, David [USA]
I am using Lucene 8.2, but have also verified this on 8.9 and 8.10.1. My query string is either ""by~1 word~1"", or ""ky~1 word~1"". I am looking for a phrase of these 2 words, with potential 1 character misspelling, or fuzziness. I realize that 'by' is usually a stop word, that is why I also tes

I am getting an exception in ComplexPhraseQueryParser when fuzzy searching

2021-11-01 Thread Shifflett, David [USA]
I am using Lucene 8.2, but have also verified this on 8.9 and 8.10.1. My query string is either ""by~1 word~1"", or ""ky~1 word~1"". I am looking for a phrase of these 2 words, with potential 1 character misspelling, or fuzziness. I realize that 'by' is usually a stop word, that is why I also test

Re: Getting a MaxBytesLengthExceededException for a TextField

2019-10-25 Thread Erick Erickson
t 4:28 AM, Marko Ćurlin > wrote: > > Hi everyone, > > I am getting an > org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException, while > trying to insert a list with 9 elements, of which one is 242905 bytes long, > into Solr. I am aware that StrField has a

Getting a MaxBytesLengthExceededException for a TextField

2019-10-25 Thread Marko Ćurlin
Hi everyone, I am getting an org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException, while trying to insert a list with 9 elements, of which one is 242905 bytes long, into Solr. I am aware that StrField has a hard limit of slightly less than 32k. I am using a TextField that by my

Getting Matched Text and Field for a Document

2019-09-20 Thread aravinth thangasami
other approaches for getting the matched query from the results? ScoreDocExtended scoreDocExtended = new ScoreDocExtended(doc + docBase , score); Object[] queries = getQueries(doc , baseScorer); scoreDocExtended.setQueriesMatched(queries); pq.add

Re: Getting Exception : java.nio.channels.ClosedByInterruptException

2019-04-01 Thread Robert Muir
x.IndexWriter.startCommit(IndexWriter.java:4739) > > at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3281) > > at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3449) > > at org.apache.lucene.index.IndexWriter.commit(Inde

Getting Exception : java.nio.channels.ClosedByInterruptException

2019-03-31 Thread Chellasamy G
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3449) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3414) After this exception I am getting the below exception subsequently while trying to commit the index, org.apache.lucene.store.AlreadyClosedException

Getting list of actual fields in index

2019-02-01 Thread Vladimir Kroz
I'm working on a tool that lists all fields in the index, including explicit list of all dynamic fields. I tried GET `/solr/mycore/schema/fields` Schema API however without any luck. On the other hand there is undocumented API used by Solr UI: GET `/solr/mycore/admin/luke?wt=json` which among other

Re: Not getting desired result through TermQuery

2018-07-18 Thread baris . kazar
), BooleanClause.Occur.MUST); Best regards On 7/18/18 3:22 PM, Michael Sokolov wrote: It's impossible to tell for sure from the info you provided -- attachments are not included in messages on this mailing list - but my guess is that when you use the QueryParser api you are getting a query

Re: Not getting desired result through TermQuery

2018-07-18 Thread Michael Sokolov
It's impossible to tell for sure from the info you provided -- attachments are not included in messages on this mailing list - but my guess is that when you use the QueryParser api you are getting a query that has the benefit of text processing using an Analyzer (lower-casing and other

Not getting desired result through TermQuery

2018-07-16 Thread Ashish Parab
Hi Team, I am new to Lucene and been exploring Lucene 7.4.0 for past few months. please take a look at attached IndexFiles.java, in which I am indexing all the files present under specified folder. also for every document, a field namely ashish is stored. now the indexing is ready, I am trying

Re: getting Lucene Docid from inside score()

2018-03-10 Thread Erick Erickson
I was thinking this was a Solr question rather than a Lucene one so the [docid] bit doesn't apply if you're in the lucene code. If you _are_ really going from solr, just put [docid] in your Solr "fl" list. Look in the Solr ref guide for an explanation: https://lucene.apache.org/solr/guide/6_6/trans

Re: getting Lucene Docid from inside score()

2018-03-10 Thread dwaipayan . roy
Hi Erick, Many thanks for your reply and explanation. I really want this to work. The good news for me is, the index is static, there is no chance of any modification of the index. > Luke and the like are using a point-in-time snapshot of the index. I want to get that lucene-assigned docid, th

Re: getting Lucene Docid from inside score()

2018-03-09 Thread Erick Erickson
You almost certainly do _not_ want this unless you are absolutely and totally sure that your index does not change between the time you ask for for the internal Lucene doc ID and the time you use it. No docs may be added. No forceMerges are done. In fact, I'd go so far as to say you shouldn't open

Re: getting Lucene Docid from inside score()

2018-03-09 Thread dwaipayan . roy
Thank you very much for your reply. Yes, I really want this (for implementing a retrieval function that extends the LMDir function). Precisely, I want the document numbering same as that we see in Lucene-Index-Viewers like Luke. I am not sure what you meant by "segment offset, held by a leaf reade

Re: getting Lucene Docid from inside score()

2018-03-09 Thread Dwaipayan Roy
Thank you very much for your reply. Yes, I really want this (for implementing a retrieval function that extends the LMDir function). Precisely, I want the document numbering same as that we see in Lucene-Index-Viewers like Luke. I am not sure what you meant by "segment offset, held by a leaf reade

Re: getting Lucene Docid from inside score()

2018-03-09 Thread Michael Sokolov
Are you sure you want this? Lucene docids aren't generally useful outside a narrow internal context. They can change over time for example. But if you do, it sounds like maybe what you are seeing is the per segment docid. To get a global one you have to add the segment offset, held by a leaf reade

getting Lucene Docid from inside score()

2018-03-09 Thread Dwaipayan Roy
While searching, I want to get the lucene assigned docid (that starts from 0 to the number of documents -1) of a document having a particular query term. >From inside the score(), printing 'doc' or calling docId() is returning a docid which, I think, is the internal docid of a segment in which the

Getting most frequent terms from single-token field values in a subset of Lucene documents

2017-08-28 Thread wilqor
Hello, In a Lucene index I have documents containing a number of single-token text fields indexed as StringField. I would like to query the most frequent terms from single-token values of each field - like a top 10 of occurrences - and be able to perform the query on a subset of documents effectiv

getting error " is not a SuggestField"

2017-08-01 Thread Vitaly Stroutchkov
Hi, I created a Lucene index with documents that have a SuggestField. When I run a completion query against it I'm getting an error. The code is like this (not so straightforward in the actual program): File sdir = new File(sgs

RE: Un-used index files are not getting released

2017-05-12 Thread Siraj Haider
-la : 236 lsof : 79 -- Regards -Siraj Haider (212) 306-0154 -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, May 11, 2017 1:34 PM To: java-user@lucene.apache.org Cc: ian@gmail.com Subject: RE: Un-used index files are not getting released

RE: Un-used index files are not getting released

2017-05-11 Thread Chris Hostetter
: We do not open any IndexReader explicitly. We keep one instance on : IndexWriter open (and never close) and for searching we use : SearcherManager. I checked the lsof and did not find any files with : delete status. what exactly does your SearchManager usage look like? is every searcher =

Re: Un-used index files are not getting released

2017-05-09 Thread Ian Lea
index folder using java > (File.listFiles()) it lists 1761 files in that folder. This count goes down > to a double digit number when I restart the tomcat. > > Thanks for looking into it. > > -- > Regards > -Siraj Haider > (212) 306-0154 > > -Original Mess

RE: Un-used index files are not getting released

2017-05-08 Thread Siraj Haider
getting released The most common cause is unclosed index readers. If you run lsof against the tomcat process id and see that some deleted files are still open, that's almost certainly the problem. Then all you have to do is track it down in your code. -- Ian. On Thu, May 4, 2017 at 10:

Re: Un-used index files are not getting released

2017-05-05 Thread Ian Lea
aider wrote: > Hi all, > We recently switched to Lucene 6.5 from 2.9 and we have an issue that the > files in index directory are not getting released after the IndexWriter > finishes up writing a batch of documents. We are using > IndexFolder.listFiles().length to check the numb

Un-used index files are not getting released

2017-05-04 Thread Siraj Haider
Hi all, We recently switched to Lucene 6.5 from 2.9 and we have an issue that the files in index directory are not getting released after the IndexWriter finishes up writing a batch of documents. We are using IndexFolder.listFiles().length to check the number of files in index folder. We have

Un-used index files are not getting released

2017-05-03 Thread Siraj Haider
Hi all, We recently switched to Lucene 6.5 from 2.9 and we have an issue that the files in index directory are not getting released after the IndexWriter finishes up writing a batch of documents. We are using IndexFolder.listFiles().length to check the number of files in index folder. We have

Re: Getting list of committed documents

2016-11-13 Thread lukes
Thanks Mike. Yeah, i saw the changelist you mentioned. Unfortunately i can't upgrade to 6.2 because of stack limitations :( . Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258p4305728.html Sent from the Lucene -

Re: Getting list of committed documents

2016-11-13 Thread Michael McCandless
> Thanks a lot. > > Regards. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258p4305644.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > ---

Re: Getting list of committed documents

2016-11-12 Thread lukes
it count progressive over time or number of documents which made into only for that commit only ? Once you point me, i can look into for more details. Thanks a lot. Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents

Re: Getting list of committed documents

2016-11-11 Thread Michael McCandless
, and all other ops did not make it. You should be able to use this information to e.g. tell the channel (e.g. a kafka queue) which offset your Lucene app has "durably" consumed. Mike McCandless http://blog.mikemccandless.com On Wed, Nov 9, 2016 at 2:40 PM, lukes wrote: > Hi all, &

Re: Getting list of committed documents

2016-11-10 Thread lukes
Hi, Can anyone please suggest or point in some directions. Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258p4305503.html Sent from the Lucene - Java Users mailing list archive at Nabble.com

Getting list of committed documents

2016-11-09 Thread lukes
Hi all, I need some feedback on getting hold of documents which got committed during commit call on indexwriter. There are multiple threads which keeps on adding documents to indexWriter in parallel, and there's another thread which wakes up after n number of minutes and does the commit.

RE: Lucene IndexSearcher PrefixQuery seach getting really slow after a while

2016-11-04 Thread Nir Barel
Try to optimize your indexes. Sent securely from my iPhone From: Jason Wu Sent: Thursday, 3 November 2016 at 22:21:55 To: java-user@lucene.apache.org Subject: Lucene IndexSearcher PrefixQuery seach getting really slow after a while Hi Team, We are using lucene 4.8.1 to do some info searches

Lucene IndexSearcher PrefixQuery seach getting really slow after a while

2016-11-03 Thread Jason Wu
Hi Team, We are using lucene 4.8.1 to do some info searches every day for years. However, recently we encounter some performance issues which greatly slow down the lucene search. After application running for a while, we are facing below issues, which IndexSearcher PrefixQuery taking much lon

Re: Lucene indexes getting deleted after application restart

2016-07-06 Thread Michael McCandless
Call IW.commit on a periodic basis, e.g. every N (!= 1) docs, or every M bytes or something? Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 1:57 PM, Desteny Child wrote: > Hi! > > In my Spring/Lucene application I'm using Lucene IndexWriter, > TrackingIndexWriter, Search

Lucene indexes getting deleted after application restart

2016-07-06 Thread Desteny Child
Hi! In my Spring/Lucene application I'm using Lucene IndexWriter, TrackingIndexWriter, SearcherManager and ControlledRealTimeReopenThread. I use open mode - IndexWriterConfig.OpenMode.CREATE_OR_APPEND. Right now I'm trying to index a thousands of a documents. For this purpose I have added Apache

Re: LockFactory issue observed in lucene while getting instance of indexWriter

2016-07-01 Thread Michael McCandless
Mike McCandless http://blog.mikemccandless.com On Wed, Jun 29, 2016 at 8:10 AM, Mukul Ranjan wrote: > Hi Mike, > > > > I observed the issue from some time and observed that there may issue with > the MultiSearcher that I’m using for searcher over multiple index folders. > > &

RE: LockFactory issue observed in lucene while getting instance of indexWriter

2016-06-16 Thread Mukul Ranjan
Hi Mike, Yes, we are getting indexReader instance from the active Directory. We are using MultiReader to obtain instance of indexSearcher. Thanks, Mukul From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Friday, June 17, 2016 12:56 AM To: Mukul Ranjan Cc: Lucene Users Subject

Re: LockFactory issue observed in lucene while getting instance of indexWriter

2016-06-16 Thread Michael McCandless
g IndexWriter > after adding document to it. > > We are not getting this issue always but it’s frequency is high in our > application. Can you please provide your suggestion? > > > > Thanks, > > Mukul > > > > *From:* Michael McCandless [mailto:luc...@mikem

RE: LockFactory issue observed in lucene while getting instance of indexWriter

2016-06-16 Thread Mukul Ranjan
Hi Michael, Thanks for your reply. I’m running it on windows. I have checked my code, I’m closing IndexWriter after adding document to it. We are not getting this issue always but it’s frequency is high in our application. Can you please provide your suggestion? Thanks, Mukul From: Michael

Re: LockFactory issue observed in lucene while getting instance of indexWriter

2016-06-16 Thread Michael McCandless
Mukul Ranjan wrote: > Hi, > > I'm observing below exception while getting instance of indexWriter- > > java.lang.IllegalArgumentException: Directory MMapDirectory@"directoryName" > lockFactory=org.apache.lucene.store.NativeFSLockFactory@1ec79746 still > has

Re: LockFactory issue observed in lucene while getting instance of indexWriter

2016-06-16 Thread Ian Lea
Sounds to me like it's related to the index not having been closed properly or still being updated or something. I'd worry about that. -- Ian. On Thu, Jun 16, 2016 at 11:19 AM, Mukul Ranjan wrote: > Hi, > > I'm observing below exception while getting

LockFactory issue observed in lucene while getting instance of indexWriter

2016-06-16 Thread Mukul Ranjan
Hi, I'm observing below exception while getting instance of indexWriter- java.lang.IllegalArgumentException: Directory MMapDirectory@"directoryName" lockFactory=org.apache.lucene.store.NativeFSLockFactory@1ec79746 still has pending deleted files; cannot initialize IndexWriter I

Getting exception while initializing FSDirectory

2016-06-15 Thread Mukul Ranjan
Hi, I'm getting below exception while initializing FSDirectory- Caused by: java.lang.IllegalAccessError: tried to access method org.apache.lucene.store.MMapDirectory.unmapHackImpl()Ljava/lang/Object; from class org.apache.lucene.store.MMapDirectory$$dtt &

Getting DocValues by Doc ID

2016-04-20 Thread Aravinth T
I'm trying to retrieve the DocValues for a Field only by giving the Doc ID previous approach i had was getting the AtomicReader & Doc ID set then Iterating through the doc values it seems bit slower Is there any quicker way in lucene ??

Getting an Exception while searching when (numHits = Large Number) in TopScoreDocCollector

2016-03-01 Thread Gimantha Bandara
I know that I am getting this exception because the priorityQueue allocate memory more than my PC can allocate from the RAM. ERROR {org.wso2.carbon.analytics.dataservice.core.indexing.AnalyticsDataIndexer} - Error in index search: null java.lang.NegativeArraySizeException at

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
e >- >Uwe Schindler >H.-H.-Meier-Allee 63, D-28213 Bremen >http://www.thetaphi.de >eMail: u...@thetaphi.de > > >> -Original Message- >> From: Wayne Xin [mailto:wayne_...@hotmail.com] >> Sent: Friday, August 14, 2015 8:44 PM >

RE: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Uwe Schindler
; Sent: Friday, August 14, 2015 8:44 PM > To: java-user@lucene.apache.org > Subject: Re: getting full english word from tokenizing with > SmartChineseAnalyzer > > Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is > final, otherwise we could overwrite createCompone

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
th Lucene Analyzer. I would like to get the full English >>tokens >> from SmartChineseAnalyzer. But I’m only getting stems. The following >>code >> has predefined the sentence in "testStr": >> String testStr = "女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 >> 林firs

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Michael Mastroianni
, 2015 at 11:20 AM, Wayne Xin wrote: > Hi, > > > > I am new with Lucene Analyzer. I would like to get the full English tokens > from SmartChineseAnalyzer. But I’m only getting stems. The following code > has predefined the sentence in "testStr": > String testStr =

getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
Hi, I am new with Lucene Analyzer. I would like to get the full English tokens from SmartChineseAnalyzer. But I’m only getting stems. The following code has predefined the sentence in "testStr": String testStr = "女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 林first seed同处1/4区,3号种子李雪芮和韩国选手Ko

problem in getting data with spanTermQuery Lucene 5.2

2015-08-03 Thread Behnam Khoshsafar
problem in getting data with spanTermQuery Lucene 5.2 i need and example of spantermquery and getsapns in lucene 5.2 thanks; - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java

Getting cosine similarity of any given two Lucene 5.1 Documents using latest APIs

2015-07-11 Thread Nitish Nitish
Hi All, Greetings, Just started with Lucene 5.1 a month ago for my research. I have a set of documents indexed with term frequencies option enabled during indexing. For given any two documents, I would like to calculate their tfidf cosine similarity could you please point me to the right direc

Re: Getting a proper ID value into every document

2015-06-05 Thread Chris Hostetter
: If you cannot do this for whatever reason, I vaguely remember someone : posting a link to a program they'd put together to do this for a : docValues field, you'd have to search the archives to find it. It was Toke - he generated DocValues for an existing index by writing an IndexReader Filter

Re: Getting a proper ID value into every document

2015-06-05 Thread Erick Erickson
My first recommendation, of course, would be to re-index the corpus with a new field. If possible, frankly, that would probably be less effort than trying to hack in an ID after the fact as well as not as error-prone. If you cannot do this for whatever reason, I vaguely remember someone posting a

Getting a proper ID value into every document

2015-06-04 Thread Trejkaz
Hi all. We had been going for the longest time abusing Lucene's doc IDs as our own IDs and of course all our filters still work like this. But at the moment, we're looking at ways to break our dependencies on this. One of the motivators for this is the outright removal of FieldCache in Lucene 5.

RE: getting exception in lucene 4.0

2015-04-30 Thread Uwe Schindler
http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Rajendra Rao [mailto:rajendra@launchship.com] > Sent: Thursday, April 30, 2015 3:52 PM > To: java-user@lucene.apache.org > Subject: getting exception in lucene 4.0 > > i am getting below exception

getting exception in lucene 4.0

2015-04-30 Thread Rajendra Rao
i am getting below exception while using apache lucene in web service with tomcat ,axis 2 .This exception is not coming in standalone application. Java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist. You need

Getting the doc values grouped by Facets

2015-03-26 Thread Gimantha Bandara
Hi, I have some lucene documents indexed. They contain some facet fields. I wrote some drilldown query and by using getTopChildren, I can get the facet labels and the value/count. I am wondering if it is possible to get the doc values of the documents under each facet. So I can list the documents

Re: Getting most occurring words in lucene

2015-02-22 Thread Michael McCandless
: > Hi, > > I am trying to get the top occurring words by building a memory index using > lucene using the code below but I am not getting the desired results. The > text contains 'freedom' three times but it gives only 1. Where am I > committing a mistake. Is

Getting most occurring words in lucene

2015-02-22 Thread Maisnam Ns
Hi, I am trying to get the top occurring words by building a memory index using lucene using the code below but I am not getting the desired results. The text contains 'freedom' three times but it gives only 1. Where am I committing a mistake. Is there a way out. Please help. RAMDir

AW: Indexing an IntField but getting SotredField from found Document

2015-02-19 Thread Clemens Wyss DEV
a dynamic framework ;) -Ursprüngliche Nachricht- Von: Ian Lea [mailto:ian@gmail.com] Gesendet: Donnerstag, 19. Februar 2015 16:16 An: java-user@lucene.apache.org Betreff: Re: Indexing an IntField but getting SotredField from found Document I think if you follow the Field.fieldType().num

Re: Indexing an IntField but getting SotredField from found Document

2015-02-19 Thread Ian Lea
I think if you follow the Field.fieldType().numericType() chain you'll end up with INT or DOUBLE or whatever. But if you know you stored it as an IntField then surely you already know it's an integer? Unless you sometimes store different things in the one field. I wouldn't do that. -- Ian. O

Indexing an IntField but getting SotredField from found Document

2015-02-19 Thread Clemens Wyss DEV
When I index a Document with an IntField and then find that very Document the former IntField is returned as StoredField. How do I determine the "original" fieldtype (IntField, LongField, DoubleField ...)? Must I ? Number number = Field.numericValue(); if( number != null ) { if( number instanc

Re: getting number of terms in a document/field

2015-02-08 Thread Ahmet Arslan
Hi, Sorry for my ignorance, how do I obtain AtomicReader from a IndexReader? I figured above code but it gives me a list of atomic readers. for (AtomicReaderContext context : reader.leaves()) { NumericDocValues docValues = context.reader().getNormValues(field); if (docValues != null) normValu

Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless
On Fri, Feb 6, 2015 at 8:51 AM, Ahmet Arslan wrote: > Hi Michael, > > Thanks for the explanation. I am working with a TREC dataset, > since it is static, I set size of that array experimentally. > > I followed the DefaultSimilarity#lengthNorm method a bit. > > If default similarity and no index ti

Re: getting number of terms in a document/field

2015-02-06 Thread Ahmet Arslan
Hi Michael, Thanks for the explanation. I am working with a TREC dataset, since it is static, I set size of that array experimentally. I followed the DefaultSimilarity#lengthNorm method a bit. If default similarity and no index time boost is used, I assume that norm equals to 1.0 / Math.sqrt

Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless
How will you know how large to allocate that array? The within-doc term freq can in general be arbitrarily large... Lucene does not directly store the total number of terms in a document, but it does store it approximately in the doc's norm value. Maybe you can use that? Alternatively, you can s

  1   2   3   4   5   6   >