Re: Loading WFST to Memory Mapped File in Lucene

2022-12-27 Thread Dawid Weiss
if I'm using the MMapDirectory. The data > is on heap. > > For my use case, it's a huge waste of memory :( 90% of my data could be > correctly organised and kept in disk. > > Thanks for the support > > Best regards > Marcos Rebelo > > On Tue, 27 Dec 2022, 09:11 Dawi

Re: Loading WFST to Memory Mapped File in Lucene

2022-12-27 Thread marcos rebelo
I have the same impression, even if I'm using the MMapDirectory. The data is on heap. For my use case, it's a huge waste of memory :( 90% of my data could be correctly organised and kept in disk. Thanks for the support Best regards Marcos Rebelo On Tue, 27 Dec 2022, 09:11 Dawid Wei

Re: Loading WFST to Memory Mapped File in Lucene

2022-12-27 Thread Dawid Weiss
but I don't think there is an API in WFSTCompletionLookup that would allow you to do that. D. On Fri, Dec 23, 2022 at 5:00 PM marcos rebelo wrote: > Hey all! > > I'm loading multiple WFST with ~1.1 Gb and the JVM memory increases > proportionally. Looks like the file i

Loading WFST to Memory Mapped File in Lucene

2022-12-23 Thread marcos rebelo
Hey all! I'm loading multiple WFST with ~1.1 Gb and the JVM memory increases proportionally. Looks like the file is stored in memory, meaning not using Memory Mapped Files at all. Example code: In the following code we setup the Lucene to use /tmp/deleteme2 for the memory mapped file a

Warming up index files via cat to make it in memory index

2021-03-25 Thread baris . kazar
Hi,-  This new thread is the continuation of previous thread back in Feb 2021: Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) May i mention that i cat'ed *fdt files (largest index files among 98 index files generated) by directing to new files so that these

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread Baris Kazar
So, just cat will do this. Thanks From: Robert Muir Sent: Tuesday, February 23, 2021 4:45 PM To: Baris Kazar Cc: java-user Subject: Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) The preload isn't magical. It only "reads in

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread Robert Muir
e url / site that i can look at for preload? > > Thanks for the explanations. This thread will be useful for many folks i > believe. > > Best regards > > > On 2/23/21 4:15 PM, Robert Muir wrote: > > > > On Tue, Feb 23, 2021 at 4:07 PM wrote: > >> What i w

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread baris . kazar
ve. Best regards On 2/23/21 4:15 PM, Robert Muir wrote: On Tue, Feb 23, 2021 at 4:07 PM <mailto:baris.ka...@oracle.com>> wrote: What i want to achieve: Problem statement: base case is disk based Lucene index with FSDirectory speedup case was supposed to be in memor

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread Robert Muir
On Tue, Feb 23, 2021 at 4:07 PM wrote: > What i want to achieve: Problem statement: > > base case is disk based Lucene index with FSDirectory > > speedup case was supposed to be in memory Lucene index with MMapDirectory > On 64-bit systems, FSDirectory just invokes MMapDirecto

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread baris . kazar
better in that, too: ie, cold start. What i want to achieve: Problem statement: base case is disk based Lucene index with FSDirectory speedup case was supposed to be in memory Lucene index with MMapDirectory Uwe mentioned tmpfs will help. i will try that next. I thought preload was not

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread baris . kazar
Lucene index with FSDirectory speedup case was supposed to be in memory Lucene index with MMapDirectory Uwe mentioned tmpfs will help. i will try that next. Thanks On 2/23/21 3:54 PM, Robert Muir wrote: speedup over what? You are probably already using MMapDirectory (it is the default). So I

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread Robert Muir
speedup over what? You are probably already using MMapDirectory (it is the default). So I don't know what you are trying to achieve, but giving lots of memory to your java process is not going to help. If you just want to prevent the first few queries to a fresh cold machine instance from

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread baris . kazar
/21 3:40 PM, Robert Muir wrote: Don't give gobs of memory to your java process, you will just make things slower. The kernel will cache your index files. On Tue, Feb 23, 2021 at 1:45 PM <mailto:baris.ka...@oracle.com>> wrote: Ok, but how is this MMapDirectory used then?

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread Robert Muir
Don't give gobs of memory to your java process, you will just make things slower. The kernel will cache your index files. On Tue, Feb 23, 2021 at 1:45 PM wrote: > Ok, but how is this MMapDirectory used then? > > Best regards > > > On 2/23/21 7:03 AM, Robert Muir wrote:

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread baris . kazar
, Robert Muir wrote: On Tue, Feb 23, 2021 at 2:30 AM <mailto:baris.ka...@oracle.com>> wrote: Hi,-   I tried MMapDirectory and i allocated as big as index size on my J2EE Container but Don't allocate java heap memory for the index, MMapDirectory does not use java heap memory!

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread baris . kazar
t Don't allocate java heap memory for the index, MMapDirectory does not use java heap memory!

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-23 Thread Robert Muir
On Tue, Feb 23, 2021 at 2:30 AM wrote: > Hi,- > > I tried MMapDirectory and i allocated as big as index size on my J2EE > Container but > > Don't allocate java heap memory for the index, MMapDirectory does not use java heap memory!

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2021-02-22 Thread baris . kazar
Hi,-  I tried MMapDirectory and i allocated as big as index size on my J2EE Container but it only gives me at most 25% speedup and even sometimes a small amount of slowdown. How can i effectively use Lucene indexes in memory? Best regards On 12/14/20 6:35 PM, baris.ka...@oracle.com

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread baris . kazar
regards On 12/14/20 5:52 PM, Robert Muir wrote: On Mon, Dec 14, 2020 at 1:59 PM Uwe Schindler wrote: Hi, as writer of the original bog post, here my comments: Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is to load everything into memory - but that does not guarantee

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Robert Muir
On Mon, Dec 14, 2020 at 1:59 PM Uwe Schindler wrote: > > Hi, > > as writer of the original bog post, here my comments: > > Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is > to load everything into memory - but that does not guarantee anything!

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread baris . kazar
: Hi, Thanks Uwe, i am not insisting on to load everything into memory but loading into memory might speed up and i would like to see how much speedup. but i have one more question and that is still not clear to me: "it is much better to open index, with MMAP directory" does t

RE: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Uwe Schindler
Hi, > Thanks Uwe, i am not insisting on to load everything into memory > > but loading into memory might speed up and i would like to see how much > speedup. > > > but i have one more question and that is still not clear to me: > > "it is much better to

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread baris . kazar
original bog post, here my comments: Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is to load everything into memory - but that does not guarantee anything! Still, I would not recommend to use that function, because all it does is to just touch every page of the file, so

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread baris . kazar
Thanks Uwe, i am not insisting on to load everything into memory but loading into memory might speed up and i would like to see how much speedup. but i have one more question and that is still not clear to me: "it is much better to open index, with MMAP directory" does this mea

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Jigar Shah
read-only index reducing the risk of downtime. On Mon, Dec 14, 2020 at 1:51 PM Uwe Schindler wrote: > Hi, > > as writer of the original bog post, here my comments: > > Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is > to load everything into memo

RE: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Uwe Schindler
Hi, as writer of the original bog post, here my comments: Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is to load everything into memory - but that does not guarantee anything! Still, I would not recommend to use that function, because all it does is to just touch

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread baris . kazar
GBfhVwhvr1QLg-A2u4Xd8QWD5FKapojFuxlIEAQY7H3KlnA2YBj41g$ > On Mon, Dec 14, 2020 at 11:27 AM wrote: Thanks Mike, appreciate the reply and the suggestions very much. And Your article link to concurrent search is amazing. Together with in memory and concurrent index (especially in read only mode) these will

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Jigar Shah
ry. <https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux> On Mon, Dec 14, 2020 at 11:27 AM wrote: > Thanks Mike, appreciate the reply and the suggestions very much. > > And Your article link to concurrent search is amazing. > > Together with in memory and concurrent inde

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread baris . kazar
Thanks Mike, appreciate the reply and the suggestions very much. And Your article link to concurrent search is amazing. Together with in memory and concurrent index (especially in read only mode) these will speed up Lucene queries very much. Happy Holidays Best regards On 12/14/20 10:12 AM

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Michael McCandless
use to load your index. Mike McCandless http://blog.mikemccandless.com On Sun, Dec 13, 2020 at 4:18 PM wrote: > Hi,- > > it would be nice to create a Lucene index in files and then effectively > load it into memory once (since i use in read-only mode). I am looking into > if th

MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-13 Thread baris . kazar
Hi,- it would be nice to create a Lucene index in files and then effectively load it into memory once (since i use in read-only mode). I am looking into if this is doable in Lucene. i wish there were an option to load whole Lucene index into memory: Both of below urls have links to the blog

Re: Need suggestions on implementing a custom query (offload R-tree filter to fully in-memory) on Lucene-8.3

2019-12-07 Thread Adrien Grand
l. which combines inverted index and R-tree distance > query(index data is fully loaded into memory), i use a bound box to do > filter and then use concise "contains" check to filter, so they are both > "distance query" (or i call it "point nearby query") > &

Re: Need suggestions on implementing a custom query (offload R-tree filter to fully in-memory) on Lucene-8.3

2019-12-04 Thread 小鱼儿
Hi, adrien As to my native impl. which combines inverted index and R-tree distance query(index data is fully loaded into memory), i use a bound box to do filter and then use concise "contains" check to filter, so they are both "distance query" (or i call it "poi

Re: Need suggestions on implementing a custom query (offload R-tree filter to fully in-memory) on Lucene-8.3

2019-12-04 Thread Adrien Grand
filter. > > The problem is, when i first build a native in-memory index which use a > simple BitSet as DocIDSet type and STRTree class from the famous JTS lib, i > get 20ms/1000qps perf metrics with 1w8 POIs on my laptop(Windows 7 x64, use > mmap codec). But when i use Luc

Need suggestions on implementing a custom query (offload R-tree filter to fully in-memory) on Lucene-8.3

2019-12-03 Thread 小鱼儿
Background: i need to implement a document indexing and search for POIs(point of interest) under LBS scene. A POI has name, address, and location(LatLonPoint), and i want to combine a text query with a geo-spatial 2d range filter. The problem is, when i first build a native in-memory index which

Re: Lucene coreClosedListeners memory issues

2019-06-04 Thread Adrien Grand
drien Grand > > <mailto:jpou...@gmail.com> wrote > > > > It looks like you are leaking readers. > > > > On Mon, Jun 3, 2019 at 9:46 AM alex stark > > <mailto:alex.st...@zoho.com.invalid> wrote: > > > > > >

Re: Lucene coreClosedListeners memory issues

2019-06-03 Thread alex stark
on, Jun 3, 2019 at 9:46 AM alex stark > <mailto:alex.st...@zoho.com.invalid> wrote: > > > > Hi experts, > > > > > > > > I recently have memory issues on Lucene. By checking heap dump, most of > > them are occupied by SegmentCoreReaders.coreC

Re: Lucene coreClosedListeners memory issues

2019-06-03 Thread Adrien Grand
wrote > > It looks like you are leaking readers. > > On Mon, Jun 3, 2019 at 9:46 AM alex stark wrote: > > > > Hi experts, > > > > > > > > I recently have memory issues on Lucene. By checking heap dump, most of > > them are occ

Re: Lucene coreClosedListeners memory issues

2019-06-03 Thread alex stark
e: > > Hi experts, > > > > I recently have memory issues on Lucene. By checking heap dump, most of them > are occupied by SegmentCoreReaders.coreClosedListeners which is about nearly > half of all. > > > > > > Dominator Tree==

Re: Lucene coreClosedListeners memory issues

2019-06-03 Thread Adrien Grand
It looks like you are leaking readers. On Mon, Jun 3, 2019 at 9:46 AM alex stark wrote: > > Hi experts, > > > > I recently have memory issues on Lucene. By checking heap dump, most of them > are occupied by SegmentCoreReaders.coreClosedListeners which is about

Lucene coreClosedListeners memory issues

2019-06-03 Thread alex stark
Hi experts, I recently have memory issues on Lucene. By checking heap dump, most of them are occupied by SegmentCoreReaders.coreClosedListeners which is about nearly half of all. Dominator Tree num retain size(bytes) percent percent(live) class Name

RE: Why does Lucene 7.4.0 commit() Increase Memory Usage x2

2019-04-05 Thread thturk
Thank you for all answers. The basic solution was commit rarely with an Scheduled Task or manually commit and keep heap size to minimun to GC run often and parallelly to release Memory Usage . -- Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

RE: Why does Lucene 7.4.0 commit() Increase Memory Usage x2

2019-04-04 Thread Uwe Schindler
Small correction: It's not fully true that the JVM "never" gives back memory to the operating system: The G1 collector can give back memory to the OS since the beginning, but it does this only on full GCs which it tries to prevent. But: The default collector as shipped with Jav

RE: Why does Lucene 7.4.0 commit() Increase Memory Usage x2

2019-04-04 Thread Uwe Schindler
Hi, Thanks Adrien. With current JVM versions (Java 8 or Java 11), the garbage collector never gives back memory to the operating system, once it has allocated that. Due to now heavy usage of containers and similar techniques, there are efforts on the JVM front to change that: At least the G1

Re: Why does Lucene 7.4.0 commit() Increase Memory Usage x2

2019-04-04 Thread Adrien Grand
I think what you are experiencing is just due to how the JVM works: it happily reserves memory to the operating system if it thinks it might need it, and then it's reluctant to give it back because it assumes that if it has needed so much memory in the past, it might need it again in the f

Re: Why does Lucene 7.4.0 commit() Increase Memory Usage x2

2019-04-03 Thread thturk
p size didnt decrease is it cos its take while to merge serment for lucene ? cos after a hour Memory Ussage Decreased around 3G it was 3.5G after add new 15k document. -- Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f53286

Re: Why does Lucene 7.4.0 commit() Increase Memory Usage x2

2019-04-02 Thread Erick Erickson
GCViewer opened on the GC logs to get something more useful. Best, Erick > On Apr 2, 2019, at 4:50 AM, thturk wrote: > > I am watching via task manager. > Now i tired to handle this with hard coded way. I create new index and with > commit in small index cost low memory. but i dont

Re: Why does Lucene 7.4.0 commit() Increase Memory Usage x2

2019-04-02 Thread thturk
I am watching via task manager. Now i tired to handle this with hard coded way. I create new index and with commit in small index cost low memory. but i dont think that its good way to do this. Its getting harder to manage indexes. -- Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java

Re: Why does Lucene 7.4.0 commit() Increase Memory Usage x2

2019-04-02 Thread Adrien Grand
How do you measure memory usage? On Mon, Apr 1, 2019 at 8:33 AM thturk wrote: > > Hello, > -For a while i am tring to figure out why ram usage incease x2 than before > after commit one single document. > > -Lucene Version 7.4.0 > -Writer Directory FSDirectory > -Reader

Why does Lucene 7.4.0 commit() Increase Memory Usage x2

2019-03-31 Thread thturk
. -After commit i create new searcher for real-time data. also close older one. Closing old index decrease memory usage but not that much as app started. I know that i shouldn't commit that often but i am tring to test how it will react per each commit. Even commit per hour or day the memory u

RE: in memory lucene

2019-02-28 Thread wmartinusa
cluster. Another interesting search result link snippet was from eClubPrague. -Original Message- From: Jonathan Willis Sent: Tuesday, February 26, 2019 6:19 PM To: java-user@lucene.apache.org Subject: in memory lucene Hi, i'm looking into using Lucene 7.7.0 and noticed tha

RE: in memory lucene

2019-02-27 Thread Uwe Schindler
From: Jonathan Willis > Sent: Wednesday, February 27, 2019 12:19 AM > To: java-user@lucene.apache.org > Subject: in memory lucene > > Hi, i'm looking into using Lucene 7.7.0 and noticed that the RAMDirectory > has been deprecated because of inefficient synchronization issues and t

in memory lucene

2019-02-27 Thread Jonathan Willis
Hi, i'm looking into using Lucene 7.7.0 and noticed that the RAMDirectory has been deprecated because of inefficient synchronization issues and that we are encouraged to use MMapDirectory instead. I was hoping to use an in memory only directory and was wondering if that would be possible wi

Re: RamDirectory vs MemoryIndex vs MMapDirectory for In-Memory-Index

2018-09-25 Thread Matthias Müller
Thanks Dawid, glad I asked! Am Dienstag, den 25.09.2018, 10:46 +0200 schrieb Dawid Weiss: > Use MMapDirectory on a temporary location, Matthias. If you really > need in-memory indexes, a new Directory implementation is coming > (RAMDirectory will be deprecated, then removed), but the d

Re: RamDirectory vs MemoryIndex vs MMapDirectory for In-Memory-Index

2018-09-25 Thread Dawid Weiss
Use MMapDirectory on a temporary location, Matthias. If you really need in-memory indexes, a new Directory implementation is coming (RAMDirectory will be deprecated, then removed), but the difference compared to MMapDirectory is typically not worth the hassle. See this issue for more discussion

RamDirectory vs MemoryIndex vs MMapDirectory for In-Memory-Index

2018-09-25 Thread Matthias Müller
Hi, Lucene provides different storage options for in-memory indexes. I found three structures that would qualify for the task: * RamDirectory (which I currently use for prototyping, but wonder if it is the ideal choice for my task) * MemoryIndex, which claims to have better performance and

Re: Lucene 6.5.1 memory consumption on 64 bit linux System

2018-04-30 Thread ankur.168
Hi Adrian, I am hitting lucene once per request and getting back one document. I have attached a sample I took some time back. Let me know if you need any other details. -- Sent from: http://lucene.472066.n3.nabble.com/Lucene

Re: Lucene 6.5.1 memory consumption on 64 bit linux System

2018-04-26 Thread Adrien Grand
Hello, You didn't say how many hits you fetch per request. Would you have screenshots of a heap dump analysis to share? Le jeu. 26 avr. 2018 à 11:02, ankur.168 a écrit : > Hi Adrien and others, > > Any suggestions here? > > Thanks > Ankur Bansal > > > > -- > Sent from: > http://lucene.472066.n3

Re: Lucene 6.5.1 memory consumption on 64 bit linux System

2018-04-26 Thread ankur.168
Hi Adrien and others, Any suggestions here? Thanks Ankur Bansal -- Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional

Re: Lucene 6.5.1 memory consumption on 64 bit linux System

2018-04-19 Thread ankur.168
Currently I am hitting Lucene only 1 time per request, yes I have single searcherManager instance which i am using to acquire/release per request. I have 2 indexes, hence 2 searcherManager for respective indexReaders. -- Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.ht

Re: Lucene 6.5.1 memory consumption on 64 bit linux System

2018-04-19 Thread Adrien Grand
Sorry I was more asking about the number of hits that you ask for per request rather than the request rate. Regarding SearcherManager, I mostly wanted to make sure that you have a single SearcherManager instance that all threads acquire an IndexSearcher from. On the contrary to eg. having multiple

Re: Lucene 6.5.1 memory consumption on 64 bit linux System

2018-04-18 Thread ankur.168
Application is hit with average 11-12 TPS only currently with max as 20 TPS. Currently searcherManager is acquired and released per request/Thread. What do you mean by sharing searchermanager among multiple threads can you please give more details on this and how this can help me here? -- Sent f

Re: Lucene 6.5.1 memory consumption on 64 bit linux System

2018-04-18 Thread Adrien Grand
How many hits do you typically ask for? Do you share the SearcherManager object across threads? (you should if not) Le mer. 18 avr. 2018 à 14:34, ankur.168 a écrit : > ok, So have gone through few more searching. I have found that *MMap uses > only virtual memory not JVM allocated memory

Re: Lucene 6.5.1 memory consumption on 64 bit linux System

2018-04-18 Thread ankur.168
ok, So have gone through few more searching. I have found that *MMap uses only virtual memory not JVM allocated memory*. But what about SearcherManager or ScoreDocs? I can see in heap dump most of the space is taken by either multiple scoredoc[] or searchermanager object. Can you guys help me

Lucene 6.5.1 memory consumption on 64 bit linux System

2018-04-16 Thread ankur.168
requests to the application.* When I took a look at heap dump, its either ScoreDoc or SearcherManager instances which is consuming most of the memory. Can you guys help me here to understand below things- 1. why this behaviour is happening? 2. How lucene manages memory in JVM? 3. How indexReader and

Re: Optimize FTS memory footprint

2017-12-12 Thread Michael McCandless
Try upgrading Elasticsearch -- it's up to 6.0 release just a few week ago now -- its (and Lucene's) memory usage has decreased over time. The _uid field in particular will always be costly, unfortunately. Since it's a primary key, every term will be unique, and the term index has

Re: Optimize FTS memory footprint

2017-12-12 Thread Michael McCandless
Comments below: On Tue, Nov 28, 2017 at 4:47 PM, elirev wrote: > Thanks Mike . > I did not find any clear way to know it its FST or Norm , or something > else ( unless i miss something ) the fact the FST is an in memory prefix > index lead me to think it using most of the he

Re: Optimize FTS memory footprint

2017-12-12 Thread Bingtao Yin
ld. The ramBytesUsed() method returns memory cost of the fst. 2017-12-12 1:05 GMT+08:00 elirev : > Hו yin > How do you determine the size being allocated for your _uid ? > > > > -- > Sent from: http://lucene.472066.n3.nab

Re: Optimize FTS memory footprint

2017-12-11 Thread elirev
Hו yin How do you determine the size being allocated for your _uid ? -- Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additio

Re: How to regulate native memory?

2017-12-04 Thread Dominique Bejean
Hi Uwe, When you are saying "MMap is NOT direct memory", I understand that we can consider that JVM can use (at least) these 3 types of memory: - Heap memory (controlled by Xmx and managed by GC) - Off-heap MMap (os cache) *which is not* Direct Memory and *is not* con

Re: Optimize FTS memory footprint

2017-12-02 Thread Bingtao Yin
Thanks, mike. I'm facing a similar problem. I'm running a 2.0 elasticsearch cluster, and find the fst of _uid field takes a lot of memory. The _uid field is not analyzed and generated by elasticsearch, which also has high cardinality. Is there any ways to reduce memory cost for _uid fie

Re: Optimize FTS memory footprint

2017-11-28 Thread elirev
Thanks Mike . I did not find any clear way to know it its FST or Norm , or something else ( unless i miss something ) the fact the FST is an in memory prefix index lead me to think it using most of the heap . Our mapping is normal with around of 200 columns one of the columns is nested

Re: Optimize FTS memory footprint

2017-11-20 Thread Michael McCandless
Are you sure its FSTs using your heap? Do you have many index fields that have high cardinality? Or many suggesters? Mike McCandless http://blog.mikemccandless.com On Thu, Nov 16, 2017 at 5:03 PM, Eli Revach wrote: > Hi > I am using Elasticserach 1.7.5 , our segment memory allocati

Optimize FTS memory footprint

2017-11-16 Thread Eli Revach
Hi I am using Elasticserach 1.7.5 , our segment memory allocation per node is very big , its seems like related to FST . 1) Amy way to reduce /optimze its size ( i understed its the index for the terms) ? 2) Did index optimize can help ? 2) The fact that we used nested objects can dramticly the

Re: How to regulate native memory?

2017-08-31 Thread Erik Stephens
I did not know that mmap is not considered direct memory, so thanks for that. Now I can stop barking about why -XX:MaxDirectMemorySize isn't having any effect :) -- Erik On Wed, Aug 30, 2017 at 11:39 PM, Uwe Schindler wrote: > Hi, > > As a suggestion from my side: As a first

Re: How to regulate native memory?

2017-08-31 Thread Erik Stephens
stored in tmpfs (including /dev/shm, used for shared memory). The mmap'd, mlocked pages are stuck in the page cache. Dirty pages will for the most part swiftly be written out. Data in tmpfs will be swapped out if possible." That could've explained why processes are getting OO

RE: How to regulate native memory?

2017-08-30 Thread Uwe Schindler
using the maximum direct memory size, so I have the feeling something is using a lot direct memory and you want to limit that. MMap is NOT direct memory! MMap is also not taken into account by the OOM killer, because it's not owned by the process. To me it looks like the operating system

Re: How to regulate native memory?

2017-08-30 Thread Robert Muir
>From the lucene side, it only uses file mappings for reads and doesn't allocate any anonymous memory. The way lucene uses cache for reads won't impact your OOM (http://www.linuxatemyram.com/play.html) At the end of the day you are running out of memory on the system either way, and

Re: How to regulate native memory?

2017-08-30 Thread Erik Stephens
ntify what I think is mostly lucene usage. Is that an accurate way to quantify that? It shows 51G with `-XX:MaxDirectMemorySize=15G`. The heap is 30G and the resident memory is reported as 82.5G. That makes a bit of sense: 30G + 51G + miscellaneous. `top` reports roughly 51G as shared whi

Re: How to regulate native memory?

2017-08-30 Thread Robert Muir
s old you may have to go through more trouble (summing up stuff from smaps or whatever) On Wed, Aug 30, 2017 at 9:58 PM, Erik Stephens wrote: > Our elasticsearch processes have been slowly consuming memory until a kernel > OOM kills it. Details are here: > > https://github.com/ela

How to regulate native memory?

2017-08-30 Thread Erik Stephens
Our elasticsearch processes have been slowly consuming memory until a kernel OOM kills it. Details are here: https://github.com/elastic/elasticsearch/issues/26269 <https://github.com/elastic/elasticsearch/issues/26269> To summarize: - Explicit GC is enabled - MaxDirectMemorySize

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-07-02 Thread David Smiley
of 50m. > > I am looking to reduce search/sort time to 10ms. I have 4g of RAM for the > java process which is more than sufficient. > > Any suggestions greatly appreciated. > > Thanks, > sc > > > > -- > View this message in context: > http://lucene.472066.

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-06-29 Thread sc
radius of 50m. I am looking to reduce search/sort time to 10ms. I have 4g of RAM for the java process which is more than sufficient. Any suggestions greatly appreciated. Thanks, sc -- View this message in context: http://lucene.472066.n3.nabble.com/Term-Dictionary-taking-up-lots-of-memory

Clarification on Multiple calls to Off heap Memory & on combining storedFields

2017-06-28 Thread aravinth thangasami
educe the number of calls to the Off-heap memory. Correct me if I'm wrong. Please clarify following questions Does access off-heap memory is costly? Is there any overhead in using readField multiple times? Thanks Aravinth

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-06-14 Thread David Smiley
eap, although you have large indexes with many terms! You > > can easily run a query on a 100 Gig index with less than 4 gigs of heap. > > The memory used by Lucene is filesystem cache through MMapDirectory, so > you > > need lots of that free, not heap space. Too large heaps ar

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-06-13 Thread Tom Hirschfeld
something else?: > > - too large heap? Heaps greater than 31 gigs are bad by default. Lucene > needs only few heap, although you have large indexes with many terms! You > can easily run a query on a 100 Gig index with less than 4 gigs of heap. > The memory used by Lucene is filesystem

Re: Term Dictionary taking up lots of heap memory, looking for solutions, lucene 5.3.1

2017-06-06 Thread David Smiley
m super pleased with the performance. ~ David On Wed, May 17, 2017 at 10:59 PM Tom Hirschfeld wrote: > Hey! > > I am working on a lucene based service for reverse geocoding. We have a > large index with lots of unique terms (550 million) and it appears that > we're running in

Re: Memory footprint of individual indices at runtime

2017-06-05 Thread Adrien Grand
then cast the reader to a CodecReader). https://lucene.apache.org/core/6_5_1/core/org/apache/lucene/index/CodecReader.html#ramBytesUsed-- Le lun. 5 juin 2017 à 19:09, Florian Buetow a écrit : > Hi, > > > > I would like to know (or estimate) how much memory an opened index > consu

Memory footprint of individual indices at runtime

2017-06-05 Thread Florian Buetow
Hi, I would like to know (or estimate) how much memory an opened index consumes inside the JVM (heap) and outside the JVM (fs buffers?). To my understanding the amount of memory inside the JVM depends on performed searches and search results which might be cached by Lucene. However, I am not

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-05-18 Thread Uwe Schindler
have large indexes with many terms! You can easily run a query on a 100 Gig index with less than 4 gigs of heap. The memory used by Lucene is filesystem cache through MMapDirectory, so you need lots of that free, not heap space. Too large heaps are contraproductive. - could it's be that you t

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-05-18 Thread Michael McCandless
That sounds like a fun amount of terms! Note that Lucene does not load all terms into memory; only the "prefix trie", stored as an FST ( http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html), mapping term prefixes to on-disk blocks of terms. FSTs are very co

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-05-17 Thread Adrien Grand
Is upgrading to Lucene 6 and using points rather than terms an option? Points typically have lower memory usage (see GeoPoint which is based on terms vs LatLonPoint which is based on points at http://people.apache.org/~mikemccand/geobench.html#reader-heap). Le jeu. 18 mai 2017 à 02:35, Tom

Term Dictionary taking up lots of heap memory, looking for solutions, lucene 5.3.1

2017-05-17 Thread Tom Hirschfeld
Hey! I am working on a lucene based service for reverse geocoding. We have a large index with lots of unique terms (550 million) and it appears that we're running into issue with memory on our leaf servers as the term dictionary for the entire index is being loaded into heap space. If we all

Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-05-17 Thread Tom Hirschfeld
Hey! I am working on a lucene based service for reverse geocoding. We have a large index with lots of unique terms (550 million) and it appears that we're running into issue with memory on our leaf servers as the term dictionary for the entire index is being loaded into heap space. If we all

large set of memory consumed by array init

2016-12-15 Thread Vincent Sevel
weekly indexes (around 15 indexes). I sort with Sort.INDEXORDER by default. when running a search with a high limit (eg: 1 million), I ended up sometimes going out of memory because of the empty datastructures that were initialized. I looked at the objects and in the CollectorManager. reduce

DocValues Field - Memory consumption

2016-11-28 Thread Chitra R
Hi, I would like to enable doc values on all fields that I need to sort or aggregate on. At search time, I am performing sort for a single field, whether the whole doc value files (.dvd & .dvm) information are loaded in memory or a particular field information from that file(s

Re: Crazy increase of MultiPhraseQuery memory usage in Lucene 5 (compared with 3)

2016-10-05 Thread Trejkaz
Thought I would try some thread necromancy here, because nobody replied about this a year ago. Now we're on 5.4.1 and the numbers changed a bit again. Recording best times for each operation. Indexing: 5.723 s SpanQuery: 25.13 s MultiPhraseQuery: (waited 10 minutes and it hasn't compl

Re: Crazy increase of MultiPhraseQuery memory usage in Lucene 5 (compared with 3)

2015-08-23 Thread Trejkaz
I spent some time carving out a quick test of the bits that matter and put them up here: https://gist.github.com/trejkaz/a72b87277b1aec800c2e The tests index 1,000,000 docs with just one instance of the field/sub-field trick we're using, plus one unique value. So it's a bit of an artificial test,

Crazy increase of MultiPhraseQuery memory usage in Lucene 5 (compared with 3)

2015-08-23 Thread Trejkaz
ngs reader which is ultimately (unsurprisingly) being held by the MultiPhraseQuery. What I'm wondering is: - Why the increase in memory cost? - Is our performance hack of using MultiPhraseQuery over SpanQuery really warranted anymore? - Is there a better way to do this particular query? Also, just in case

Re: Re: memory cost in forceMerge(1)

2015-08-11 Thread Duke DAI
10 minute(5 minute???). The server is so common on hardware, 4G heap assigned. Best regards, Duke If not now, when? If not me, who? On Tue, Aug 11, 2015 at 7:00 PM, Phaneendra N wrote: > There could be other applications running on the machine with 24 GB memory? > Which would result

  1   2   3   4   5   6   7   8   9   10   >