Re: How to find RAM/disk usage of each vector field

2024-11-06 Thread Michael McCandless
On Tue, Nov 5, 2024 at 5:17 PM Adrien Grand wrote Why is it important to break down per field as opposed to scaling based on > the total volume of vector data? > It's really for internal planning purposes / service telemetry ... at Amazon product search team (where I also work w/ Tanmay -- hi Ta

Re: How to find RAM/disk usage of each vector field

2024-11-06 Thread Michael McCandless
On Tue, Nov 5, 2024 at 7:31 PM Patrick Zhai wrote: I wouldn't call this a good way, but as the last resort you can parse the > metadata files yourself, as it is not so hard to parse (yet) Yeah ... the Lucene codec itself knows precisely how much disk is used for each field, and indeed stores it

Re: How to find RAM/disk usage of each vector field

2024-11-05 Thread Patrick Zhai
I wouldn't call this a good way, but as the last resort you can parse the metadata files yourself, as it is not so hard to parse (yet), the logics are in: Lucene99HnswVectorsFormat.java Lucene99FlatVectorsFormat.java The risk for sure is that whenever the format is changed the parsing logic will ne

Re: How to find RAM/disk usage of each vector field

2024-11-05 Thread Adrien Grand
I cannot think of good ways to do this. Why is it important to break down per field as opposed to scaling based on the total volume of vector data? On Tue, Nov 5, 2024 at 10:58 PM Tanmay Goel wrote: > Hi Rui > > Thanks for your response and the snippet that you shared is great but not > exactly

Re: How to find RAM/disk usage of each vector field

2024-11-05 Thread Tanmay Goel
Hi Rui Thanks for your response and the snippet that you shared is great but not exactly what I was looking for. With this snippet we are able to find the total size of the .vec files, but I want to see inside the .vec files and try to compute a map of vector_field_name to the number of bytes on d

Re: How to find RAM/disk usage of each vector field

2024-10-30 Thread Rui Wu
Hi Tanmay, Are you bothered by the .vec files hidden within the compound files? If yes, I have a snippet that can sum up the .vec files inside and outside compound files. https://gist.github.com/wurui90/28de20d46079108d7ae5ed181ba939d4 On Tue, Oct 29, 2024 at 12:08 PM Tanmay Goel wrote: > Hi al

How to find RAM/disk usage of each vector field

2024-10-29 Thread Tanmay Goel
Hi all I recently joined the Lucene team at Amazon and this is my first time working with Lucene so any help will be appreciated. One of my first tasks is to *add a metric in production to track the RAM / disk usage of vector fields*. We want to use this metric to decide when to scale our deploym