On Tue, Nov 5, 2024 at 5:17 PM Adrien Grand wrote
Why is it important to break down per field as opposed to scaling based on
> the total volume of vector data?
>
It's really for internal planning purposes / service telemetry ... at
Amazon product search team (where I also work w/ Tanmay -- hi Ta
On Tue, Nov 5, 2024 at 7:31 PM Patrick Zhai wrote:
I wouldn't call this a good way, but as the last resort you can parse the
> metadata files yourself, as it is not so hard to parse (yet)
Yeah ... the Lucene codec itself knows precisely how much disk is used for
each field, and indeed stores it
I wouldn't call this a good way, but as the last resort you can parse the
metadata files yourself, as it is not so hard to parse (yet), the logics
are in:
Lucene99HnswVectorsFormat.java
Lucene99FlatVectorsFormat.java
The risk for sure is that whenever the format is changed the parsing logic
will ne
I cannot think of good ways to do this. Why is it important to break down
per field as opposed to scaling based on the total volume of vector data?
On Tue, Nov 5, 2024 at 10:58 PM Tanmay Goel wrote:
> Hi Rui
>
> Thanks for your response and the snippet that you shared is great but not
> exactly
Hi Rui
Thanks for your response and the snippet that you shared is great but not
exactly what I was looking for. With this snippet we are able to find the
total size of the .vec files, but I want to see inside the .vec files and
try to compute a map of vector_field_name to the number of bytes on d
Hi Tanmay,
Are you bothered by the .vec files hidden within the compound files? If
yes, I have a snippet that can sum up the .vec files inside and outside
compound files.
https://gist.github.com/wurui90/28de20d46079108d7ae5ed181ba939d4
On Tue, Oct 29, 2024 at 12:08 PM Tanmay Goel wrote:
> Hi al
Hi all
I recently joined the Lucene team at Amazon and this is my first time
working with Lucene so any help will be appreciated.
One of my first tasks is to *add a metric in production to track the RAM /
disk usage of vector fields*. We want to use this metric to decide when to
scale our deploym