Store byte array in StoredField using zlib compression

Prashant Saxena Sat, 26 Oct 2024 03:45:10 -0700

PyLucene 10.0.0

I'm trying to store a long text by compressing it first using zlib


*doc.add(StoredField("contents", zlib.compress(ftext.encode('utf-8'))))*

The resulting index size is *~83 MB*. When reading it's value back using

*c = doc.getBinaryValue("contents")*

It's returning 'NoneType' and when using

*c = doc.get("contents")*

It's returning a string which cannot be decompressed.

When using

*doc.add(StoredField("contents",
JArray('byte')(zlib.compress(ftext.encode('utf-8')))))*

The resulting index size is ~*160 MB. *There is no problem in getting it's
value using



*c = doc.getBinaryValue("contents")cc =
zlib.decompress(c.bytes.bytes_).decode('utf-8') *

*Question 1 : *Why does the index size almost double when using JArray?
*Question 2: *How do you correctly create and store compressed binary data
in StoredField ?

I am using PyLucene in my current project. Please advise me if I should
post my questions on the java-user list instead of here.

Prashant

Store byte array in StoredField using zlib compression

Reply via email to