> On Oct 26, 2024, at 14:50, Prashant Saxena <animator...@gmail.com> wrote:
> 
> I just need to store compressed strings to save space. If it can be done in
> any other way, I'm OK with that.

The JArray('byte') is the way.

Andi..

> 
> 
>> On Sat, Oct 26, 2024 at 6:11 PM Andi Vajda <va...@apache.org> wrote:
>> 
>> 
>>> On Sat, 26 Oct 2024, Prashant Saxena wrote:
>>> 
>>> PyLucene 10.0.0
>>> 
>>> I'm trying to store a long text by compressing it first using zlib
>>> 
>>> *doc.add(StoredField("contents", zlib.compress(ftext.encode('utf-8'))))*
>>> 
>>> The resulting index size is *~83 MB*. When reading it's value back using
>>> 
>>> *c = doc.getBinaryValue("contents")*
>>> 
>>> It's returning 'NoneType' and when using
>>> 
>>> *c = doc.get("contents")*
>>> 
>>> It's returning a string which cannot be decompressed.
>>> 
>>> When using
>>> 
>>> *doc.add(StoredField("contents",
>>> JArray('byte')(zlib.compress(ftext.encode('utf-8')))))*
>>> 
>>> The resulting index size is ~*160 MB. *There is no problem in getting
>> it's
>>> value using
>>> 
>>> 
>>> 
>>> *c = doc.getBinaryValue("contents")cc =
>>> zlib.decompress(c.bytes.bytes_).decode('utf-8') *
>>> 
>>> *Question 1 : *Why does the index size almost double when using JArray?
>> 
>> Because the value you're passing is actually processed correctly ?
>> 
>>> *Question 2: *How do you correctly create and store compressed binary
>> data
>>> in StoredField ?
>> 
>> If you want a python byte object, like b'abcd', to be seen by Lucene
>> (Java)
>> as a byte array, you should wrap it with a JArray('byte') like you did.
>> Otherwise, it's seen as a string (I need to double-check) and not handled
>> correctly.
>> 
>>> I am using PyLucene in my current project. Please advise me if I should
>>> post my questions on the java-user list instead of here.
>> 
>> This particular question is specific to PyLucene and should be asked here,
>> like you did ;-)
>> 
>> Andi..
>> 

Reply via email to