On Feb 25, 2011, at 5:57, Roman Chyla <roman.ch...@gmail.com> wrote:

Hi Andi,

Thanks, the JArray_byte() does what I needed - I was (wrongly) passing
bytestring (which I think got automatically converted to unicode) and
trying to get bytes of that string was not correct.

Though it would be interesting to find out if it is possible to pass
string and get the bytes in java,

A Java String is not made of bytes but 16-bit unicode chars. If I remember correctly, the String.getBytes() method is deprecated in Java because of encoding issues. Whenever a Python string (type str, made of bytes) is passed to Java, it is assumed to be encoded utf-8 and converted to 16-bit unicode on the fly.

Andi..

I don't know if what conversion
happening on the jni side, or only in java - i shall do some reading

Example in python:

In [4]: s = zlib.compress("python")

In [5]: repr(s)
Out[5]: "'x\\x9c+\\xa8,\\xc9\\xc8\\xcf\\x03\\x00\\tW\\x02\\xa3'"

In [6]: lucene.JArray_byte(s)
Out[6]: JArray<byte>(120, -100, 43, -88, 44, -55, -56, -49, 3, 0, 9, 87, 2, -93)

The same thing in Jython:

s = zlib.compress("python")
s
'x\x9c+\xa8,\xc9\xc8\xcf\x03\x00\tW\x02\xa3'
repr(s)
"'x\\x9c+\\xa8,\\xc9\\xc8\\xcf\\x03\\x00\\tW\\x02\\xa3'"
String(s).getBytes()
array('b', [120, -62, -100, 43, -62, -88, 44, -61, -119, -61, -120,
-61, -113, 3, 0, 9, 87, 2, -62, -93])
String(s).getBytes('utf8')
array('b', [120, -62, -100, 43, -62, -88, 44, -61, -119, -61, -120,
-61, -113, 3, 0, 9, 87, 2, -62, -93])
String(s).getBytes('utf16')
array('b', [-2, -1, 0, 120, 0, -100, 0, 43, 0, -88, 0, 44, 0, -55, 0,
-56, 0, -49, 0, 3, 0, 0, 0, 9, 0, 87, 0, 2, 0, -93])
String(s).getBytes('ascii')
array('b', [120, 63, 43, 63, 44, 63, 63, 63, 3, 0, 9, 87, 2, 63])




Roman

On Thu, Feb 24, 2011 at 3:42 AM, Andi Vajda <va...@apache.org> wrote:

On Thu, 24 Feb 2011, Roman Chyla wrote:

I would like to transfer results from python to java:

hello = zlib.compress("hello")

on the java side do:

byte[] data = string.getBytes()

But I am not successful. Is there any translation going on somewhere?

Can you be more specific ?
Actual lines of code, errors, expected results, actual results...

An array of bytes in JCC is not created with a string but a
JArray('byte')(len or str)

 >>> import lucene
 >>> lucene.initVM()
 <jcc.JCCEnv object at 0x1004100d8>
 >>> lucene.JArray('byte')(10)
 JArray<byte>(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
 >>> lucene.JArray('byte')("abcd")
 JArray<byte>(97, 98, 99, 100)
 >>>

Andi..

Reply via email to