Thanks a lot. On Sat, Sep 1, 2018 at 12:08 AM Jacques Nadeau <jacq...@apache.org> wrote:
> Slight correction on code: > > int recordIndexToRead = ... > ListVector lv = ... > ArrowBuf offsetVector = lv.getOffsetBuffer(); > VarCharVector vc = lv.getDataVector(); > int listStart = offsetVector.getInt((recordIndexToRead ) * 4) ; > int listEnd = offsetVector.getInt((recordIndexToRead + 1) * 4); > NullableVarCharHolder nvh = new NullableVarCharHolder(); > for(int i = listStart; i < listEnd; i++){ > vc.get(i, nvh); > // do something with data. > } > > On Fri, Aug 31, 2018 at 9:04 AM Jacques Nadeau <jacq...@apache.org> wrote: > >> Adding the Arrow dev list. >> >> Yes, VarCharVector.get(int index, NullableVarCharHolder holder) is a >> cheaper method. >> >> You can get the offsets from list vector and then using the holder to >> retrieve pointers into the exist memory. That memory is offheap so you'll >> have to do a copy if you want a byte array. >> >> Pseudo code: >> >> int recordIndexToRead = ... >> ListVector lv = ... >> ArrowBuf offsetVector = lv.getOffsetBuffer(); >> VarCharVector vc = lv.getDataVector(); >> int listStart = lv.offsetBuffer.getInt((recordIndexToRead ) * 4) ; >> int listEnd = lv.offsetBuffer.getInt((recordIndexToRead + 1) * 4); >> NullableVarCharHolder nvh = new NullableVarCharHolder(); >> for(int i = listStart; i < listEnd; i++){ >> vc.get(i, nvh); >> // do something with data. >> } >> >> >> >> >> >> >> On Fri, Aug 31, 2018 at 2:08 AM Xu,Wenjian <zero...@gmail.com> wrote: >> >>> Hi Jacques, >>> >>> I have a question about ListVector in Arrow Java API. Thanks for your >>> kind help. >>> >>> I would like to iterate through *array<string>* in SQL semantics. >>> >>> I understand that , in order to represent *array<string>* in Arrow >>> format, I could use ListVector with VarCharVector as the inner list. My >>> question is, how to efficiently access all the elements (i.e., each byte[] >>> as string)? >>> >>> By checking the test code: >>> >>> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestListVector.java >>> >>> one option is to use ListVector.getObject(int index) to get each >>> ArrayList<Text>, and then access each element in ArrayList<Text>. But this >>> method is expensive because: >>> >>> 1) it calls VarCharVector.get(int index) which involves memory copy >>> 2) it calls Text.set(byte[]) which assemble the Text from byte array. >>> >>> My goal is just to retrieve each byte[] and do some filtering. Is there >>> any other less expensive method to achieve my goal? For example, >>> VarCharVector.get(int index, NullableVarCharHolder holder) seems to be a >>> less-expensive operation. But how to use this method in my case? >>> >>> Thanks again. >>> >>> Best regards, >>> Wenjian >>> >>