Thanks Wes for you help. Based upon some code reading, I managed to code-up a basic working example. The code is here: https://github.com/animeshtrivedi/ArrowExample/tree/master/src/main/java/com/github/animeshtrivedi/arrowexample .
However, I do have some questions about the concepts in Arrow 1. ArrowBlock is the unit of reading/writing. One ArrowBlock essentially is the amount of the data one must hold in-memory at a time. Is my understanding correct? 2. There are Base[Reade/Writer] interfaces as well as Mutator/Accessor classes in the ValueVector interface - both are implemented by all supported data types. What is the relationship between these two? or when is one suppose to use one over other. I only use Mutator/Accessor classes in my code. 3. What are the "safe" varient functions in the Mutator's code? I could not understand what they meant to achieve. 4. What are MinorTypes? 5. For a writer, what is a dictionary provider? For example in the Integration.java code, the reader is given as the dictionary provider for the writer. But, is it something more than just: DictionaryProvider.MapDictionaryProvider provider = new DictionaryProvider.MapDictionaryProvider(); ArrowFileWriter arrowWriter = new ArrowFileWriter(root, provider, fileOutputStream.getChannel()); 6. I am not clearly sure about the sequence of call that one needs to do write on mutators. For example, if I code something like NullableIntVector intVector = (NullableIntVector) fieldVector; NullableIntVector.Mutator mutator = intVector.getMutator(); [.write num values] mutator.setValueCount(num) then this works for primitive types, but not for VarBinary type. There I have to set the capacity first, NullableVarBinaryVector varBinaryVector = (NullableVarBinaryVector) fieldVector; varBinaryVector.setInitialCapacity(items); varBinaryVector.allocateNew(); NullableVarBinaryVector.Mutator mutator = varBinaryVector.getMutator(); Example of these are here: https://github.com/animeshtrivedi/ArrowExample/blob/master/src/main/java/com/github/animeshtrivedi/arrowexample/ArrowWrite.java (writeField[???] functions). Thank you very much, -- Animesh On Thu, Dec 14, 2017 at 6:15 PM, Wes McKinney <wesmck...@gmail.com> wrote: > hi Animesh, > > I suggest you try the ArrowStreamReader/Writer or > ArrowFileReader/Writer classes. See > https://github.com/apache/arrow/blob/master/java/tools/ > src/main/java/org/apache/arrow/tools/Integration.java > for example working code for this > > - Wes > > On Thu, Dec 14, 2017 at 8:30 AM, Animesh Trivedi > <animesh.triv...@gmail.com> wrote: > > Hi all, > > > > It might be a trivial question, so please let me know if I am missing > > something. > > > > I am trying to write and read files in the Arrow format in Java. My data > is > > simple flat schema with primitive types. I already have the data in Java. > > So my questions are: > > 1. Is this possible or am I fundamentally missing something what Arrow > can > > or cannot do (or is designed to do). I assume that an efficient in-memory > > columnar data format should work with files too. > > 2. Can you point me out to a working example? or a starting example. > > Intuitively I am looking for a way to define schema, write/read column > > vectors to/from files as one does with Parquet or ORC. > > > > I try to locate some working examples with ArrowFile[Reader/Writer] > classes > > in the maven tests but so far not sure where to start. > > > > Thanks, > > -- > > Animesh >