hi, These are great questions. Note that the Java API has changed in regards to some of your questions in 0.8.0.
Could someone more experienced in the Java library create some JIRAs about creating user documentation to help get started using the Java API (if even some simple example programs to point people to)? I think this would help adoption and growth of the user base. Thank you, Wes On Sat, Dec 16, 2017 at 12:17 PM, Animesh Trivedi <animesh.triv...@gmail.com> wrote: > Thanks Wes for you help. > > Based upon some code reading, I managed to code-up a basic working example. > The code is here: > https://github.com/animeshtrivedi/ArrowExample/tree/master/src/main/java/com/github/animeshtrivedi/arrowexample > . > > However, I do have some questions about the concepts in Arrow > > 1. ArrowBlock is the unit of reading/writing. One ArrowBlock essentially is > the amount of the data one must hold in-memory at a time. Is my > understanding correct? > > 2. There are Base[Reade/Writer] interfaces as well as Mutator/Accessor > classes in the ValueVector interface - both are implemented by all > supported data types. What is the relationship between these two? or when > is one suppose to use one over other. I only use Mutator/Accessor classes > in my code. > > 3. What are the "safe" varient functions in the Mutator's code? I could not > understand what they meant to achieve. > > 4. What are MinorTypes? > > 5. For a writer, what is a dictionary provider? For example in the > Integration.java code, the reader is given as the dictionary provider for > the writer. But, is it something more than just: > DictionaryProvider.MapDictionaryProvider provider = new > DictionaryProvider.MapDictionaryProvider(); > ArrowFileWriter arrowWriter = new ArrowFileWriter(root, provider, > fileOutputStream.getChannel()); > > 6. I am not clearly sure about the sequence of call that one needs to do > write on mutators. For example, if I code something like > NullableIntVector intVector = (NullableIntVector) fieldVector; > NullableIntVector.Mutator mutator = intVector.getMutator(); > [.write num values] > mutator.setValueCount(num) > then this works for primitive types, but not for VarBinary type. There I > have to set the capacity first, > > NullableVarBinaryVector varBinaryVector = (NullableVarBinaryVector) > fieldVector; > varBinaryVector.setInitialCapacity(items); > varBinaryVector.allocateNew(); > NullableVarBinaryVector.Mutator mutator = varBinaryVector.getMutator(); > > Example of these are here: > https://github.com/animeshtrivedi/ArrowExample/blob/master/src/main/java/com/github/animeshtrivedi/arrowexample/ArrowWrite.java > (writeField[???] functions). > > Thank you very much, > -- > Animesh > > > > On Thu, Dec 14, 2017 at 6:15 PM, Wes McKinney <wesmck...@gmail.com> wrote: > >> hi Animesh, >> >> I suggest you try the ArrowStreamReader/Writer or >> ArrowFileReader/Writer classes. See >> https://github.com/apache/arrow/blob/master/java/tools/ >> src/main/java/org/apache/arrow/tools/Integration.java >> for example working code for this >> >> - Wes >> >> On Thu, Dec 14, 2017 at 8:30 AM, Animesh Trivedi >> <animesh.triv...@gmail.com> wrote: >> > Hi all, >> > >> > It might be a trivial question, so please let me know if I am missing >> > something. >> > >> > I am trying to write and read files in the Arrow format in Java. My data >> is >> > simple flat schema with primitive types. I already have the data in Java. >> > So my questions are: >> > 1. Is this possible or am I fundamentally missing something what Arrow >> can >> > or cannot do (or is designed to do). I assume that an efficient in-memory >> > columnar data format should work with files too. >> > 2. Can you point me out to a working example? or a starting example. >> > Intuitively I am looking for a way to define schema, write/read column >> > vectors to/from files as one does with Parquet or ORC. >> > >> > I try to locate some working examples with ArrowFile[Reader/Writer] >> classes >> > in the maven tests but so far not sure where to start. >> > >> > Thanks, >> > -- >> > Animesh >>