Thanks Wes for you help.

Based upon some code reading, I managed to code-up a basic working example.
The code is here:
https://github.com/animeshtrivedi/ArrowExample/tree/master/src/main/java/com/github/animeshtrivedi/arrowexample
.

However, I do have some questions about the concepts in Arrow

1. ArrowBlock is the unit of reading/writing. One ArrowBlock essentially is
the amount of the data one must hold in-memory at a time. Is my
understanding correct?

2. There are Base[Reade/Writer] interfaces as well as Mutator/Accessor
classes in the ValueVector interface - both are implemented by all
supported data types. What is the relationship between these two? or when
is one suppose to use one over other. I only use Mutator/Accessor classes
in my code.

3. What are the "safe" varient functions in the Mutator's code? I could not
understand what they meant to achieve.

4. What are MinorTypes?

5. For a writer, what is a dictionary provider? For example in the
Integration.java code, the reader is given as the dictionary provider for
the writer. But, is it something more than just:
DictionaryProvider.MapDictionaryProvider provider = new
DictionaryProvider.MapDictionaryProvider();
ArrowFileWriter arrowWriter = new ArrowFileWriter(root, provider,
fileOutputStream.getChannel());

6. I am not clearly sure about the sequence of call that one needs to do
write on mutators. For example, if I code something like
NullableIntVector intVector = (NullableIntVector) fieldVector;
NullableIntVector.Mutator mutator = intVector.getMutator();
[.write num values]
mutator.setValueCount(num)
then this works for primitive types, but not for VarBinary type. There I
have to set the capacity first,

NullableVarBinaryVector varBinaryVector = (NullableVarBinaryVector)
fieldVector;
varBinaryVector.setInitialCapacity(items);
varBinaryVector.allocateNew();
NullableVarBinaryVector.Mutator mutator = varBinaryVector.getMutator();

Example of these are here:
https://github.com/animeshtrivedi/ArrowExample/blob/master/src/main/java/com/github/animeshtrivedi/arrowexample/ArrowWrite.java
(writeField[???] functions).

Thank you very much,
--
Animesh



On Thu, Dec 14, 2017 at 6:15 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> hi Animesh,
>
> I suggest you try the ArrowStreamReader/Writer or
> ArrowFileReader/Writer classes. See
> https://github.com/apache/arrow/blob/master/java/tools/
> src/main/java/org/apache/arrow/tools/Integration.java
> for example working code for this
>
> - Wes
>
> On Thu, Dec 14, 2017 at 8:30 AM, Animesh Trivedi
> <animesh.triv...@gmail.com> wrote:
> > Hi all,
> >
> > It might be a trivial question, so please let me know if I am missing
> > something.
> >
> > I am trying to write and read files in the Arrow format in Java. My data
> is
> > simple flat schema with primitive types. I already have the data in Java.
> > So my questions are:
> > 1. Is this possible or am I fundamentally missing something what Arrow
> can
> > or cannot do (or is designed to do). I assume that an efficient in-memory
> > columnar data format should work with files too.
> > 2. Can you point me out to a working example? or a starting example.
> > Intuitively I am looking for a way to define schema, write/read column
> > vectors to/from files as one does with Parquet or ORC.
> >
> > I try to locate some working examples with ArrowFile[Reader/Writer]
> classes
> > in the maven tests but so far not sure where to start.
> >
> > Thanks,
> > --
> > Animesh
>

Reply via email to