Hi Wes,

Thanks a lot for your help! I have been looking at that blog the last
couple of days but I haven't been able to achieve what I want :(
Do you know if there is there any actual documentation, test cases or some
code I can look at?
Anyway, this is what I have so far:
parquet::Int32Writer* int32_writer1 =
static_cast<parquet::Int32Writer*>(rg_writer->NextColumn());
int32_t value = 1;
value = 1000;
int16_t definition_level = 2;
int16_t repetition_level = 0;
int32_writer1->WriteBatch(1, &definition_level, &repetition_level, &value);

int16_t rpl = 1;
int32_writer1->WriteBatch(1, &definition_level, &rpl, &value);

This works better (using the parquet reader doesn't yield into reading NULL
values), but I still can't read the resulting parquet file from
Presto/Athena.
I would like to have as final result when queries from Presto/Athena:
id          my_array
1           array[1000, 1000]

What I currently get is
id          my_array
1

Regarding using parquet::arrow API, is there any docs? that I can look to
get me started? Also, is there any performance penalties by using
parquet::arrow instead of the parquet lower api?

2017-12-09 1:13 GMT+01:00 Wes McKinney <wesmck...@gmail.com>:

> Didn't realize this question was on the Arrow mailing list instead of
> the Parquet mailing list!
>
> You can make things much easier on yourself by putting your data in
> Arrow arrays and using the parquet::arrow APIs.
>
> If you want to write the data using the lower-level Parquet column
> writer API, you will have to be careful with the repetition/definition
> levels. In your case, I believe the values you write need to have
> definition level 2 (the repeated node and optional node both increment
> the definition level by 1).
>
> I find this blog helpful for this
> https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with-
> parquet.html.
> There is also the Google Dremel paper
>
> - Wes
>
> On Fri, Dec 8, 2017 at 6:19 PM, Renato Marroquín Mogrovejo
> <renatoj.marroq...@gmail.com> wrote:
> > Thanks Wes! So I create it this way, but I still don't know how to
> populate
> > and
> >
> > auto element = PrimitiveNode::Make("element", Repetition::OPTIONAL,
> > Type::INT32);
> > auto list = GroupNode::Make("list", Repetition::REPEATED, {element});
> > auto my_array = GroupNode::Make("my_array", Repetition::REQUIRED, {list},
> > LogicalType::LIST);
> > fields.push_back(PrimitiveNode::Make("id", Repetition::REQUIRED,
> > Type::INT32, LogicalType::NONE));
> > fields.push_back(my_array);
> > auto my_schema = GroupNode::Make("schema", Repetition::REQUIRED, fields);
> >
> > I tried populating it this way:
> >
> >        parquet::Int32Writer* int32_writer1 =
> > static_cast<parquet::Int32Writer*>(rg_writer->NextColumn());
> >        for (int i = 0; i < NROWS_GROUP; i++) {
> >          int32_t value = i;
> >          int16_t definition_level = 1;
> >          int16_t repetition_level = 0;
> >          if ((i+1)%2 == 0) {
> >            repetition_level = 1;  // start of a new record
> >          }
> >          int32_writer1->WriteBatch(1, &definition_level,
> &repetition_level,
> > &value);
> >       }
> >
> > That seems to work, but I can't use the generated file on Athena and
> using
> > the parquet_reader from parquet_cpp returns NULLs on the elements. Is it
> > that I have to get a handle to the list element? Thanks again for the
> help!
> >
>

Reply via email to