Hi Wes, Thanks a lot for your help! I have been looking at that blog the last couple of days but I haven't been able to achieve what I want :( Do you know if there is there any actual documentation, test cases or some code I can look at? Anyway, this is what I have so far: parquet::Int32Writer* int32_writer1 = static_cast<parquet::Int32Writer*>(rg_writer->NextColumn()); int32_t value = 1; value = 1000; int16_t definition_level = 2; int16_t repetition_level = 0; int32_writer1->WriteBatch(1, &definition_level, &repetition_level, &value);
int16_t rpl = 1; int32_writer1->WriteBatch(1, &definition_level, &rpl, &value); This works better (using the parquet reader doesn't yield into reading NULL values), but I still can't read the resulting parquet file from Presto/Athena. I would like to have as final result when queries from Presto/Athena: id my_array 1 array[1000, 1000] What I currently get is id my_array 1 Regarding using parquet::arrow API, is there any docs? that I can look to get me started? Also, is there any performance penalties by using parquet::arrow instead of the parquet lower api? 2017-12-09 1:13 GMT+01:00 Wes McKinney <wesmck...@gmail.com>: > Didn't realize this question was on the Arrow mailing list instead of > the Parquet mailing list! > > You can make things much easier on yourself by putting your data in > Arrow arrays and using the parquet::arrow APIs. > > If you want to write the data using the lower-level Parquet column > writer API, you will have to be careful with the repetition/definition > levels. In your case, I believe the values you write need to have > definition level 2 (the repeated node and optional node both increment > the definition level by 1). > > I find this blog helpful for this > https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with- > parquet.html. > There is also the Google Dremel paper > > - Wes > > On Fri, Dec 8, 2017 at 6:19 PM, Renato Marroquín Mogrovejo > <renatoj.marroq...@gmail.com> wrote: > > Thanks Wes! So I create it this way, but I still don't know how to > populate > > and > > > > auto element = PrimitiveNode::Make("element", Repetition::OPTIONAL, > > Type::INT32); > > auto list = GroupNode::Make("list", Repetition::REPEATED, {element}); > > auto my_array = GroupNode::Make("my_array", Repetition::REQUIRED, {list}, > > LogicalType::LIST); > > fields.push_back(PrimitiveNode::Make("id", Repetition::REQUIRED, > > Type::INT32, LogicalType::NONE)); > > fields.push_back(my_array); > > auto my_schema = GroupNode::Make("schema", Repetition::REQUIRED, fields); > > > > I tried populating it this way: > > > > parquet::Int32Writer* int32_writer1 = > > static_cast<parquet::Int32Writer*>(rg_writer->NextColumn()); > > for (int i = 0; i < NROWS_GROUP; i++) { > > int32_t value = i; > > int16_t definition_level = 1; > > int16_t repetition_level = 0; > > if ((i+1)%2 == 0) { > > repetition_level = 1; // start of a new record > > } > > int32_writer1->WriteBatch(1, &definition_level, > &repetition_level, > > &value); > > } > > > > That seems to work, but I can't use the generated file on Athena and > using > > the parquet_reader from parquet_cpp returns NULLs on the elements. Is it > > that I have to get a handle to the list element? Thanks again for the > help! > > >