> Union format strings share enough properties that having them in the > same switch case doesn't result in additional complexity...lists and > list views are completely different types (for the purposes of parsing > the format string).
Dense and sparse union differ a bit more than list and list-view. Not starting with `+l` for list-views would be a deviation from this pattern started by unions. +------------------------+---------------------------------------------------+------------+ | ``+ud:I,J,...`` | dense union with type ids I,J... | | +------------------------+---------------------------------------------------+------------+ | ``+us:I,J,...`` | sparse union with type ids I,J... | | +------------------------+---------------------------------------------------+------------+ Is sharing prefixes an issue? To make this more concrete, these are the parser changes for supporting `+lv` and `+Lv` as I proposed in the beginning: @@ -1097,9 +1101,9 @@ struct SchemaImporter { RETURN_NOT_OK(f_parser_.CheckHasNext()); switch (f_parser_.Next()) { case 'l': - return ProcessListLike<ListType>(); + return ProcessVarLengthList<false>(); case 'L': - return ProcessListLike<LargeListType>(); + return ProcessVarLengthList<true>(); case 'w': return ProcessFixedSizeList(); case 's': @@ -1195,12 +1199,30 @@ struct SchemaImporter { return CheckNoChildren(type); } - template <typename ListType> - Status ProcessListLike() { - RETURN_NOT_OK(f_parser_.CheckAtEnd()); - RETURN_NOT_OK(CheckNumChildren(1)); - ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0)); - type_ = std::make_shared<ListType>(field); + template <bool is_large_variation> + Status ProcessVarLengthList() { + if (f_parser_.AtEnd()) { + RETURN_NOT_OK(CheckNumChildren(1)); + ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0)); + if constexpr (is_large_variation) { + type_ = large_list(field); + } else { + type_ = list(field); + } + } else { + if (f_parser_.Next() == 'v') { + RETURN_NOT_OK(CheckNumChildren(1)); + ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0)); + if constexpr (is_large_variation) { + type_ = large_list_view(field); + } else { + type_ = list_view(field); + } + } else { + return f_parser_.Invalid(); + } + } + return Status::OK(); } -- Felipe On Thu, Oct 5, 2023 at 5:26 PM Antoine Pitrou <anto...@python.org> wrote: > > I don't think the parsing will be a problem even in C. It's not like you > have to backtrack anyway. > > +1 from me on Felipe's proposal. > > Regards > > Antoine. > > > Le 05/10/2023 à 20:33, Felipe Oliveira Carvalho a écrit : > > This mailing list thread is going to be the discussion. > > > > The union types also use two characters, so I didn’t think it would be a > > problem. > > > > — > > Felipe > > > > On Thu, 5 Oct 2023 at 15:26 Dewey Dunnington > <de...@voltrondata.com.invalid> > > wrote: > > > >> I'm sorry for missing earlier discussion on this or a PR into the > >> format where this discussion may have occurred...is there a reason > >> that +lv and +Lv were chosen over a single-character version (i.e., > >> maybe +v and +V)? A single-character version is (slightly) easier to > >> parse in C. > >> > >> On Thu, Oct 5, 2023 at 2:00 PM Felipe Oliveira Carvalho > >> <felipe...@gmail.com> wrote: > >>> > >>> Hello, > >>> > >>> I'm writing to propose "+lv" and "+Lv" as format strings for list-view > >> and > >>> large list-view arrays passing through the Arrow C data interface [1]. > >>> > >>> The vote will be open for at least 72 hours. > >>> > >>> [ ] +1 - I'm in favor of this new C Data Format string > >>> [ ] +0 > >>> [ ] -1 - I'm against adding this new format string because.... > >>> > >>> Thanks everyone! > >>> > >>> -- > >>> Felipe > >>> > >>> [1] https://arrow.apache.org/docs/format/CDataInterface.html > >> > > >