Re: [Vote][Format] C data interface format string for ListView and LargeListView arrays

Felipe Oliveira Carvalho Thu, 05 Oct 2023 14:40:26 -0700

> Union format strings share enough properties that having them in the
> same switch case doesn't result in additional complexity...lists and
> list views are completely different types (for the purposes of parsing
> the format string).


Dense and sparse union differ a bit more than list and list-view.

Not starting with `+l` for list-views would be a deviation from this
pattern started by unions.

+------------------------+---------------------------------------------------+------------+
| ``+ud:I,J,...``        | dense union with type ids I,J...
 |            |
+------------------------+---------------------------------------------------+------------+
| ``+us:I,J,...``        | sparse union with type ids I,J...
  |            |
+------------------------+---------------------------------------------------+------------+

Is sharing prefixes an issue?

To make this more concrete, these are the parser changes for supporting
`+lv` and `+Lv` as I proposed in the beginning:

@@ -1097,9 +1101,9 @@ struct SchemaImporter {
     RETURN_NOT_OK(f_parser_.CheckHasNext());
     switch (f_parser_.Next()) {
       case 'l':
-        return ProcessListLike<ListType>();
+        return ProcessVarLengthList<false>();
       case 'L':
-        return ProcessListLike<LargeListType>();
+        return ProcessVarLengthList<true>();
       case 'w':
         return ProcessFixedSizeList();
       case 's':
@@ -1195,12 +1199,30 @@ struct SchemaImporter {
     return CheckNoChildren(type);
   }

-  template <typename ListType>
-  Status ProcessListLike() {
-    RETURN_NOT_OK(f_parser_.CheckAtEnd());
-    RETURN_NOT_OK(CheckNumChildren(1));
-    ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0));
-    type_ = std::make_shared<ListType>(field);
+  template <bool is_large_variation>
+  Status ProcessVarLengthList() {
+    if (f_parser_.AtEnd()) {
+      RETURN_NOT_OK(CheckNumChildren(1));
+      ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0));
+      if constexpr (is_large_variation) {
+        type_ = large_list(field);
+      } else {
+        type_ = list(field);
+      }
+    } else {
+      if (f_parser_.Next() == 'v') {
+        RETURN_NOT_OK(CheckNumChildren(1));
+        ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0));
+        if constexpr (is_large_variation) {
+          type_ = large_list_view(field);
+        } else {
+          type_ = list_view(field);
+        }
+      } else {
+        return f_parser_.Invalid();
+      }
+    }
+
     return Status::OK();
   }

--
Felipe


On Thu, Oct 5, 2023 at 5:26 PM Antoine Pitrou <anto...@python.org> wrote:

>
> I don't think the parsing will be a problem even in C. It's not like you
> have to backtrack anyway.
>
> +1 from me on Felipe's proposal.
>
> Regards
>
> Antoine.
>
>
> Le 05/10/2023 à 20:33, Felipe Oliveira Carvalho a écrit :
> > This mailing list thread is going to be the discussion.
> >
> > The union types also use two characters, so I didn’t think it would be a
> > problem.
> >
> > —
> > Felipe
> >
> > On Thu, 5 Oct 2023 at 15:26 Dewey Dunnington
> <de...@voltrondata.com.invalid>
> > wrote:
> >
> >> I'm sorry for missing earlier discussion on this or a PR into the
> >> format where this discussion may have occurred...is there a reason
> >> that +lv and +Lv were chosen over a single-character version (i.e.,
> >> maybe +v and +V)? A single-character version is (slightly) easier to
> >> parse in C.
> >>
> >> On Thu, Oct 5, 2023 at 2:00 PM Felipe Oliveira Carvalho
> >> <felipe...@gmail.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'm writing to propose "+lv" and "+Lv" as format strings for list-view
> >> and
> >>> large list-view arrays passing through the Arrow C data interface [1].
> >>>
> >>> The vote will be open for at least 72 hours.
> >>>
> >>> [ ] +1 - I'm in favor of this new C Data Format string
> >>> [ ] +0
> >>> [ ] -1 - I'm against adding this new format string because....
> >>>
> >>> Thanks everyone!
> >>>
> >>> --
> >>> Felipe
> >>>
> >>> [1] https://arrow.apache.org/docs/format/CDataInterface.html
> >>
> >
>

Re: [Vote][Format] C data interface format string for ListView and LargeListView arrays

Reply via email to