Hi,

Thank you for the feedback and as per last meeting here our the changes
that are incorporated to this APE.
They are as follows:
1.  Name of the schema inference functions
2. Schema inference functionality

The summary of changes are as follows :

   1. query_schema (Aggregate function that takes all records of the
   subquery and generates a JSON Schema),
   2. collection_schema (JSON Schema translation of the defined datatypes
   in the metadata node)
   3. current_schema (for columnar stores and converting the inferred
   schema for storage compaction to JSON Schema)


Regards
Calvin Dani


On Fri, Oct 4, 2024 at 10:28 AM Mike Carey <dtab...@gmail.com> wrote:

> Great feature!  I wasn't able to understand the query example(s),
> though...  Could those be cleaned up a little and clarified?
>
> Also, I think we might want two functions at the user level - one that
> takes an expression as input and reports its schema, and another that
> takes a dataset/collection name as input and reports its schema.  The
> first one would scan the results and say what the schema is; the other
> would use a more efficient approach (accessing and combining the
> metadata from the collection's most recent LSM components in each of its
> partitions).
>
> Cheers,
>
> Mike
>
> On 10/4/24 10:13 AM, Calvin Dani wrote:
> > Initiating the discussion thread proposing a new aggregate function in
> > AsterixDB.
> > *Feature:* aggregate function to infer schema
> > *Details:* This feature introduces schema inference as an SQL++ function
> > directly integrated into AsterixDB. It is the first approach to offer
> > schema inference as a native SQL++ function, allowing users to infer
> > schemas for not only any dataset but also for queries and subqueries. Its
> > output in JSON Schema, the industry standard, produces both human and
> > machine-readable results, suitable for user interpretation or integration
> > into other queries or programs.
> >
> > Utilizing the template of array_avg() in the Built-in Function and
> Function
> > collection file the array_schema() was implemented. During self review, a
> > lot of defined aggregate functions for
> > example SerializableAvgAggregateFunction
> > and IntermediateAvgAggregateFunction are not being utilised during
> > array_schema() query. Is it due to different use cases or am I utilising
> it
> > incorrectly?
> >
> > Are there any resources to understand the functionality of aggregate
> > functions in the implementation?
> >
> > *APE*
> >
> https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+8%3A+Schema+Inference+Aggregate+Functions
> >

Reply via email to