Re: Schema Aggregate Function

Mike Carey Fri, 04 Oct 2024 10:28:41 -0700

Great feature! I wasn't able to understand the query example(s),though... Could those be cleaned up a little and clarified?

Also, I think we might want two functions at the user level - one thattakes an expression as input and reports its schema, and another thattakes a dataset/collection name as input and reports its schema. Thefirst one would scan the results and say what the schema is; the otherwould use a more efficient approach (accessing and combining themetadata from the collection's most recent LSM components in each of itspartitions).


Cheers,

Mike

On 10/4/24 10:13 AM, Calvin Dani wrote:

Initiating the discussion thread proposing a new aggregate function in
AsterixDB.
*Feature:* aggregate function to infer schema
*Details:* This feature introduces schema inference as an SQL++ function
directly integrated into AsterixDB. It is the first approach to offer
schema inference as a native SQL++ function, allowing users to infer
schemas for not only any dataset but also for queries and subqueries. Its
output in JSON Schema, the industry standard, produces both human and
machine-readable results, suitable for user interpretation or integration
into other queries or programs.

Utilizing the template of array_avg() in the Built-in Function and Function
collection file the array_schema() was implemented. During self review, a
lot of defined aggregate functions for
example SerializableAvgAggregateFunction
and IntermediateAvgAggregateFunction are not being utilised during
array_schema() query. Is it due to different use cases or am I utilising it
incorrectly?

Are there any resources to understand the functionality of aggregate
functions in the implementation?

*APE*
https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+8%3A+Schema+Inference+Aggregate+Functions

Re: Schema Aggregate Function

Reply via email to