Rohan Garg created CALCITE-5084: ----------------------------------- Summary: Support ROWS syntax with TABLESAMPLE Key: CALCITE-5084 URL: https://issues.apache.org/jira/browse/CALCITE-5084 Project: Calcite Issue Type: Task Reporter: Rohan Garg
Currently, Calcite provides a useful syntax for TABLESAMPLE which allows users to sample the data being processed. It has two main parameters : 1. sampling algorithm (BERNOULLI or SYSTEM) 2. sampling percentage (a value between 0 and 100 to indicate rate of sampling) While percentage is generally good, it is not always possible to provide a decent value for it if the user is unaware of the row counts. Further incase of subqueries (assuming that the underlying system handles tablesample with subqueries), it becomes even more difficult to estimate the correct percentage value. Most likely the 'n ROWS' syntax is not a part of the SQL standard and hence wasn't included in the default calcite grammar. But, a few systems have implemented it in their dialects : 1. MS SQL Server : [https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql?view=sql-server-ver15#tablesample-clause] 2. Snowflake : [https://docs.snowflake.com/en/sql-reference/constructs/sample.html] 3. Google Spanner : [https://cloud.google.com/spanner/docs/reference/standard-sql/query-syntax#tablesample_operator] 4. Apache Spark : [https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-sampling.html] So, it would be a useful addition to Calcite. Derived from https://issues.apache.org/jira/browse/CALCITE-5074 -- This message was sent by Atlassian Jira (v8.20.1#820001)