Sorry for the typo in last mail.
Compared with the Query-2, we have questions in Query-1 and Query-3.
Also, may I know the difference between CollectLimit and BaseLimit?
Thanks so much.
Best,
Liz
> On 26 Oct 2016, at 7:25 PM, Liz Bai wrote:
>
> Hi all,
>
> We used Parquet and
Hi all,
We used Parquet and Spark 2.0 to do the testing. The table below is the summary
of what we have found about `Limit` keyword. Query-2 reveals that SparkSQL does
early stop upon getting adequate results. But we are curious of Query-1 and
Query-2. It seems that, either writing result RDD a
Hi all,
Let me clarify the problem:
Suppose we have a simple table `A` with 100 000 000 records
Problem:
When we execute sql query ‘select * from A Limit 500`,
It scan through all 100 000 000 records.
Normal behaviour should be that once 500 records is found, engine stop scanning.
Detailed ob
Hi there,
I have a question about writing Parquet using SparkSQL. Spark 1.4 has already
supported writing DataFrames as Parquet files with “partitionBy(colNames:
String*)”, as Spark-6561 fixed.
Is there any method or plan to write Parquet with dynamic partitions? For
example, instead of partiti