Hello,
I have 2 parquets (each containing 1 file):
- parquet-wide - schema has 25 top level cols + 1 array
- parquet-narrow - schema has 3 top level cols
Both files have same data for given columns.
When I read from parquet-wide spark reports* read 52.6 KB*, from
parquet-narrow *only 2.6 K
Hi,
Consider the following statements:
1)
> scala> val df = spark.read.format("com.shubham.MyDataSource").load
> scala> df.show
> +---+---+
> | i| j|
> +---+---+
> | 0| 0|
> | 1| -1|
> | 2| -2|
> | 3| -3|
> | 4| -4|
> +---+---+
> 2)
> scala> val df1 = df.filter("i < 3")
> scala> df1.show
Hello all,
We are Martin and Mats from Neo4j and we're working on the Spark Graph SPIP
(https://issues.apache.org/jira/browse/SPARK-25994).
We are also +1 for a Spark 3.0 preview release and setting a timeline for
the actual release.
The SPIP was accepted in the beginning of this year and we've m