Yea definitely not. The only requirement is, the DataReader/WriterFactory
must support at least one DataFormat.
> how are we going to express capability of the given reader of its
supported format(s), or specific support for each of “real-time data in row
format, and history data in columnar form
Is it required for DataReader to support all known DataFormat?
Hopefully, not, as assumed by the 'throw' in the interface. Then specifically
how are we going to express capability of the given reader of its supported
format(s), or specific support for each of "real-time data in row format, and
Yes, it sounds good to me. We can upgrade both Parquet 1.8.2 to 1.8.3 and
ORC 1.4.1 to 1.4.3 in our upcoming Spark 2.3.1 release.
Thanks for your efforts! @Henry and @Dongjoon
Xiao
2018-04-16 14:41 GMT-07:00 Henry Robinson :
> Seems like there aren't any objections. I'll pick this thread back u
Seems like there aren't any objections. I'll pick this thread back up when
a Parquet maintenance release has happened.
Henry
On 11 April 2018 at 14:00, Dongjoon Hyun wrote:
> Great.
>
> If we can upgrade the parquet dependency from 1.8.2 to 1.8.3 in Apache
> Spark 2.3.1, let's upgrade orc depen
Hello,
Thank you very much for your response Anastasie! Today I think I made it
through dropping partitions in (runJob or submitJob) - I don’t remember
exactly, in DAGScheduler.
If it doesn’t work properly after some tests, I will follow your approach.
Thank you,
Thodoris
> On 16 Apr 2018, a
Hi all,
I think this is doable using the mapPartitionsWithIndex method of RDD.
Example:
val partitionIndex = 0 // Your favorite partition index here
val rdd = spark.sparkContext.parallelize(Array.range(0, 1000))
// Replace elements of partitionIndex with [-10, .. ,0]
val fixed = rdd.mapPartit