Sandeep Joshi wrote:
So, as you see, I managed to create the required code to return a
valid schema, and was also able to write unittests for it.
I copied "protected[spark]" from the CSV implementation, but I
commented it out because it prevents compilation from being
success
HI all,
I have code like below:
Logger.getLogger("org.apache.spark").setLevel( Level.ERROR)
//Logger.getLogger("org.apache.spark.streaming.dstream").setLevel(
Level.DEBUG)
val conf = new SparkConf().setAppName("testDstream").setMaster("local[4]")
//val sc = SparkContext.getOrCrea
(adding holden and bryan cutler to the CC on this)
we're currently fixing this. i've installed pyarrow 0.4.0 in the
default conda environment used by the spark tests (py3k), and either
bryan or holden will be removing the pip install from run-pip-tests
and adding the arrow tests to the regular py
the discussion about this is located here:
https://github.com/apache/spark/pull/15821
On Tue, Jun 27, 2017 at 12:32 PM, shane knapp wrote:
> (adding holden and bryan cutler to the CC on this)
>
> we're currently fixing this. i've installed pyarrow 0.4.0 in the
> default conda environment used by
PR being tested now:
https://github.com/apache/spark/pull/18443
On Tue, Jun 27, 2017 at 12:33 PM, shane knapp wrote:
> the discussion about this is located here:
> https://github.com/apache/spark/pull/15821
>
> On Tue, Jun 27, 2017 at 12:32 PM, shane knapp wrote:
>> (adding holden and bryan cu
Hi All,
The code: RangePartitioner
// This is the sample size we need to have roughly balanced output
partitions, capped at 1M.
val sampleSize = math.min(20.0 * partitions, 1e6)
// Assume the input partitions are roughly balanced and over-sample a
little bit.
val sampleSize