Re: [How-To][SQL] Create a dataframe inside the TableScan.buildScan method of a relation

2017-06-27 Thread OBones
Sandeep Joshi wrote: So, as you see, I managed to create the required code to return a valid schema, and was also able to write unittests for it. I copied "protected[spark]" from the CSV implementation, but I commented it out because it prevents compilation from being success

the function of countByValueAndWindow and foreachRDD in DStream, would you like help me understand it please?

2017-06-27 Thread ??????????
HI all, I have code like below: Logger.getLogger("org.apache.spark").setLevel( Level.ERROR) //Logger.getLogger("org.apache.spark.streaming.dstream").setLevel( Level.DEBUG) val conf = new SparkConf().setAppName("testDstream").setMaster("local[4]") //val sc = SparkContext.getOrCrea

Re: Is there something wrong with jenkins?

2017-06-27 Thread shane knapp
(adding holden and bryan cutler to the CC on this) we're currently fixing this. i've installed pyarrow 0.4.0 in the default conda environment used by the spark tests (py3k), and either bryan or holden will be removing the pip install from run-pip-tests and adding the arrow tests to the regular py

Re: Is there something wrong with jenkins?

2017-06-27 Thread shane knapp
the discussion about this is located here: https://github.com/apache/spark/pull/15821 On Tue, Jun 27, 2017 at 12:32 PM, shane knapp wrote: > (adding holden and bryan cutler to the CC on this) > > we're currently fixing this. i've installed pyarrow 0.4.0 in the > default conda environment used by

Re: Is there something wrong with jenkins?

2017-06-27 Thread shane knapp
PR being tested now: https://github.com/apache/spark/pull/18443 On Tue, Jun 27, 2017 at 12:33 PM, shane knapp wrote: > the discussion about this is located here: > https://github.com/apache/spark/pull/15821 > > On Tue, Jun 27, 2017 at 12:32 PM, shane knapp wrote: >> (adding holden and bryan cu

Some question for range

2017-06-27 Thread raintung li
Hi All, The code: RangePartitioner // This is the sample size we need to have roughly balanced output partitions, capped at 1M. val sampleSize = math.min(20.0 * partitions, 1e6) // Assume the input partitions are roughly balanced and over-sample a little bit. val sampleSize