Re: New Feature Request

2015-08-05 Thread Sandeep Giri
Hi Jonathan,

Does that guarantee a result? I do not see that it is really optimized.

Hi Carsten,


How does the following code work:

data.filter(qualifying_function).take(n).count() >= n


Also, as per my understanding, in both the approaches you mentioned the
qualifying function will be executed on whole dataset even if the value was
already found in the first element of RDD:


   - data.filter(qualifying_function).take(n).count() >= n
  - val contains1MatchingElement = !(data.filter(qualifying_
  function).isEmpty())

Isn't it? Am I missing something?


Regards,
Sandeep Giri,
+1 347 781 4573 (US)
+91-953-899-8962 (IN)

www.KnowBigData.com. 
Phone: +1-253-397-1945 (Office)

[image: linkedin icon]  [image:
other site icon]   [image: facebook icon]
 [image: twitter icon]
 


On Fri, Jul 31, 2015 at 3:37 PM, Jonathan Winandy <
jonathan.wina...@gmail.com> wrote:

> Hello !
>
> You could try something like that :
>
> def exists[T](rdd:RDD[T])(f:T=>Boolean, n:Int):Boolean = {
>   rdd.filter(f).countApprox(timeout = 1).getFinalValue().low > n
> }
>
> If would work for large datasets and large value of n.
>
> Have a nice day,
>
> Jonathan
>
>
>
> On 31 July 2015 at 11:29, Carsten Schnober <
> schno...@ukp.informatik.tu-darmstadt.de> wrote:
>
>> Hi,
>> the RDD class does not have an exist()-method (in the Scala API), but
>> the functionality you need seems easy to resemble with the existing
>> methods:
>>
>> val containsNMatchingElements =
>> data.filter(qualifying_function).take(n).count() >= n
>>
>> Note: I am not sure whether the intermediate take(n) really increases
>> performance, but the idea is to arbitrarily reduce the number of
>> elements in the RDD before counting because we are not interested in the
>> full count.
>>
>> If you need to check specifically whether there is at least one matching
>> occurrence, it is probably preferable to use isEmpty() instead of
>> count() and check whether the result is false:
>>
>> val contains1MatchingElement =
>> !(data.filter(qualifying_function).isEmpty())
>>
>> Best,
>> Carsten
>>
>>
>>
>> Am 31.07.2015 um 11:11 schrieb Sandeep Giri:
>> > Dear Spark Dev Community,
>> >
>> > I am wondering if there is already a function to solve my problem. If
>> > not, then should I work on this?
>> >
>> > Say you just want to check if a word exists in a huge text file. I could
>> > not find better ways than those mentioned here
>> > <
>> http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2#q6
>> >.
>> >
>> > So, I was proposing if we have a function called /exists /in RDD with
>> > the following signature:
>> >
>> > #returns the true if n elements exist which qualify our criteria.
>> > #qualifying function would receive the element and its index and return
>> > true or false.
>> > def /exists/(qualifying_function, n):
>> >  
>> >
>> >
>> > Regards,
>> > Sandeep Giri,
>> > +1 347 781 4573 (US)
>> > +91-953-899-8962 (IN)
>> >
>> > www.KnowBigData.com. 
>> > Phone: +1-253-397-1945 (Office)
>> >
>> > linkedin icon  other site
>> icon
>> >  facebook icon
>> > twitter icon
>> > 
>> >
>>
>> --
>> Carsten Schnober
>> Doctoral Researcher
>> Ubiquitous Knowledge Processing (UKP) Lab
>> FB 20 / Computer Science Department
>> Technische Universität Darmstadt
>> Hochschulstr. 10, D-64289 Darmstadt, Germany
>> phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
>> schno...@ukp.informatik.tu-darmstadt.de
>> www.ukp.tu-darmstadt.de
>>
>> Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de
>> GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources
>> (AIPHES): www.aiphes.tu-darmstadt.de
>> PhD program: Knowledge Discovery in Scientific Literature (KDSL)
>> www.kdsl.tu-darmstadt.de
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>


Re: New Feature Request

2015-08-05 Thread Sean Owen
I don't think countApprox is appropriate here unless approximation is OK.
But more generally, counting everything matching a filter requires applying
the filter to the whole data set, which seems like the thing to be avoided
here.

The take approach is better since it would stop after finding n matching
elements (it might do a little extra work given partitioning and
buffering). It would not filter the whole data set.

The only downside there is that it would copy n elements to the driver.

On Wed, Aug 5, 2015 at 10:34 AM, Sandeep Giri 
wrote:

> Hi Jonathan,
>
> Does that guarantee a result? I do not see that it is really optimized.
>
> Hi Carsten,
>
>
> How does the following code work:
>
> data.filter(qualifying_function).take(n).count() >= n
>
>
> Also, as per my understanding, in both the approaches you mentioned the
> qualifying function will be executed on whole dataset even if the value was
> already found in the first element of RDD:
>
>
>- data.filter(qualifying_function).take(n).count() >= n
>   - val contains1MatchingElement = !(data.filter(qualifying_
>   function).isEmpty())
>
> Isn't it? Am I missing something?
>
>
> Regards,
> Sandeep Giri,
> +1 347 781 4573 (US)
> +91-953-899-8962 (IN)
>
> www.KnowBigData.com. 
> Phone: +1-253-397-1945 (Office)
>
> [image: linkedin icon]  [image:
> other site icon]   [image: facebook icon]
>  [image: twitter icon]
>  
>
>
> On Fri, Jul 31, 2015 at 3:37 PM, Jonathan Winandy <
> jonathan.wina...@gmail.com> wrote:
>
>> Hello !
>>
>> You could try something like that :
>>
>> def exists[T](rdd:RDD[T])(f:T=>Boolean, n:Int):Boolean = {
>>   rdd.filter(f).countApprox(timeout = 1).getFinalValue().low > n
>> }
>>
>> If would work for large datasets and large value of n.
>>
>> Have a nice day,
>>
>> Jonathan
>>
>>
>>
>> On 31 July 2015 at 11:29, Carsten Schnober <
>> schno...@ukp.informatik.tu-darmstadt.de> wrote:
>>
>>> Hi,
>>> the RDD class does not have an exist()-method (in the Scala API), but
>>> the functionality you need seems easy to resemble with the existing
>>> methods:
>>>
>>> val containsNMatchingElements =
>>> data.filter(qualifying_function).take(n).count() >= n
>>>
>>> Note: I am not sure whether the intermediate take(n) really increases
>>> performance, but the idea is to arbitrarily reduce the number of
>>> elements in the RDD before counting because we are not interested in the
>>> full count.
>>>
>>> If you need to check specifically whether there is at least one matching
>>> occurrence, it is probably preferable to use isEmpty() instead of
>>> count() and check whether the result is false:
>>>
>>> val contains1MatchingElement =
>>> !(data.filter(qualifying_function).isEmpty())
>>>
>>> Best,
>>> Carsten
>>>
>>>
>>>
>>> Am 31.07.2015 um 11:11 schrieb Sandeep Giri:
>>> > Dear Spark Dev Community,
>>> >
>>> > I am wondering if there is already a function to solve my problem. If
>>> > not, then should I work on this?
>>> >
>>> > Say you just want to check if a word exists in a huge text file. I
>>> could
>>> > not find better ways than those mentioned here
>>> > <
>>> http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2#q6
>>> >.
>>> >
>>> > So, I was proposing if we have a function called /exists /in RDD with
>>> > the following signature:
>>> >
>>> > #returns the true if n elements exist which qualify our criteria.
>>> > #qualifying function would receive the element and its index and return
>>> > true or false.
>>> > def /exists/(qualifying_function, n):
>>> >  
>>> >
>>> >
>>> > Regards,
>>> > Sandeep Giri,
>>> > +1 347 781 4573 (US)
>>> > +91-953-899-8962 (IN)
>>> >
>>> > www.KnowBigData.com. 
>>> > Phone: +1-253-397-1945 (Office)
>>> >
>>> > linkedin icon  other site
>>> icon
>>> >  facebook icon
>>> > twitter icon
>>> > 
>>> >
>>>
>>> --
>>> Carsten Schnober
>>> Doctoral Researcher
>>> Ubiquitous Knowledge Processing (UKP) Lab
>>> FB 20 / Computer Science Department
>>> Technische Universität Darmstadt
>>> Hochschulstr. 10, D-64289 Darmstadt, Germany
>>> phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
>>> schno...@ukp.informatik.tu-darmstadt.de
>>> www.ukp.tu-darmstadt.de
>>>
>>> Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de
>>> GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources
>>> (AIPHES): www.aiphes.tu-darmstadt.de
>>> PhD program: Knowledge Discovery in Scientific Literature (KDSL)
>>> www.kdsl.tu-darmstadt.de
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands

[no subject]

2015-08-05 Thread Sandeep Giri
Yes, but in the take() approach we will be bringing the data to the driver
and is no longer distributed.

Also, the take() takes only count as argument which means that every time
we would transferring the redundant elements.


Regards,
Sandeep Giri,
+1 347 781 4573 (US)
+91-953-899-8962 (IN)

www.KnowBigData.com. 
Phone: +1-253-397-1945 (Office)

[image: linkedin icon]  [image:
other site icon]   [image: facebook icon]
 [image: twitter icon]
 


On Wed, Aug 5, 2015 at 3:09 PM, Sean Owen  wrote:

> I don't think countApprox is appropriate here unless approximation is OK.
> But more generally, counting everything matching a filter requires applying
> the filter to the whole data set, which seems like the thing to be avoided
> here.
>
> The take approach is better since it would stop after finding n matching
> elements (it might do a little extra work given partitioning and
> buffering). It would not filter the whole data set.
>
> The only downside there is that it would copy n elements to the driver.
>
>
> On Wed, Aug 5, 2015 at 10:34 AM, Sandeep Giri 
> wrote:
>
>> Hi Jonathan,
>>
>> Does that guarantee a result? I do not see that it is really optimized.
>>
>> Hi Carsten,
>>
>>
>> How does the following code work:
>>
>> data.filter(qualifying_function).take(n).count() >= n
>>
>>
>> Also, as per my understanding, in both the approaches you mentioned the
>> qualifying function will be executed on whole dataset even if the value was
>> already found in the first element of RDD:
>>
>>
>>- data.filter(qualifying_function).take(n).count() >= n
>>   - val contains1MatchingElement = !(data.filter(qualifying_
>>   function).isEmpty())
>>
>> Isn't it? Am I missing something?
>>
>>
>> Regards,
>> Sandeep Giri,
>> +1 347 781 4573 (US)
>> +91-953-899-8962 (IN)
>>
>> www.KnowBigData.com. 
>> Phone: +1-253-397-1945 (Office)
>>
>> [image: linkedin icon]  [image:
>> other site icon]   [image: facebook icon]
>>  [image: twitter icon]
>>  
>>
>>
>> On Fri, Jul 31, 2015 at 3:37 PM, Jonathan Winandy <
>> jonathan.wina...@gmail.com> wrote:
>>
>>> Hello !
>>>
>>> You could try something like that :
>>>
>>> def exists[T](rdd:RDD[T])(f:T=>Boolean, n:Int):Boolean = {
>>>   rdd.filter(f).countApprox(timeout = 1).getFinalValue().low > n
>>> }
>>>
>>> If would work for large datasets and large value of n.
>>>
>>> Have a nice day,
>>>
>>> Jonathan
>>>
>>>
>>>
>>> On 31 July 2015 at 11:29, Carsten Schnober <
>>> schno...@ukp.informatik.tu-darmstadt.de> wrote:
>>>
 Hi,
 the RDD class does not have an exist()-method (in the Scala API), but
 the functionality you need seems easy to resemble with the existing
 methods:

 val containsNMatchingElements =
 data.filter(qualifying_function).take(n).count() >= n

 Note: I am not sure whether the intermediate take(n) really increases
 performance, but the idea is to arbitrarily reduce the number of
 elements in the RDD before counting because we are not interested in the
 full count.

 If you need to check specifically whether there is at least one matching
 occurrence, it is probably preferable to use isEmpty() instead of
 count() and check whether the result is false:

 val contains1MatchingElement =
 !(data.filter(qualifying_function).isEmpty())

 Best,
 Carsten



 Am 31.07.2015 um 11:11 schrieb Sandeep Giri:
 > Dear Spark Dev Community,
 >
 > I am wondering if there is already a function to solve my problem. If
 > not, then should I work on this?
 >
 > Say you just want to check if a word exists in a huge text file. I
 could
 > not find better ways than those mentioned here
 > <
 http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2#q6
 >.
 >
 > So, I was proposing if we have a function called /exists /in RDD with
 > the following signature:
 >
 > #returns the true if n elements exist which qualify our criteria.
 > #qualifying function would receive the element and its index and
 return
 > true or false.
 > def /exists/(qualifying_function, n):
 >  
 >
 >
 > Regards,
 > Sandeep Giri,
 > +1 347 781 4573 (US)
 > +91-953-899-8962 (IN)
 >
 > www.KnowBigData.com. 
 > Phone: +1-253-397-1945 (Office)
 >
 > linkedin icon  other site
 icon
 >  facebook icon
 > twitter icon
 > 

Re:

2015-08-05 Thread Sean Owen
take only brings n elements to the driver, which is probably still a win if
n is small. I'm not sure what you mean by only taking a count argument --
what else would be an arg to take?

On Wed, Aug 5, 2015 at 4:49 PM, Sandeep Giri 
wrote:

> Yes, but in the take() approach we will be bringing the data to the driver
> and is no longer distributed.
>
> Also, the take() takes only count as argument which means that every time
> we would transferring the redundant elements.
>
>


Re:

2015-08-05 Thread Sandeep Giri
Okay. I think I got it now. Yes take() does not need to be called more than
once. I got the impression that we wanted to bring elements to the driver
node and then run out qualifying_function on driver_node.

Now, I am back to my question which I started with: Could there be an
approach where the qualifying_function() does not get called after an
element has been found?


Regards,
Sandeep Giri,
+1 347 781 4573 (US)
+91-953-899-8962 (IN)

www.KnowBigData.com. 
Phone: +1-253-397-1945 (Office)

[image: linkedin icon]  [image:
other site icon]   [image: facebook icon]
 [image: twitter icon]
 


On Wed, Aug 5, 2015 at 9:21 PM, Sean Owen  wrote:

> take only brings n elements to the driver, which is probably still a win
> if n is small. I'm not sure what you mean by only taking a count argument
> -- what else would be an arg to take?
>
> On Wed, Aug 5, 2015 at 4:49 PM, Sandeep Giri 
> wrote:
>
>> Yes, but in the take() approach we will be bringing the data to the
>> driver and is no longer distributed.
>>
>> Also, the take() takes only count as argument which means that every time
>> we would transferring the redundant elements.
>>
>>


Re: Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'

2015-08-05 Thread Guru Medasani
Following up on this thread to see if anyone has some thoughts or opinions on 
the mentioned approach.


Guru Medasani
gdm...@gmail.com



> On Aug 3, 2015, at 10:20 PM, Guru Medasani  wrote:
> 
> Hi,
> 
> I was looking at the spark-submit and spark-shell --help  on both (Spark 
> 1.3.1 and Spark 1.5-snapshot) versions and the Spark documentation for 
> submitting Spark applications to YARN. It seems to be there is some mismatch 
> in the preferred syntax and documentation. 
> 
> Spark documentation 
> 
>  says that we need to specify either yarn-cluster or yarn-client to connect 
> to a yarn cluster. 
> 
> 
> yarn-client   Connect to a YARN  
> cluster in client 
> mode. The cluster location will be found based on the HADOOP_CONF_DIR or 
> YARN_CONF_DIR variable.
> yarn-cluster  Connect to a YARN  
> cluster in cluster 
> mode. The cluster location will be found based on the HADOOP_CONF_DIR or 
> YARN_CONF_DIR variable.
> In the spark-submit --help it says the following Options: --master yarn 
> --deploy-mode cluster or client.
> 
> Usage: spark-submit [options]  [app arguments]
> Usage: spark-submit --kill [submission ID] --master [spark://...] 
> 
> Usage: spark-submit --status [submission ID] --master [spark://...] 
> 
> 
> Options:
>   --master MASTER_URL spark://host:port , 
> mesos://host:port , yarn, or local.
>   --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally 
> ("client") or
>   on one of the worker machines inside the 
> cluster ("cluster")
>   (Default: client).
> 
> I want to bring this to your attention as this is a bit confusing for someone 
> running Spark on YARN. For example, they look at the spark-submit help 
> command and start using the syntax, but when they look at online 
> documentation or user-group mailing list, they see different spark-submit 
> syntax. 
> 
> From a quick discussion with other engineers at Cloudera it seems like 
> —deploy-mode is preferred as it is more consistent with the way things are 
> done with other cluster managers, i.e. there is no standalone-cluster or 
> standalone-client masters. This applies to Mesos as well.
> 
> Either syntax works, but I would like to propose to use ‘-master yarn 
> —deploy-mode x’ instead of ‘-master yarn-cluster or -master yarn-client’ as 
> it is consistent with other cluster managers . This would require updating 
> all Spark pages related to submitting Spark applications to YARN.
> 
> So far I’ve identified the following pages.
> 
> 1) http://spark.apache.org/docs/latest/running-on-yarn.html 
> 
> 2) 
> http://spark.apache.org/docs/latest/submitting-applications.html#master-urls 
> 
> 
> There is a JIRA to track the progress on this as well.
> 
> https://issues.apache.org/jira/browse/SPARK-9570 
> 
>  
> The option we choose dictates whether we update the documentation  or 
> spark-submit and spark-shell help pages.  
> 
> Any thoughts which direction we should go? 
> 
> Guru Medasani
> gdm...@gmail.com 
> 
> 
> 



Re:

2015-08-05 Thread Feynman Liang
qualifying_function() will be executed on each partition in parallel;
stopping all parallel execution after the first instance satisfying
qualifying_function() would mean that you would have to effectively make
the computation sequential.

On Wed, Aug 5, 2015 at 9:05 AM, Sandeep Giri 
wrote:

> Okay. I think I got it now. Yes take() does not need to be called more
> than once. I got the impression that we wanted to bring elements to the
> driver node and then run out qualifying_function on driver_node.
>
> Now, I am back to my question which I started with: Could there be an
> approach where the qualifying_function() does not get called after an
> element has been found?
>
>
> Regards,
> Sandeep Giri,
> +1 347 781 4573 (US)
> +91-953-899-8962 (IN)
>
> www.KnowBigData.com. 
> Phone: +1-253-397-1945 (Office)
>
> [image: linkedin icon]  [image:
> other site icon]   [image: facebook icon]
>  [image: twitter icon]
>  
>
>
> On Wed, Aug 5, 2015 at 9:21 PM, Sean Owen  wrote:
>
>> take only brings n elements to the driver, which is probably still a win
>> if n is small. I'm not sure what you mean by only taking a count argument
>> -- what else would be an arg to take?
>>
>> On Wed, Aug 5, 2015 at 4:49 PM, Sandeep Giri 
>> wrote:
>>
>>> Yes, but in the take() approach we will be bringing the data to the
>>> driver and is no longer distributed.
>>>
>>> Also, the take() takes only count as argument which means that every
>>> time we would transferring the redundant elements.
>>>
>>>
>


Re:

2015-08-05 Thread Jonathan Winandy
Hello !

You could try something like that :

def exists[T](rdd:RDD[T])(f:T=>Boolean, n:Long):Boolean = {

  val context: SparkContext = rdd.sparkContext
  val grp: String = Random.alphanumeric.take(10).mkString
  context.setJobGroup(grp, "exist")
  val count: Accumulator[Long] = context.accumulator(0L)

  val iteratorToInt: (Iterator[T]) => Int = {
iterator =>
  val i: Int = iterator.count(f)
  count += i
  i
  }

  val t = new Thread {
override def run {
  while (count.value < n) {}
  context.cancelJobGroup(grp)
}
  }
  t.start()
  try {
context.runJob(rdd, iteratorToInt) > n
  } catch  {
case e:SparkException => {
  count.value > n
}
  } finally {
t.stop()
  }

}



It stops the computation if enough elements satisfying the condition are
witnessed.

It is performant if the RDD is well partitioned. If this is a problem, you
could change iteratorToInt to :

val iteratorToInt: (Iterator[T]) => Int = {
  iterator =>
val i: Int = iterator.count(x => {
  if(f(x)) {
count += 1
true
  } else false
})
i

}


I am interested in a safer way to perform partial computation in spark.

Cheers,
Jonathan

On 5 August 2015 at 18:54, Feynman Liang  wrote:

> qualifying_function() will be executed on each partition in parallel;
> stopping all parallel execution after the first instance satisfying
> qualifying_function() would mean that you would have to effectively make
> the computation sequential.
>
> On Wed, Aug 5, 2015 at 9:05 AM, Sandeep Giri 
> wrote:
>
>> Okay. I think I got it now. Yes take() does not need to be called more
>> than once. I got the impression that we wanted to bring elements to the
>> driver node and then run out qualifying_function on driver_node.
>>
>> Now, I am back to my question which I started with: Could there be an
>> approach where the qualifying_function() does not get called after an
>> element has been found?
>>
>>
>> Regards,
>> Sandeep Giri,
>> +1 347 781 4573 (US)
>> +91-953-899-8962 (IN)
>>
>> www.KnowBigData.com. 
>> Phone: +1-253-397-1945 (Office)
>>
>> [image: linkedin icon]  [image:
>> other site icon]   [image: facebook icon]
>>  [image: twitter icon]
>>  
>>
>>
>> On Wed, Aug 5, 2015 at 9:21 PM, Sean Owen  wrote:
>>
>>> take only brings n elements to the driver, which is probably still a win
>>> if n is small. I'm not sure what you mean by only taking a count argument
>>> -- what else would be an arg to take?
>>>
>>> On Wed, Aug 5, 2015 at 4:49 PM, Sandeep Giri 
>>> wrote:
>>>
 Yes, but in the take() approach we will be bringing the data to the
 driver and is no longer distributed.

 Also, the take() takes only count as argument which means that every
 time we would transferring the redundant elements.


>>
>


Avoiding unnecessary build changes until tests are in better shape

2015-08-05 Thread Patrick Wendell
Hey All,

Was wondering if people would be willing to avoid merging build
changes until we have put the tests in better shape. The reason is
that build changes are the most likely to cause downstream issues with
the test matrix and it's very difficult to reverse engineer which
patches caused which problems when the tests are not in a stable
state. For instance, the updates to Hive 1.2.1 caused cascading
failures that have lasted several days now and in the mean time a few
other build related patches were also merged - as these pile up it
gets harder for us to have confidence those other patches didn't
introduce problems.

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Avoiding unnecessary build changes until tests are in better shape

2015-08-05 Thread Josh Rosen
+1.  I've been holding off on reviewing / merging patches like the 
run-tests-jenkins Python refactoring for exactly this reason.


On 8/5/15 11:24 AM, Patrick Wendell wrote:

Hey All,

Was wondering if people would be willing to avoid merging build
changes until we have put the tests in better shape. The reason is
that build changes are the most likely to cause downstream issues with
the test matrix and it's very difficult to reverse engineer which
patches caused which problems when the tests are not in a stable
state. For instance, the updates to Hive 1.2.1 caused cascading
failures that have lasted several days now and in the mean time a few
other build related patches were also merged - as these pile up it
gets harder for us to have confidence those other patches didn't
introduce problems.

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org