Structured Streaming with Kafka Source, does it work??

2016-11-06 Thread shyla
I am trying to do Structured Streaming with Kafka Source. Please let me know
where I can find some sample code for this. Thanks



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Structured-Streaming-with-Kafka-Source-does-it-work-tp19748.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Structured Streaming with Kafka Source, does it work??

2016-11-06 Thread Jayaradha Natarajan
Shyla!

Check
https://databricks.com/blog/2016/07/28/structured-streaming-in-apache-spark.html

Thanks,
Jayaradha

On Sun, Nov 6, 2016 at 5:13 PM, shyla  wrote:

> I am trying to do Structured Streaming with Kafka Source. Please let me
> know
> where I can find some sample code for this. Thanks
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Structured-
> Streaming-with-Kafka-Source-does-it-work-tp19748.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Using mention-bot to automatically ping potential reviewers

2016-11-06 Thread Nicholas Chammas
Howdy folks,

I wonder if anybody has ever used Facebook's mention-bot in a project:

https://github.com/facebook/mention-bot

Seems like a useful tool to help address the problem of figuring out who to
ping for review.

If you've used it, what was your experience? Do you think it would be
helpful for Spark?

Nick


Re: Structured Streaming with Kafka Source, does it work??

2016-11-06 Thread Matei Zaharia
The Kafka source will only appear in 2.0.2 -- see this thread for the current 
release candidate: 
https://lists.apache.org/thread.html/597d630135e9eb3ede54bb0cc0b61a2b57b189588f269a64b58c9243@%3Cdev.spark.apache.org%3E
 . You can try that right now if you want from the staging Maven repo shown 
there. The vote looks likely to pass so an actual release should hopefully also 
be out soon.

Matei

> On Nov 6, 2016, at 5:25 PM, shyla deshpande  wrote:
> 
> Hi Jaya!
> 
> Thanks for the reply. Structured streaming works fine for me with socket text 
> stream . I think structured streaming with kafka source not yet supported.
> 
> Please if anyone has got it working with kafka source, please provide me some 
> sample code or direction.
> 
> Thanks
> 
> 
> On Sun, Nov 6, 2016 at 5:17 PM, Jayaradha Natarajan  > wrote:
> Shyla!
> 
> Check
> https://databricks.com/blog/2016/07/28/structured-streaming-in-apache-spark.html
>  
> 
> 
> Thanks,
> Jayaradha
> 
> On Sun, Nov 6, 2016 at 5:13 PM, shyla  > wrote:
> I am trying to do Structured Streaming with Kafka Source. Please let me know
> where I can find some sample code for this. Thanks
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Structured-Streaming-with-Kafka-Source-does-it-work-tp19748.html
>  
> 
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
> 
> 
> 



Re: Using mention-bot to automatically ping potential reviewers

2016-11-06 Thread Holden Karau
So according the documentation it mostly uses blame lines which _might_ not
be the best fit for Spark (since many of the people in the blame lines
aren't going to have permission to commit the code). (Although it's
possible that the algorithm that is actually used does more than the one
described in the documentation).

I'd love to know what peoples experiences of using this with other projects
with a similar structure as Spark (small set of committers to a large set
of contributors).

Even with that reservation it could be interesting to try out since this is
a really big problem for new contributors looking to participate in Spark
and the spark-prs dashboard doesn't seem to be catching everything.

On Sun, Nov 6, 2016 at 6:25 PM, Nicholas Chammas  wrote:

> Howdy folks,
>
> I wonder if anybody has ever used Facebook's mention-bot in a project:
>
> https://github.com/facebook/mention-bot
>
> Seems like a useful tool to help address the problem of figuring out who
> to ping for review.
>
> If you've used it, what was your experience? Do you think it would be
> helpful for Spark?
>
> Nick
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: Handling questions in the mailing lists

2016-11-06 Thread Reynold Xin
OK I've checked on the ASF member list (which is private so there is no
public archive).

It is not against any ASF rule to recommend StackOverflow as a place for
users to ask questions. I don't think we can or should delete the existing
user@spark list either, but we can certainly make SO more visible than it
is.



On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin  wrote:

> Actually after talking with more ASF members, I believe the only policy is
> that development decisions have to be made and announced on ASF properties
> (dev list or jira), but user questions don't have to.
>
> I'm going to double check this. If it is true, I would actually recommend
> us moving entirely over the Q&A part of the user list to stackoverflow, or
> at least make that the recommended way rather than the existing user list
> which is not very scalable.
>
>
> On Wednesday, November 2, 2016, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> We’ve discussed several times upgrading our communication tools, as far
>> back as 2014 and maybe even before that too. The bottom line is that we
>> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>>
>> For some history, see this discussion:
>>
>>- https://mail-archives.apache.org/mod_mbox/spark-user/201412.
>>mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@
>>mail.gmail.com%3E
>>
>> 
>>- https://mail-archives.apache.org/mod_mbox/spark-user/201501.
>>mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@
>>mail.gmail.com%3E
>>
>> 
>>
>> (It’s ironic that it’s difficult to follow the past discussion on why we
>> can’t change our official communication tools due to those very tools…)
>>
>> Nick
>> ​
>>
>> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <
>> ricardo.alme...@actnowib.com> wrote:
>>
>>> I fell Assaf point is quite relevant if we want to move this project
>>> forward from the Spark user perspective (as I do). In fact, we're still
>>> using 20th century tools (mailing lists) with some add-ons (like Stack
>>> Overflow).
>>>
>>> As usually, Sean and Cody's contributions are very to the point.
>>> I fell it is indeed a matter of of culture (hard to enforce) and tools
>>> (much easier). Isn't it?
>>>
>>> On 2 November 2016 at 16:36, Cody Koeninger  wrote:
>>>
 So concrete things people could do

 - users could tag subject lines appropriately to the component they're
 asking about

 - contributors could monitor user@ for tags relating to components
 they've worked on.
 I'd be surprised if my miss rate for any mailing list questions
 well-labeled as Kafka was higher than 5%

 - committers could be more aggressive about soliciting and merging PRs
 to improve documentation.
 It's a lot easier to answer even poorly-asked questions with a link to
 relevant docs.

 On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen  wrote:
 > There's already reviews@ and issues@. dev@ is for project
 development itself
 > and I think is OK. You're suggesting splitting up user@ and I
 sympathize
 > with the motivation. Experience tells me that we'll have a beginner@
 that's
 > then totally ignored, and people will quickly learn to post to
 advanced@ to
 > get attention, and we'll be back where we started. Putting it in JIRA
 > doesn't help. I don't think this a problem that is merely down to
 lack of
 > process. It actually requires cultivating a culture change on the
 community
 > list.
 >
 > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <
 assaf.mendel...@rsa.com>
 > wrote:
 >>
 >> What I am suggesting is basically to fix that.
 >>
 >> For example, we might say that mailing list A is only for voting,
 mailing
 >> list B is only for PR and have something like stack overflow for
 developer
 >> questions (I would even go as far as to have beginner, intermediate
 and
 >> advanced mailing list for users and beginner/advanced for dev).
 >>
 >>
 >>
 >> This can easily be done using stack overflow tags, however, that
 would
 >> probably be harder to manage.
 >>
 >> Maybe using special jira tags and manage it in jira?
 >>
 >>
 >>
 >> Anyway as I said, the main issue is not user questions (except maybe
 >> advanced ones) but more for dev questions. It is so easy to get lost
 in the
 >> chatter that it makes it very hard for people to learn spark
 internals…
 >>
 >> Assaf.
 >>
 >>
 >>
 >> From: Sean Owen [mailto:so...@cloudera.com]
 >> Sent: Wednesday, November 02, 2016 2:07 PM
 >> To: Mendelson, Assaf;

Re: Handling questions in the mailing lists

2016-11-06 Thread Maciej Szymkiewicz
You have to remember that Stack Overflow crowd (like me) is highly
opinionated, so many questions, which could be just fine on the mailing
list, will be quickly downvoted and / or closed as off-topic. Just
saying...

-- 
Best, 
Maciej


On 11/07/2016 04:03 AM, Reynold Xin wrote:
> OK I've checked on the ASF member list (which is private so there is
> no public archive).
>
> It is not against any ASF rule to recommend StackOverflow as a place
> for users to ask questions. I don't think we can or should delete the
> existing user@spark list either, but we can certainly make SO more
> visible than it is.
>
>
>
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin  > wrote:
>
> Actually after talking with more ASF members, I believe the only
> policy is that development decisions have to be made and announced
> on ASF properties (dev list or jira), but user questions don't
> have to. 
>
> I'm going to double check this. If it is true, I would actually
> recommend us moving entirely over the Q&A part of the user list to
> stackoverflow, or at least make that the recommended way rather
> than the existing user list which is not very scalable. 
>
>
> On Wednesday, November 2, 2016, Nicholas Chammas
> mailto:nicholas.cham...@gmail.com>>
> wrote:
>
> We’ve discussed several times upgrading our communication
> tools, as far back as 2014 and maybe even before that too. The
> bottom line is that we can’t due to ASF rules requiring the
> use of ASF-managed mailing lists.
>
> For some history, see this discussion:
>
>   * 
> https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oy5no2dhwj_kveop...@mail.gmail.com%3E
> 
> 
>   * 
> https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=tktxy_...@mail.gmail.com%3E
> 
> 
>
> (It’s ironic that it’s difficult to follow the past discussion
> on why we can’t change our official communication tools due to
> those very tools…)
>
> Nick
>
> ​
>
> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida
>  wrote:
>
> I fell Assaf point is quite relevant if we want to move
> this project forward from the Spark user perspective (as I
> do). In fact, we're still using 20th century tools
> (mailing lists) with some add-ons (like Stack Overflow).
>
> As usually, Sean and Cody's contributions are very to the
> point.
> I fell it is indeed a matter of of culture (hard to
> enforce) and tools (much easier). Isn't it?
>
> On 2 November 2016 at 16:36, Cody Koeninger
>  wrote:
>
> So concrete things people could do
>
> - users could tag subject lines appropriately to the
> component they're
> asking about
>
> - contributors could monitor user@ for tags relating
> to components
> they've worked on.
> I'd be surprised if my miss rate for any mailing list
> questions
> well-labeled as Kafka was higher than 5%
>
> - committers could be more aggressive about soliciting
> and merging PRs
> to improve documentation.
> It's a lot easier to answer even poorly-asked
> questions with a link to
> relevant docs.
>
> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen
>  wrote:
> > There's already reviews@ and issues@. dev@ is for
> project development itself
> > and I think is OK. You're suggesting splitting up
> user@ and I sympathize
> > with the motivation. Experience tells me that we'll
> have a beginner@ that's
> > then totally ignored, and people will quickly learn
> to post to advanced@ to
> > get attention, and we'll be back where we started.
> Putting it in JIRA
> > doesn't help. I don't think this a problem that is
> merely down to lack of
> > process. It actually requires cultivating a culture
> change on the community
> > list.
> >
> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf
> 
> > wrote:
> >>
> >> What I

Re: Handling questions in the mailing lists

2016-11-06 Thread Reynold Xin
You have substantially underestimated how opinionated people can be on
mailing lists too :)

On Sunday, November 6, 2016, Maciej Szymkiewicz 
wrote:

> You have to remember that Stack Overflow crowd (like me) is highly
> opinionated, so many questions, which could be just fine on the mailing
> list, will be quickly downvoted and / or closed as off-topic. Just
> saying...
>
> --
> Best,
> Maciej
>
>
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
> OK I've checked on the ASF member list (which is private so there is no
> public archive).
>
> It is not against any ASF rule to recommend StackOverflow as a place for
> users to ask questions. I don't think we can or should delete the existing
> user@spark list either, but we can certainly make SO more visible than it
> is.
>
>
>
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin  > wrote:
>
>> Actually after talking with more ASF members, I believe the only policy
>> is that development decisions have to be made and announced on ASF
>> properties (dev list or jira), but user questions don't have to.
>>
>> I'm going to double check this. If it is true, I would actually recommend
>> us moving entirely over the Q&A part of the user list to stackoverflow, or
>> at least make that the recommended way rather than the existing user list
>> which is not very scalable.
>>
>>
>> On Wednesday, November 2, 2016, Nicholas Chammas <
>> nicholas.cham...@gmail.com
>> > wrote:
>>
>>> We’ve discussed several times upgrading our communication tools, as far
>>> back as 2014 and maybe even before that too. The bottom line is that we
>>> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>>>
>>> For some history, see this discussion:
>>>
>>>- https://mail-archives.apache.org/mod_mbox/spark-user/201412.
>>>mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@
>>>mail.gmail.com%3E
>>>
>>> 
>>>- https://mail-archives.apache.org/mod_mbox/spark-user/201501.
>>>mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@
>>>mail.gmail.com%3E
>>>
>>> 
>>>
>>> (It’s ironic that it’s difficult to follow the past discussion on why we
>>> can’t change our official communication tools due to those very tools…)
>>>
>>> Nick
>>> ​
>>>
>>> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <
>>> ricardo.alme...@actnowib.com> wrote:
>>>
 I fell Assaf point is quite relevant if we want to move this project
 forward from the Spark user perspective (as I do). In fact, we're
 still using 20th century tools (mailing lists) with some add-ons (like
 Stack Overflow).

 As usually, Sean and Cody's contributions are very to the point.
 I fell it is indeed a matter of of culture (hard to enforce) and tools
 (much easier). Isn't it?

 On 2 November 2016 at 16:36, Cody Koeninger  wrote:

> So concrete things people could do
>
> - users could tag subject lines appropriately to the component they're
> asking about
>
> - contributors could monitor user@ for tags relating to components
> they've worked on.
> I'd be surprised if my miss rate for any mailing list questions
> well-labeled as Kafka was higher than 5%
>
> - committers could be more aggressive about soliciting and merging PRs
> to improve documentation.
> It's a lot easier to answer even poorly-asked questions with a link to
> relevant docs.
>
> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen  wrote:
> > There's already reviews@ and issues@. dev@ is for project
> development itself
> > and I think is OK. You're suggesting splitting up user@ and I
> sympathize
> > with the motivation. Experience tells me that we'll have a beginner@
> that's
> > then totally ignored, and people will quickly learn to post to
> advanced@ to
> > get attention, and we'll be back where we started. Putting it in JIRA
> > doesn't help. I don't think this a problem that is merely down to
> lack of
> > process. It actually requires cultivating a culture change on the
> community
> > list.
> >
> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <
> assaf.mendel...@rsa.com>
> > wrote:
> >>
> >> What I am suggesting is basically to fix that.
> >>
> >> For example, we might say that mailing list A is only for voting,
> mailing
> >> list B is only for PR and have something like stack overflow for
> developer
> >> questions (I would even go as far as to have beginner, intermediate
> and
> >> advanced mailing list for users and beginner/advanced for dev).
> >>
> >>
> >>
> >> This can easily be done using stack overflow tags, however, that

Re: Handling questions in the mailing lists

2016-11-06 Thread Maciej Szymkiewicz
Damn, I always thought that mailing list is only for nice and welcoming
people and there is nothing to do for me here >:)

To be serious though, there are many questions on the users list which
would fit just fine on SO but it is not true in general. There are
dozens of questions which are to broad, opinion based, ask for external
resources and so on. If you want to direct users to SO you have to help
them to decide if it is the right channel. Otherwise it will just create
a really bad experience for both seeking help and active answerers.
Former ones will be downvoted and bashed, latter ones will have to deal
with handling all the junk and the number of active Spark users with
moderation privileges is really low (with only Massg and me being able
to directly close duplicates).

Believe me, I've seen this before.

On 11/07/2016 05:08 AM, Reynold Xin wrote:
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz
> mailto:mszymkiew...@gmail.com>> wrote:
>
> You have to remember that Stack Overflow crowd (like me) is highly
> opinionated, so many questions, which could be just fine on the
> mailing list, will be quickly downvoted and / or closed as
> off-topic. Just saying...
>
> -- 
> Best, 
> Maciej
>
>
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>> OK I've checked on the ASF member list (which is private so there
>> is no public archive).
>>
>> It is not against any ASF rule to recommend StackOverflow as a
>> place for users to ask questions. I don't think we can or should
>> delete the existing user@spark list either, but we can certainly
>> make SO more visible than it is.
>>
>>
>>
>> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin > > wrote:
>>
>> Actually after talking with more ASF members, I believe the
>> only policy is that development decisions have to be made and
>> announced on ASF properties (dev list or jira), but user
>> questions don't have to. 
>>
>> I'm going to double check this. If it is true, I would
>> actually recommend us moving entirely over the Q&A part of
>> the user list to stackoverflow, or at least make that the
>> recommended way rather than the existing user list which is
>> not very scalable. 
>>
>>
>> On Wednesday, November 2, 2016, Nicholas Chammas
>> > >
>> wrote:
>>
>> We’ve discussed several times upgrading our communication
>> tools, as far back as 2014 and maybe even before that
>> too. The bottom line is that we can’t due to ASF rules
>> requiring the use of ASF-managed mailing lists.
>>
>> For some history, see this discussion:
>>
>>   * 
>> https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oy5no2dhwj_kveop...@mail.gmail.com%3E
>> 
>> 
>>   * 
>> https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=tktxy_...@mail.gmail.com%3E
>> 
>> 
>>
>> (It’s ironic that it’s difficult to follow the past
>> discussion on why we can’t change our official
>> communication tools due to those very tools…)
>>
>> Nick
>>
>> ​
>>
>> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida
>>  wrote:
>>
>> I fell Assaf point is quite relevant if we want to
>> move this project forward from the Spark user
>> perspective (as I do). In fact, we're still using
>> 20th century tools (mailing lists) with some add-ons
>> (like Stack Overflow).
>>
>> As usually, Sean and Cody's contributions are very to
>> the point.
>> I fell it is indeed a matter of of culture (hard to
>> enforce) and tools (much easier). Isn't it?
>>
>> On 2 November 2016 at 16:36, Cody Koeninger
>>  wrote:
>>
>> So concrete things people could do
>>
>> - users could tag subject lines appropriately to
>> the component they're
>> asking about
>>
>> - contributors could monitor user@ for tags
>> relating to components
>> they've worked on.
>> I'd be surprised if my miss rate for any mailing
>> list questions
>> well-labeled as Kafka was high

Re: Handling questions in the mailing lists

2016-11-06 Thread Reynold Xin
This is an excellent point. If we do go ahead and feature SO as a way for
users to ask questions more prominently, as someone who knows SO very well,
would you be willing to help write a short guideline (ideally the shorter
the better, which makes it hard) to direct what goes to user@ and what goes
to SO?


On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz 
wrote:

> Damn, I always thought that mailing list is only for nice and welcoming
> people and there is nothing to do for me here >:)
>
> To be serious though, there are many questions on the users list which
> would fit just fine on SO but it is not true in general. There are dozens
> of questions which are to broad, opinion based, ask for external resources
> and so on. If you want to direct users to SO you have to help them to
> decide if it is the right channel. Otherwise it will just create a really
> bad experience for both seeking help and active answerers. Former ones will
> be downvoted and bashed, latter ones will have to deal with handling all
> the junk and the number of active Spark users with moderation privileges is
> really low (with only Massg and me being able to directly close duplicates).
>
> Believe me, I've seen this before.
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz 
> wrote:
>
>> You have to remember that Stack Overflow crowd (like me) is highly
>> opinionated, so many questions, which could be just fine on the mailing
>> list, will be quickly downvoted and / or closed as off-topic. Just
>> saying...
>>
>> --
>> Best,
>> Maciej
>>
>>
>> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>>
>> OK I've checked on the ASF member list (which is private so there is no
>> public archive).
>>
>> It is not against any ASF rule to recommend StackOverflow as a place for
>> users to ask questions. I don't think we can or should delete the existing
>> user@spark list either, but we can certainly make SO more visible than
>> it is.
>>
>>
>>
>> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin  wrote:
>>
>>> Actually after talking with more ASF members, I believe the only policy
>>> is that development decisions have to be made and announced on ASF
>>> properties (dev list or jira), but user questions don't have to.
>>>
>>> I'm going to double check this. If it is true, I would actually
>>> recommend us moving entirely over the Q&A part of the user list to
>>> stackoverflow, or at least make that the recommended way rather than the
>>> existing user list which is not very scalable.
>>>
>>>
>>> On Wednesday, November 2, 2016, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 We’ve discussed several times upgrading our communication tools, as far
 back as 2014 and maybe even before that too. The bottom line is that we
 can’t due to ASF rules requiring the use of ASF-managed mailing lists.

 For some history, see this discussion:

- https://mail-archives.apache.org/mod_mbox/spark-user/201412.
mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@
mail.gmail.com%3E

 
- https://mail-archives.apache.org/mod_mbox/spark-user/201501.
mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@
mail.gmail.com%3E

 

 (It’s ironic that it’s difficult to follow the past discussion on why
 we can’t change our official communication tools due to those very tools…)

 Nick
 ​

 On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <
 ricardo.alme...@actnowib.com> wrote:

> I fell Assaf point is quite relevant if we want to move this project
> forward from the Spark user perspective (as I do). In fact, we're
> still using 20th century tools (mailing lists) with some add-ons (like
> Stack Overflow).
>
> As usually, Sean and Cody's contributions are very to the point.
> I fell it is indeed a matter of of culture (hard to enforce) and tools
> (much easier). Isn't it?
>
> On 2 November 2016 at 16:36, Cody Koeninger 
> wrote:
>
>> So concrete things people could do
>>
>> - users could tag subject lines appropriately to the component they're
>> asking about
>>
>> - contributors could monitor user@ for tags relating to components
>> they've worked on.
>> I'd be surprised if my miss rate for any mailing list questions
>> well-labeled as Kafka was higher than 5%
>>
>> - committers could be more aggressive about soliciting and merging PRs
>> to improve documentation.
>> It's a lot easier to answer even poorly-asked

RE: Handling questions in the mailing lists

2016-11-06 Thread assaf.mendelson
There are other options as well. For example hosting an answerhub 
(www.answerhub.com) or other similar separate Q&A 
service.
BTW, I believe the main issue is not how opinionated people are but who is 
answering questions.
Today there are already people asking (and getting answers) on SO (including 
myself). The problem is that many people do not go to SO.
The problem I see is how to “bump” up questions which are not being answered to 
someone more likely to be able to answer them. Simple questions can be answered 
by many people, many of them even newbies who ran into the issue themselves.
The main issue is that the more complex the question, the less people there are 
who can answer it and those people’s bandwidth is already clogged by other 
questions.
We could for example try to create tags on SO for “basic questions”, “medium”, 
“advanced”. Provide guidelines to ask first on basic, if not answered after X 
days then add the medium tag etc. Downvote people who don’t go by the process. 
This would mean that committers for example can look at advanced only tag and 
have a manageable number of questions they can help with while others can 
answer medium and basic.

I agree that some things are not good for SO. Basically stuff which asks for 
opinion is such but most cases in the mailing list are either “how do I solve 
this bug” or “how do I do X”. Either of those two are good for SO.


Assaf.



From: rxin [via Apache Spark Developers List] 
[mailto:ml-node+s1001551n19757...@n3.nabble.com]
Sent: Monday, November 07, 2016 8:33 AM
To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists

This is an excellent point. If we do go ahead and feature SO as a way for users 
to ask questions more prominently, as someone who knows SO very well, would you 
be willing to help write a short guideline (ideally the shorter the better, 
which makes it hard) to direct what goes to user@ and what goes to SO?


On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden 
email]> wrote:

Damn, I always thought that mailing list is only for nice and welcoming people 
and there is nothing to do for me here >:)

To be serious though, there are many questions on the users list which would 
fit just fine on SO but it is not true in general. There are dozens of 
questions which are to broad, opinion based, ask for external resources and so 
on. If you want to direct users to SO you have to help them to decide if it is 
the right channel. Otherwise it will just create a really bad experience for 
both seeking help and active answerers. Former ones will be downvoted and 
bashed, latter ones will have to deal with handling all the junk and the number 
of active Spark users with moderation privileges is really low (with only Massg 
and me being able to directly close duplicates).

Believe me, I've seen this before.
On 11/07/2016 05:08 AM, Reynold Xin wrote:
You have substantially underestimated how opinionated people can be on mailing 
lists too :)

On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden 
email]> wrote:

You have to remember that Stack Overflow crowd (like me) is highly opinionated, 
so many questions, which could be just fine on the mailing list, will be 
quickly downvoted and / or closed as off-topic. Just saying...

--

Best,

Maciej

On 11/07/2016 04:03 AM, Reynold Xin wrote:
OK I've checked on the ASF member list (which is private so there is no public 
archive).

It is not against any ASF rule to recommend StackOverflow as a place for users 
to ask questions. I don't think we can or should delete the existing user@spark 
list either, but we can certainly make SO more visible than it is.



On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden 
email]> wrote:
Actually after talking with more ASF members, I believe the only policy is that 
development decisions have to be made and announced on ASF properties (dev list 
or jira), but user questions don't have to.

I'm going to double check this. If it is true, I would actually recommend us 
moving entirely over the Q&A part of the user list to stackoverflow, or at 
least make that the recommended way rather than the existing user list which is 
not very scalable.


On Wednesday, November 2, 2016, Nicholas Chammas <[hidden 
email]> wrote:

We’ve discussed several times upgrading our communication tools, as far back as 
2014 and maybe even before that too. The bottom line is that we can’t due to 
ASF rules requiring the use of ASF-managed mailing lists.

For some history, see this discussion:
· 
https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oy5no2dhwj_kveop...@mail.gmail.com%3E
· 
https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=tktxy_...@mail.gmail.com%3E

(It’s ironic that it’s difficult to follow the past discussion on why we can’t 
change our official communication tools due to those very t

Re: Handling questions in the mailing lists

2016-11-06 Thread Matei Zaharia
Even for the mailing list, I'd love to have a short set of instructions on how 
to submit your questions (maybe on http://spark.apache.org/community.html or 
maybe in the welcome email when you subscribe). It would be great if someone 
added that. After all, we have such instructions for contributing PRs, for 
example.

Matei

> On Nov 6, 2016, at 11:09 PM, assaf.mendelson  wrote:
> 
> There are other options as well. For example hosting an answerhub 
> (www.answerhub.com ) or other similar separate Q&A 
> service.
> 
> BTW, I believe the main issue is not how opinionated people are but who is 
> answering questions.
> 
> Today there are already people asking (and getting answers) on SO (including 
> myself). The problem is that many people do not go to SO.
> 
> The problem I see is how to “bump” up questions which are not being answered 
> to someone more likely to be able to answer them. Simple questions can be 
> answered by many people, many of them even newbies who ran into the issue 
> themselves.
> 
> The main issue is that the more complex the question, the less people there 
> are who can answer it and those people’s bandwidth is already clogged by 
> other questions.
> 
> We could for example try to create tags on SO for “basic questions”, 
> “medium”, “advanced”. Provide guidelines to ask first on basic, if not 
> answered after X days then add the medium tag etc. Downvote people who don’t 
> go by the process. This would mean that committers for example can look at 
> advanced only tag and have a manageable number of questions they can help 
> with while others can answer medium and basic.
> 
>  
> 
> I agree that some things are not good for SO. Basically stuff which asks for 
> opinion is such but most cases in the mailing list are either “how do I solve 
> this bug” or “how do I do X”. Either of those two are good for SO.
> 
>  
> 
>  
> 
> Assaf.
> 
>  
> 
>  
> 
>  
> 
> From: rxin [via Apache Spark Developers List] [mailto:ml-node+[hidden email] 
> ] 
> Sent: Monday, November 07, 2016 8:33 AM
> To: Mendelson, Assaf
> Subject: Re: Handling questions in the mailing lists
> 
>  
> 
> This is an excellent point. If we do go ahead and feature SO as a way for 
> users to ask questions more prominently, as someone who knows SO very well, 
> would you be willing to help write a short guideline (ideally the shorter the 
> better, which makes it hard) to direct what goes to user@ and what goes to SO?
> 
>  
> 
>  
> 
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email] 
> > wrote:
> 
> Damn, I always thought that mailing list is only for nice and welcoming 
> people and there is nothing to do for me here >:)
> 
> To be serious though, there are many questions on the users list which would 
> fit just fine on SO but it is not true in general. There are dozens of 
> questions which are to broad, opinion based, ask for external resources and 
> so on. If you want to direct users to SO you have to help them to decide if 
> it is the right channel. Otherwise it will just create a really bad 
> experience for both seeking help and active answerers. Former ones will be 
> downvoted and bashed, latter ones will have to deal with handling all the 
> junk and the number of active Spark users with moderation privileges is 
> really low (with only Massg and me being able to directly close duplicates).
> 
> Believe me, I've seen this before.
> 
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
> 
> You have substantially underestimated how opinionated people can be on 
> mailing lists too :)
> 
> On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email] 
> > wrote:
> 
> You have to remember that Stack Overflow crowd (like me) is highly 
> opinionated, so many questions, which could be just fine on the mailing list, 
> will be quickly downvoted and / or closed as off-topic. Just saying...
> 
> -- 
> Best, 
> Maciej
>  
> 
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
> 
> OK I've checked on the ASF member list (which is private so there is no 
> public archive).
> 
>  
> 
> It is not against any ASF rule to recommend StackOverflow as a place for 
> users to ask questions. I don't think we can or should delete the existing 
> user@spark list either, but we can certainly make SO more visible than it is.
> 
>  
> 
>  
> 
>  
> 
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email] 
> > wrote:
> 
> Actually after talking with more ASF members, I believe the only policy is 
> that development decisions have to be made and announced on ASF properties 
> (dev list or jira), but user questions don't have to. 
> 
>  
> 
> I'm going to double check this. If it is true, I would actually recommend us 
> moving entirely over the Q&A part of the user list to stackoverflow, or at 
> least make that the recommended way rather than the existing user list which 
> is not very scalable. 
> 
> 
> 
> On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email] 
> > wrote:
> 
>