Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Sean Owen
It's worth calling attention to:

https://issues.apache.org/jira/browse/SPARK-17418
https://issues.apache.org/jira/browse/SPARK-17422

It looks like we need to at least not publish the kinesis *assembly*
Maven artifact because it contains Amazon Software Licensed-code
directly.

However there's a reasonably strong reason to believe that we'd have
to remove the non-assembly Kinesis artifact too, as well as the
Ganglia one. This doesn't mean it goes away from the project, just
means it would no longer be published as a Maven artifact. (These have
never been bundled in the main Spark artifacts.)

I wanted to give a heads up to see if anyone a) believes this
conclusion is wrong or b) wants to take it up with legal@? I'm
inclined to believe we have to remove them given the interpretation
Luciano has put forth.

Sean

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



implement UDF/UDAF supporting whole stage codegen

2016-09-07 Thread assaf.mendelson
Hi,
I want to write a UDF/UDAF which provides native processing performance. 
Currently, when creating a UDF/UDAF in a normal manner the performance is hit 
because it breaks optimizations.
For a simple example I wanted to create a UDF which tests whether the value is 
smaller than 10.
I tried something like this :

import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.util.TypeUtils
import org.apache.spark.sql.types._
import org.apache.spark.util.Utils
import org.apache.spark.sql.catalyst.expressions._

case class genf(child: Expression) extends UnaryExpression with Predicate with 
ImplicitCastInputTypes {

  override def inputTypes: Seq[AbstractDataType] = Seq(IntegerType)

  override def toString: String = s"$child < 10"

  override def eval(input: InternalRow): Any = {
val value = child.eval(input)
if (value == null)
{
  false
} else {
  child.dataType match {
case IntegerType => value.asInstanceOf[Int] < 10
  }
}
  }

  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
   defineCodeGen(ctx, ev, c => s"($c) < 10")
  }
}


However, this doesn't work as some of the underlying classes/traits are private 
(e.g. AbstractDataType is private) making it problematic to create a new case 
class.
Is there a way to do it? The idea is to provide a couple of jars with a bunch 
of functions our team needs.
Thanks,
Assaf.





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/implement-UDF-UDAF-supporting-whole-stage-codegen-tp18874.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

How to get 2 years prior date from currentdate using Spark Sql

2016-09-07 Thread farman.bsse1855
I need to derive 2 years prior date of current date using a query in Spark
Sql. For ex : today's date is 2016-09-07. I need to get the date exactly 2
years before this date in the above format (-MM-DD).

Please let me know if there are multiple approaches and which one would be
better.

Thanks



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-get-2-years-prior-date-from-currentdate-using-Spark-Sql-tp18875.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: How to get 2 years prior date from currentdate using Spark Sql

2016-09-07 Thread Yong Zhang
https://issues.apache.org/jira/browse/SPARK-8185

[SPARK-8185] date/time function: datediff - ASF 
JIRA
issues.apache.org
Spark; SPARK-8159 Improve expression function coverage (Spark 1.5) SPARK-8185; 
date/time function: datediff





From: farman.bsse1855 
Sent: Wednesday, September 7, 2016 7:27 AM
To: dev@spark.apache.org
Subject: How to get 2 years prior date from currentdate using Spark Sql

I need to derive 2 years prior date of current date using a query in Spark
Sql. For ex : today's date is 2016-09-07. I need to get the date exactly 2
years before this date in the above format (-MM-DD).

Please let me know if there are multiple approaches and which one would be
better.

Thanks



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-get-2-years-prior-date-from-currentdate-using-Spark-Sql-tp18875.html
Apache Spark Developers List - How to get 2 years prior date from currentdate 
using Spark 
Sql
apache-spark-developers-list.1001551.n3.nabble.com
How to get 2 years prior date from currentdate using Spark Sql. I need to 
derive 2 years prior date of current date using a query in Spark Sql. For ex : 
today's date is 2016-09-07. I need to get the...


Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: How to get 2 years prior date from currentdate using Spark Sql

2016-09-07 Thread Yong Zhang
sorry, should be date_sub


https://issues.apache.org/jira/browse/SPARK-8187

[SPARK-8187] date/time function: date_sub - ASF 
JIRA
issues.apache.org
Apache Spark added a comment - 12/Jun/15 06:56 User 'adrian-wang' has created a 
pull request for this issue: https://github.com/apache/spark/pull/6782





From: Yong Zhang 
Sent: Wednesday, September 7, 2016 9:13 AM
To: farman.bsse1855; dev@spark.apache.org
Subject: Re: How to get 2 years prior date from currentdate using Spark Sql


https://issues.apache.org/jira/browse/SPARK-8185

[SPARK-8185] date/time function: datediff - ASF 
JIRA
issues.apache.org
Spark; SPARK-8159 Improve expression function coverage (Spark 1.5) SPARK-8185; 
date/time function: datediff





From: farman.bsse1855 
Sent: Wednesday, September 7, 2016 7:27 AM
To: dev@spark.apache.org
Subject: How to get 2 years prior date from currentdate using Spark Sql

I need to derive 2 years prior date of current date using a query in Spark
Sql. For ex : today's date is 2016-09-07. I need to get the date exactly 2
years before this date in the above format (-MM-DD).

Please let me know if there are multiple approaches and which one would be
better.

Thanks



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-get-2-years-prior-date-from-currentdate-using-Spark-Sql-tp18875.html
Apache Spark Developers List - How to get 2 years prior date from currentdate 
using Spark 
Sql
apache-spark-developers-list.1001551.n3.nabble.com
How to get 2 years prior date from currentdate using Spark Sql. I need to 
derive 2 years prior date of current date using a query in Spark Sql. For ex : 
today's date is 2016-09-07. I need to get the...


Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: How to get 2 years prior date from currentdate using Spark Sql

2016-09-07 Thread Herman van Hövell tot Westerflier
This is more a @use question.

You can write the following in sql: select date '2016-09-07' - interval 2
years

HTH

On Wed, Sep 7, 2016 at 3:14 PM, Yong Zhang  wrote:

> sorry, should be date_sub
>
>
> https://issues.apache.org/jira/browse/SPARK-8187
> [SPARK-8187] date/time function: date_sub - ASF JIRA
> 
> issues.apache.org
> Apache Spark added a comment - 12/Jun/15 06:56 User 'adrian-wang' has
> created a pull request for this issue: https://github.com/apache/
> spark/pull/6782
>
>
>
> --
> *From:* Yong Zhang 
> *Sent:* Wednesday, September 7, 2016 9:13 AM
> *To:* farman.bsse1855; dev@spark.apache.org
> *Subject:* Re: How to get 2 years prior date from currentdate using Spark
> Sql
>
>
> https://issues.apache.org/jira/browse/SPARK-8185
> [SPARK-8185] date/time function: datediff - ASF JIRA
> 
> issues.apache.org
> Spark; SPARK-8159 Improve expression function coverage (Spark 1.5)
> SPARK-8185; date/time function: datediff
>
>
>
> --
> *From:* farman.bsse1855 
> *Sent:* Wednesday, September 7, 2016 7:27 AM
> *To:* dev@spark.apache.org
> *Subject:* How to get 2 years prior date from currentdate using Spark Sql
>
> I need to derive 2 years prior date of current date using a query in Spark
> Sql. For ex : today's date is 2016-09-07. I need to get the date exactly 2
> years before this date in the above format (-MM-DD).
>
> Please let me know if there are multiple approaches and which one would be
> better.
>
> Thanks
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/How-to-get-2-years-
> prior-date-from-currentdate-using-Spark-Sql-tp18875.html
> Apache Spark Developers List - How to get 2 years prior date from
> currentdate using Spark Sql
> 
> apache-spark-developers-list.1001551.n3.nabble.com
> How to get 2 years prior date from currentdate using Spark Sql. I need to
> derive 2 years prior date of current date using a query in Spark Sql. For
> ex : today's date is 2016-09-07. I need to get the...
>
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Discuss SparkR executors/workers support virtualenv

2016-09-07 Thread Shivaram Venkataraman
I think this makes sense -- making it easier to use additional R
packages would be a good feature. I am not sure we need Packrat for
this use case though. Lets continue discussion on the JIRA at
https://issues.apache.org/jira/browse/SPARK-17428

Thanks
Shivaram

On Tue, Sep 6, 2016 at 11:36 PM, Yanbo Liang  wrote:
> Hi All,
>
>
> Many users have requirements to use third party R packages in
> executors/workers, but SparkR can not satisfy this requirements elegantly.
> For example, you should to mess with the IT/administrators of the cluster to
> deploy these R packages on each executors/workers node which is very
> inflexible.
>
> I think we should support third party R packages for SparkR users as what we
> do for jar packages in the following two scenarios:
> 1, Users can install R packages from CRAN or custom CRAN-like repository for
> each executors.
> 2, Users can load their local R packages and install them on each executors.
>
> To achieve this goal, the first thing is to make SparkR executors support
> virtualenv like Python conda. I have investigated and found
> packrat(http://rstudio.github.io/packrat/) is one of the candidates to
> support virtualenv for R. Packrat is a dependency management system for R
> and can isolate the dependent R packages in its own private package space.
> Then SparkR users can install third party packages in the application
> scope(destroy after the application exit) and don’t need to bother
> IT/administrators to install these packages manually.
>
> I would like to know whether it make sense.
>
>
> Thanks
>
> Yanbo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Cody Koeninger
I don't see a reason to remove the non-assembly artifact, why would
you?  You're not distributing copies of Amazon licensed code, and the
Amazon license goes out of its way not to over-reach regarding
derivative works.

This seems pretty clearly to fall in the spirit of

http://www.apache.org/legal/resolved.html#optional

I certainly think the majority of Spark users will still want to use
Spark without adding Kinesis

On Wed, Sep 7, 2016 at 3:29 AM, Sean Owen  wrote:
> It's worth calling attention to:
>
> https://issues.apache.org/jira/browse/SPARK-17418
> https://issues.apache.org/jira/browse/SPARK-17422
>
> It looks like we need to at least not publish the kinesis *assembly*
> Maven artifact because it contains Amazon Software Licensed-code
> directly.
>
> However there's a reasonably strong reason to believe that we'd have
> to remove the non-assembly Kinesis artifact too, as well as the
> Ganglia one. This doesn't mean it goes away from the project, just
> means it would no longer be published as a Maven artifact. (These have
> never been bundled in the main Spark artifacts.)
>
> I wanted to give a heads up to see if anyone a) believes this
> conclusion is wrong or b) wants to take it up with legal@? I'm
> inclined to believe we have to remove them given the interpretation
> Luciano has put forth.
>
> Sean
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Mridul Muralidharan
I agree, we should not be publishing both of them.
Thanks for bringing this up !

Regards,
Mridul


On Wed, Sep 7, 2016 at 1:29 AM, Sean Owen  wrote:
> It's worth calling attention to:
>
> https://issues.apache.org/jira/browse/SPARK-17418
> https://issues.apache.org/jira/browse/SPARK-17422
>
> It looks like we need to at least not publish the kinesis *assembly*
> Maven artifact because it contains Amazon Software Licensed-code
> directly.
>
> However there's a reasonably strong reason to believe that we'd have
> to remove the non-assembly Kinesis artifact too, as well as the
> Ganglia one. This doesn't mean it goes away from the project, just
> means it would no longer be published as a Maven artifact. (These have
> never been bundled in the main Spark artifacts.)
>
> I wanted to give a heads up to see if anyone a) believes this
> conclusion is wrong or b) wants to take it up with legal@? I'm
> inclined to believe we have to remove them given the interpretation
> Luciano has put forth.
>
> Sean
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Sean Owen
(Credit to Luciano for pointing it out)

Yes it's clear why the assembly can't be published but I had the same
question about the non-assembly Kinesis (and ganglia) artifact,
because the published artifact has no code from Kinesis.

See the related discussion at
https://issues.apache.org/jira/browse/LEGAL-198 ; the point I took
from there is that the Spark Kinesis artifact is optional with respect
to Spark, but still something published by Spark, and it requires the
Amazon-licensed code non-optionally.

I'll just ask that question to confirm or deny.

(It also has some background on why the Amazon License is considered
"Category X" in ASF policy due to field of use restrictions. I myself
take that as read rather than know the details of that decision.)

On Wed, Sep 7, 2016 at 6:44 PM, Cody Koeninger  wrote:
> I don't see a reason to remove the non-assembly artifact, why would
> you?  You're not distributing copies of Amazon licensed code, and the
> Amazon license goes out of its way not to over-reach regarding
> derivative works.
>
> This seems pretty clearly to fall in the spirit of
>
> http://www.apache.org/legal/resolved.html#optional
>
> I certainly think the majority of Spark users will still want to use
> Spark without adding Kinesis
>
> On Wed, Sep 7, 2016 at 3:29 AM, Sean Owen  wrote:
>> It's worth calling attention to:
>>
>> https://issues.apache.org/jira/browse/SPARK-17418
>> https://issues.apache.org/jira/browse/SPARK-17422
>>
>> It looks like we need to at least not publish the kinesis *assembly*
>> Maven artifact because it contains Amazon Software Licensed-code
>> directly.
>>
>> However there's a reasonably strong reason to believe that we'd have
>> to remove the non-assembly Kinesis artifact too, as well as the
>> Ganglia one. This doesn't mean it goes away from the project, just
>> means it would no longer be published as a Maven artifact. (These have
>> never been bundled in the main Spark artifacts.)
>>
>> I wanted to give a heads up to see if anyone a) believes this
>> conclusion is wrong or b) wants to take it up with legal@? I'm
>> inclined to believe we have to remove them given the interpretation
>> Luciano has put forth.
>>
>> Sean
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Matei Zaharia
I think you should ask legal about how to have some Maven artifacts for these. 
Both Ganglia and Kinesis are very widely used, so it's weird to ask users to 
build them from source. Maybe the Maven artifacts can be marked as being under 
a different license?

In the initial discussion for LEGAL-198, we were told the following:

"If the component that uses this dependency is not required for the rest of 
Spark to function then you can have a subproject to build the component. See 
http://www.apache.org/legal/resolved.html#optional. This means you will have to 
provide instructions for users to enable the optional component (which IMO 
should provide pointers to the licensing)."

It's not clear whether "enable the optional component" means "every user must 
build it from source", or whether we could tell users "here's a Maven 
coordinate you can add to your project if you're okay with the licensing".

Matei

> On Sep 7, 2016, at 11:35 AM, Sean Owen  wrote:
> 
> (Credit to Luciano for pointing it out)
> 
> Yes it's clear why the assembly can't be published but I had the same
> question about the non-assembly Kinesis (and ganglia) artifact,
> because the published artifact has no code from Kinesis.
> 
> See the related discussion at
> https://issues.apache.org/jira/browse/LEGAL-198 ; the point I took
> from there is that the Spark Kinesis artifact is optional with respect
> to Spark, but still something published by Spark, and it requires the
> Amazon-licensed code non-optionally.
> 
> I'll just ask that question to confirm or deny.
> 
> (It also has some background on why the Amazon License is considered
> "Category X" in ASF policy due to field of use restrictions. I myself
> take that as read rather than know the details of that decision.)
> 
> On Wed, Sep 7, 2016 at 6:44 PM, Cody Koeninger  wrote:
>> I don't see a reason to remove the non-assembly artifact, why would
>> you?  You're not distributing copies of Amazon licensed code, and the
>> Amazon license goes out of its way not to over-reach regarding
>> derivative works.
>> 
>> This seems pretty clearly to fall in the spirit of
>> 
>> http://www.apache.org/legal/resolved.html#optional
>> 
>> I certainly think the majority of Spark users will still want to use
>> Spark without adding Kinesis
>> 
>> On Wed, Sep 7, 2016 at 3:29 AM, Sean Owen  wrote:
>>> It's worth calling attention to:
>>> 
>>> https://issues.apache.org/jira/browse/SPARK-17418
>>> https://issues.apache.org/jira/browse/SPARK-17422
>>> 
>>> It looks like we need to at least not publish the kinesis *assembly*
>>> Maven artifact because it contains Amazon Software Licensed-code
>>> directly.
>>> 
>>> However there's a reasonably strong reason to believe that we'd have
>>> to remove the non-assembly Kinesis artifact too, as well as the
>>> Ganglia one. This doesn't mean it goes away from the project, just
>>> means it would no longer be published as a Maven artifact. (These have
>>> never been bundled in the main Spark artifacts.)
>>> 
>>> I wanted to give a heads up to see if anyone a) believes this
>>> conclusion is wrong or b) wants to take it up with legal@? I'm
>>> inclined to believe we have to remove them given the interpretation
>>> Luciano has put forth.
>>> 
>>> Sean
>>> 
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> 
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Sean Owen
Agree, I've asked the question on that thread and will follow it up.
I'd prefer not to pull these unless it's fairly clear it's going to be
against policy.


On Wed, Sep 7, 2016 at 7:57 PM, Matei Zaharia  wrote:
> I think you should ask legal about how to have some Maven artifacts for 
> these. Both Ganglia and Kinesis are very widely used, so it's weird to ask 
> users to build them from source. Maybe the Maven artifacts can be marked as 
> being under a different license?
>
> In the initial discussion for LEGAL-198, we were told the following:
>
> "If the component that uses this dependency is not required for the rest of 
> Spark to function then you can have a subproject to build the component. See 
> http://www.apache.org/legal/resolved.html#optional. This means you will have 
> to provide instructions for users to enable the optional component (which IMO 
> should provide pointers to the licensing)."
>
> It's not clear whether "enable the optional component" means "every user must 
> build it from source", or whether we could tell users "here's a Maven 
> coordinate you can add to your project if you're okay with the licensing".
>
> Matei

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Mridul Muralidharan
It is good to get clarification, but the way I read it, the issue is
whether we publish it as official Apache artifacts (in maven, etc).

Users can of course build it directly (and we can make it easy to do so) -
as they are explicitly agreeing to additional licenses.

Regards
Mridul


On Wednesday, September 7, 2016, Matei Zaharia 
wrote:

> I think you should ask legal about how to have some Maven artifacts for
> these. Both Ganglia and Kinesis are very widely used, so it's weird to ask
> users to build them from source. Maybe the Maven artifacts can be marked as
> being under a different license?
>
> In the initial discussion for LEGAL-198, we were told the following:
>
> "If the component that uses this dependency is not required for the rest
> of Spark to function then you can have a subproject to build the component.
> See http://www.apache.org/legal/resolved.html#optional. This means you
> will have to provide instructions for users to enable the optional
> component (which IMO should provide pointers to the licensing)."
>
> It's not clear whether "enable the optional component" means "every user
> must build it from source", or whether we could tell users "here's a Maven
> coordinate you can add to your project if you're okay with the licensing".
>
> Matei
>
> > On Sep 7, 2016, at 11:35 AM, Sean Owen  > wrote:
> >
> > (Credit to Luciano for pointing it out)
> >
> > Yes it's clear why the assembly can't be published but I had the same
> > question about the non-assembly Kinesis (and ganglia) artifact,
> > because the published artifact has no code from Kinesis.
> >
> > See the related discussion at
> > https://issues.apache.org/jira/browse/LEGAL-198 ; the point I took
> > from there is that the Spark Kinesis artifact is optional with respect
> > to Spark, but still something published by Spark, and it requires the
> > Amazon-licensed code non-optionally.
> >
> > I'll just ask that question to confirm or deny.
> >
> > (It also has some background on why the Amazon License is considered
> > "Category X" in ASF policy due to field of use restrictions. I myself
> > take that as read rather than know the details of that decision.)
> >
> > On Wed, Sep 7, 2016 at 6:44 PM, Cody Koeninger  > wrote:
> >> I don't see a reason to remove the non-assembly artifact, why would
> >> you?  You're not distributing copies of Amazon licensed code, and the
> >> Amazon license goes out of its way not to over-reach regarding
> >> derivative works.
> >>
> >> This seems pretty clearly to fall in the spirit of
> >>
> >> http://www.apache.org/legal/resolved.html#optional
> >>
> >> I certainly think the majority of Spark users will still want to use
> >> Spark without adding Kinesis
> >>
> >> On Wed, Sep 7, 2016 at 3:29 AM, Sean Owen  > wrote:
> >>> It's worth calling attention to:
> >>>
> >>> https://issues.apache.org/jira/browse/SPARK-17418
> >>> https://issues.apache.org/jira/browse/SPARK-17422
> >>>
> >>> It looks like we need to at least not publish the kinesis *assembly*
> >>> Maven artifact because it contains Amazon Software Licensed-code
> >>> directly.
> >>>
> >>> However there's a reasonably strong reason to believe that we'd have
> >>> to remove the non-assembly Kinesis artifact too, as well as the
> >>> Ganglia one. This doesn't mean it goes away from the project, just
> >>> means it would no longer be published as a Maven artifact. (These have
> >>> never been bundled in the main Spark artifacts.)
> >>>
> >>> I wanted to give a heads up to see if anyone a) believes this
> >>> conclusion is wrong or b) wants to take it up with legal@? I'm
> >>> inclined to believe we have to remove them given the interpretation
> >>> Luciano has put forth.
> >>>
> >>> Sean
> >>>
> >>> -
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> >>>
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> >
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
>
>


Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Luciano Resende
On Wed, Sep 7, 2016 at 11:57 AM, Matei Zaharia 
wrote:

> I think you should ask legal about how to have some Maven artifacts for
> these. Both Ganglia and Kinesis are very widely used, so it's weird to ask
> users to build them from source. Maybe the Maven artifacts can be marked as
> being under a different license?
>
>
As long as they are not part of an "Apache licensed"  distribution. Note
that Ganglia seems to have changed license to BSD and we might be able to
better support that.


> In the initial discussion for LEGAL-198, we were told the following:
>
> "If the component that uses this dependency is not required for the rest
> of Spark to function then you can have a subproject to build the component.
> See http://www.apache.org/legal/resolved.html#optional. This means you
> will have to provide instructions for users to enable the optional
> component (which IMO should provide pointers to the licensing)."
>
> It's not clear whether "enable the optional component" means "every user
> must build it from source", or whether we could tell users "here's a Maven
> coordinate you can add to your project if you're okay with the licensing".
>

I think the key here is "optional", while the Kinesis is optional for Spark
(which makes it ok to have it in Spark) it is not optional for Kinesis
extension, which thenm IMHO, does not allow us to publish the Kinesis
artifact either.

But let's wait on the response from Legal before we actually implement a
solution.


>
> Matei
>
> > On Sep 7, 2016, at 11:35 AM, Sean Owen  wrote:
> >
> > (Credit to Luciano for pointing it out)
> >
> > Yes it's clear why the assembly can't be published but I had the same
> > question about the non-assembly Kinesis (and ganglia) artifact,
> > because the published artifact has no code from Kinesis.
> >
> > See the related discussion at
> > https://issues.apache.org/jira/browse/LEGAL-198 ; the point I took
> > from there is that the Spark Kinesis artifact is optional with respect
> > to Spark, but still something published by Spark, and it requires the
> > Amazon-licensed code non-optionally.
> >
> > I'll just ask that question to confirm or deny.
> >
> > (It also has some background on why the Amazon License is considered
> > "Category X" in ASF policy due to field of use restrictions. I myself
> > take that as read rather than know the details of that decision.)
> >
> > On Wed, Sep 7, 2016 at 6:44 PM, Cody Koeninger 
> wrote:
> >> I don't see a reason to remove the non-assembly artifact, why would
> >> you?  You're not distributing copies of Amazon licensed code, and the
> >> Amazon license goes out of its way not to over-reach regarding
> >> derivative works.
> >>
> >> This seems pretty clearly to fall in the spirit of
> >>
> >> http://www.apache.org/legal/resolved.html#optional
> >>
> >> I certainly think the majority of Spark users will still want to use
> >> Spark without adding Kinesis
> >>
> >> On Wed, Sep 7, 2016 at 3:29 AM, Sean Owen  wrote:
> >>> It's worth calling attention to:
> >>>
> >>> https://issues.apache.org/jira/browse/SPARK-17418
> >>> https://issues.apache.org/jira/browse/SPARK-17422
> >>>
> >>> It looks like we need to at least not publish the kinesis *assembly*
> >>> Maven artifact because it contains Amazon Software Licensed-code
> >>> directly.
> >>>
> >>> However there's a reasonably strong reason to believe that we'd have
> >>> to remove the non-assembly Kinesis artifact too, as well as the
> >>> Ganglia one. This doesn't mean it goes away from the project, just
> >>> means it would no longer be published as a Maven artifact. (These have
> >>> never been bundled in the main Spark artifacts.)
> >>>
> >>> I wanted to give a heads up to see if anyone a) believes this
> >>> conclusion is wrong or b) wants to take it up with legal@? I'm
> >>> inclined to believe we have to remove them given the interpretation
> >>> Luciano has put forth.
> >>>
> >>> Sean
> >>>
> >>> -
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Luciano Resende
On Wed, Sep 7, 2016 at 12:20 PM, Mridul Muralidharan 
wrote:

>
> It is good to get clarification, but the way I read it, the issue is
> whether we publish it as official Apache artifacts (in maven, etc).
>
> Users can of course build it directly (and we can make it easy to do so) -
> as they are explicitly agreeing to additional licenses.
>
> Regards
> Mridul
>
>
+1, by providing instructions on how the user would build, and attaching
the license details on the instructions, we are then safe on the legal
aspects of it.



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Cody Koeninger
To be clear, "safe" has very little to do with this.

It's pretty clear that there's very little risk of the spark module
for kinesis being considered a derivative work, much less all of
spark.

The use limitation in 3.3 that caused the amazon license to be put on
the apache X list also doesn't have anything to do with a legal safety
risk here.  Really, what are you going to use a kinesis connector for,
except for connecting to kinesis?


On Wed, Sep 7, 2016 at 2:41 PM, Luciano Resende  wrote:
>
>
> On Wed, Sep 7, 2016 at 12:20 PM, Mridul Muralidharan 
> wrote:
>>
>>
>> It is good to get clarification, but the way I read it, the issue is
>> whether we publish it as official Apache artifacts (in maven, etc).
>>
>> Users can of course build it directly (and we can make it easy to do so) -
>> as they are explicitly agreeing to additional licenses.
>>
>> Regards
>> Mridul
>>
>
> +1, by providing instructions on how the user would build, and attaching the
> license details on the instructions, we are then safe on the legal aspects
> of it.
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Unable to run docker jdbc integrations test ?

2016-09-07 Thread Luciano Resende
It looks like there is nobody running these tests, and after some
dependency upgrades in Spark 2.0 this has stopped working. I have tried to
bring up this but I am having some issues with getting the right
dependencies loaded and satisfying the docker-client expectations.

The question then is: Does the community find value on having these tests
available ? Then we can focus on bringing them up and I can go push my
previous experiments as a WIP PR. Otherwise we should just get rid of these
tests.

Thoughts ?


On Tue, Sep 6, 2016 at 4:05 PM, Suresh Thalamati  wrote:

> Hi,
>
>
> I am getting the following error , when I am trying to run jdbc docker
> integration tests on my laptop.   Any ideas , what I might be be doing
> wrong ?
>
> build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0  -Phive-thriftserver
> -Phive -DskipTests clean install
> build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11
> compile test
>
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> MaxPermSize=512m; support was removed in 8.0
> Discovery starting.
> Discovery completed in 200 milliseconds.
> Run starting. Expected test count is: 10
> MySQLIntegrationSuite:
>
> Error:
> 16/09/06 11:52:00 INFO BlockManagerMaster: Registered BlockManager
> BlockManagerId(driver, 9.31.117.25, 51868)
> *** RUN ABORTED ***
>   java.lang.AbstractMethodError:
>   at org.glassfish.jersey.model.internal.CommonConfig.
> configureAutoDiscoverableProviders(CommonConfig.java:622)
>   at org.glassfish.jersey.client.ClientConfig$State.
> configureAutoDiscoverableProviders(ClientConfig.java:357)
>   at org.glassfish.jersey.client.ClientConfig$State.
> initRuntime(ClientConfig.java:392)
>   at org.glassfish.jersey.client.ClientConfig$State.access$000(
> ClientConfig.java:88)
>   at org.glassfish.jersey.client.ClientConfig$State$3.get(
> ClientConfig.java:120)
>   at org.glassfish.jersey.client.ClientConfig$State$3.get(
> ClientConfig.java:117)
>   at org.glassfish.jersey.internal.util.collection.Values$
> LazyValueImpl.get(Values.java:340)
>   at org.glassfish.jersey.client.ClientConfig.getRuntime(
> ClientConfig.java:726)
>   at org.glassfish.jersey.client.ClientRequest.getConfiguration(
> ClientRequest.java:285)
>   at org.glassfish.jersey.client.JerseyInvocation.
> validateHttpMethodAndEntity(JerseyInvocation.java:126)
>   ...
> 16/09/06 11:52:00 INFO SparkContext: Invoking stop() from shutdown hook
> 16/09/06 11:52:00 INFO MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
>
>
>
> Thanks
> -suresh
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Matei Zaharia
The question is just whether the metadata and instructions involving these 
Maven packages counts as sufficient to tell the user that they have different 
licensing terms. For example, our Ganglia package was called spark-ganglia-lgpl 
(so you'd notice it's a different license even from its name), and our Kinesis 
one was called spark-streaming-kinesis-asl, and our docs both mentioned these 
were under different licensing terms. But is that enough? That's the question.

Matei

> On Sep 7, 2016, at 2:05 PM, Cody Koeninger  wrote:
> 
> To be clear, "safe" has very little to do with this.
> 
> It's pretty clear that there's very little risk of the spark module
> for kinesis being considered a derivative work, much less all of
> spark.
> 
> The use limitation in 3.3 that caused the amazon license to be put on
> the apache X list also doesn't have anything to do with a legal safety
> risk here.  Really, what are you going to use a kinesis connector for,
> except for connecting to kinesis?
> 
> 
> On Wed, Sep 7, 2016 at 2:41 PM, Luciano Resende  wrote:
>> 
>> 
>> On Wed, Sep 7, 2016 at 12:20 PM, Mridul Muralidharan 
>> wrote:
>>> 
>>> 
>>> It is good to get clarification, but the way I read it, the issue is
>>> whether we publish it as official Apache artifacts (in maven, etc).
>>> 
>>> Users can of course build it directly (and we can make it easy to do so) -
>>> as they are explicitly agreeing to additional licenses.
>>> 
>>> Regards
>>> Mridul
>>> 
>> 
>> +1, by providing instructions on how the user would build, and attaching the
>> license details on the instructions, we are then safe on the legal aspects
>> of it.
>> 
>> 
>> 
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Unable to run docker jdbc integrations test ?

2016-09-07 Thread Josh Rosen
I think that these tests are valuable so I'd like to keep them. If
possible, though, we should try to get rid of our dependency on the Spotify
docker-client library, since it's a dependency hell nightmare. Given our
relatively simple use of Docker here, I wonder whether we could just write
some simple scripting over the `docker` command-line tool instead of
pulling in such a problematic library.

On Wed, Sep 7, 2016 at 2:36 PM Luciano Resende  wrote:

> It looks like there is nobody running these tests, and after some
> dependency upgrades in Spark 2.0 this has stopped working. I have tried to
> bring up this but I am having some issues with getting the right
> dependencies loaded and satisfying the docker-client expectations.
>
> The question then is: Does the community find value on having these tests
> available ? Then we can focus on bringing them up and I can go push my
> previous experiments as a WIP PR. Otherwise we should just get rid of these
> tests.
>
> Thoughts ?
>
>
> On Tue, Sep 6, 2016 at 4:05 PM, Suresh Thalamati <
> suresh.thalam...@gmail.com> wrote:
>
>> Hi,
>>
>>
>> I am getting the following error , when I am trying to run jdbc docker
>> integration tests on my laptop.   Any ideas , what I might be be doing
>> wrong ?
>>
>> build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0  -Phive-thriftserver
>> -Phive -DskipTests clean install
>> build/mvn -Pdocker-integration-tests -pl
>> :spark-docker-integration-tests_2.11  compile test
>>
>> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
>> MaxPermSize=512m; support was removed in 8.0
>> Discovery starting.
>> Discovery completed in 200 milliseconds.
>> Run starting. Expected test count is: 10
>> MySQLIntegrationSuite:
>>
>> Error:
>> 16/09/06 11:52:00 INFO BlockManagerMaster: Registered BlockManager
>> BlockManagerId(driver, 9.31.117.25, 51868)
>> *** RUN ABORTED ***
>>   java.lang.AbstractMethodError:
>>   at
>> org.glassfish.jersey.model.internal.CommonConfig.configureAutoDiscoverableProviders(CommonConfig.java:622)
>>   at
>> org.glassfish.jersey.client.ClientConfig$State.configureAutoDiscoverableProviders(ClientConfig.java:357)
>>   at
>> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:392)
>>   at
>> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>>   at
>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>>   at
>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>>   at
>> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>>   at
>> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>>   at
>> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>>   at
>> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>>   ...
>> 16/09/06 11:52:00 INFO SparkContext: Invoking stop() from shutdown hook
>> 16/09/06 11:52:00 INFO MapOutputTrackerMasterEndpoint:
>> MapOutputTrackerMasterEndpoint stopped!
>>
>>
>>
>> Thanks
>> -suresh
>>
>>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: Unable to run docker jdbc integrations test ?

2016-09-07 Thread Luciano Resende
That might be a reasonable and much more simpler approach to try... but if
we resolve these issues, we should make it part of some frequent build to
make sure the build don't regress and that the actual functionality don't
regress either. Let me look into this again...

On Wed, Sep 7, 2016 at 2:46 PM, Josh Rosen  wrote:

> I think that these tests are valuable so I'd like to keep them. If
> possible, though, we should try to get rid of our dependency on the Spotify
> docker-client library, since it's a dependency hell nightmare. Given our
> relatively simple use of Docker here, I wonder whether we could just write
> some simple scripting over the `docker` command-line tool instead of
> pulling in such a problematic library.
>
> On Wed, Sep 7, 2016 at 2:36 PM Luciano Resende 
> wrote:
>
>> It looks like there is nobody running these tests, and after some
>> dependency upgrades in Spark 2.0 this has stopped working. I have tried to
>> bring up this but I am having some issues with getting the right
>> dependencies loaded and satisfying the docker-client expectations.
>>
>> The question then is: Does the community find value on having these tests
>> available ? Then we can focus on bringing them up and I can go push my
>> previous experiments as a WIP PR. Otherwise we should just get rid of these
>> tests.
>>
>> Thoughts ?
>>
>>
>> On Tue, Sep 6, 2016 at 4:05 PM, Suresh Thalamati <
>> suresh.thalam...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>
>>> I am getting the following error , when I am trying to run jdbc docker
>>> integration tests on my laptop.   Any ideas , what I might be be doing
>>> wrong ?
>>>
>>> build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0
>>> -Phive-thriftserver -Phive -DskipTests clean install
>>> build/mvn -Pdocker-integration-tests -pl 
>>> :spark-docker-integration-tests_2.11
>>> compile test
>>>
>>> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
>>> MaxPermSize=512m; support was removed in 8.0
>>> Discovery starting.
>>> Discovery completed in 200 milliseconds.
>>> Run starting. Expected test count is: 10
>>> MySQLIntegrationSuite:
>>>
>>> Error:
>>> 16/09/06 11:52:00 INFO BlockManagerMaster: Registered BlockManager
>>> BlockManagerId(driver, 9.31.117.25, 51868)
>>> *** RUN ABORTED ***
>>>   java.lang.AbstractMethodError:
>>>   at org.glassfish.jersey.model.internal.CommonConfig.
>>> configureAutoDiscoverableProviders(CommonConfig.java:622)
>>>   at org.glassfish.jersey.client.ClientConfig$State.
>>> configureAutoDiscoverableProviders(ClientConfig.java:357)
>>>   at org.glassfish.jersey.client.ClientConfig$State.
>>> initRuntime(ClientConfig.java:392)
>>>   at org.glassfish.jersey.client.ClientConfig$State.access$000(
>>> ClientConfig.java:88)
>>>   at org.glassfish.jersey.client.ClientConfig$State$3.get(
>>> ClientConfig.java:120)
>>>   at org.glassfish.jersey.client.ClientConfig$State$3.get(
>>> ClientConfig.java:117)
>>>   at org.glassfish.jersey.internal.util.collection.Values$
>>> LazyValueImpl.get(Values.java:340)
>>>   at org.glassfish.jersey.client.ClientConfig.getRuntime(
>>> ClientConfig.java:726)
>>>   at org.glassfish.jersey.client.ClientRequest.getConfiguration(
>>> ClientRequest.java:285)
>>>   at org.glassfish.jersey.client.JerseyInvocation.
>>> validateHttpMethodAndEntity(JerseyInvocation.java:126)
>>>   ...
>>> 16/09/06 11:52:00 INFO SparkContext: Invoking stop() from shutdown hook
>>> 16/09/06 11:52:00 INFO MapOutputTrackerMasterEndpoint:
>>> MapOutputTrackerMasterEndpoint stopped!
>>>
>>>
>>>
>>> Thanks
>>> -suresh
>>>
>>>
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


FileStreamSource source checks path eagerly?

2016-09-07 Thread Jacek Laskowski
Hi,

I'm wondering what's the rationale for checking the path option
eagerly in FileStreamSource? My thinking is that until start is called
there's no processing going on that is supposed to happen on executors
(not the driver) with the path available.

I could (and perhaps should) use dfs but IMHO that just hides the real
question of the text source eagerness.

Please help me understand the rationale of the choice. Thanks!

scala> spark.version
res0: String = 2.1.0-SNAPSHOT

scala> spark.readStream.format("text").load("/var/logs")
org.apache.spark.sql.AnalysisException: Path does not exist: /var/logs;
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:229)
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:81)
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:81)
  at 
org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
  at 
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:142)
  at 
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:153)
  ... 48 elided

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org