date:20171107

Re: [ML] Migrating transformers from mllib to ml

2017-11-07 Thread Yan Facai

Hi, I have migrated HashingTF from mllib to ml, and wait for review.

see:
[SPARK-21748][ML] Migrate the implementation of HashingTF from MLlib to ML
#18998
https://github.com/apache/spark/pull/18998



On Mon, Nov 6, 2017 at 10:58 PM, Marco Gaido  wrote:

> Hello,
>
> I saw that there are several TODOs to migrate some transformers (like
> HashingTF and IDF) to use only ml.Vector in order to avoid the overhead of
> converting them to the mllib ones and back.
>
> Is there any reason why this has not been done so far? Is it to avoid code
> duplication? If so, is it still an issue since we are going to deprecate
> mllib from 2.3 (at least this is what I read on Spark docs)? If no, I can
> work on this.
>
> Thanks,
> Marco
>
>
>

Re: Jenkins upgrade/Test Parallelization & Containerization

2017-11-07 Thread Holden Karau

Oh and to be clear part of my +1 is dockerized Spark builds would simplify
a lot of headaches we face trying to coordinate changes on the PySpark side
(it's not just oooh shiny faster build times, although that's pretty
compelling in its self).

On Tue, Nov 7, 2017 at 11:20 AM, Holden Karau  wrote:

> Just wanting to +1 this idea. One potential option would be to look at
> migrating away from the AMP Lab Jenkins infra into the ASF infra. I've
> added Josh, Shane, and Sean to the CC line explicitly since I think they
> might have opinions about this.
>
>
> On Tue, Oct 31, 2017 at 11:05 PM, Xin Lu  wrote:
>
>> Hi everyone,
>>
>> I tried sending emails to this list and I'm not sure if it went through
>> so I'm trying again.  Anyway, a couple months ago before I left Databricks
>> I was working on a proof of concept that parallelized Spark tests on
>> jenkins.  The way it worked was basically it build the spark jars and then
>> ran all the tests in a docker container on a bunch of slaves in parallel.
>> This cut the testing time down from 4 hours to approximately 1.5 hours.
>> This required a newer version of jenkins and the Jenkins Pipeline plugin.
>> I am wondering if it is possible to do this on amplab jenkins.  It looks
>> like https://builds.apache.org/ has upgraded so Amplabs jenkins is a
>> year or so behind.  I am happy to help with this project if it is something
>> that people think is worthwhile.
>>
>> Thanks
>>
>> Xin
>>
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
>



-- 
Twitter: https://twitter.com/holdenkarau

Re: Jenkins upgrade/Test Parallelization & Containerization

2017-11-07 Thread Holden Karau

Just wanting to +1 this idea. One potential option would be to look at
migrating away from the AMP Lab Jenkins infra into the ASF infra. I've
added Josh, Shane, and Sean to the CC line explicitly since I think they
might have opinions about this.


On Tue, Oct 31, 2017 at 11:05 PM, Xin Lu  wrote:

> Hi everyone,
>
> I tried sending emails to this list and I'm not sure if it went through so
> I'm trying again.  Anyway, a couple months ago before I left Databricks I
> was working on a proof of concept that parallelized Spark tests on
> jenkins.  The way it worked was basically it build the spark jars and then
> ran all the tests in a docker container on a bunch of slaves in parallel.
> This cut the testing time down from 4 hours to approximately 1.5 hours.
> This required a newer version of jenkins and the Jenkins Pipeline plugin.
> I am wondering if it is possible to do this on amplab jenkins.  It looks
> like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year
> or so behind.  I am happy to help with this project if it is something that
> people think is worthwhile.
>
> Thanks
>
> Xin
>



-- 
Twitter: https://twitter.com/holdenkarau

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-07 Thread Joseph Bradley

+1

On Mon, Nov 6, 2017 at 5:11 PM, Michael Armbrust 
wrote:

> +1
>
> On Sat, Nov 4, 2017 at 11:02 AM, Xiao Li  wrote:
>
>> +1
>>
>> 2017-11-04 11:00 GMT-07:00 Burak Yavuz :
>>
>>> +1
>>>
>>> On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan 
>>> wrote:
>>>
 +1

 On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu 
 wrote:

> +1.
>
> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia  > wrote:
>
>> +1 from me too.
>>
>> Matei
>>
>> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan 
>> wrote:
>> >
>> > +1.
>> >
>> > I think this architecture makes a lot of sense to let executors
>> talk to source/sink directly, and bring very low latency.
>> >
>> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen 
>> wrote:
>> > +0 simply because I don't feel I know enough to have an opinion. I
>> have no reason to doubt the change though, from a skim through the doc.
>> >
>> >
>> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin 
>> wrote:
>> > Earlier I sent out a discussion thread for CP in Structured
>> Streaming:
>> >
>> > https://issues.apache.org/jira/browse/SPARK-20928
>> >
>> > It is meant to be a very small, surgical change to Structured
>> Streaming to enable ultra-low latency. This is great timing because we 
>> are
>> also designing and implementing data source API v2. If designed properly,
>> we can have the same data source API working for both streaming and 
>> batch.
>> >
>> >
>> > Following the SPIP process, I'm putting this SPIP up for a vote.
>> >
>> > +1: Let's go ahead and design / implement the SPIP.
>> > +0: Don't really care.
>> > -1: I do not think this is a good idea for the following reasons.
>> >
>> >
>> >
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


 --
 Regards,
 Vaquar Khan
 +1 -224-436-0783 <(224)%20436-0783>
 Greater Chicago

>>>
>>>
>>
>


-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com]

Re: [ML] Migrating transformers from mllib to ml

2017-11-07 Thread Joseph Bradley

Hi, we do still want to do this migration; it's just been a bit stalled due
to low bandwidth.  There are still a few feature parity items which need to
be completed, so the deprecation will likely not happen until after 2.3.
Joseph

On Tue, Nov 7, 2017 at 12:38 AM, 颜发才(Yan Facai)  wrote:

> Hi, I have migrated HashingTF from mllib to ml, and wait for review.
>
> see:
> [SPARK-21748][ML] Migrate the implementation of HashingTF from MLlib to ML
> #18998
> https://github.com/apache/spark/pull/18998
>
>
>
> On Mon, Nov 6, 2017 at 10:58 PM, Marco Gaido 
> wrote:
>
>> Hello,
>>
>> I saw that there are several TODOs to migrate some transformers (like
>> HashingTF and IDF) to use only ml.Vector in order to avoid the overhead of
>> converting them to the mllib ones and back.
>>
>> Is there any reason why this has not been done so far? Is it to avoid
>> code duplication? If so, is it still an issue since we are going to
>> deprecate mllib from 2.3 (at least this is what I read on Spark docs)? If
>> no, I can work on this.
>>
>> Thanks,
>> Marco
>>
>>
>>
>


-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com]

Re: Jenkins upgrade/Test Parallelization & Containerization

2017-11-07 Thread Sean Owen

Faster tests would be great. I recall that the straightforward ways to
parallelize via Maven haven't worked because many tests collide with one
another. Is this about running each module's tests in a container? that
should work.

I can see how this is becoming essential for repeatable and reliable
Python/R builds, which depend on the environment to a much greater extent
than the JVM does.

I don't have a strong preference for AMPLab vs ASF builds. I suppose using
the ASF machinery is a little tidier. If it's got a later Jenkins that's
required, also a plus, but I assume updating AMPLab isn't so hard here
either. I think the key issue is which environment is easier to control and
customize over time.

On Wed, Nov 1, 2017 at 6:05 AM Xin Lu  wrote:

> Hi everyone,
>
> I tried sending emails to this list and I'm not sure if it went through so
> I'm trying again.  Anyway, a couple months ago before I left Databricks I
> was working on a proof of concept that parallelized Spark tests on
> jenkins.  The way it worked was basically it build the spark jars and then
> ran all the tests in a docker container on a bunch of slaves in parallel.
> This cut the testing time down from 4 hours to approximately 1.5 hours.
> This required a newer version of jenkins and the Jenkins Pipeline plugin.
> I am wondering if it is possible to do this on amplab jenkins.  It looks
> like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year
> or so behind.  I am happy to help with this project if it is something that
> people think is worthwhile.
>
> Thanks
>
> Xin
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-07 Thread Shixiong(Ryan) Zhu

+1

On Tue, Nov 7, 2017 at 1:34 PM, Joseph Bradley 
wrote:

> +1
>
> On Mon, Nov 6, 2017 at 5:11 PM, Michael Armbrust 
> wrote:
>
>> +1
>>
>> On Sat, Nov 4, 2017 at 11:02 AM, Xiao Li  wrote:
>>
>>> +1
>>>
>>> 2017-11-04 11:00 GMT-07:00 Burak Yavuz :
>>>
 +1

 On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan 
 wrote:

> +1
>
> On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu 
> wrote:
>
>> +1.
>>
>> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia <
>> matei.zaha...@gmail.com> wrote:
>>
>>> +1 from me too.
>>>
>>> Matei
>>>
>>> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan 
>>> wrote:
>>> >
>>> > +1.
>>> >
>>> > I think this architecture makes a lot of sense to let executors
>>> talk to source/sink directly, and bring very low latency.
>>> >
>>> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen 
>>> wrote:
>>> > +0 simply because I don't feel I know enough to have an opinion. I
>>> have no reason to doubt the change though, from a skim through the doc.
>>> >
>>> >
>>> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin 
>>> wrote:
>>> > Earlier I sent out a discussion thread for CP in Structured
>>> Streaming:
>>> >
>>> > https://issues.apache.org/jira/browse/SPARK-20928
>>> >
>>> > It is meant to be a very small, surgical change to Structured
>>> Streaming to enable ultra-low latency. This is great timing because we 
>>> are
>>> also designing and implementing data source API v2. If designed 
>>> properly,
>>> we can have the same data source API working for both streaming and 
>>> batch.
>>> >
>>> >
>>> > Following the SPIP process, I'm putting this SPIP up for a vote.
>>> >
>>> > +1: Let's go ahead and design / implement the SPIP.
>>> > +0: Don't really care.
>>> > -1: I do not think this is a good idea for the following reasons.
>>> >
>>> >
>>> >
>>>
>>>
>>> 
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>
>
> --
> Regards,
> Vaquar Khan
> +1 -224-436-0783 <(224)%20436-0783>
> Greater Chicago
>


>>>
>>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] 
>

Re: Jenkins upgrade/Test Parallelization & Containerization

2017-11-07 Thread Holden Karau

True, I think we've seen that the Amp Lab Jenkins needs to be more focused
on running AMP Lab projects, and while I don't know how difficult the ASF
Jenkins is I assume it might be an easier place to make changes going
forward? (Of course this could be the grass is greener on the other side
and I don't mean to say it's been hard to make changes on the AMP lab
hardware, folks have been amazingly helpful - its just the projects on each
have different needs).

On Tue, Nov 7, 2017 at 12:52 PM, Sean Owen  wrote:

> Faster tests would be great. I recall that the straightforward ways to
> parallelize via Maven haven't worked because many tests collide with one
> another. Is this about running each module's tests in a container? that
> should work.
>
> I can see how this is becoming essential for repeatable and reliable
> Python/R builds, which depend on the environment to a much greater extent
> than the JVM does.
>
> I don't have a strong preference for AMPLab vs ASF builds. I suppose using
> the ASF machinery is a little tidier. If it's got a later Jenkins that's
> required, also a plus, but I assume updating AMPLab isn't so hard here
> either. I think the key issue is which environment is easier to control and
> customize over time.
>
>
> On Wed, Nov 1, 2017 at 6:05 AM Xin Lu  wrote:
>
>> Hi everyone,
>>
>> I tried sending emails to this list and I'm not sure if it went through
>> so I'm trying again.  Anyway, a couple months ago before I left Databricks
>> I was working on a proof of concept that parallelized Spark tests on
>> jenkins.  The way it worked was basically it build the spark jars and then
>> ran all the tests in a docker container on a bunch of slaves in parallel.
>> This cut the testing time down from 4 hours to approximately 1.5 hours.
>> This required a newer version of jenkins and the Jenkins Pipeline plugin.
>> I am wondering if it is possible to do this on amplab jenkins.  It looks
>> like https://builds.apache.org/ has upgraded so Amplabs jenkins is a
>> year or so behind.  I am happy to help with this project if it is something
>> that people think is worthwhile.
>>
>> Thanks
>>
>> Xin
>>
>


-- 
Twitter: https://twitter.com/holdenkarau

Re: Jenkins upgrade/Test Parallelization & Containerization

2017-11-07 Thread Reynold Xin

My understanding is that AMP actually can provide more resources or adapt
changes, while ASF needs to manage 200+ projects and it's hard to
accommodate much. I could be wrong though.


On Tue, Nov 7, 2017 at 2:14 PM, Holden Karau  wrote:

> True, I think we've seen that the Amp Lab Jenkins needs to be more focused
> on running AMP Lab projects, and while I don't know how difficult the ASF
> Jenkins is I assume it might be an easier place to make changes going
> forward? (Of course this could be the grass is greener on the other side
> and I don't mean to say it's been hard to make changes on the AMP lab
> hardware, folks have been amazingly helpful - its just the projects on each
> have different needs).
>
> On Tue, Nov 7, 2017 at 12:52 PM, Sean Owen  wrote:
>
>> Faster tests would be great. I recall that the straightforward ways to
>> parallelize via Maven haven't worked because many tests collide with one
>> another. Is this about running each module's tests in a container? that
>> should work.
>>
>> I can see how this is becoming essential for repeatable and reliable
>> Python/R builds, which depend on the environment to a much greater extent
>> than the JVM does.
>>
>> I don't have a strong preference for AMPLab vs ASF builds. I suppose
>> using the ASF machinery is a little tidier. If it's got a later Jenkins
>> that's required, also a plus, but I assume updating AMPLab isn't so hard
>> here either. I think the key issue is which environment is easier to
>> control and customize over time.
>>
>>
>> On Wed, Nov 1, 2017 at 6:05 AM Xin Lu  wrote:
>>
>>> Hi everyone,
>>>
>>> I tried sending emails to this list and I'm not sure if it went through
>>> so I'm trying again.  Anyway, a couple months ago before I left Databricks
>>> I was working on a proof of concept that parallelized Spark tests on
>>> jenkins.  The way it worked was basically it build the spark jars and then
>>> ran all the tests in a docker container on a bunch of slaves in parallel.
>>> This cut the testing time down from 4 hours to approximately 1.5 hours.
>>> This required a newer version of jenkins and the Jenkins Pipeline plugin.
>>> I am wondering if it is possible to do this on amplab jenkins.  It looks
>>> like https://builds.apache.org/ has upgraded so Amplabs jenkins is a
>>> year or so behind.  I am happy to help with this project if it is something
>>> that people think is worthwhile.
>>>
>>> Thanks
>>>
>>> Xin
>>>
>>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
>

Re: Jenkins upgrade/Test Parallelization & Containerization

2017-11-07 Thread Holden Karau

That makes sense, in that case do we know how hard it would be to make the
necessary hands to the AMP Lab Jenkins to support this?

On Tue, Nov 7, 2017 at 4:04 PM Reynold Xin  wrote:

> My understanding is that AMP actually can provide more resources or adapt
> changes, while ASF needs to manage 200+ projects and it's hard to
> accommodate much. I could be wrong though.
>
>
> On Tue, Nov 7, 2017 at 2:14 PM, Holden Karau  wrote:
>
>> True, I think we've seen that the Amp Lab Jenkins needs to be more
>> focused on running AMP Lab projects, and while I don't know how difficult
>> the ASF Jenkins is I assume it might be an easier place to make changes
>> going forward? (Of course this could be the grass is greener on the other
>> side and I don't mean to say it's been hard to make changes on the AMP lab
>> hardware, folks have been amazingly helpful - its just the projects on each
>> have different needs).
>>
>> On Tue, Nov 7, 2017 at 12:52 PM, Sean Owen  wrote:
>>
>>> Faster tests would be great. I recall that the straightforward ways to
>>> parallelize via Maven haven't worked because many tests collide with one
>>> another. Is this about running each module's tests in a container? that
>>> should work.
>>>
>>> I can see how this is becoming essential for repeatable and reliable
>>> Python/R builds, which depend on the environment to a much greater extent
>>> than the JVM does.
>>>
>>> I don't have a strong preference for AMPLab vs ASF builds. I suppose
>>> using the ASF machinery is a little tidier. If it's got a later Jenkins
>>> that's required, also a plus, but I assume updating AMPLab isn't so hard
>>> here either. I think the key issue is which environment is easier to
>>> control and customize over time.
>>>
>>>
>>> On Wed, Nov 1, 2017 at 6:05 AM Xin Lu  wrote:
>>>
 Hi everyone,

 I tried sending emails to this list and I'm not sure if it went through
 so I'm trying again.  Anyway, a couple months ago before I left Databricks
 I was working on a proof of concept that parallelized Spark tests on
 jenkins.  The way it worked was basically it build the spark jars and then
 ran all the tests in a docker container on a bunch of slaves in parallel.
 This cut the testing time down from 4 hours to approximately 1.5 hours.
 This required a newer version of jenkins and the Jenkins Pipeline plugin.
 I am wondering if it is possible to do this on amplab jenkins.  It looks
 like https://builds.apache.org/ has upgraded so Amplabs jenkins is a
 year or so behind.  I am happy to help with this project if it is something
 that people think is worthwhile.

 Thanks

 Xin

>>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
> --
Twitter: https://twitter.com/holdenkarau

Re: Jenkins upgrade/Test Parallelization & Containerization

2017-11-07 Thread Felix Cheung

We actually have some immediate needs for custom config for some upcoming 
integration tests.

I don't know if such changes are possible in ASF Jenkins but the work is in 
progress in RISELab Jenkins :)



From: holden.ka...@gmail.com  on behalf of Holden Karau 

Sent: Tuesday, November 7, 2017 2:14:18 PM
To: Sean Owen
Cc: Xin Lu; dev@spark.apache.org
Subject: Re: Jenkins upgrade/Test Parallelization & Containerization

True, I think we've seen that the Amp Lab Jenkins needs to be more focused on 
running AMP Lab projects, and while I don't know how difficult the ASF Jenkins 
is I assume it might be an easier place to make changes going forward? (Of 
course this could be the grass is greener on the other side and I don't mean to 
say it's been hard to make changes on the AMP lab hardware, folks have been 
amazingly helpful - its just the projects on each have different needs).

On Tue, Nov 7, 2017 at 12:52 PM, Sean Owen 
mailto:so...@cloudera.com>> wrote:
Faster tests would be great. I recall that the straightforward ways to 
parallelize via Maven haven't worked because many tests collide with one 
another. Is this about running each module's tests in a container? that should 
work.

I can see how this is becoming essential for repeatable and reliable Python/R 
builds, which depend on the environment to a much greater extent than the JVM 
does.

I don't have a strong preference for AMPLab vs ASF builds. I suppose using the 
ASF machinery is a little tidier. If it's got a later Jenkins that's required, 
also a plus, but I assume updating AMPLab isn't so hard here either. I think 
the key issue is which environment is easier to control and customize over time.


On Wed, Nov 1, 2017 at 6:05 AM Xin Lu 
mailto:x...@salesforce.com>> wrote:
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm 
trying again.  Anyway, a couple months ago before I left Databricks I was 
working on a proof of concept that parallelized Spark tests on jenkins.  The 
way it worked was basically it build the spark jars and then ran all the tests 
in a docker container on a bunch of slaves in parallel.  This cut the testing 
time down from 4 hours to approximately 1.5 hours.  This required a newer 
version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is 
possible to do this on amplab jenkins.  It looks like 
https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so 
behind.  I am happy to help with this project if it is something that people 
think is worthwhile.

Thanks

Xin



--
Twitter: https://twitter.com/holdenkarau

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-07 Thread Reynold Xin

The vote has passed with the following +1s:

Reynold Xin*
Debasish Das
Noman Khan
Wenchen Fan*
Matei Zaharia*
Weichen Xu
Vaquar Khan
Burak Yavuz
Xiao Li
Tom Graves*
Michael Armbrust*
Joseph Bradley*
Shixiong Zhu*


And the following +0s:

Sean Owen*


Thanks for the feedback!


On Wed, Nov 1, 2017 at 8:37 AM, Reynold Xin  wrote:

> Earlier I sent out a discussion thread for CP in Structured Streaming:
>
> https://issues.apache.org/jira/browse/SPARK-20928
>
> It is meant to be a very small, surgical change to Structured Streaming to
> enable ultra-low latency. This is great timing because we are also
> designing and implementing data source API v2. If designed properly, we can
> have the same data source API working for both streaming and batch.
>
>
> Following the SPIP process, I'm putting this SPIP up for a vote.
>
> +1: Let's go ahead and design / implement the SPIP.
> +0: Don't really care.
> -1: I do not think this is a good idea for the following reasons.
>
>
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-07 Thread Saisai Shao

+1, looking forward to more design details of this feature.

Thanks
Jerry

On Wed, Nov 8, 2017 at 6:40 AM, Shixiong(Ryan) Zhu 
wrote:

> +1
>
> On Tue, Nov 7, 2017 at 1:34 PM, Joseph Bradley 
> wrote:
>
>> +1
>>
>> On Mon, Nov 6, 2017 at 5:11 PM, Michael Armbrust 
>> wrote:
>>
>>> +1
>>>
>>> On Sat, Nov 4, 2017 at 11:02 AM, Xiao Li  wrote:
>>>
 +1

 2017-11-04 11:00 GMT-07:00 Burak Yavuz :

> +1
>
> On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan 
> wrote:
>
>> +1
>>
>> On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu > > wrote:
>>
>>> +1.
>>>
>>> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia <
>>> matei.zaha...@gmail.com> wrote:
>>>
 +1 from me too.

 Matei

 > On Nov 3, 2017, at 4:59 PM, Wenchen Fan 
 wrote:
 >
 > +1.
 >
 > I think this architecture makes a lot of sense to let executors
 talk to source/sink directly, and bring very low latency.
 >
 > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen 
 wrote:
 > +0 simply because I don't feel I know enough to have an opinion.
 I have no reason to doubt the change though, from a skim through the 
 doc.
 >
 >
 > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin 
 wrote:
 > Earlier I sent out a discussion thread for CP in Structured
 Streaming:
 >
 > https://issues.apache.org/jira/browse/SPARK-20928
 >
 > It is meant to be a very small, surgical change to Structured
 Streaming to enable ultra-low latency. This is great timing because we 
 are
 also designing and implementing data source API v2. If designed 
 properly,
 we can have the same data source API working for both streaming and 
 batch.
 >
 >
 > Following the SPIP process, I'm putting this SPIP up for a vote.
 >
 > +1: Let's go ahead and design / implement the SPIP.
 > +0: Don't really care.
 > -1: I do not think this is a good idea for the following reasons.
 >
 >
 >


 
 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>>
>>
>>
>> --
>> Regards,
>> Vaquar Khan
>> +1 -224-436-0783 <(224)%20436-0783>
>> Greater Chicago
>>
>
>

>>>
>>
>>
>> --
>>
>> Joseph Bradley
>>
>> Software Engineer - Machine Learning
>>
>> Databricks, Inc.
>>
>> [image: http://databricks.com] 
>>
>
>

Re: [ML] Migrating transformers from mllib to ml

Re: Jenkins upgrade/Test Parallelization & Containerization

Re: Jenkins upgrade/Test Parallelization & Containerization

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Re: [ML] Migrating transformers from mllib to ml

Re: Jenkins upgrade/Test Parallelization & Containerization

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Re: Jenkins upgrade/Test Parallelization & Containerization

Re: Jenkins upgrade/Test Parallelization & Containerization

Re: Jenkins upgrade/Test Parallelization & Containerization

Re: Jenkins upgrade/Test Parallelization & Containerization

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

13 matches

Site Navigation

Mail list logo

Footer information