from:"Tim Hunter"

Re: eager execution and debuggability

2018-05-09 Thread Tim Hunter

The repr() trick is neat when working on a notebook. When working in a library, I used to use an evaluate(dataframe) -> DataFrame function that simply forces the materialization of a dataframe. As Reynold mentions, this is very convenient when working on a lot of chained UDFs, and it is a standard

[ml] Deep learning talks at the Spark Summit Europe

2017-10-10 Thread Tim Hunter

Hello all, following the last Summit, there will be a couple of exciting talks about deep learning and Spark at the next Spark Summit in Dublin. - Deep Dive Into Deep Learning Pipelines, in which we will go even deeper into the technical aspects for an hour-long session - Apache Spark and TensorF

Re: [VOTE][SPIP] SPARK-21866 Image support in Apache Spark

2017-09-28 Thread Tim Hunter

t; > On Sep 23, 2017, at 7:27 AM, Yanbo Liang wrote: >>> > >>> > +1 >>> > >>> > On Sat, Sep 23, 2017 at 7:08 PM, Noman Khan >>> wrote: >>> > +1 >>> > >>> > Regards >>> > Noman >>>

[VOTE][SPIP] SPARK-21866 Image support in Apache Spark

2017-09-21 Thread Tim Hunter

Hello community, I would like to call for a vote on SPARK-21866. It is a short proposal that has important applications for image processing and deep learning. Joseph Bradley has offered to be the shepherd. JIRA ticket: https://issues.apache.org/jira/browse/SPARK-21866 PDF version: https://issues

SPIP: SPARK-21866 Image support in Apache Spark

2017-09-05 Thread Tim Hunter

Hello community, I would like to start a discussion about adding support for images in Spark. We will follow up with a formal vote in two weeks. Please feel free to comment on the JIRA ticket too. JIRA ticket: https://issues.apache.org/jira/browse/SPARK-21866 PDF version: https://issues.apache.or

Re: Question on Spark's graph libraries roadmap

2017-03-13 Thread Tim Hunter

Hello Enzo, since this question is also relevant to Spark, I will answer it here. The goal of GraphFrames is to provide graph capabilities along with excellent integration to the rest of the Spark ecosystem (using modern APIs such as DataFrames). As you seem to be well aware, a large number of gra

Re: [Spark Namespace]: Expanding Spark ML under Different Namespace?

2017-02-24 Thread Tim Hunter

Regarding logging, Graphframes makes a simple wrapper this way: https://github.com/graphframes/graphframes/blob/master/src/main/scala/org/ graphframes/Logging.scala Regarding the UDTs, they have been hidden to be reworked for Datasets, the reasons being detailed here [1]. Can you describe your us

Re: Feedback on MLlib roadmap process proposal

2017-02-23 Thread Tim Hunter

As Sean wrote very nicely above, the changes made to Spark are decided in an organic fashion based on the interests and motivations of the committers and contributors. The case of deep learning is a good example. There is a lot of interest, and the core algorithms could be implemented without too m

Re: Design document - MLlib's statistical package for DataFrames

2017-02-17 Thread Tim Hunter

Hi Brad, this task is focusing on moving the existing algorithms, so that we are held up by parity issues. Do you have some paper suggestions for cardinality? I do not think there is a feature request on JIRA either. Tim On Thu, Feb 16, 2017 at 2:21 PM, bradc wrote: > Hi, > > While it is also

Design document - MLlib's statistical package for DataFrames

2017-02-16 Thread Tim Hunter

Hello all, I have been looking at some of the missing items for complete feature parity between spark.ml and spark.mllib. Here is a proposal for porting mllib.stats, the descriptive statistics package: https://docs.google.com/document/d/1ELVpGV3EBjc2KQPLN9_9_Ge9gWchPZ6SGtDW5tTm_50/edit?usp=sharin

Re: Spark Improvement Proposals

2017-02-16 Thread Tim Hunter

The doc looks good to me. Ryan, the role of the shepherd is to make sure that someone knowledgeable with Spark processes is involved: this person can advise on technical and procedural considerations for people outside the community. Also, if no one is willing to be a shepherd, the proposed idea i

Re: Spark Improvement Proposals

2017-01-05 Thread Tim Hunter

Hi Cody, thank you for bringing up this topic, I agree it is very important to keep a cohesive community around some common, fluid goals. Here are a few comments about the current document: 1. name: it should not overlap with an existing one such as SIP. Can you imagine someone trying to discuss a

GraphFrames 0.2.0 released

2016-08-16 Thread Tim Hunter

Hello all, I have released version 0.2.0 of the GraphFrames package. Apart from a few bug fixes, it is the first release published for Spark 2.0 and both scala 2.10 and 2.11. Please let us know if you have any comment or questions. It is available as a Spark package: https://spark-packages.org/pac

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Tim Hunter

+1 This release passes all tests on the graphframes and tensorframes packages. On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger wrote: > If we're considering backporting changes for the 0.8 kafka > integration, I am sure there are people who would like to get > > https://issues.apache.org/jira/br

Request for comments: Tensorframes, an integration library between TensorFlow and Spark DataFrames

2016-03-19 Thread Tim Hunter

Tim Hunter

Introducing spark-sklearn, a scikit-learn integration package for Spark

2016-02-10 Thread Tim Hunter

Hello community, Joseph and I would like to introduce a new Spark package that should be useful for python users that depend on scikit-learn. Among other tools: - train and evaluate multiple scikit-learn models in parallel. - convert Spark's Dataframes seamlessly into numpy arrays - (experiment

Re: eager execution and debuggability

[ml] Deep learning talks at the Spark Summit Europe

Re: [VOTE][SPIP] SPARK-21866 Image support in Apache Spark

[VOTE][SPIP] SPARK-21866 Image support in Apache Spark

SPIP: SPARK-21866 Image support in Apache Spark

Re: Question on Spark's graph libraries roadmap

Re: [Spark Namespace]: Expanding Spark ML under Different Namespace?

Re: Feedback on MLlib roadmap process proposal

Re: Design document - MLlib's statistical package for DataFrames

Design document - MLlib's statistical package for DataFrames

Re: Spark Improvement Proposals

Re: Spark Improvement Proposals

GraphFrames 0.2.0 released

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

Request for comments: Tensorframes, an integration library between TensorFlow and Spark DataFrames

Introducing spark-sklearn, a scikit-learn integration package for Spark

16 matches

Site Navigation

Mail list logo

Footer information