date:20180916

Re: from_csv

2018-09-16 Thread Maxim Gekk

Hi Reynold, > i'd make this as consistent as to_json / from_json as possible Sure, new function from_csv() has the same signature as from_json(). > how would this work in sql? i.e. how would passing options in work? The options are passed to the function via map, for example: select from_csv('2

Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Mark Hamstra

We could also deprecate Py2 already in the 2.4.0 release. On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson wrote: > In case this didn't make it onto this thread: > > There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and remove > it entirely on a later 3.x release. > > On Sat, Sep 15

Re: Python friendly API for Spark 3.0

2018-09-16 Thread Mark Hamstra

It's not splitting hairs, Erik. It's actually very close to something that I think deserves some discussion (perhaps on a separate thread.) What I've been thinking about also concerns API "friendliness" or style. The original RDD API was very intentionally modeled on the Scala parallel collections

how can solve this error

2018-09-16 Thread hagersaleh

I write code to connect kafka with spark using python and I run code on jupyer my code import os #os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /home/hadoop/Desktop/spark-program/kafka/spark-streaming-kafka-0-8-assembly_2.10-2.0.0-preview.jar pyspark-shell' os.environ['PYSPARK_SUBMIT_ARGS'] = "--pack

Re: Python friendly API for Spark 3.0

2018-09-16 Thread Matei Zaharia

My 2 cents on this is that the biggest room for improvement in Python is similarity to Pandas. We already made the Python DataFrame API different from Scala/Java in some respects, but if there’s anything we can do to make it more obvious to Pandas users, that will help the most. The other issue

Re: Python friendly API for Spark 3.0

2018-09-16 Thread Reynold Xin

Most of those are pretty difficult to add though, because they are fundamentally difficult to do in a distributed setting and with lazy execution. We should add some but at some point there are fundamental differences between the underlying execution engine that are pretty difficult to reconcile.

Re: Python friendly API for Spark 3.0

2018-09-16 Thread Mark Hamstra

> > difficult to reconcile > That's a big chunk of what I'm getting at: How much is it even possible to do this kind of reconciliation from the underlying implementation to a more normal/expected/friendly API for a given programming environment? How much more work is it for us to maintain multiple

Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Felix Cheung

I don’t think we should remove any API even in a major release without deprecating it first... From: Mark Hamstra Sent: Sunday, September 16, 2018 12:26 PM To: Erik Erlandson Cc: u...@spark.apache.org; dev Subject: Re: Should python-2 be supported in Spark 3.0?

Re: [Discuss] Datasource v2 support for manipulating partitions

2018-09-16 Thread Thakrar, Jayesh

I am not involved with the design or development of the V2 API - so these could be naïve comments/thoughts. Just as dataset is to abstract away from RDD, which otherwise required a little more intimate knowledge about Spark internals, I am guessing the absence of partition operations is either d

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-16 Thread Wenchen Fan

I'm +1 for this proposal: "Extend SessionConfigSupport to support passing specific white-listed configuration values" One goal of data source v2 API is to not depend on any high-level APIs like SparkSession, SQLConf, etc. If users do want to access these high-level APIs, there is a workaround: cal

Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Hyukjin Kwon

I think we can deprecate it in 3.x.0 and remove it in Spark 4.0.0. Many people still use Python 2. Also, techincally 2.7 support is not officially dropped yet - https://pythonclock.org/ 2018년 9월 17일 (월) 오전 9:31, Aakash Basu 님이 작성: > Removing support for an API in a major release makes poor sense

Re: from_csv

2018-09-16 Thread Hyukjin Kwon

+1 for this idea since text parsing in CSV/JSON is quite common. One thing is about schema inference likewise with JSON functionality. In case of JSON, we added schema_of_json for it and same thing should be able to apply to CSV too. If we see some more needs for it, we can consider a function lik

Re: Some PRs not automatically linked to JIRAs

2018-09-16 Thread Hyukjin Kwon

Seems same thing is happening again. For instance, - https://issues.apache.org/jira/browse/SPARK-25440 / https://github.com/apache/spark/pull/22429 - https://issues.apache.org/jira/browse/SPARK-25429 / https://github.com/apache/spark/pull/22420 2017년 8월 3일 (목) 오전 9:06, Hyukjin Kwon 님이 작성: > I t

[VOTE] SPARK 2.4.0 (RC1)

2018-09-16 Thread Wenchen Fan

Please vote on releasing the following candidate as Apache Spark version 2.4.0. The vote is open until September 20 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.0 [ ] -1 Do not release this package because ... T

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-16 Thread Sean Owen

A few preliminary notes: Wenchen for some weird reason when I hit your key in gpg --import, it asks for a passphrase. When I skip it, it's fine, gpg can still verify the signature. No issue there really. The staging repo gives a 404: https://repository.apache.org/content/repositories/orgapachespa

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-16 Thread Wenchen Fan

Ah I missed the Scala 2.12 build. Do you mean we should publish a Scala 2.12 build this time? Current for Scala 2.11 we have 3 builds: with hadoop 2.7, with hadoop 2.6, without hadoop. Shall we do the same thing for Scala 2.12? On Mon, Sep 17, 2018 at 11:14 AM Sean Owen wrote: > A few preliminar

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-16 Thread Sean Owen

I think one build is enough, but haven't thought it through. The Hadoop 2.6/2.7 builds are already nearly redundant. 2.12 is probably best advertised as a 'beta'. So maybe publish a no-hadoop build of it? Really, whatever's the easy thing to do. On Sun, Sep 16, 2018 at 10:28 PM Wenchen Fan wrote:

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-16 Thread Wenchen Fan

I confirmed that https://repository.apache.org/content/repositories/orgapachespark-1285 is not accessible. I did it via ./dev/create-release/do-release-docker.sh -d /my/work/dir -s publish , not sure what's going wrong. I didn't see any error message during it. Any insights are appreciated! So tha

Re: from_csv

Re: Should python-2 be supported in Spark 3.0?

Re: Python friendly API for Spark 3.0

how can solve this error

Re: Python friendly API for Spark 3.0

Re: Python friendly API for Spark 3.0

Re: Python friendly API for Spark 3.0

Re: Should python-2 be supported in Spark 3.0?

Re: [Discuss] Datasource v2 support for manipulating partitions

Re: [Discuss] Datasource v2 support for Kerberos

Re: Should python-2 be supported in Spark 3.0?

Re: from_csv

Re: Some PRs not automatically linked to JIRAs

[VOTE] SPARK 2.4.0 (RC1)

Re: [VOTE] SPARK 2.4.0 (RC1)

Re: [VOTE] SPARK 2.4.0 (RC1)

Re: [VOTE] SPARK 2.4.0 (RC1)

Re: [VOTE] SPARK 2.4.0 (RC1)

18 matches

Site Navigation

Mail list logo

Footer information