date:20150512

@since version tag for all dataframe/sql methods

2015-05-12 Thread Reynold Xin

I added @since version tag for all public dataframe/sql methods/classes in this patch: https://github.com/apache/spark/pull/6101/files >From now on, if you merge anything related to DF/SQL, please make sure the public functions have @since tag. Thanks.

Re: Change for submitting to yarn in 1.3.1

2015-05-12 Thread Patrick Wendell

Hey Kevin and Ron, So is the main shortcoming of the launcher library the inability to get an app ID back from YARN? Or are there other issues here that fundamentally regress things for you. It seems like adding a way to get back the appID would be a reasonable addition to the launcher. - Patric

回复： [PySpark DataFrame] When a Row is not a Row

2015-05-12 Thread Davies Liu

The class (called Row) for rows from Spark SQL is created on the fly, is different from pyspark.sql.Row (is an public API to create Row by users). The reason we done it in this way is that we want to have better performance when accessing the columns. Basically, the rows are just named tuples

[IMPORTANT] Committers please update merge script

2015-05-12 Thread Patrick Wendell

Due to an ASF infrastructure change (bug?) [1] the default JIRA resolution status has switched to "Pending Closed". I've made a change to our merge script to coerce the correct status of "Fixed" when resolving [2]. Please upgrade the merge script to master. I've manually corrected JIRA's that were

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-05-12 Thread fightf...@163.com

Hi， there Which version are you using ? Actually the problem seems gone after we change our spark version from 1.2.0 to 1.3.0 Not sure what the internal changes did. Best, Sun. fightf...@163.com From: Night Wolf Date: 2015-05-12 22:05 To: fightf...@163.com CC: Patrick Wendell; user; dev Su

[build system] brief downtime tomorrow morning (5-12-15, 7am PDT)

2015-05-12 Thread shane knapp

i will need to restart jenkins to finish a plugin install and resolve https://issues.apache.org/jira/browse/SPARK-7561 this will be very brief, and i'll retrigger any errant jobs i kill. please let me know if there are any comments/questions/concerns. thanks! shane

Re: Change for submitting to yarn in 1.3.1

2015-05-12 Thread Marcelo Vanzin

On Tue, May 12, 2015 at 11:34 AM, Kevin Markey wrote: > I understand that SparkLauncher was supposed to address these issues, but > it really doesn't. Yarn already provides indirection and an arm's length > transaction for starting Spark on a cluster. The launcher introduces yet > another layer

Sharing memory across applications/integration

2015-05-12 Thread Alexey Goncharuk

Hello Spark community, I am currently trying to implement a proof-of-concept RDD that will allow to integrate Apache Spark and Apache Ignite (incubating) [1]. My original idea was to embed an Ignite node in Spark's worker process, in order for the user code to have a direct access to in-memory dat

Re: Change for submitting to yarn in 1.3.1

2015-05-12 Thread Kevin Markey

We have the same issue. As result, we are stuck back on 1.0.2. Not being able to programmatically interface directly with the Yarn client to obtain the application id is a show stopper for us, which is a real shame given the Yarn enhancements in 1.2, 1.3, and 1.4. I understand that SparkLaun

s3 vfs on Mesos Slaves

2015-05-12 Thread Stephen Carman

We have a small mesos cluster and these slaves need to have a vfs setup on them so that the slaves can pull down the data they need from S3 when spark runs. There doesn’t seem to be any obvious way online on how to do this or how easily accomplish this. Does anyone have some best practices or so

Re: Adding/Using More Resolution Types on JIRA

2015-05-12 Thread Nicholas Chammas

I tend to find that any large project has a lot of walking dead JIRAs, and pretending they are simply Open causes problems. Any state is better for these, so I favor this. Agreed. 1. Inactive: A way to clear out inactive/dead JIRA’s without indicating a decision has been made one way or th

Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

2015-05-12 Thread Matei Zaharia

It could also be that your hash function is expensive. What is the key class you have for the reduceByKey / groupByKey? Matei > On May 12, 2015, at 10:08 AM, Night Wolf wrote: > > I'm seeing a similar thing with a slightly different stack trace. Ideas? > > org.apache.spark.util.collection.App

Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

2015-05-12 Thread Night Wolf

I'm seeing a similar thing with a slightly different stack trace. Ideas? org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:150) org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) org.apache.spark.util.collection.E

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-05-12 Thread Night Wolf

Seeing similar issues, did you find a solution? One would be to increase the number of partitions if you're doing lots of object creation. On Thu, Feb 12, 2015 at 7:26 PM, fightf...@163.com wrote: > Hi, patrick > > Really glad to get your reply. > Yes, we are doing group by operations for our wo

Re: Getting "Access is denied" error while cloning Spark source using Eclipse

2015-05-12 Thread Akhil Das

May be you should check where exactly its throwing up permission denied (possibly trying to write to some directory). Also you can try manually cloning the git repo to a directory and then try opening that in eclipse. Thanks Best Regards On Tue, May 12, 2015 at 3:46 PM, Chandrashekhar Kotekar < s

Getting "Access is denied" error while cloning Spark source using Eclipse

2015-05-12 Thread Chandrashekhar Kotekar

Hi, I am trying to clone Spark source using Eclipse. After providing spark source URL, eclipse downloads some code which I can see in download location but as soon as downloading reaches 99% Eclipse throws "Gi repository clone failed. Access is denied" error. Has anyone encountered such a proble

Re: Adding/Using More Resolution Types on JIRA

2015-05-12 Thread Sean Owen

I tend to find that any large project has a lot of walking dead JIRAs, and pretending they are simply Open causes problems. Any state is better for these, so I favor this. The possible objection is that this will squash or hide useful issues, but in practice we have the opposite problem. Resolved

Adding/Using More Resolution Types on JIRA

2015-05-12 Thread Patrick Wendell

In Spark we sometimes close issues as something other than "Fixed", and this is an important part of maintaining our JIRA. The current resolution types we use are the following: Won't Fix - bug fix or (more often) feature we don't want to add Invalid - issue is underspecified or not appropriate f

@since version tag for all dataframe/sql methods

Re: Change for submitting to yarn in 1.3.1

回复： [PySpark DataFrame] When a Row is not a Row

[IMPORTANT] Committers please update merge script

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

[build system] brief downtime tomorrow morning (5-12-15, 7am PDT)

Re: Change for submitting to yarn in 1.3.1

Sharing memory across applications/integration

Re: Change for submitting to yarn in 1.3.1

s3 vfs on Mesos Slaves

Re: Adding/Using More Resolution Types on JIRA

Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

Re: Getting "Access is denied" error while cloning Spark source using Eclipse

Getting "Access is denied" error while cloning Spark source using Eclipse

Re: Adding/Using More Resolution Types on JIRA

Adding/Using More Resolution Types on JIRA

18 matches

Site Navigation

Mail list logo

Footer information