date:20160617

Re: Spark 2.0 Dataset Documentation

2016-06-17 Thread Reynold Xin

Please go for it! On Friday, June 17, 2016, Pedro Rodriguez wrote: > I would be open to working on Dataset documentation if no one else isn't > already working on it. Thoughts? > > On Fri, Jun 17, 2016 at 11:44 PM, Cheng Lian > wrote: > >> As mentioned in the PR description, this is just an ini

Re: Spark 2.0 Dataset Documentation

2016-06-17 Thread Pedro Rodriguez

I would be open to working on Dataset documentation if no one else isn't already working on it. Thoughts? On Fri, Jun 17, 2016 at 11:44 PM, Cheng Lian wrote: > As mentioned in the PR description, this is just an initial PR to bring > existing contents up to date, so that people can add more cont

Re: Spark 2.0 Dataset Documentation

2016-06-17 Thread Cheng Lian

As mentioned in the PR description, this is just an initial PR to bring existing contents up to date, so that people can add more contents incrementally. We should definitely cover more about Dataset. Cheng On 6/17/16 10:28 PM, Pedro Rodriguez wrote: The updates look great! Looks like man

Re: Spark 2.0 Dataset Documentation

2016-06-17 Thread Pedro Rodriguez

The updates look great! Looks like many places are updated to the new APIs, but there still isn't a section for working with Datasets (most of the docs work with Dataframes). Are you planning on adding more? I am thinking something that would address common questions like the one I posted on the u

Re: Spark 2.0 Dataset Documentation

2016-06-17 Thread Cheng Lian

Hey Pedro, SQL programming guide is being updated. Here's the PR, but not merged yet: https://github.com/apache/spark/pull/13592 Cheng On 6/17/16 9:13 PM, Pedro Rodriguez wrote: Hi All, At my workplace we are starting to use Datasets in 1.6.1 and even more with Spark 2.0 in place of Datafr

Question about equality of o.a.s.sql.Row

2016-06-17 Thread Kazuaki Ishizaki

Dear all, I have three questions about equality of org.apache.spark.sql.Row. (1) If a Row has a complex type (e.g. Array), is the following behavior expected? If two Rows has the same array instance, Row.equals returns true in the second assert. If two Rows has different array instances (a1 and

Re: Skew data

2016-06-17 Thread Pedro Rodriguez

I am going to take a guess that this means that your partitions within an RDD are not balanced (one or more partitions are much larger than the rest). This would mean a single core would need to do much more work than the rest leading to poor performance. In general, the way to fix this is to sprea

Spark 2.0 Dataset Documentation

2016-06-17 Thread Pedro Rodriguez

Hi All, At my workplace we are starting to use Datasets in 1.6.1 and even more with Spark 2.0 in place of Dataframes. I looked at the 1.6.1 documentation then the 2.0 documentation and it looks like not much time has been spent writing a Dataset guide/tutorial. Preview Docs: https://home.apache.o

Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

2016-06-17 Thread Jonathan Kelly

I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's log4j.properties is not getting picked up in the executor classpath (and driver classpath for yarn-cluster mode), so Hadoop's log4j.properties file is taking precedence in the YARN

Re: Hello

2016-06-17 Thread Michael Armbrust

Another good signal is the "target version" (which by convention is only set by committers). When I set this for the upcoming version it means I think its important enough that I will prioritize reviewing a patch for it. On Fri, Jun 17, 2016 at 3:22 PM, Pedro Rodriguez wrote: > What is the best

Re: Hello

2016-06-17 Thread Ted Yu

You can use a JIRA filter to find JIRAs of the component(s) you're interested in. Then sort by Priority. Maybe comment on the JIRA if you want to work on it. On Fri, Jun 17, 2016 at 3:22 PM, Pedro Rodriguez wrote: > What is the best way to determine what the library maintainers believe is > imp

Re: Hello

2016-06-17 Thread Pedro Rodriguez

What is the best way to determine what the library maintainers believe is important work to be done? I have looked through the JIRA and its unclear what are priority items one could do work on. I am guessing this is in part because things are a little hectic with final work for 2.0, but it would b

Re: [VOTE] Release Apache Spark 1.6.2 (RC1)

2016-06-17 Thread Ted Yu

Docker Integration Tests failed on Linux: http://pastebin.com/Ut51aRV3 Here was the command I used: mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Psparkr -Dhadoop.version=2.7.0 package Has anyone seen similar error ? Thanks On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin wrote: > P

Re: [VOTE] Release Apache Spark 1.6.2 (RC1)

2016-06-17 Thread Marcelo Vanzin

-1 (non-binding) SPARK-16017 shows a severe perf regression in YARN compared to 1.6.1. On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.6.2! > > The vote is open until Sunday, June 19, 2016 at 22:00 PDT and passes

Re: [VOTE] Release Apache Spark 1.6.2 (RC1)

2016-06-17 Thread Jonathan Kelly

+1 (non-binding) On Thu, Jun 16, 2016 at 9:49 PM Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.6.2! > > The vote is open until Sunday, June 19, 2016 at 22:00 PDT and passes if a > majority of at least 3+1 PMC votes are cast. > > [ ] +1 Release

Re: Hello

2016-06-17 Thread Xinh Huynh

Here are some guidelines about contributing to Spark: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark There is also a section specific to MLlib: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-MLlib-specificContributionGuideline

Re: Regarding on the dataframe stat frequent

2016-06-17 Thread Sean Owen

If you have a clean test case demonstrating the desired behavior, and a change which makes it work that way, yes make a JIRA and PR. On Fri, Jun 17, 2016 at 1:35 AM, Luyi Wang wrote: > Hey there: > > The frequent item in dataframe stat package seems not accurate. In the > documentation,it did men

Re: Spark internal Logging trait potential thread unsafe

2016-06-17 Thread Sean Owen

I think that's OK to change, yes. I don't see why it's necessary to init log_ the way it is now. initializeLogIfNecessary() has a purpose though. On Fri, Jun 17, 2016 at 2:39 AM, Prajwal Tuladhar wrote: > Hi, > > The way log instance inside Logger trait is current being initialized > doesn't seem

testing the kafka 0.10 connector

2016-06-17 Thread Reynold Xin

Cody has graciously worked on a new connector for dstream for Kafka 0.10. Can people that use Kafka test this connector out? The patch is at https://github.com/apache/spark/pull/11863 Although we have stopped merging new features into branch-2.0, this connector is very decoupled from rest of Spark

Re: ImportError: No module named numpy

2016-06-17 Thread Bhupendra Mishra

Issue has been fixed after lots of R&D around finally found preety simple things causing this problem It was related to permission issue on the python libraries. The user I am logged in was not having enough permission to read/execute the following python liabraries. /usr/lib/python2.7/site-pack

Re: Spark 2.0 Dataset Documentation

Re: Spark 2.0 Dataset Documentation

Re: Spark 2.0 Dataset Documentation

Re: Spark 2.0 Dataset Documentation

Re: Spark 2.0 Dataset Documentation

Question about equality of o.a.s.sql.Row

Re: Skew data

Spark 2.0 Dataset Documentation

Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Re: Hello

Re: Hello

Re: Hello

Re: [VOTE] Release Apache Spark 1.6.2 (RC1)

Re: [VOTE] Release Apache Spark 1.6.2 (RC1)

Re: [VOTE] Release Apache Spark 1.6.2 (RC1)

Re: Hello

Re: Regarding on the dataframe stat frequent

Re: Spark internal Logging trait potential thread unsafe

testing the kafka 0.10 connector

Re: ImportError: No module named numpy

20 matches

Site Navigation

Mail list logo

Footer information