Please go for it!
On Friday, June 17, 2016, Pedro Rodriguez wrote:
> I would be open to working on Dataset documentation if no one else isn't
> already working on it. Thoughts?
>
> On Fri, Jun 17, 2016 at 11:44 PM, Cheng Lian > wrote:
>
>> As mentioned in the PR description, this is just an ini
I would be open to working on Dataset documentation if no one else isn't
already working on it. Thoughts?
On Fri, Jun 17, 2016 at 11:44 PM, Cheng Lian wrote:
> As mentioned in the PR description, this is just an initial PR to bring
> existing contents up to date, so that people can add more cont
As mentioned in the PR description, this is just an initial PR to bring
existing contents up to date, so that people can add more contents
incrementally.
We should definitely cover more about Dataset.
Cheng
On 6/17/16 10:28 PM, Pedro Rodriguez wrote:
The updates look great!
Looks like man
The updates look great!
Looks like many places are updated to the new APIs, but there still isn't a
section for working with Datasets (most of the docs work with Dataframes).
Are you planning on adding more? I am thinking something that would address
common questions like the one I posted on the u
Hey Pedro,
SQL programming guide is being updated. Here's the PR, but not merged
yet: https://github.com/apache/spark/pull/13592
Cheng
On 6/17/16 9:13 PM, Pedro Rodriguez wrote:
Hi All,
At my workplace we are starting to use Datasets in 1.6.1 and even more
with Spark 2.0 in place of Datafr
Dear all,
I have three questions about equality of org.apache.spark.sql.Row.
(1) If a Row has a complex type (e.g. Array), is the following behavior
expected?
If two Rows has the same array instance, Row.equals returns true in the
second assert. If two Rows has different array instances (a1 and
I am going to take a guess that this means that your partitions within an
RDD are not balanced (one or more partitions are much larger than the
rest). This would mean a single core would need to do much more work than
the rest leading to poor performance. In general, the way to fix this is to
sprea
Hi All,
At my workplace we are starting to use Datasets in 1.6.1 and even more with
Spark 2.0 in place of Dataframes. I looked at the 1.6.1 documentation then
the 2.0 documentation and it looks like not much time has been spent
writing a Dataset guide/tutorial.
Preview Docs:
https://home.apache.o
I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT
(commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
log4j.properties is not getting picked up in the executor classpath (and
driver classpath for yarn-cluster mode), so Hadoop's log4j.properties file
is taking precedence in the YARN
Another good signal is the "target version" (which by convention is only
set by committers). When I set this for the upcoming version it means I
think its important enough that I will prioritize reviewing a patch for it.
On Fri, Jun 17, 2016 at 3:22 PM, Pedro Rodriguez
wrote:
> What is the best
You can use a JIRA filter to find JIRAs of the component(s) you're
interested in.
Then sort by Priority.
Maybe comment on the JIRA if you want to work on it.
On Fri, Jun 17, 2016 at 3:22 PM, Pedro Rodriguez
wrote:
> What is the best way to determine what the library maintainers believe is
> imp
What is the best way to determine what the library maintainers believe is
important work to be done?
I have looked through the JIRA and its unclear what are priority items one
could do work on. I am guessing this is in part because things are a little
hectic with final work for 2.0, but it would b
Docker Integration Tests failed on Linux:
http://pastebin.com/Ut51aRV3
Here was the command I used:
mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Psparkr
-Dhadoop.version=2.7.0 package
Has anyone seen similar error ?
Thanks
On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin wrote:
> P
-1 (non-binding)
SPARK-16017 shows a severe perf regression in YARN compared to 1.6.1.
On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.2!
>
> The vote is open until Sunday, June 19, 2016 at 22:00 PDT and passes
+1 (non-binding)
On Thu, Jun 16, 2016 at 9:49 PM Reynold Xin wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.2!
>
> The vote is open until Sunday, June 19, 2016 at 22:00 PDT and passes if a
> majority of at least 3+1 PMC votes are cast.
>
> [ ] +1 Release
Here are some guidelines about contributing to Spark:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
There is also a section specific to MLlib:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-MLlib-specificContributionGuideline
If you have a clean test case demonstrating the desired behavior, and
a change which makes it work that way, yes make a JIRA and PR.
On Fri, Jun 17, 2016 at 1:35 AM, Luyi Wang wrote:
> Hey there:
>
> The frequent item in dataframe stat package seems not accurate. In the
> documentation,it did men
I think that's OK to change, yes. I don't see why it's necessary to
init log_ the way it is now. initializeLogIfNecessary() has a purpose
though.
On Fri, Jun 17, 2016 at 2:39 AM, Prajwal Tuladhar wrote:
> Hi,
>
> The way log instance inside Logger trait is current being initialized
> doesn't seem
Cody has graciously worked on a new connector for dstream for Kafka 0.10.
Can people that use Kafka test this connector out? The patch is at
https://github.com/apache/spark/pull/11863
Although we have stopped merging new features into branch-2.0, this
connector is very decoupled from rest of Spark
Issue has been fixed after lots of R&D around finally found preety simple
things causing this problem
It was related to permission issue on the python libraries. The user I am
logged in was not having enough permission to read/execute the following
python liabraries.
/usr/lib/python2.7/site-pack
20 matches
Mail list logo