so how to run the check locally?
On master tree, sbt mimaReportBinaryIssues Seems to lead to a lot of errors
reported. Do we need to modify SparkBuilder.scala etc to run it locally? Could
not figure out how Jekins run the check on its console outputs.
Best Regards,
Raymond Liu
-Original M
You can take a look at
https://github.com/apache/spark/blob/master/dev/run-tests
dev/mima
On Thu, Jul 10, 2014 at 12:21 AM, Liu, Raymond
wrote:
> so how to run the check locally?
>
> On master tree, sbt mimaReportBinaryIssues Seems to lead to a lot of
> errors reported. Do we need to modify S
when insert data (the data is small, it will not be partitioned
automatically)into one table which is on tachyon, how can i control the
data position, i mean how can i point which machine the data should exist
on?
if we can not control, what is the data assign strategy of tachyon or spark?
I went ahead and created JIRAs.
JIRA for Hierarchical Clustering:
https://issues.apache.org/jira/browse/SPARK-2429
JIRA for Standarized Clustering APIs:
https://issues.apache.org/jira/browse/SPARK-2430
Before submitting a PR for the standardized API, I want to implement a
few clustering algorith
Might be worth checking out scikit-learn and mahout to get some broad ideas—
Sent from Mailbox
On Thu, Jul 10, 2014 at 4:25 PM, RJ Nowling wrote:
> I went ahead and created JIRAs.
> JIRA for Hierarchical Clustering:
> https://issues.apache.org/jira/browse/SPARK-2429
> JIRA for Standarized Cluste
Hi,
I've implemented a class that does Chi-squared feature selection for
RDD[LabeledPoint]. It also computes basic class/feature occurrence statistics
and other methods like mutual information or information gain can be easily
implemented. I would like to make a pull request. However, MLlib mas
Just a heads up, we merged Prashant's work on having the sbt build read all
dependencies from Maven. Please report any issues you find on the dev list
or on JIRA.
One note here for developers, going forward the sbt build will use the same
configuration style as the maven build (-D for options and
Woot!
On Thu, Jul 10, 2014 at 11:15 AM, Patrick Wendell
wrote:
> Just a heads up, we merged Prashant's work on having the sbt build read all
> dependencies from Maven. Please report any issues you find on the dev list
> or on JIRA.
>
> One note here for developers, going forward the sbt build w
Cool~
On Thu, Jul 10, 2014 at 1:29 PM, Sandy Ryza wrote:
> Woot!
>
>
> On Thu, Jul 10, 2014 at 11:15 AM, Patrick Wendell
> wrote:
>
> > Just a heads up, we merged Prashant's work on having the sbt build read
> all
> > dependencies from Maven. Please report any issues you find on the dev
> list
Hi devs!
Right now it takes a non-trivial amount of time to launch EC2 clusters.
Part of this time is spent starting the EC2 instances, which is out of our
control. Another part of this time is spent installing stuff on and
configuring the instances. This, we can control.
I’d like to explore appr
Had a few quick questions...
Just wondering if right now spark sql is expected to be thread safe on
master?
doing a simple hadoop file -> RDD -> schema RDD -> write parquet
will fail in reflection code if i run these in a thread pool.
The SparkSqlSerializer, seems to create a new Kryo instance
Hey Ian,
Thanks for bringing these up! Responses in-line:
Just wondering if right now spark sql is expected to be thread safe on
> master?
> doing a simple hadoop file -> RDD -> schema RDD -> write parquet
> will fail in reflection code if i run these in a thread pool.
>
You are probably hittin
You are partially correct.
It's not terribly complex, but also not easy to accomplish. Sounds like you
want to manage some partially/fully baked AMI's with the core spark libs and
dependencies already on the image. Main issues that crop up are:
1) image sprawl, as libs/config/defaults/etc cha
-1 I honestly do not know the voting rules for the Spark community, so
please excuse me if I am out of line or if Mesos compatibility is not a
concern at this point.
We just tried to run this version built against 2.3.0-cdh5.0.2 on mesos
0.18.2. All of our jobs with data above a few gigabytes hun
Just realized the deadline was Monday, my apologies. The issue
nevertheless stands.
On Thu, Jul 10, 2014 at 9:28 PM, Gary Malouf wrote:
> -1 I honestly do not know the voting rules for the Spark community, so
> please excuse me if I am out of line or if Mesos compatibility is not a
> concern a
Hey Gary,
The vote technically doesn't close until I send the vote summary
e-mail, but I was planning to close and package this tonight. It's too
bad if there is a regression, it might be worth holding the release
but it really requires narrowing down the issue to get more
information about the sc
The function run in worker is serialized in driver, so the driver and worker
should be run in the same Python interpreter.
If you do not need c extension support, then Jython will be better than
CPython, because of the cost of serialization is much lower.
Davies
--
View this message in context
--
Best Regards
Frank Wang | Software Engineer
Mobile: +86 18505816792
Phone: +86 571 63547
Fax:
Email: wangf...@huawei.com
Huawei Technologies Co., Ltd.
Hangzhou R&D Center
NO.410, JiangHong Road, Binjiang Area, H
There are two differences:
1. We publish hive with a shaded protobuf dependency to avoid
conflicts with some Hadoop versions.
2. We publish a proper hive-exec jar that only includes hive packages.
The upstream version of hive-exec bundles a bunch of other random
dependencies in it which makes it r
19 matches
Mail list logo