So essentially the driver/client program needs to explicitly have two
threads to ensure concurrency?
What happens when the program is sequential... I.e. I execute function A
and then function B. Does this mean that each RDD first goes through
function A, and them stream X is persisted, but process
thanks i will read up on that
On Sat, Oct 24, 2015 at 12:53 PM, Ted Yu wrote:
> The code below was introduced by SPARK-7673 / PR #6225
>
> See item #1 in the description of the PR.
>
> Cheers
>
> On Sat, Oct 24, 2015 at 12:59 AM, Koert Kuipers wrote:
>
>> the code that seems to flatMap director
If you run sparkR in yarn-client mode, it fails with
Exception in thread "main" java.io.FileNotFoundException:
/usr/hdp/2.3.2.1-12/spark/R/lib/sparkr.zip (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:27
I have not been able to start Spark scala shell since 1.5 as it was not able
to create the sqlContext during the startup. It complains the metastore_db
is already locked: "Another instance of Derby may have already booted the
database". The Derby log is attached.
I only have this problem with star
Hi Bilnmek,
Spark 1.5.x does not support Scala 2.11.7 so the easiest thing to do it
build it like your trying. Here are the steps I followed to build it on a
Max OS X 10.10.5 environment, should be very similar on ubuntu.
1. set theJAVA_HOME environment variable in my bash session via export
JA
Have you taken a look at the fix for SPARK-11000 which is in the upcoming
1.6.0 release ?
Cheers
On Sun, Oct 25, 2015 at 8:42 AM, Yao wrote:
> I have not been able to start Spark scala shell since 1.5 as it was not
> able
> to create the sqlContext during the startup. It complains the metastore
Thanks. I wonder why this is not widely reported in the user forum. The RELP
shell is basically broken in 1.5 .0 and 1.5.1
-Yao
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Sunday, October 25, 2015 12:01 PM
To: Ge, Yao (Y.)
Cc: user
Subject: Re: Spark scala REPL - Unable to create sqlContext
Hm, why do you say it doesn't support 2.11? It does.
It is not even this difficult; you just need a source distribution,
and then run "./dev/change-scala-version.sh 2.11" as you say. Then
build as normal
On Sun, Oct 25, 2015 at 4:00 PM, Todd Nist wrote:
> Hi Bilnmek,
>
> Spark 1.5.x does not sup
In zipRLibraries():
// create a zip file from scratch, do not append to existing file.
val zipFile = new File(dir, name)
I guess instead of creating sparkr.zip in the same directory as R lib, the
zip file can be created under some directory writable by the user launching
the app and acces
Dear All,
I have some program as below which makes me very much confused and inscrutable,
it is about multiple dimension linear regression mode, the weight / coefficient
is always perfect while the dimension is smaller than 4, otherwise it is wrong
all the time.Or, whether the LinearRegressionWi
Thank you for the quick reply. You are God Send. I have long not been
programming in java, nothing know about maven, scala, sbt ant spark stuff.
I used java 7 since build failed with java 8. Which java version do you
advise in general to use spark. I can downgrade scala version as well. Can
you adv
A dependency couldn't be downloaded:
[INFO] +- com.h2database:h2:jar:1.4.183:test
Have you checked your network settings ?
Cheers
On Sun, Oct 25, 2015 at 10:22 AM, Bilinmek Istemiyor
wrote:
> Thank you for the quick reply. You are God Send. I have long not been
> programming in java, nothing
Sorry Sean you are absolutely right it supports 2.11 all o meant is there
is no release available as a standard download and that one has to build
it. Thanks for the clairification.
-Todd
On Sunday, October 25, 2015, Sean Owen wrote:
> Hm, why do you say it doesn't support 2.11? It does.
>
> It
When I try to start up sbt for the Spark build, or if I try to import it
in IntelliJ IDEA as an sbt project, it fails with a "No such file or
directory" error when it attempts to "git clone" sbt-pom-reader into
.sbt/0.13/staging/some-sha1-hash.
If I manually create the expected directory before r
By "it works", I mean, "It gets past that particular error". It still fails
several minutes later with a different error:
java.lang.IllegalStateException: impossible to get artifacts when data has
not been loaded. IvyNode = org.scala-lang#scala-library;2.10.3
On Sun, Oct 25, 2015 at 3:38 PM, Ric
Also, if I run the Maven build on Windows or Linux without setting
-DskipTests=true, it hangs indefinitely when it gets to
org.apache.spark.JavaAPISuite.
It's hard to test patches when the build doesn't work. :-/
On Sun, Oct 25, 2015 at 3:41 PM, Richard Eggert
wrote:
> By "it works", I mean, "I
If you have a pull request, Jenkins can test your change for you.
FYI
> On Oct 25, 2015, at 12:43 PM, Richard Eggert wrote:
>
> Also, if I run the Maven build on Windows or Linux without setting
> -DskipTests=true, it hangs indefinitely when it gets to
> org.apache.spark.JavaAPISuite.
>
>
Yes, I know, but it would be nice to be able to test things myself before I
push commits.
On Sun, Oct 25, 2015 at 3:50 PM, Ted Yu wrote:
> If you have a pull request, Jenkins can test your change for you.
>
> FYI
>
> On Oct 25, 2015, at 12:43 PM, Richard Eggert
> wrote:
>
> Also, if I run the M
No, 2.11 artifacts are in fact published:
http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-parent_2.11%22
On Sun, Oct 25, 2015 at 7:37 PM, Todd Nist wrote:
> Sorry Sean you are absolutely right it supports 2.11 all o meant is there is
> no release available as a standard download and that
LinearRegressionWithSGD is not stable. Please use linear regression in
ML package instead.
http://spark.apache.org/docs/latest/ml-linear-methods.html
Sincerely,
DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D
On Sun, Oct 25,
Have the issue resolved. In this case the hostname of my machine is configured
to a public domain resolved to the EC2 machine's public IP. It's not allowed to
bind to an elastic IP. I changed the hostnames to Amazon's private hostname
(ip-72-xxx-xxx) then it works.
This might be related to https://issues.apache.org/jira/browse/SPARK-10500
On Sun, Oct 25, 2015 at 9:57 AM -0700, "Ted Yu" wrote:
In zipRLibraries():
// create a zip file from scratch, do not append to existing file.
val zipFile = new File(dir, name)
I guess instead of creating spark
Ted Yu,
Agree that either picking up sparkr.zip if it already exists, or creating a
zip in a local scratch directory will work. This code is called by the
client side job submission logic and the resulting zip is already added to
the local resources for the YARN job, so I don't think the directory
Felix,
Missed your reply - agree looks like the same issue, resolved mine as
Duplicate.
Thanks!
Ram
On Sun, Oct 25, 2015 at 2:47 PM, Felix Cheung
wrote:
>
>
> This might be related to https://issues.apache.org/jira/browse/SPARK-10500
>
>
>
> On Sun, Oct 25, 2015 at 9:57 AM -0700, "Ted Yu"
> w
So yes the individual artifacts are released however, there is no
deployable bundle prebuilt for Spark 1.5.1 and Scala 2.11.7, something
like: spark-1.5.1-bin-hadoop-2.6_scala-2.11.tgz. The spark site even
states this:
*Note: Scala 2.11 users should download the Spark source package and
build wi
Hi guys,
After waiting for a day, it actually causes OOM on the spark driver. I
configure the driver to have 6GB. Note that I didn't call refresh myself.
The method was called when saving the dataframe in parquet format. Also I'm
using partitionBy() on the DataFrameWriter to generate over 1 millio
Hi spark guys,
I think I hit the same issue SPARK-8890
https://issues.apache.org/jira/browse/SPARK-8890. It is marked as resolved.
However it is not. I have over a million output directories for 1 single
column in partitionBy. Not sure if this is a regression issue? Do I need to
set some parameter
Hi Jerry,
Do you have speculation enabled? A write which produces one million files /
output partitions might be using tons of driver memory via the
OutputCommitCoordinator's bookkeeping data structures.
On Sun, Oct 25, 2015 at 5:50 PM, Jerry Lam wrote:
> Hi spark guys,
>
> I think I hit the sa
Hi DB Tsai,
Thanks very much for your kind reply help.
As for your comment, I just modified and tested the key part of the codes:
LinearRegression lr = new LinearRegression()
.setMaxIter(1)
.setRegParam(0)
.setElasticNetParam(0); //the number could be reset
final Linear
As documented in
http://spark.apache.org/docs/latest/configuration.html#available-properties,
Note for “spark.driver.memory”:
Note: In client mode, this config must not be set through the SparkConf
directly in your application, because the driver JVM has already started at
that point. Instead, p
Column 4 is always constant, so no predictive power resulting zero weight.
On Sunday, October 25, 2015, Zhiliang Zhu wrote:
> Hi DB Tsai,
>
> Thanks very much for your kind reply help.
>
> As for your comment, I just modified and tested the key part of the codes:
>
> LinearRegression lr = new L
Hi Josh,
No I don't have speculation enabled. The driver took about few hours until
it was OOM. Interestingly, all partitions are generated successfully
(_SUCCESS file is written in the output directory). Is there a reason why
the driver needs so much memory? The jstack revealed that it called ref
Hi guys,
I mentioned that the partitions are generated so I tried to read the
partition data from it. The driver is OOM after few minutes. The stack
trace is below. It looks very similar to the the jstack above (note on the
refresh method). Thanks!
Name: java.lang.OutOfMemoryError
Message: GC ove
Hi DB Tsai,
Thanks very much for your kind help. I get it now.
I am sorry that there is another issue, the weight/coefficient result is
perfect while A is triangular matrix, however, while A is not triangular matrix
(but
transformed from triangular matrix, still is invertible), the result seems
On Monday, October 26, 2015 11:26 AM, Zhiliang Zhu
wrote:
Hi DB Tsai,
Thanks very much for your kind help. I get it now.
I am sorry that there is another issue, the weight/coefficient result is
perfect while A is triangular matrix, however, while A is not triangular matrix
(but
Hi,
Does the use of custom partitioner in Streaming affect performance?
On Mon, Oct 5, 2015 at 1:06 PM, Adrian Tanase wrote:
> Great article, especially the use of a custom partitioner.
>
> Also, sorting by multiple fields by creating a tuple out of them is an
> awesome, easy to miss, Scala fea
please add "setFitIntercept(false)" to your LinearRegression.
LinearRegression by default includes an intercept in the model, e.g.
label = intercept + features dot weight
To get the result you want, you need to force the intercept to be zero.
Just curious, are you trying to solve systems of line
We are using Spark 1.5.1 with `--master yarn`, Yarn RM is running in HA mode.
direct visit
click ApplicationMaster link
YARN RM log
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Yarn-Client-Can-not-access-SparkUI-tp25197.html
Sent from the Apac
Embedded Derby, which Hive/Spark SQL uses as the default metastore only
supports a single user at a time. Till this issue is fixed, you could use
another metastore that supports multiple concurrent users (e.g. networked
derby or mysql) to get around it.
On 25 October 2015 at 16:15, Ge, Yao (Y.) w
1. You can call any api that returns you the hostname in your map
function. Here's a simplified example, You would generally use
mapPartitions as it will save the overhead of retrieving hostname multiple
times
2.
3. import scala.sys.process._
4. val distinctHosts = sc.paralleli
40 matches
Mail list logo