date:20151025

Re: [SPARK STREAMING] Concurrent operations in spark streaming

2015-10-25 Thread Nipun Arora

So essentially the driver/client program needs to explicitly have two threads to ensure concurrency? What happens when the program is sequential... I.e. I execute function A and then function B. Does this mean that each RDD first goes through function A, and them stream X is persisted, but process

Re: question about HadoopFsRelation

2015-10-25 Thread Koert Kuipers

thanks i will read up on that On Sat, Oct 24, 2015 at 12:53 PM, Ted Yu wrote: > The code below was introduced by SPARK-7673 / PR #6225 > > See item #1 in the description of the PR. > > Cheers > > On Sat, Oct 24, 2015 at 12:59 AM, Koert Kuipers wrote: > >> the code that seems to flatMap director

SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Ram Venkatesh

If you run sparkR in yarn-client mode, it fails with Exception in thread "main" java.io.FileNotFoundException: /usr/hdp/2.3.2.1-12/spark/R/lib/sparkr.zip (Permission denied) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:27

Spark scala REPL - Unable to create sqlContext

2015-10-25 Thread Yao

I have not been able to start Spark scala shell since 1.5 as it was not able to create the sqlContext during the startup. It complains the metastore_db is already locked: "Another instance of Derby may have already booted the database". The Derby log is attached. I only have this problem with star

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist

Hi Bilnmek, Spark 1.5.x does not support Scala 2.11.7 so the easiest thing to do it build it like your trying. Here are the steps I followed to build it on a Max OS X 10.10.5 environment, should be very similar on ubuntu. 1. set theJAVA_HOME environment variable in my bash session via export JA

Re: Spark scala REPL - Unable to create sqlContext

2015-10-25 Thread Ted Yu

Have you taken a look at the fix for SPARK-11000 which is in the upcoming 1.6.0 release ? Cheers On Sun, Oct 25, 2015 at 8:42 AM, Yao wrote: > I have not been able to start Spark scala shell since 1.5 as it was not > able > to create the sqlContext during the startup. It complains the metastore

RE: Spark scala REPL - Unable to create sqlContext

2015-10-25 Thread Ge, Yao (Y.)

Thanks. I wonder why this is not widely reported in the user forum. The RELP shell is basically broken in 1.5 .0 and 1.5.1 -Yao From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, October 25, 2015 12:01 PM To: Ge, Yao (Y.) Cc: user Subject: Re: Spark scala REPL - Unable to create sqlContext

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Sean Owen

Hm, why do you say it doesn't support 2.11? It does. It is not even this difficult; you just need a source distribution, and then run "./dev/change-scala-version.sh 2.11" as you say. Then build as normal On Sun, Oct 25, 2015 at 4:00 PM, Todd Nist wrote: > Hi Bilnmek, > > Spark 1.5.x does not sup

Re: SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Ted Yu

In zipRLibraries(): // create a zip file from scratch, do not append to existing file. val zipFile = new File(dir, name) I guess instead of creating sparkr.zip in the same directory as R lib, the zip file can be created under some directory writable by the user launching the app and acces

[SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Zhiliang Zhu

Dear All, I have some program as below which makes me very much confused and inscrutable, it is about multiple dimension linear regression mode, the weight / coefficient is always perfect while the dimension is smaller than 4, otherwise it is wrong all the time.Or, whether the LinearRegressionWi

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Bilinmek Istemiyor

Thank you for the quick reply. You are God Send. I have long not been programming in java, nothing know about maven, scala, sbt ant spark stuff. I used java 7 since build failed with java 8. Which java version do you advise in general to use spark. I can downgrade scala version as well. Can you adv

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Ted Yu

A dependency couldn't be downloaded: [INFO] +- com.h2database:h2:jar:1.4.183:test Have you checked your network settings ? Cheers On Sun, Oct 25, 2015 at 10:22 AM, Bilinmek Istemiyor wrote: > Thank you for the quick reply. You are God Send. I have long not been > programming in java, nothing

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist

Sorry Sean you are absolutely right it supports 2.11 all o meant is there is no release available as a standard download and that one has to build it. Thanks for the clairification. -Todd On Sunday, October 25, 2015, Sean Owen wrote: > Hm, why do you say it doesn't support 2.11? It does. > > It

Error building Spark on Windows with sbt

2015-10-25 Thread Richard Eggert

When I try to start up sbt for the Spark build, or if I try to import it in IntelliJ IDEA as an sbt project, it fails with a "No such file or directory" error when it attempts to "git clone" sbt-pom-reader into .sbt/0.13/staging/some-sha1-hash. If I manually create the expected directory before r

Re: Error building Spark on Windows with sbt

2015-10-25 Thread Richard Eggert

By "it works", I mean, "It gets past that particular error". It still fails several minutes later with a different error: java.lang.IllegalStateException: impossible to get artifacts when data has not been loaded. IvyNode = org.scala-lang#scala-library;2.10.3 On Sun, Oct 25, 2015 at 3:38 PM, Ric

Re: Error building Spark on Windows with sbt

2015-10-25 Thread Richard Eggert

Also, if I run the Maven build on Windows or Linux without setting -DskipTests=true, it hangs indefinitely when it gets to org.apache.spark.JavaAPISuite. It's hard to test patches when the build doesn't work. :-/ On Sun, Oct 25, 2015 at 3:41 PM, Richard Eggert wrote: > By "it works", I mean, "I

Re: Error building Spark on Windows with sbt

2015-10-25 Thread Ted Yu

If you have a pull request, Jenkins can test your change for you. FYI > On Oct 25, 2015, at 12:43 PM, Richard Eggert wrote: > > Also, if I run the Maven build on Windows or Linux without setting > -DskipTests=true, it hangs indefinitely when it gets to > org.apache.spark.JavaAPISuite. > >

Re: Error building Spark on Windows with sbt

2015-10-25 Thread Richard Eggert

Yes, I know, but it would be nice to be able to test things myself before I push commits. On Sun, Oct 25, 2015 at 3:50 PM, Ted Yu wrote: > If you have a pull request, Jenkins can test your change for you. > > FYI > > On Oct 25, 2015, at 12:43 PM, Richard Eggert > wrote: > > Also, if I run the M

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Sean Owen

No, 2.11 artifacts are in fact published: http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-parent_2.11%22 On Sun, Oct 25, 2015 at 7:37 PM, Todd Nist wrote: > Sorry Sean you are absolutely right it supports 2.11 all o meant is there is > no release available as a standard download and that

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread DB Tsai

LinearRegressionWithSGD is not stable. Please use linear regression in ML package instead. http://spark.apache.org/docs/latest/ml-linear-methods.html Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Oct 25,

Re: "Failed to bind to" error with spark-shell on CDH5 and YARN

2015-10-25 Thread Lin Zhao

Have the issue resolved. In this case the hostname of my machine is configured to a public domain resolved to the EC2 machine's public IP. It's not allowed to bind to an elastic IP. I changed the hostnames to Amazon's private hostname (ip-72-xxx-xxx) then it works.

Re: SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Felix Cheung

This might be related to https://issues.apache.org/jira/browse/SPARK-10500 On Sun, Oct 25, 2015 at 9:57 AM -0700, "Ted Yu" wrote: In zipRLibraries(): // create a zip file from scratch, do not append to existing file. val zipFile = new File(dir, name) I guess instead of creating spark

Re: SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Ram Venkatesh

Ted Yu, Agree that either picking up sparkr.zip if it already exists, or creating a zip in a local scratch directory will work. This code is called by the client side job submission logic and the resulting zip is already added to the local resources for the YARN job, so I don't think the directory

Re: SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Ram Venkatesh

Felix, Missed your reply - agree looks like the same issue, resolved mine as Duplicate. Thanks! Ram On Sun, Oct 25, 2015 at 2:47 PM, Felix Cheung wrote: > > > This might be related to https://issues.apache.org/jira/browse/SPARK-10500 > > > > On Sun, Oct 25, 2015 at 9:57 AM -0700, "Ted Yu" > w

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist

So yes the individual artifacts are released however, there is no deployable bundle prebuilt for Spark 1.5.1 and Scala 2.11.7, something like: spark-1.5.1-bin-hadoop-2.6_scala-2.11.tgz. The spark site even states this: *Note: Scala 2.11 users should download the Spark source package and build wi

Re: [Spark SQL]: Spark Job Hangs on the refresh method when saving over 1 million files

2015-10-25 Thread Jerry Lam

Hi guys, After waiting for a day, it actually causes OOM on the spark driver. I configure the driver to have 6GB. Note that I didn't call refresh myself. The method was called when saving the dataframe in parquet format. Also I'm using partitionBy() on the DataFrameWriter to generate over 1 millio

Re: [Spark SQL]: Spark Job Hangs on the refresh method when saving over 1 million files

2015-10-25 Thread Jerry Lam

Hi spark guys, I think I hit the same issue SPARK-8890 https://issues.apache.org/jira/browse/SPARK-8890. It is marked as resolved. However it is not. I have over a million output directories for 1 single column in partitionBy. Not sure if this is a regression issue? Do I need to set some parameter

Re: [Spark SQL]: Spark Job Hangs on the refresh method when saving over 1 million files

2015-10-25 Thread Josh Rosen

Hi Jerry, Do you have speculation enabled? A write which produces one million files / output partitions might be using tons of driver memory via the OutputCommitCoordinator's bookkeeping data structures. On Sun, Oct 25, 2015 at 5:50 PM, Jerry Lam wrote: > Hi spark guys, > > I think I hit the sa

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Zhiliang Zhu

Hi DB Tsai, Thanks very much for your kind reply help. As for your comment, I just modified and tested the key part of the codes: LinearRegression lr = new LinearRegression() .setMaxIter(1) .setRegParam(0) .setElasticNetParam(0); //the number could be reset final Linear

RE: How to set memory for SparkR with master="local[*]"

2015-10-25 Thread Sun, Rui

As documented in http://spark.apache.org/docs/latest/configuration.html#available-properties, Note for “spark.driver.memory”: Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, p

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread DB Tsai

Column 4 is always constant, so no predictive power resulting zero weight. On Sunday, October 25, 2015, Zhiliang Zhu wrote: > Hi DB Tsai, > > Thanks very much for your kind reply help. > > As for your comment, I just modified and tested the key part of the codes: > > LinearRegression lr = new L

Re: [Spark SQL]: Spark Job Hangs on the refresh method when saving over 1 million files

2015-10-25 Thread Jerry Lam

Hi Josh, No I don't have speculation enabled. The driver took about few hours until it was OOM. Interestingly, all partitions are generated successfully (_SUCCESS file is written in the output directory). Is there a reason why the driver needs so much memory? The jstack revealed that it called ref

Re: [Spark SQL]: Spark Job Hangs on the refresh method when saving over 1 million files

2015-10-25 Thread Jerry Lam

Hi guys, I mentioned that the partitions are generated so I tried to read the partition data from it. The driver is OOM after few minutes. The stack trace is below. It looks very similar to the the jstack above (note on the refresh method). Thanks! Name: java.lang.OutOfMemoryError Message: GC ove

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Zhiliang Zhu

Hi DB Tsai, Thanks very much for your kind help. I get it now. I am sorry that there is another issue, the weight/coefficient result is perfect while A is triangular matrix, however, while A is not triangular matrix (but transformed from triangular matrix, still is invertible), the result seems

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Zhiliang Zhu

On Monday, October 26, 2015 11:26 AM, Zhiliang Zhu wrote: Hi DB Tsai, Thanks very much for your kind help. I get it now. I am sorry that there is another issue, the weight/coefficient result is perfect while A is triangular matrix, however, while A is not triangular matrix (but

Re: Secondary Sorting in Spark

2015-10-25 Thread swetha kasireddy

Hi, Does the use of custom partitioner in Streaming affect performance? On Mon, Oct 5, 2015 at 1:06 PM, Adrian Tanase wrote: > Great article, especially the use of a custom partitioner. > > Also, sorting by multiple fields by creating a tuple out of them is an > awesome, easy to miss, Scala fea

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Meihua Wu

please add "setFitIntercept(false)" to your LinearRegression. LinearRegression by default includes an intercept in the model, e.g. label = intercept + features dot weight To get the result you want, you need to force the intercept to be zero. Just curious, are you trying to solve systems of line

[Yarn-Client]Can not access SparkUI

2015-10-25 Thread Earthson

We are using Spark 1.5.1 with `--master yarn`, Yarn RM is running in HA mode. direct visit click ApplicationMaster link YARN RM log -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Yarn-Client-Can-not-access-SparkUI-tp25197.html Sent from the Apac

Re: Spark scala REPL - Unable to create sqlContext

2015-10-25 Thread Deenar Toraskar

Embedded Derby, which Hive/Spark SQL uses as the default metastore only supports a single user at a time. Till this issue is fixed, you could use another metastore that supports multiple concurrent users (e.g. networked derby or mysql) to get around it. On 25 October 2015 at 16:15, Ge, Yao (Y.) w

Re: get host from rdd map

2015-10-25 Thread Deenar Toraskar

1. You can call any api that returns you the hostname in your map function. Here's a simplified example, You would generally use mapPartitions as it will save the overhead of retrieving hostname multiple times 2. 3. import scala.sys.process._ 4. val distinctHosts = sc.paralleli

40 matches

Mail list logo