Error when run Spark on mesos

2014-04-02 Thread felix
I deployed mesos and test it using the exmaple/test-framework script, mesos seems OK.but when runing spark on the mesos cluster, the mesos slave nodes report the following exception, any one can help me to fix this ? thanks in advance:14/04/03 11:24:39 INFO Slf4jLogger: Slf4jLogger started14/04/03

Re: Error when run Spark on mesos

2014-04-03 Thread felix
You can download this tarball to replace the 0.9.0 one: wget https://github.com/apache/spark/archive/v0.9.1-rc3.tar.gz just compile it and test it ! 2014-04-03 18:41 GMT+08:00 Gino Mathews [via Apache Spark User List] < ml-node+s1001560n3702...@n3.nabble.com>: > Hi, > > > > I have installed S

what does SPARK_EXECUTOR_URI in spark-env.sh do ?

2014-04-03 Thread felix
I deploy spark on mesos successfully, by copying the spark tarball to all the slave nodes. I have tried to remove the SPARK_EXECUTOR_URI settings in spark-env.sh and every things goes well. so , I want to know what does the SPARK_EXECUTOR_URI setting do, spark 0.9.1 don't use it any more or it's my

Re: what does SPARK_EXECUTOR_URI in spark-env.sh do ?

2014-04-03 Thread felix
So, if I set this parameter, there is no need to copy the spark tarball to every mesos slave nodes? am I right? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/what-does-SPARK-EXECUTOR-URI-in-spark-env-sh-do-tp3708p3722.html Sent from the Apache Spark User L

Fwd: Announcing ApacheCon @Home 2020

2020-07-01 Thread Felix Cheung
-- Forwarded message - We are pleased to announce that ApacheCon @Home will be held online, September 29 through October 1. More event details are available at https://apachecon.com/acah2020 but there’s a few things that I want to highlight for you, the members. Yes, the CFP has

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-05 Thread Felix Cheung
Congrats and thanks! From: Hyukjin Kwon Sent: Wednesday, March 3, 2021 4:09:23 PM To: Dongjoon Hyun Cc: Gabor Somogyi ; Jungtaek Lim ; angers zhu ; Wenchen Fan ; Kent Yao ; Takeshi Yamamuro ; dev ; user @spark Subject: Re: [ANNOUNCE] Announcing Apache Spark

SparkR parallelize not found with 1.4.1?

2015-06-24 Thread Felix C
Hi, It must be something very straightforward... Not working: parallelize(sc) Error: could not find function "parallelize" Working: df <- createDataFrame(sqlContext, localDF) What did I miss? Thanks

Re: SparkR parallelize not found with 1.4.1?

2015-06-25 Thread Felix C
Thanks! It's good to know --- Original Message --- From: "Eskilson,Aleksander" Sent: June 25, 2015 5:57 AM To: "Felix C" , user@spark.apache.org Subject: Re: SparkR parallelize not found with 1.4.1? Hi there, Parallelize is part of the RDD API which was made private

Re: installing packages with pyspark

2016-03-19 Thread Felix Cheung
mes:graphframes:0.1.0-spark1.5 which starts and gives me a REPL, but when I try from graphframes import * I get No module names graphframes without '--master yarn' it works as expected thanks On 18 March 2016 at 12:59, Felix Cheung wrote: > For some, like graphframes

Re: installing packages with pyspark

2016-03-19 Thread Felix Cheung
For some, like graphframes that are Spark packages, you could also use --packages in the command line of spark-submit or pyspark. Seehttp://spark.apache.org/docs/latest/submitting-applications.html _ From: Jakob Odersky Sent: Thursday, March 17, 2016 6:40 PM Subj

Re: GraphFrames and IPython notebook issue - No module named graphframes

2016-04-30 Thread Felix Cheung
Please see http://stackoverflow.com/questions/36397136/importing-pyspark-packages On Mon, Apr 25, 2016 at 2:39 AM -0700, "Camelia Elena Ciolac" wrote: Hello, I work locally on my laptop, not using DataBricks Community edition. I downloaded graphframes-0.1.0-spark1.6.jar from http:

RE: GraphX Java API

2016-06-08 Thread Felix Cheung
You might want to check out GraphFrames graphframes.github.io On Sun, Jun 5, 2016 at 6:40 PM -0700, "Santoshakhilesh" wrote: Ok , thanks for letting me know. Yes Since Java and scala programs ultimately runs on JVM. So the APIs written in one language can be called from other. When I h

Re: Set the node the spark driver will be started

2016-06-28 Thread Felix Massem
Hey Mich, thx for the fast reply. We are using it in cluster mode and spark version 1.5.2 Greets Felix Felix Massem | IT-Consultant | Karlsruhe mobil: +49 (0) 172.2919848 <> www.codecentric.de <http://www.codecentric.de/> | blog.codecentric.de <http://blog

Re: Set the node the spark driver will be started

2016-06-29 Thread Felix Massem
submit a new job I got OOM exceptions which took down my cassandra service only to start the driver on the same node where all the other 13 drivers where running. Thx and best regards Felix Felix Massem | IT-Consultant | Karlsruhe mobil: +49 (0) 172.2919848 <> www.codecentric.de

Re: Set the node the spark driver will be started

2016-06-29 Thread Felix Massem
In addition we are not using Yarn we are using the standalone mode and the driver will be started with the deploy-mode cluster Thx Felix Felix Massem | IT-Consultant | Karlsruhe mobil: +49 (0) 172.2919848 <> www.codecentric.de <http://www.codecentric.de/> | blog.codecentr

Re: Set the node the spark driver will be started

2016-06-30 Thread Felix Massem
Hey Bryan, yes this definitely sounds like the issue I have :-) Thx a lot and best regards Felix Felix Massem | IT-Consultant | Karlsruhe mobil: +49 (0) 172.2919848 <> www.codecentric.de <http://www.codecentric.de/> | blog.codecentric.de <http://blog.codecentric.de/> | www

Re: Graphframe Error

2016-07-04 Thread Felix Cheung
It looks like either the extracted Python code is corrupted or there is a mismatch Python version. Are you using Python 3? stackoverflow.com/questions/514371/whats-the-bad-magic-number-error On Mon, Jul 4, 2016 at 1

Re: Graphframe Error

2016-07-05 Thread Felix Cheung
This could be the workaround: http://stackoverflow.com/a/36419857 On Tue, Jul 5, 2016 at 5:37 AM -0700, "Arun Patel" mailto:arunp.bigd...@gmail.com>> wrote: Thanks Yanbo and Felix. I tried these commands on CDH Quickstart VM and also on "Spark 1.6 pre-built for H

Re: Graphframe Error

2016-07-08 Thread Felix Cheung
I ran it with Python 2. On Thu, Jul 7, 2016 at 4:13 AM -0700, "Arun Patel" mailto:arunp.bigd...@gmail.com>> wrote: I have tied this already. It does not work. What version of Python is needed for this package? On Wed, Jul 6, 2016 at 12:45 AM, Felix Cheung m

Re: XLConnect in SparkR

2016-07-20 Thread Felix Cheung
>From looking at be CLConnect package, its loadWorkbook() function only >supports reading from local file path, so you might need a way to call HDFS >command to get the file from HDFS first. SparkR currently does not support this - you could read it in as a text file (I don't think .xlsx is a t

Re: SparkR error when repartition is called

2016-08-09 Thread Felix Cheung
I think it's saying a string isn't being sent properly from the JVM side. Does it work for you if you change the dapply UDF to something simpler? Do you have any log from YARN? _ From: Shane Lee mailto:shane_y_...@yahoo.com.invalid>> Sent: Tuesday, August 9, 2016 12

Re: UDF in SparkR

2016-08-17 Thread Felix Cheung
This is supported in Spark 2.0.0 as dapply and gapply. Please see the API doc: https://spark.apache.org/docs/2.0.0/api/R/ Feedback welcome and appreciated! _ From: Yogesh Vyas mailto:informy...@gmail.com>> Sent: Tuesday, August 16, 2016 11:39 PM Subject: UDF in SparkR

Re: pyspark unable to create UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp

2016-08-18 Thread Felix Cheung
Do you have a file called tmp at / on HDFS? On Thu, Aug 18, 2016 at 2:57 PM -0700, "Andy Davidson" mailto:a...@santacruzintegration.com>> wrote: For unknown reason I can not create UDF when I run the attached notebook on my cluster. I get the following error Py4JJavaError: An error occurr

Re: pyspark unable to create UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp

2016-08-18 Thread Felix Cheung
te UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp To: Felix Cheung mailto:felixcheun...@hotmail.com>>, user @spark mailto:user@spark.apache.org>> NICE CATCH!!! Many thanks. I spent all day on this bug The error

Re: Do existing R packages work with SparkR data frames

2015-12-23 Thread Felix Cheung
Hi SparkR has some support for machine learning algorithm like glm. For existing R packages, currently you would need to collect to convert into R data.frame - assuming it fits into the memory of the driver node, though that would be required to work with R package in any case. ___

Re: number of executors in sparkR.init()

2015-12-25 Thread Felix Cheung
The equivalent for spark-submit --num-executors should be  spark.executor.instancesWhen use in SparkConf?http://spark.apache.org/docs/latest/running-on-yarn.html Could you try setting that with sparkR.init()? _ From: Franc Carter Sent: Friday, December 25, 2015 9

Re: how to use sparkR or spark MLlib load csv file on hdfs then calculate covariance

2015-12-28 Thread Felix Cheung
Make sure you add the csv spark package as this example here so that the source parameter in R read.df would work: https://spark.apache.org/docs/latest/sparkr.html#from-data-sources _ From: Andy Davidson Sent: Monday, December 28, 2015 10:24 AM Subject: Re: how

Re: sparkR ORC support.

2016-01-05 Thread Felix Cheung
Firstly I don't have ORC data to verify but this should work: df <- loadDF(sqlContext, "data/path", "orc") Secondly, could you check if sparkR.stop() was called? sparkRHive.init() should be called after sparkR.init() - please check if there is any error message there. __

Re: pyspark Dataframe and histogram through ggplot (python)

2016-01-05 Thread Felix Cheung
Hi, select() returns a new Spark DataFrame; I would imagine ggplot would not work with it. Could you try df.select("something").toPandas()? _ From: Snehotosh Banerjee Sent: Tuesday, January 5, 2016 4:32 AM Subject: pyspark Dataframe and histogram through ggplot (

Re: sparkR ORC support.

2016-01-06 Thread Felix Cheung
sparkRHive.init(sc) 2016-01-06 20:35 GMT+08:00 Sandeep Khurana : > Felix > > I tried the option suggested by you. It gave below error. I am going to > try the option suggested by Prem . > > Error in writeJobj(con, object) : invalid jobj 1 > 8 > stop("invalid jobj

RE: sparkR ORC support.

2016-01-12 Thread Felix Cheung
c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))library(SparkR) sc <<- sparkR.init()sc <<- sparkRHive.init()hivecontext <<- sparkRHive.init(sc)df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")#Vie

Re: sparkR ORC support.

2016-01-12 Thread Felix Cheung
would need to call the line hivecontext <- sparkRHive.init(sc) again. _ From: Sandeep Khurana Sent: Tuesday, January 12, 2016 5:20 AM Subject: Re: sparkR ORC support. To: Felix Cheung Cc: spark users , Prem Sure , Deepak Sharma , Yanbo Liang It worked for so

RE: SparkContext SyntaxError: invalid syntax

2016-01-17 Thread Felix Cheung
Do you still need help on the PR? btw, does this apply to YARN client mode? From: andrewweiner2...@u.northwestern.edu Date: Sun, 17 Jan 2016 17:00:39 -0600 Subject: Re: SparkContext SyntaxError: invalid syntax To: cutl...@gmail.com CC: user@spark.apache.org Yeah, I do think it would be worth exp

Re: SparkR with Hive integration

2016-01-19 Thread Felix Cheung
You might need hive-site.xml _ From: Peter Zhang Sent: Monday, January 18, 2016 9:08 PM Subject: Re: SparkR with Hive integration To: Jeff Zhang Cc: Thanks,  I will try. Peter --  Google Sent with

Re: SparkContext SyntaxError: invalid syntax

2016-01-19 Thread Felix Cheung
-Pygmentssudo apt-get install python-sphinxsudo gem install pygments.rb Hope that helps!If not, I can try putting together doc change but I’d rather you could make progress :) On Mon, Jan 18, 2016 at 6:36 AM -0800, "Andrew Weiner" wrote: Hi Felix, Yeah, when I try to build the docs us

Re: NA value handling in sparkR

2016-01-27 Thread Felix Cheung
That's correct - and because spark-csv as Spark package is not specifically aware of R's notion of  NA and interprets it as a string value. On the other hand, R native NA is converted to NULL on Spark when creating a Spark DataFrame from a R data.frame.  https://eradiating.wordpress.com/2016/01/0

Re: cannot coerce class "data.frame" to a DataFrame - with spark R

2016-02-18 Thread Felix Cheung
Doesn't DESeqDataSetFromMatrix work with data.frame only? It wouldn't work with Spark's DataFrame - try collect(countMat) and others to convert them into data.frame? _ From: roni Sent: Thursday, February 18, 2016 4:55 PM Subject: cannot coerce class "data.frame

Re: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-10-06 Thread Felix Cheung
Is it possible that your user does not have permission to write temp file? On Tue, Oct 6, 2015 at 10:26 AM -0700, "akhandeshi" wrote: It seems it is failing at path <- tempfile(pattern = "backend_port") I do not see backend_port directory created... -- View this message in context: ht

Re: SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Felix Cheung
This might be related to https://issues.apache.org/jira/browse/SPARK-10500 On Sun, Oct 25, 2015 at 9:57 AM -0700, "Ted Yu" wrote: In zipRLibraries(): // create a zip file from scratch, do not append to existing file. val zipFile = new File(dir, name) I guess instead of creating spark

RE: HiveContext ignores ("skip.header.line.count"="1")

2015-10-26 Thread Felix Cheung
Please open a JIRA? Date: Mon, 26 Oct 2015 15:32:42 +0200 Subject: HiveContext ignores ("skip.header.line.count"="1") From: daniel.ha...@veracity-group.com To: user@spark.apache.org Hi,I have a csv table in Hive which is configured to skip the header row using TBLPROPERTIES("skip.header.line.c

Re: thought experiment: use spark ML to real time prediction

2015-11-12 Thread Felix Cheung
+1 on that. It would be useful to use the model outside of Spark. _ From: DB Tsai Sent: Wednesday, November 11, 2015 11:57 PM Subject: Re: thought experiment: use spark ML to real time prediction To: Nirmal Fernando Cc: Andy Davidson , Adrian Tanase , user @spa

RE: possible bug spark/python/pyspark/rdd.py portable_hash()

2015-11-27 Thread Felix Cheung
May I ask how you are starting Spark? It looks like PYTHONHASHSEED is being set: https://github.com/apache/spark/search?utf8=%E2%9C%93&q=PYTHONHASHSEED Date: Thu, 26 Nov 2015 11:30:09 -0800 Subject: possible bug spark/python/pyspark/rdd.py portable_hash() From: a...@santacruzintegration.com To:

Re: possible bug spark/python/pyspark/rdd.py portable_hash()

2015-11-28 Thread Felix Cheung
Ah, it's there in spark-submit and pyspark.Seems like it should be added for spark_ec2 _ From: Ted Yu Sent: Friday, November 27, 2015 11:50 AM Subject: Re: possible bug spark/python/pyspark/rdd.py portable_hash() To: Felix Cheung Cc: Andy Davidson ,

RE: possible bug spark/python/pyspark/rdd.py portable_hash()

2015-11-29 Thread Felix Cheung
HSEED")? Date: Sun, 29 Nov 2015 09:48:19 -0800 Subject: Re: possible bug spark/python/pyspark/rdd.py portable_hash() From: a...@santacruzintegration.com To: felixcheun...@hotmail.com; yuzhih...@gmail.com CC: user@spark.apache.org Hi Felix and Ted This is how I am starting spark Should I

Re: Python API Documentation Mismatch

2015-12-03 Thread Felix Cheung
Please open an issue in JIRA, thanks! On Thu, Dec 3, 2015 at 3:03 AM -0800, "Roberto Pagliari" wrote: Hello, I believe there is a mismatch between the API documentation (1.5.2) and the software currently available. Not all functions mentioned here http://spark.apache.org/docs/latest/

Re: SparkR in Spark 1.5.2 jsonFile Bug Found

2015-12-03 Thread Felix Cheung
It looks like this has been broken around Spark 1.5. Please see JIRA SPARK-10185. This has been fixed in pyspark but unfortunately SparkR was missed. I have confirmed this is still broken in Spark 1.6. Could you please open a JIRA? On Thu, Dec 3, 2015 at 2:08 PM -0800, "tomasr3" wrote:

Re: SparkR read.df failed to read file from local directory

2015-12-08 Thread Felix Cheung
Have you tried flightsDF <- read.df(sqlContext, "/home/myuser/test_data/sparkR/flights.csv", source = "com.databricks.spark.csv", header = "true")     _ From: Boyu Zhang Sent: Tuesday, December 8, 2015 8:47 AM Subject: SparkR read.df failed to read file from loc

RE: SparkR csv without headers

2015-08-21 Thread Felix Cheung
You could also rename them with names Unfortunately the API doesn't show the example of that https://spark.apache.org/docs/latest/api/R/index.html On Thu, Aug 20, 2015 at 7:43 PM -0700, "Sun, Rui" wrote: Hi, You can create a DataFrame using load.df() with a specified schema. Something like

RE: SparkR: exported functions

2015-08-26 Thread Felix Cheung
I believe that is done explicitly while the final API is being figured out. For the moment you could use DataFrame read.df() > From: csgilles...@gmail.com > Date: Tue, 25 Aug 2015 18:26:50 +0100 > Subject: SparkR: exported functions > To: user@spark.apache.org > > Hi, > > I've just started play

Fwd: Issue with building Spark v1.4.1-rc4 with Scala 2.11

2015-08-26 Thread Felix Neutatz
xception Thank you for your help. Best regards, Felix

Re: Best way to read XML data from RDD

2016-08-19 Thread Felix Cheung
Have you tried https://github.com/databricks/spark-xml ? On Fri, Aug 19, 2016 at 1:07 PM -0700, "Diwakar Dhanuskodi" mailto:diwakar.dhanusk...@gmail.com>> wrote: Hi, There is a RDD with json data. I could read json data using rdd.read.json . The json data has XML data in couple of key-valu

Re: Best way to read XML data from RDD

2016-08-19 Thread Felix Cheung
Ah. Have you tried Jackson? https://github.com/FasterXML/jackson-dataformat-xml/blob/master/README.md _ From: Diwakar Dhanuskodi mailto:diwakar.dhanusk...@gmail.com>> Sent: Friday, August 19, 2016 9:41 PM Subject: Re: Best way to read XML data from RDD To:

Re: Disable logger in SparkR

2016-08-22 Thread Felix Cheung
You should be able to do that with log4j.properties http://spark.apache.org/docs/latest/configuration.html#configuring-logging Or programmatically https://spark.apache.org/docs/2.0.0/api/R/setLogLevel.html _ From: Yogesh Vyas mailto:informy...@gmail.com>> Sent: Monday,

Re: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big")

2016-08-22 Thread Felix Cheung
How big is the output from score()? Also could you elaborate on what you want to broadcast? On Mon, Aug 22, 2016 at 11:58 AM -0700, "Cinquegrana, Piero" mailto:piero.cinquegr...@neustar.biz>> wrote: Hello, I am using the new R API in SparkR spark.lapply (spark 2.0). I am defining a comple

Re: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big")

2016-08-25 Thread Felix Cheung
rom: Cinquegrana, Piero mailto:piero.cinquegr...@neustar.biz>> Sent: Wednesday, August 24, 2016 10:37 AM Subject: RE: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big") To: Cinquegrana, Piero mailto:piero.cinquegr...@neustar.biz>>, Felix Cheung mailto:felixcheun...

Re: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big")

2016-08-25 Thread Felix Cheung
endian = "big") To: mailto:user@spark.apache.org>>, Felix Cheung mailto:felixcheun...@hotmail.com>> I tested both in local and cluster mode and the ‘<<-‘ seemed to work at least for small data. Or am I missing something? Is there a way for me to test? If that does not w

Re: PySpark: preference for Python 2.7 or Python 3.5?

2016-09-02 Thread Felix Cheung
There is an Anaconda parcel one could readily install on CDH https://docs.continuum.io/anaconda/cloudera As Sean says it is Python 2.7.x. Spark should work for both 2.7 and 3.5. _ From: Sean Owen mailto:so...@cloudera.com>> Sent: Friday, September 2, 2016 12:41 AM Su

Re: No SparkR on Mesos?

2016-09-07 Thread Felix Cheung
This is correct - SparkR is not quite working completely on Mesos. JIRAs and contributions welcome! On Wed, Sep 7, 2016 at 10:21 AM -0700, "Michael Gummelt" mailto:mgumm...@mesosphere.io>> wrote: Quite possibly. I've never used it. I know Python was "unsupported" for a while, which turne

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-10 Thread Felix Cheung
You should be able to get it to work with 2.0 as uber jar. What type cluster you are running on? YARN? And what distribution? On Sun, Sep 4, 2016 at 8:48 PM -0700, "Holden Karau" mailto:hol...@pigscanfly.ca>> wrote: You really shouldn't mix different versions of Spark between the master and

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Felix Cheung
Could you include code snippets you are running? On Sat, Sep 10, 2016 at 1:44 AM -0700, "Bene" mailto:benedikt.haeu...@outlook.com>> wrote: Hi, I am having a problem with the SparkR API. I need to subset a distributed data so I can extract single values from it on which I can then do calcul

Re: Assign values to existing column in SparkR

2016-09-10 Thread Felix Cheung
If you are to set a column to 0 (essentially remove and replace the existing one) you would need to put a column on the right hand side: > df <- as.DataFrame(iris) > head(df) Sepal_Length Sepal_Width Petal_Length Petal_Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Felix Cheung
How are you calling dirs()? What would be x? Is dat a SparkDataFrame? With SparkR, i in dat[i, 4] should be an logical expression for row, eg. df[df$age %in% c(19, 30), 1:2] On Sat, Sep 10, 2016 at 11:02 AM -0700, "Bene" mailto:benedikt.haeu...@outlook.com>> wrote: Here are a few code snip

Re: questions about using dapply

2016-09-10 Thread Felix Cheung
You might need MARGIN capitalized, this example works though: c <- as.DataFrame(cars) # rename the columns to c1, c2 c <- selectExpr(c, "speed as c1", "dist as c2") cols_in <- dapplyCollect(c, function(x) {apply(x[, paste("c", 1:2, sep = "")], MARGIN=2, FUN = function(y){ y %in% c(61, 99)})}) # d

Re: SparkR error: reference is ambiguous.

2016-09-10 Thread Felix Cheung
Could you provide more information on how df in your example is created? Also please include the output from printSchema(df)? This example works: > c <- createDataFrame(cars) > c SparkDataFrame[speed:double, dist:double] > c$speed <- c$dist*0 > c SparkDataFrame[speed:double, dist:double] > head(c)

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-18 Thread Felix Cheung
ay nicely on a 1.5 standalone cluster. On Saturday, September 10, 2016, Felix Cheung mailto:felixcheun...@hotmail.com>> wrote: You should be able to get it to work with 2.0 as uber jar. What type cluster you are running on? YARN? And what distribution? On Sun, Sep 4, 2016 at 8:48 PM -0700, &

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
HBase has released support for Spark hbase.apache.org/book.html#spark And if you search you should find several alternative approaches. On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" mailto:bbuil...@gmail.com>> wrote: Does anyone know if Spark

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
tab_sql_10). _ From: Benjamin Kim mailto:bbuil...@gmail.com>> Sent: Saturday, October 8, 2016 10:40 AM Subject: Re: Spark SQL Thriftserver with HBase To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:user@spark.apache.org>> Felix, The only alternative way is to create

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
! _ From: Benjamin Kim mailto:bbuil...@gmail.com>> Sent: Saturday, October 8, 2016 11:00 AM Subject: Re: Spark SQL Thriftserver with HBase To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:user@spark.apache.org>> Felix, My goal is to use Spark SQL JDBC Thri

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
rsion, Kerberos support etc) _ From: Benjamin Kim mailto:bbuil...@gmail.com>> Sent: Saturday, October 8, 2016 11:26 AM Subject: Re: Spark SQL Thriftserver with HBase To: Mich Talebzadeh mailto:mich.talebza...@gmail.com>> Cc: mailto:user@spark.apache.or

Re: SparkR execution hang on when handle a RDD which is converted from DataFrame

2016-10-13 Thread Felix Cheung
How big is the metrics_moveing_detection_cube table? On Thu, Oct 13, 2016 at 8:51 PM -0700, "Lantao Jin" mailto:jinlan...@gmail.com>> wrote: sqlContext <- sparkRHive.init(sc) sqlString<- "SELECT key_id, rtl_week_beg_dt rawdate, gmv_plan_rate_amt value FROM metrics_moveing_detection_cube " df

Re: Substitute Certain Rows a data Frame using SparkR

2016-10-19 Thread Felix Cheung
It's a bit less concise but this works: > a <- as.DataFrame(cars) > head(a) speed dist 1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10 > b <- withColumn(a, "speed", ifelse(a$speed > 15, a$speed, 3)) > head(b) speed dist 1 3 2 2 3 10 3 3 4 4 3 22 5 3 16 6 3 10 I think your example could be something

Re: Issue Running sparkR on YARN

2016-11-09 Thread Felix Cheung
It maybe the Spark executor is running as a different user and it can't see where RScript is? You might want to try putting Rscript path to PATH. Also please see this for the config property to set for the R command to use: https://spark.apache.org/docs/latest/configuration.html#sparkr __

Re: Strongly Connected Components

2016-11-10 Thread Felix Cheung
It is possible it is dead. Could you check the Spark UI to see if there is any progress? _ From: Shreya Agarwal mailto:shrey...@microsoft.com>> Sent: Thursday, November 10, 2016 12:45 AM Subject: RE: Strongly Connected Components To: mailto:user@spark.apache.org>> B

Re: How to propagate R_LIBS to sparkr executors

2016-11-17 Thread Felix Cheung
Have you tried spark.executorEnv.R_LIBS? spark.apache.org/docs/latest/configuration.html#runtime-environment _ From: Rodrick Brown mailto:rodr...@orchard-app.com>> Sent: Wednesday, November 16, 2016 1:01 PM Subject: How to propagate R_LIBS to sparkr executors To: mailt

Re: PySpark to remote cluster

2016-11-30 Thread Felix Cheung
Spark 2.0.1 is running with a different py4j library than Spark 1.6. You will probably run into other problems mixing versions though - is there a reason you can't run Spark 1.6 on the client? _ From: Klaus Schaefers mailto:klaus.schaef...@philips.com>> Sent: Wednes

Re: [GraphFrame, Pyspark] Weighted Edge in PageRank

2016-12-01 Thread Felix Cheung
That's correct - currently GraphFrame does not compute PageRank with weighted edges. _ From: Weiwei Zhang mailto:wzhan...@dons.usfca.edu>> Sent: Thursday, December 1, 2016 2:41 PM Subject: [GraphFrame, Pyspark] Weighted Edge in PageRank To: user mailto:user@spark.apac

Re: How to load edge with properties file useing GraphX

2016-12-15 Thread Felix Cheung
Have you checked out https://github.com/graphframes/graphframes? It might be easier to work with DataFrame. From: zjp_j...@163.com Sent: Thursday, December 15, 2016 7:23:57 PM To: user Subject: How to load edge with properties file useing GraphX Hi, I want t

Re: Spark Dataframe: Save to hdfs is taking long time

2016-12-15 Thread Felix Cheung
What is the format? From: KhajaAsmath Mohammed Sent: Thursday, December 15, 2016 7:54:27 PM To: user @spark Subject: Spark Dataframe: Save to hdfs is taking long time Hi, I am using issue while saving the dataframe back to HDFS. It's taking long time to run.

Re: GraphFrame not init vertices when load edges

2016-12-18 Thread Felix Cheung
Can you clarify? Vertices should be another DataFrame as you can see in the example here: https://github.com/graphframes/graphframes/blob/master/docs/quick-start.md From: zjp_j...@163.com Sent: Sunday, December 18, 2016 6:25:50 PM To: user Subject: GraphFrame n

Re: GraphFrame not init vertices when load edges

2016-12-18 Thread Felix Cheung
Or this is a better link: http://graphframes.github.io/quick-start.html _ From: Felix Cheung mailto:felixcheun...@hotmail.com>> Sent: Sunday, December 18, 2016 8:46 PM Subject: Re: GraphFrame not init vertices when load edges To: mailto:zjp_j...@163.com&g

Re: GraphFrame not init vertices when load edges

2016-12-18 Thread Felix Cheung
There is not a GraphLoader for GraphFrames but you could load and convert from GraphX: http://graphframes.github.io/user-guide.html#graphx-to-graphframe From: zjp_j...@163.com Sent: Sunday, December 18, 2016 9:39:49 PM To: Felix Cheung; user Subject: Re: Re

Re: Issue with SparkR setup on RStudio

2016-12-29 Thread Felix Cheung
Any reason you are setting HADOOP_HOME? >From the error it seems you are running into issue with Hive config likely >with trying to load hive-site.xml. Could you try not setting HADOOP_HOME From: Md. Rezaul Karim Sent: Thursday, December 29, 2016 10:24:57 AM To

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Felix Cheung
Have you tried the spark-csv package? https://spark-packages.org/package/databricks/spark-csv From: Raymond Xie Sent: Friday, December 30, 2016 6:46:11 PM To: user@spark.apache.org Subject: How to load a big csv to dataframe in Spark 1.6 Hello, I see there is

Re: Difference in R and Spark Output

2016-12-30 Thread Felix Cheung
Could you elaborate more on the huge difference you are seeing? From: Saroj C Sent: Friday, December 30, 2016 5:12:04 AM To: User Subject: Difference in R and Spark Output Dear All, For the attached input file, there is a huge difference between the Clusters i

Re: Spark Graphx with Database

2016-12-30 Thread Felix Cheung
You might want to check out GraphFrames - to load database data (as Spark DataFrame) and build graphs with them https://github.com/graphframes/graphframes _ From: balaji9058 mailto:kssb...@gmail.com>> Sent: Monday, December 26, 2016 9:27 PM Subject: Spark Graphx with

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-31 Thread Felix Cheung
csv to dataframe in Spark 1.6 To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:user@spark.apache.org>> Hello Felix, I followed the instruction and ran the command: > $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 and I received the fo

Re: Issue with SparkR setup on RStudio

2017-01-02 Thread Felix Cheung
set in the Windows tests. _ From: Md. Rezaul Karim mailto:rezaul.ka...@insight-centre.org>> Sent: Monday, January 2, 2017 7:58 AM Subject: Re: Issue with SparkR setup on RStudio To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: spark users mailto

Re: Spark GraphFrame ConnectedComponents

2017-01-04 Thread Felix Cheung
Do you have more of the exception stack? From: Ankur Srivastava Sent: Wednesday, January 4, 2017 4:40:02 PM To: user@spark.apache.org Subject: Spark GraphFrame ConnectedComponents Hi, I am trying to use the ConnectedComponent algorithm of GraphFrames but by de

Re: Spark GraphFrame ConnectedComponents

2017-01-05 Thread Felix Cheung
day, January 4, 2017 9:23 PM Subject: Re: Spark GraphFrame ConnectedComponents To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:user@spark.apache.org>> This is the exact trace from the driver logs Exception in thread "main" java.lang.IllegalArgumentExceptio

Re: Spark GraphFrame ConnectedComponents

2017-01-05 Thread Felix Cheung
Right, I'd agree, it seems to be only with delete. Could you by chance run just the delete to see if it fails FileSystem.get(sc.hadoopConfiguration) .delete(new Path(somepath), true) From: Ankur Srivastava Sent: Thursday, January 5, 2017 10:05:03 AM To:

Re: Spark GraphFrame ConnectedComponents

2017-01-05 Thread Felix Cheung
. From: Ankur Srivastava Sent: Thursday, January 5, 2017 3:45:59 PM To: Felix Cheung; d...@spark.apache.org Cc: user@spark.apache.org Subject: Re: Spark GraphFrame ConnectedComponents Adding DEV mailing list to see if this is a defect with ConnectedComponent or if they can recommend any solution

Re: what does dapply actually do?

2017-01-18 Thread Felix Cheung
With Spark, the processing is performed lazily. This means nothing much is really happening until you call an "action" - an example that is collect(). Another way is to write the output in a distributed manner - see write.df() in R. With SparkR dapply() passing the data from Spark to R to proce

Re: Creating UUID using SparksSQL

2017-01-18 Thread Felix Cheung
spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.functions.monotonically_increasing_id ? From: Ninad Shringarpure Sent: Wednesday

Re: Examples in graphx

2017-01-29 Thread Felix Cheung
Which graph do you are thinking about? Here's one for neo4j https://neo4j.com/blog/neo4j-3-0-apache-spark-connector/ From: Deepak Sharma Sent: Sunday, January 29, 2017 4:28:19 AM To: spark users Subject: Examples in graphx Hi There, Are there any examples of usi

Re: Getting exit code of pipe()

2017-02-11 Thread Felix Cheung
Do you want the job to fail if there is an error exit code? You could set checkCode to True spark.apache.org/docs/latest/api/python/pyspark.html?highlight=pipe#pyspark.RDD.pipe Otherwise maybe you want to

Re: Getting exit code of pipe()

2017-02-12 Thread Felix Cheung
ode of pipe() To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:user@spark.apache.org>> Cool that's exactly what I was looking for! Thanks! How does one output the status into stdout? I mean, how does one capture the status output of pipe() command? On Sat, Feb 11,

Re: Graph Analytics on HBase with HGraphDB and Spark GraphFrames

2017-04-02 Thread Felix Cheung
Interesting! From: Robert Yokota Sent: Sunday, April 2, 2017 9:40:07 AM To: user@spark.apache.org Subject: Graph Analytics on HBase with HGraphDB and Spark GraphFrames Hi, In case anyone is interested in analyzing graphs in HBase with Apache Spark GraphFrames,

Re: [sparkR] [MLlib] : Is word2vec implemented in SparkR MLlib ?

2017-04-21 Thread Felix Cheung
Not currently - how are you planning to use the output from word2vec? From: Radhwane Chebaane Sent: Thursday, April 20, 2017 4:30:14 AM To: user@spark.apache.org Subject: [sparkR] [MLlib] : Is word2vec implemented in SparkR MLlib ? Hi, I've been experimenting wi

Re: Spark SQL - Global Temporary View is not behaving as expected

2017-04-22 Thread Felix Cheung
Cross session is this context is multiple spark sessions from the same spark context. Since you are running two shells, you are having different spark context. Do you have to you a temp view? Could you create a table? _ From: Hemanth Gudela mailto:hemanth.gud...@qva

Re: how to create List in pyspark

2017-04-28 Thread Felix Cheung
Why no use sql functions explode and split? Would perform and be more stable then udf From: Yanbo Liang Sent: Thursday, April 27, 2017 7:34:54 AM To: Selvam Raman Cc: user Subject: Re: how to create List in pyspark ​You can try with UDF, like the following code s

  1   2   >