Re: Spark-shell's performance

2017-04-19 Thread Yan Facai
Hi, Hanson. Perhaps I’m digressing here. If I'm wrong or mistake, please correct me. SPARK_WORKER_* is the configuration for whole cluster, and it's fine to write those global variable in spark-env.sh. However, SPARK_DRIVER_* and SPARK_EXECUTOR_* is the configuration for application (your code), p

Re: how to add new column using regular expression within pyspark dataframe

2017-04-19 Thread Yan Facai
How about using `withColumn` and UDF? example: + https://gist.github.com/zoltanctoth/2deccd69e3d1cde1dd78 + https://ragrawal.wordpress.com/2015/10/02/spark-custom-udf-example/ On Mon, Apr 17, 2017 at 8:25 PM, Zeming Yu wrote: > I've g

JDBC write error of Pyspark dataframe

2017-04-19 Thread Cinyoung Hur
Hi, I'm trying to write dataframe to MariaDB. I got this error message, but I have no clue. Please give some advice. Py4JJavaErrorTraceback (most recent call last) in ()> 1 result1.filter(result1["gnl_nm_set"] == "").count() /usr/local/linewalks/spark/spark/python/pyspark/sql/dataframe.pyc

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
works now! thanks much! On Wed, Apr 19, 2017 at 2:05 PM, kant kodali wrote: > oops my bad. I see it now! sorry. > > On Wed, Apr 19, 2017 at 1:56 PM, Marcelo Vanzin > wrote: > >> I see a bunch of getOrCreate methods in that class. They were all >> added in SPARK-6752, a long time ago. >> >> On W

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
oops my bad. I see it now! sorry. On Wed, Apr 19, 2017 at 1:56 PM, Marcelo Vanzin wrote: > I see a bunch of getOrCreate methods in that class. They were all > added in SPARK-6752, a long time ago. > > On Wed, Apr 19, 2017 at 1:51 PM, kant kodali wrote: > > There is no getOrCreate for JavaStream

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread Marcelo Vanzin
I see a bunch of getOrCreate methods in that class. They were all added in SPARK-6752, a long time ago. On Wed, Apr 19, 2017 at 1:51 PM, kant kodali wrote: > There is no getOrCreate for JavaStreamingContext however I do use > JavaStreamingContext inside createStreamingContext() from my code in th

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
There is no *getOrCreate *for JavaStreamingContext however I do use JavaStreamingContext inside createStreamingContext() from my code in the previous email. On Wed, Apr 19, 2017 at 1:46 PM, Marcelo Vanzin wrote: > Why are you not using JavaStreamingContext if you're writing Java? > > On Wed, Apr

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread Marcelo Vanzin
Why are you not using JavaStreamingContext if you're writing Java? On Wed, Apr 19, 2017 at 1:42 PM, kant kodali wrote: > Hi All, > > I get the following errors whichever way I try either lambda or generics. I > am using > spark 2.1 and scalla 2.11.8 > > > StreamingContext ssc = StreamingContext.g

Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
Hi All, I get the following errors whichever way I try either lambda or generics. I am using spark 2.1 and scalla 2.11.8 StreamingContext ssc = StreamingContext.getOrCreate(hdfsCheckpointDir, () -> {return createStreamingContext();}, null, false); ERROR StreamingContext ssc = StreamingContext.

Re: java.lang.java.lang.UnsupportedOperationException

2017-04-19 Thread Nicholas Hakobian
CDH 5.5 only provides Spark 1.5. Are you managing your pySpark install separately? For something like your example, you will get significantly better performance using coalesce with a lit, like so: from pyspark.sql.functions import lit, coalesce def replace_empty(icol): return coalesce(col(i

Re: Handling skewed data

2017-04-19 Thread Richard Siebeling
I'm also interested in this, does anyone this? On 17 April 2017 at 17:17, Vishnu Viswanath wrote: > Hello All, > > Does anyone know if the skew handling code mentioned in this talk > https://www.youtube.com/watch?v=bhYV0JOPd9Y was added to spark? > > If so can I know where to look for more info,

Re: java.lang.java.lang.UnsupportedOperationException

2017-04-19 Thread issues solution
Pyspark 1.6 On cloudera 5.5 (yearn) 2017-04-19 13:42 GMT+02:00 issues solution : > Hi , > somone can tell me why i get the folowing error with udf apply like udf > > def replaceCempty(x): > if x is None : > return "" > else : > return x.encode('utf-8') > udf_replaceCe

Real time incremental Update to Spark Graphs.

2017-04-19 Thread Siddharth Ubale
Hi, We have a scenario where we want to build graph and perform analytics on it. However , the graph once built and cached in spark memory, allows analytics on the already present graph. We would like the graph to be incremented in real time as and when we receive new edges and vertices for th

java.lang.java.lang.UnsupportedOperationException

2017-04-19 Thread issues solution
Hi , somone can tell me why i get the folowing error with udf apply like udf def replaceCempty(x): if x is None : return "" else : return x.encode('utf-8') udf_replaceCempty = F.udf(replaceCempty,StringType()) dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i not