from:"Zhang, Jingyu"

How many threads will be used to read RDBMS after set numPartitions =10 in Spark JDBC

2016-04-04 Thread Zhang, Jingyu

Hi All, I want read Mysql from Spark. Please let me know how many threads will be used to read the RDBMS after set numPartitions =10 in Spark JDBC. What is the best practice to read large dataset from RDBMS to Spark? Thanks, Jingyu -- This message and its attachments may contain legally privil

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread Zhang, Jingyu

Graphx did not support Python yet. http://spark.apache.org/docs/latest/graphx-programming-guide.html The workaround solution is use graphframes (3rd party API), https://issues.apache.org/jira/browse/SPARK-3789 but some features in Python are not as same as Scala, https://github.com/graphframes/gr

Error from reading S3 in Scala

2016-05-02 Thread Zhang, Jingyu

Hi All, I am using Eclipse with Maven for developing Spark applications. I got a error for Reading from S3 in Scala but it works fine in Java when I run them in the same project in Eclipse. The Scala/Java code and the error in following Scala val uri = URI.create("s3a://" + key + ":" + seckey +

Re: Error from reading S3 in Scala

2016-05-04 Thread Zhang, Jingyu

>>> don't put your secret in the URI, it'll only creep out in the logs. >>> >>> Use the specific properties coverd in >>> http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html, >>> which you can set in your spark context by p

How to filter the isolated vertexes in Graphx

2016-01-28 Thread Zhang, Jingyu

I try to filter vertexes that did not have any connection links with others. How to filter those isolated vertexes in Graphx? Thanks, Jingyu -- This message and its attachments may contain legally privileged or confidential information. It is intended solely for the named addressee. If you ar

Unpersist RDD in Graphx

2016-01-31 Thread Zhang, Jingyu

Hi, What is he best way to unpersist the RDD in graphx to release memory? RDD.unpersist or RDD.unpersistVertices and RDD..edges.unpersist I study the source code of Pregel.scala, Both of above were used between line 148 and line 150. Can anyone please tell me what the different? In addition, what

Failed to remove broadcast 2 with removeFromMaster = true in Graphx

2016-02-05 Thread Zhang, Jingyu

I running a Pregel function with 37 nodes in EMR hadoop. After a hour logs show following. Can anyone please tell what the problem is and how do I solve it? Thanks 16/02/05 14:02:46 WARN BlockManagerMaster: Failed to remove broadcast 2 with removeFromMaster = true - Cannot receive any reply in 120

Re: Memory issues on spark

2016-02-17 Thread Zhang, Jingyu

May set "maximizeResourceAllocation", then EMR will do the best config for you. http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-configure.html Jingyu On 18 February 2016 at 12:02, wrote: > Hi All, > > I have been facing memory issues in spark. im using spark-sql on AWS

NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-28 Thread Zhang, Jingyu

It is not a problem to use JavaRDD.cache() for 200M data (all Objects read form Json Format). But when I try to use DataFrame.cache(), It shown exception in below. My machine can cache 1 G data in Avro format without any problem. 15/10/29 13:26:23 INFO GeneratePredicate: Code generated in 154.531

Re: NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-29 Thread Zhang, Jingyu

ith just a single row? > Do you rows have any columns with null values? > Can you post a code snippet here on how you load/generate the dataframe? > Does dataframe.rdd.cache work? > > *Romi Kuntsman*, *Big Data Engineer* > http://www.totango.com > > On Thu, Oct 29, 2015 at 4:

Re: Save data to different S3

2015-10-29 Thread Zhang, Jingyu

Try s3://aws_key:aws_secret@bucketName/folderName with your access key and secret to save the data. On 30 October 2015 at 10:55, William Li wrote: > Hi – I have a simple app running fine with Spark, it reads data from S3 > and performs calculation. > > When reading data from S3, I use hadoopConf

key not found: sportingpulse.com in Spark SQL 1.5.0

2015-10-30 Thread Zhang, Jingyu

There is not a problem in Spark SQL 1.5.1 but the error of "key not found: sportingpulse.com" shown up when I use 1.5.0. I have to use the version of 1.5.0 because that the one AWS EMR support. Can anyone tell me why Spark uses "sportingpulse.com" and how to fix it? Thanks. Caused by: java.util.

Re: key not found: sportingpulse.com in Spark SQL 1.5.0

2015-10-30 Thread Zhang, Jingyu

hing due to the columnar compression. I've seen similar > intermittent issues when caching DataFrames. "sportingpulse.com" is a > value in one of the columns of the DF. > -- > From: Ted Yu > Sent: ‎10/‎30/‎2015 6:33 PM > To: Zhang, Jingyu

Spark-csv error on read AWS s3a in spark 1.4.1

2015-11-10 Thread Zhang, Jingyu

A small csv file in S3. I use s3a://key:seckey@bucketname/a.csv It works for SparkContext pixelsStr: SparkContext = ctx.textFile(s3pathOrg); It works for Java Spark-csv as well Java code : DataFrame careerOneDF = sqlContext.read().format( "com.databricks.spark.csv") .option("inferSchema",

How to passing parameters to another java class

2015-11-15 Thread Zhang, Jingyu

I want to pass two parameters into new java class from rdd.mapPartitions(), the code like following. ---Source Code Main method: /*the parameters that I want to pass into the PixelGenerator.class for selecting any items between the startTime and the endTime. */ int startTime, endTime; Ja

Re: How to passing parameters to another java class

2015-11-15 Thread Zhang, Jingyu

Thanks, that worked for local environment but not in the Spark Cluster. On 16 November 2015 at 16:05, Fengdong Yu wrote: > Can you try : new PixelGenerator(startTime, endTime) ? > > > > On Nov 16, 2015, at 12:47 PM, Zhang, Jingyu > wrote: > > I want to pass two parame

Re: How to passing parameters to another java class

2015-11-15 Thread Zhang, Jingyu

e: > If you got “cannot Serialized” Exception, then you need to > PixelGenerator as a Static class. > > > > > On Nov 16, 2015, at 1:10 PM, Zhang, Jingyu > wrote: > > Thanks, that worked for local environment but not in the Spark Cluster. > > > On 16 Novembe

Re: How to passing parameters to another java class

2015-11-15 Thread Zhang, Jingyu

. On 16 November 2015 at 16:34, Fengdong Yu wrote: > Just make PixelGenerator as a nested static class? > > > > > > > On Nov 16, 2015, at 1:22 PM, Zhang, Jingyu > wrote: > > Fengdong > > > -- This message and its attachments may contain legally priv

Size exceeds Integer.MAX_VALUE on EMR 4.0.0 Spark 1.4.1

2015-11-16 Thread Zhang, Jingyu

I am using spark-csv to save files in s3, it shown Size exceeds. Please let me know how to fix it. Thanks. df.write() .format("com.databricks.spark.csv") .option("header", "true") .save("s3://newcars.csv"); java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE at

Re: Size exceeds Integer.MAX_VALUE (SparkSQL$TreeNodeException: sort, tree) on EMR 4.0.0 Spark 1.4.1

2015-11-16 Thread Zhang, Jingyu

rk.sql.execution.Sort$$anonfun$doExecute$5.apply(basicOperators.scala:190) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) ... 21 more On 16 November 2015 at 21:16, Zhang, Jingyu wrote: > I am using spark-csv to save files in s3, it shown Size e

Graphx: How to print the group of connected components one by one

2015-12-01 Thread Zhang, Jingyu

Can anyone please let me know How to print all nodes in connected components one by one? graph.connectedComponents() e.g. connected Component ID Nodes ID 1 1,2,3 6 6,7,8,9 Thanks -- This message

Re: word count (group by users) in spark

2015-09-21 Thread Zhang, Jingyu

Spark spills data to disk when there is more data shuffled onto a single executor machine than can fit in memory. However, it flushes out the data to disk one key at a time - so if a single key has more key-value pairs than can fit in memory, an out of memory exception occurs. Cheers, Jingyu On

How to get RDD from PairRDD in Java

2015-09-22 Thread Zhang, Jingyu

Hi All, I want to extract the "value" RDD from PairRDD in Java Please let me know how can I get it easily. Thanks Jingyu -- This message and its attachments may contain legally privileged or confidential information. It is intended solely for the named addressee. If you are not the address

caching DataFrames

2015-09-23 Thread Zhang, Jingyu

I have A and B DataFrames A has columns a11,a12, a21,a22 B has columns b11,b12, b21,b22 I persistent them in cache 1. A.Cache(), 2. B.Cache() Then, I persistent the subset in cache later 3. DataFrame A1 (a11,a12).cache() 4. DataFrame B1 (b11,b12).cache() 5. DataFrame AB1 (a11,a12,b11,b12).cah

Re: Java Heap Space Error

2015-09-23 Thread Zhang, Jingyu

Is you sql works if do not runs a regex on strings and concatenates them, I mean just Select the stuff without String operations? On 24 September 2015 at 10:11, java8964 wrote: > Try to increase partitions count, that will make each partition has less > data. > > Yong > > ---

Re: caching DataFrames

2015-09-23 Thread Zhang, Jingyu

;> you would need extra memory that would be equivalent to half the memory of >> A/B. >> >> You can check the storage that a dataFrame is consuming in the Spark UI's >> Storage tab. http://host:4040/storage/ >> >> >> >> On Thu, Sep 24, 2015 at 5:

Exception on save s3n file (1.4.1, hadoop 2.6)

2015-09-24 Thread Zhang, Jingyu

I got following exception when I run JavPairRDD.values().saveAsTextFile("s3n://bucket); Can anyone help me out? thanks 15/09/25 12:24:32 INFO SparkContext: Successfully stopped SparkContext Exception in thread "main" java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException at org.apa

How many threads will be used to read RDBMS after set numPartitions =10 in Spark JDBC

Re: Scala vs Python for Spark ecosystem

Error from reading S3 in Scala

Re: Error from reading S3 in Scala

How to filter the isolated vertexes in Graphx

Unpersist RDD in Graphx

Failed to remove broadcast 2 with removeFromMaster = true in Graphx

Re: Memory issues on spark

NullPointerException when cache DataFrame in Java (Spark1.5.1)

Re: NullPointerException when cache DataFrame in Java (Spark1.5.1)

Re: Save data to different S3

key not found: sportingpulse.com in Spark SQL 1.5.0

Re: key not found: sportingpulse.com in Spark SQL 1.5.0

Spark-csv error on read AWS s3a in spark 1.4.1

How to passing parameters to another java class

Re: How to passing parameters to another java class

Re: How to passing parameters to another java class

Re: How to passing parameters to another java class

Size exceeds Integer.MAX_VALUE on EMR 4.0.0 Spark 1.4.1

Re: Size exceeds Integer.MAX_VALUE (SparkSQL$TreeNodeException: sort, tree) on EMR 4.0.0 Spark 1.4.1

Graphx: How to print the group of connected components one by one

Re: word count (group by users) in spark

How to get RDD from PairRDD in Java

caching DataFrames

Re: Java Heap Space Error

Re: caching DataFrames

Exception on save s3n file (1.4.1, hadoop 2.6)

27 matches

Site Navigation

Mail list logo

Footer information