date:20140927

回复: RE: problem with data locality api

2014-09-27 Thread qinwei

Thank you for your reply, ? ? I understand your explaination, but i wonder what is the?correct usage of the apinew SparkContext(config: SparkConf, preferredNodeLocationData: Map[String, Set[SplitInfo]])how to construct the second param?preferredNodeLocationData?hope for yo

Re: How to use multi thread in RDD map function ?

2014-09-27 Thread Sean Owen

If increasing executors really isn't enough, then you can consider using mapPartitions to process whole partitions at a time. Within that you can multi thread your processing of the elements in the partition. (And you should probably use more like one worker per machine then.) The question is how

RE: problem with data locality api

2014-09-27 Thread Shao, Saisai

Hi First conf is used for Hadoop to determine the locality distribution of HDFS file. Second conf is used for Spark, though with the same name, actually they are two different classes. Thanks Jerry From: qinwei [mailto:wei@dewmobile.net] Sent: Sunday, September 28, 2014 2:05 PM To: user Su

Re: How to use multi thread in RDD map function ?

2014-09-27 Thread myasuka

Thank you for your reply, Actually, we have already used this parameter. Our cluster is a standalone cluster with 16 nodes, every node has 16 cores. We have 256 pairs matrices along with 256 tasks , when we set --total-executor-cores as 64, each node can launch 4 tasks simultaneously, each task

problem with patitioning

2014-09-27 Thread qinwei

Hi, everyone I come across a problem with changing the patition number of the rdd, my code is as below: val rdd1 = sc.textFile(path1) val rdd2 = sc.textFile(path2) val rdd3 = sc.textFile(path3) val imeiList = parseParam(job.jobParams) val broadcastVar = sc.broadc

problem with data locality api

2014-09-27 Thread qinwei

Hi, everyone? ? I come across with a problem about data locality, i found these?example?code in 《Spark-on-YARN-A-Deep-Dive-Sandy-Ryza.pdf》? ??? ??val locData = InputFormatInfo.computePreferredLocations(Seq(new InputFormatInfo(conf, classOf[TextInputFormat], new Path(“myfile.txt”)))?? ???

Re: How to use multi thread in RDD map function ?

2014-09-27 Thread qinwei

in the options of spark-submit, there are two options which may be helpful to your problem, they are "--total-executor-cores NUM"(standalone and mesos only), "--executor-cores"(yarn only) qinwei From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use multi thread in RDD map funct

How to use multi thread in RDD map function ?

2014-09-27 Thread myasuka

Hi, everyone I come across with a problem about increasing the concurency. In a program, after shuffle write, each node should fetch 16 pair matrices to do matrix multiplication. such as: * import breeze.linalg.{DenseMatrix => BDM} pairs.map(t => { val b1 = t._2._1.asInstanceOf[BDM[Do

PageRank execution imbalance, might hurt performance by 6x

2014-09-27 Thread Larry Xiao

Hi all! I'm running PageRank on GraphX, and I find on some tasks on one machine can spend 5~6 times more time than on others, others are perfectly balance (around 1 second to finish). And since time for a stage (iteration) is determined by the slowest task, the performance is undesirable. I

Re: Build spark with Intellij IDEA 13

2014-09-27 Thread maddenpj

I actually got this same exact issue compiling a unrelated project (not using spark). Maybe it's a protobuf issue? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Build-spark-with-Intellij-IDEA-13-tp9904p15284.html Sent from the Apache Spark User List maili

Re: New user question on Spark SQL: can I really use Spark SQL like a normal DB?

2014-09-27 Thread jamborta

hi, Yes, I have been using spark sql extensively that way. I have just tried and saveAsTable() works OK on 1.1.0. Alternatively, you can write the data from schemaRDD to HDFS using saveAsTextFile, and create an external table on top of it. thanks, -- View this message in context: http://a

Re: How to run hive scripts pro-grammatically in Spark 1.1.0 ?

2014-09-27 Thread jamborta

Hi, you can create a spark context in your python or scala environment and use that to run your hive queries, pretty much the same way as you'd do it in the shell. thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-run-hive-scripts-pro-grammati

yarn does not accept job in cluster mode

2014-09-27 Thread jamborta

hi all, I have a job that works ok in yarn-client mode,but when I try in yarn-cluster mode it returns the following: WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory the cluster has plent

Re: iPython notebook ec2 cluster matlabplot not found?

2014-09-27 Thread Nicholas Chammas

Can you first confirm that the regular PySpark shell works on your cluster? Without upgrading to 2.7. That is, you log on to your master using spark-ec2 login and run bin/pyspark successfully without any special flags. And as far as I can tell, you should be able to use IPython at 2.6, so I’d next

iPython notebook ec2 cluster matlabplot not found?

2014-09-27 Thread Andy Davidson

Hi I am having a heck of time trying to get python to work correctly on my cluster created using the spark-ec2 script The following link was really helpful https://issues.apache.org/jira/browse/SPARK-922 I am still running into problem with matplotlib. (it works fine on my mac). I can not fig

Re: Retrieve dataset of Big Data Benchmark

2014-09-27 Thread Tom

Hi, I was able to download the dataset this way (and just reconfirmed it by doing so again): //Following before starting spark export AWS_ACCESS_KEY_ID=*key_id* export AWS_SECRET_ACCESS_KEY=*access_key* //Start spark ./spark-shell //In the spark shell val dataset = sc.textFile("s3n://big-data-ben

MLlib 1.2 New & Interesting Features

2014-09-27 Thread Krishna Sankar

Guys, - Need help in terms of the interesting features coming up in MLlib 1.2. - I have a 2 Part, ~3 hr hands-on tutorial at the Big Data Tech Con - "The Hitchhiker's Guide to Machine Learning with Python & Apache Spark"[2] - At minimum, it would be good to take the last 30 mi

Re: Is it possible to use Parquet with Dremel encoding

2014-09-27 Thread Michael Armbrust

Based on your first example it looks like what you want is actually run length encoding (which parquet does support ). Repetition and definition levels are used to reconstruct nested or repeated (arrays) data that has been shredded

RDD logic and control

2014-09-27 Thread pop1998

hello, im examining the SPARK RDDs and trying to understand how does the RDD flow works. can any one please tell me how does the RDD decide to (and where can i find the relevant code): 1. re-split to new RDD? 2. move to a new PC? 3. perform PC selection? 4. preform union of multiple RDDs? 5. how

Re: Log hdfs blocks sending

2014-09-27 Thread Andrew Ash

Hi Alexey, You're looking in the right place in the first log from the driver. Specifically the locality is on the TaskSetManager INFO log level and looks like this: 14/09/26 16:57:31 INFO TaskSetManager: Starting task 9.0 in stage 1.0 (TID 10, 10.54.255.191, ANY, 1341 bytes) The ANY there mean

Re: flume spark streaming receiver host random

2014-09-27 Thread Sean Owen

I don't think you control which host he receiver runs on, right? So that Spark can handle the failure of that node and reassign the receiver. On Sep 27, 2014 2:43 AM, "centerqi hu" wrote: > the receiver is not running on the machine I expect > > > > 2014-09-26 14:09 GMT+08:00 Sean Owen : > > I th

回复: RE: problem with data locality api

Re: How to use multi thread in RDD map function ?

RE: problem with data locality api

Re: How to use multi thread in RDD map function ?

problem with patitioning

problem with data locality api

Re: How to use multi thread in RDD map function ?

How to use multi thread in RDD map function ?

PageRank execution imbalance, might hurt performance by 6x

Re: Build spark with Intellij IDEA 13

Re: New user question on Spark SQL: can I really use Spark SQL like a normal DB?

Re: How to run hive scripts pro-grammatically in Spark 1.1.0 ?

yarn does not accept job in cluster mode

Re: iPython notebook ec2 cluster matlabplot not found?

iPython notebook ec2 cluster matlabplot not found?

Re: Retrieve dataset of Big Data Benchmark

MLlib 1.2 New & Interesting Features

Re: Is it possible to use Parquet with Dremel encoding

RDD logic and control

Re: Log hdfs blocks sending

Re: flume spark streaming receiver host random

21 matches

Site Navigation

Mail list logo

Footer information