Unsubscribe

2016-06-19 Thread Ram Krishna
Hi Sir, Please unsubscribe me -- Regards, Ram Krishna KT

Re: Spark - “min key = null, max key = null” while reading ORC file

2016-06-19 Thread Mohanraj Ragupathiraj
Hi Mich, Thank you for your reply. Let me explain more clearly. File with 100 records needs to joined with a Big lookup File created in ORC format (500 million records). The Spark process i wrote is returing back the matching records and is working fine. My concern is that it loads the entire fi

Re: Spark - “min key = null, max key = null” while reading ORC file

2016-06-19 Thread Mich Talebzadeh
Hi, To start when you store the data in ORC file can you verify that the data is there? For example register it as tempTable processDF.register("tmp") sql("select count(1) from tmp).show Also what do you mean by index file in ORC? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedi

Spark - “min key = null, max key = null” while reading ORC file

2016-06-19 Thread Mohanraj Ragupathiraj
I am trying to join a Dataframe(say 100 records) with an ORC file with 500 million records through Spark(can increase to 4-5 billion, 25 bytes each record). I used Spark hiveContext API. *ORC File Creation Code* //fsdtRdd is JavaRDD, fsdtSchema is StructType schema DataFrame fsdtDf = hiveContext

Re: Running Spark in local mode

2016-06-19 Thread Ashok Kumar
Thank you all sirs Appreciated Mich your clarification. On Sunday, 19 June 2016, 19:31, Mich Talebzadeh wrote: Thanks Jonathan for your points I am aware of the fact yarn-client and yarn-cluster are both depreciated (still work in 1.6.1), hence the new nomenclature. Bear in mind this

Re: Update Batch DF with Streaming

2016-06-19 Thread Amit Assudani
Please help From: amit assudani Date: Thursday, June 16, 2016 at 6:11 PM To: "user@spark.apache.org" Subject: Update Batch DF with Streaming Hi All, Can I update batch data frames loaded in memory with Streaming data, For eg, I have employee DF is registered as temporary table, it has

Re: Running Spark in local mode

2016-06-19 Thread Mich Talebzadeh
Thanks Jonathan for your points I am aware of the fact yarn-client and yarn-cluster are both depreciated (still work in 1.6.1), hence the new nomenclature. Bear in mind this is what I stated in my notes: "YARN Cluster Mode, the Spark driver runs inside an application master process which is mana

Re: Running Spark in local mode

2016-06-19 Thread Jonathan Kelly
Mich, what Jacek is saying is not that you implied that YARN relies on two masters. He's just clarifying that yarn-client and yarn-cluster modes are really both using the same (type of) master (simply "yarn"). In fact, if you specify "--master yarn-client" or "--master yarn-cluster", spark-submit w

Re: How to cause a stage to fail (using spark-shell)?

2016-06-19 Thread Jacek Laskowski
Mind sharing code? I think only shuffle failures lead to stage failures and re-tries. Jacek On 19 Jun 2016 4:35 p.m., "Ted Yu" wrote: > You can utilize a counter in external storage (NoSQL e.g.) > When the counter reaches 2, stop throwing exception so that the task > passes. > > FYI > > On Sun,

Re: Accessing system environment on Spark Worker

2016-06-19 Thread Ted Yu
Have you looked at http://spark.apache.org/docs/latest/ec2-scripts.html ? There is description on setting AWS_SECRET_ACCESS_KEY. On Sun, Jun 19, 2016 at 4:46 AM, Mohamed Taher AlRefaie wrote: > Hello all: > > I have an application that requires accessing DynamoDB tables. Each worker > establish

Re: How to cause a stage to fail (using spark-shell)?

2016-06-19 Thread Ted Yu
You can utilize a counter in external storage (NoSQL e.g.) When the counter reaches 2, stop throwing exception so that the task passes. FYI On Sun, Jun 19, 2016 at 3:22 AM, Jacek Laskowski wrote: > Hi, > > Thanks Burak for the idea, but it *only* fails the tasks that > eventually fail the entir

Re: Switching broadcast mechanism from torrrent

2016-06-19 Thread Ted Yu
I think good practice is not to hold on to SparkContext in mapFunction. On Sun, Jun 19, 2016 at 7:10 AM, Takeshi Yamamuro wrote: > How about using `transient` annotations? > > // maropu > > On Sun, Jun 19, 2016 at 10:51 PM, Daniel Haviv < > daniel.ha...@veracity-group.com> wrote: > >> Hi, >> Jus

Re: Switching broadcast mechanism from torrrent

2016-06-19 Thread Takeshi Yamamuro
How about using `transient` annotations? // maropu On Sun, Jun 19, 2016 at 10:51 PM, Daniel Haviv < daniel.ha...@veracity-group.com> wrote: > Hi, > Just updating on my findings for future reference. > The problem was that after refactoring my code I ended up with a scala > object which held Spar

Re: Switching broadcast mechanism from torrrent

2016-06-19 Thread Daniel Haviv
Hi, Just updating on my findings for future reference. The problem was that after refactoring my code I ended up with a scala object which held SparkContext as a member, eg: object A { sc: SparkContext = new SparkContext def mapFunction {} } and when I called rdd.map(A.mapFunction) it

Accessing system environment on Spark Worker

2016-06-19 Thread Mohamed Taher AlRefaie
Hello all: I have an application that requires accessing DynamoDB tables. Each worker establishes a connection with the database on its own. I have added both `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` to both master's and workers `spark-env.sh` file. I have also run the file using `sh` to m

Re: Running JavaBased Implementationof StreamingKmeans

2016-06-19 Thread Biplob Biswas
Hi, Thanks for that input, I tried doing that but apparently thats not working as well. I thought i am having problems with my spark installation so I ran simple word count and that works, so I am not really sure what the problem is now. Is my translation of the scala code correct? I don't unders

Re: Running Spark in local mode

2016-06-19 Thread Mich Talebzadeh
Good points but I am an experimentalist In Local mode I have this In local mode with: --master local This will start with one thread or equivalent to –master local[1]. You can also start by more than one thread by specifying the number of threads *k* in –master local[k]. You can also start us

Re: Running Spark in local mode

2016-06-19 Thread Jacek Laskowski
On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh wrote: > Spark Local - Spark runs on the local host. This is the simplest set up and > best suited for learners who want to understand different concepts of Spark > and those performing unit testing. There are also the less-common master URLs: *

Re: Running Spark in local mode

2016-06-19 Thread Mich Talebzadeh
Spark works on different modes, either local (Spark or anything else does not manager) resources and standalone (Spark itself manages resources) plus others (see below) These are from my notes, excluding mesos that I have not used - Spark Local - Spark runs on the local host. This is the sim

Re: How to cause a stage to fail (using spark-shell)?

2016-06-19 Thread Jacek Laskowski
Hi, Thanks Burak for the idea, but it *only* fails the tasks that eventually fail the entire job not a particular stage (just once or twice) before the entire job is failed. The idea is to see the attempts in web UI as there's a special handling for cases where a stage failed once or twice before

Re: Running Spark in local mode

2016-06-19 Thread Takeshi Yamamuro
There are many technical differences inside though, how to use is the almost same with each other. yea, in a standalone mode, spark runs in a cluster way: see http://spark.apache.org/docs/1.6.1/cluster-overview.html // maropu On Sun, Jun 19, 2016 at 6:14 PM, Ashok Kumar wrote: > thank you > > W

Re: Running Spark in local mode

2016-06-19 Thread Ashok Kumar
thank you  What are the main differences between a local mode and standalone mode. I understand local mode does not support cluster. Is that the only difference? On Sunday, 19 June 2016, 9:52, Takeshi Yamamuro wrote: Hi, In a local mode, spark runs in a single JVM that has a master an

Re: Running Spark in local mode

2016-06-19 Thread Takeshi Yamamuro
Hi, In a local mode, spark runs in a single JVM that has a master and one executor with `k` threads. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/local/LocalSchedulerBackend.scala#L94 // maropu On Sun, Jun 19, 2016 at 5:39 PM, Ashok Kumar wrote: >

plot importante variable in pyspark

2016-06-19 Thread pseudo oduesp
hi, who can get score for each row of classification algortithmes , and how i can plot features importance of variables like sickit learn ? thanks.

Running Spark in local mode

2016-06-19 Thread Ashok Kumar
Hi, I have been told Spark in Local mode is simplest for testing. Spark document covers little on local mode except the cores used in --master local[k].  Where are the the driver program, executor and resources. Do I need to start worker threads and how many app I can use safely without exceeding

Re: sparkR.init() can not load sparkPackages.

2016-06-19 Thread Sun Rui
Hi, Joseph, This is a known issue but not a bug. This issue does not occur when you use interactive SparkR session, while it does occur when you execute an R file. The reason behind this is that in case you execute an R file, the R backend launches before the R interpreter, so there is no oppo