Re: Create one DB connection per executor

2016-03-26 Thread Manas
Thanks much Gerard & Manas for your inputs. I'll keep in mind the connection pooling part. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Create-one-DB-connection-per-executor-tp26588p26601.html Sent from the Apache Spark User List mailing list a

Create one DB connection per executor

2016-03-24 Thread Manas
I understand that using foreachPartition I can create one DB connection per partition level. Is there a way to create a DB connection per executor level and share that for all partitions/tasks run within that executor? One approach I am thinking is to have a singleton with say a getConnection metho

Re: mapwithstate Hangs with Error cleaning broadcast

2016-03-15 Thread manas kar
will be less) I hope to get some guidance as to what parameter I can use in order to totally avoid this issue. I am guessing spark.shuffle.io.preferDirectBufs = false but I am not sure. ..Manas On Tue, Mar 15, 2016 at 2:30 PM, Iain Cundy wrote: > Hi Manas > > > > I saw a very

Re: mapwithstate Hangs with Error cleaning broadcast

2016-03-15 Thread manas kar
I am using spark 1.6. I am not using any broadcast variable. This broadcast variable is probably used by the state management of mapwithState ...Manas On Tue, Mar 15, 2016 at 10:40 AM, Ted Yu wrote: > Which version of Spark are you using ? > > Can you show the code snippet w.r.t.

How to handle Option[Int] in dataframe

2015-11-02 Thread manas kar
[Int] from a row instead of Int from a dataframe? ...Manas Some more description /*My case class*/ case class Student(name: String, age: Option[Int]) val s = new Student("Manas",Some(35)) val s1 = new Student("Manas1",None) val student =sc.makeRDD(List(s,s1)).toDF /*Now w

newAPIHadoopRDD file name

2015-04-18 Thread Manas Kar
assOf[AvroKeyInputFormat[myObject]], classOf[AvroKey[myObject]], classOf[NullWritable]) Basically I would like to end up having a tuple of (FileName, AvroKey[MyObject, NullWritable]) Any help is appreciated. .Manas

Re: Spark unit test fails

2015-04-06 Thread Manas Kar
Trying to bump up the rank of the question. Any example on Github can someone point to? ..Manas On Fri, Apr 3, 2015 at 9:39 AM, manasdebashiskar wrote: > Hi experts, > I am trying to write unit tests for my spark application which fails with > javax.servlet.FilterRegistration error.

Spark unit test fails

2015-04-03 Thread Manas Kar
nknown Source) [info] at java.net.URLClassLoader.defineClass(Unknown Source) [info] at java.net.URLClassLoader.access$100(Unknown Source) [info] at java.net.URLClassLoader$1.run(Unknown Source) [info] at java.net.URLClassLoader$1.run(Unknown Source) [info] at java.security.AccessController.doPrivileged(Native Method) [info] at java.net.URLClassLoader.findClass(Unknown Source) Thanks Manas

Re: Cannot run spark-shell "command not found".

2015-03-30 Thread Manas Kar
/content/cloudera/en/downloads.html) super easy. Currently there are on Spark 1.2. ..Manas On Mon, Mar 30, 2015 at 1:34 PM, vance46 wrote: > Hi all, > > I'm a newbee try to setup spark for my research project on a RedHat system. > I've downloaded spark-1.3.0.tgz and unt

Re: PairRDD serialization exception

2015-03-11 Thread Manas Kar
= "com.github.scopt" %% "scopt" % V.scopt val breeze = "org.scalanlp" %% "breeze" % V.breeze val breezeNatives = "org.scalanlp" %% "breeze-natives" % V.breeze val config = "com.typesafe&

Re: Joining data using Latitude, Longitude

2015-03-11 Thread Manas Kar
you want to ask questions like nearby me then these are the basic steps. 1) Index your geometry data which uses R-Tree. 2) Write your joiner logic that takes advantage of the index tree to get you faster access. Thanks Manas On Wed, Mar 11, 2015 at 5:55 AM, Andrew Musselman < andrew.mus

java.io.InvalidClassException: org.apache.spark.rdd.PairRDDFunctions; local class incompatible: stream classdesc

2015-03-10 Thread Manas Kar
.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Thanks Manas

Re: RDDs

2015-03-03 Thread Manas Kar
The above is a great example using thread. Does any one have an example using scala/Akka Future to do the same. I am looking for an example like that which uses a Akka Future and does something if the Future "Timesout" On Tue, Mar 3, 2015 at 9:16 AM, Manas Kar wrote: > The abo

Re: RDDs

2015-03-03 Thread Manas Kar
The above is a great example using thread. Does any one have an example using scala/Akka Future to do the same. I am looking for an example like that which uses a Akka Future and does something if the Future "Timesout" On Tue, Mar 3, 2015 at 7:00 AM, Kartheek.R wrote: > Hi TD, > "You can always

How to debug a Hung task

2015-02-27 Thread Manas Kar
on is executed does not help as the actual message gets buried in logs. How do one go about debugging such case? Also, is there a way I can wrap my function inside some sort of timer based environment and if it took too long I would throw a stack trace or some sort. Thanks Manas

How to print more lines in spark-shell

2015-02-23 Thread Manas Kar
Hi experts, I am using Spark 1.2 from CDH5.3. When I issue commands like myRDD.take(10) the result gets truncated after 4-5 records. Is there a way to configure the same to show more items? ..Manas

Re: Master dies after program finishes normally

2015-02-12 Thread Manas Kar
I have 5 workers each executor-memory 8GB of memory. My driver memory is 8 GB as well. They are all 8 core machines. To answer Imran's question my configurations are thus. executor_total_max_heapsize = 18GB This problem happens at the end of my program. I don't have to run a lot of jobs to see

Re: Master dies after program finishes normally

2015-02-12 Thread Manas Kar
obs to see this behaviour. I can see my output correctly in HDFS and all. I will give it one more try after increasing master's memory(which is default 296MB to 512 MB) ..manas On Thu, Feb 12, 2015 at 2:14 PM, Arush Kharbanda wrote: > How many nodes do you have in your cluster, how many

Re: Master dies after program finishes normally

2015-02-12 Thread Manas Kar
Hi Arush, Mine is a CDH5.3 with Spark 1.2. The only change to my spark programs are -Dspark.driver.maxResultSize=3g -Dspark.akka.frameSize=1000. ..Manas On Thu, Feb 12, 2015 at 2:05 PM, Arush Kharbanda wrote: > What is your cluster configuration? Did you try looking at the Web UI? > The

Master dies after program finishes normally

2015-02-12 Thread Manas Kar
org.apache.spark.deploy.master.Master.finishApplication(Master.scala:653) at org.apache.spark.deploy.master.Master$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$29.apply(Master.scala:399) Can anyone help? ..Manas

Spark 1.2 + Avro file does not work in HDP2.2

2014-12-12 Thread Manas Kar
Hi Experts, I have recently installed HDP2.2(Depends on hadoop 2.6). My spark 1.2 is built with hadoop 2.4 profile. My program has following dependencies val avro= "org.apache.avro" % "avro-mapred" %"1.7.7" val spark = "org.apache.spark" % "spark-core_2.10" % "1.2.0" % "pr

Asymmetric spark cluster memory utilization

2014-10-25 Thread Manas Kar
Hi, I have a spark cluster that has 5 machines with 32 GB memory each and 2 machines with 24 GB each. I believe the spark.executor.memory will assign the executor memory for all executors. How can I use 32 GB memory from the first 5 machines and 24 GB from the next 2 machines. Thanks ..Manas

How to create Track per vehicle using spark RDD

2014-10-14 Thread Manas Kar
of memory every time because of the volume of data. ...Manas *For some reason I have never got any reply to my emails to the user group. I am hoping to break that trend this time. :)*

Null values in Date field only when RDD is saved as File.

2014-10-03 Thread Manas Kar
Hi, I am using a library that parses Ais Messages. My code which follows the simple steps gives me null values in Date field. 1) Get the message from file. 2) parse the message. 3) map the message RDD to only keep the (Date, SomeInfo) 4) take top 100 element. Result = the Date field appears fine

Spark Streaming Example with CDH5

2014-06-17 Thread manas Kar
t dist gives the following error: object SecurityManager is not a member of package org.apache.spark [error] import org.apache.spark.{SparkConf, SecurityManager} build.scala <http://apache-spark-user-list.1001560.n3.nabble.com/file/n7796/build.scala> Appreciate the great work the spark community is d

Can I share RDD between a pyspark and spark API

2014-05-05 Thread manas Kar
) In doing so I don't want to push the parsed data to disk and then re-obtain it via the scala class. Is there a way I can achieve what I want in an efficient way? ..Manas -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-I-share-RDD-between-a-pyspark

Re: Shark on cloudera CDH5 error

2014-05-05 Thread manas Kar
-core folder of step2. Hope this saves some time for some one who has the similar problem. ..Manas -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shark-on-cloudera-CDH5-error-tp5226p5374.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

ETL for postgres to hadoop

2014-04-08 Thread Manas Kar
as to how to do it easily? Thanks Manas [cid:ee_gradient_tm_150wide.png@f20f7501e5a14d6f85ec33629f725228] Manas Kar Intermediate Software Developer, Product Development | exactEarth Ltd. 60 Struck Ct. Cambridge, Ontario N1R 8L2 office. +1.519.622.4445 ext. 5869 | direct: +1.519.620