from:"Manas Kar"

Re: mapwithstate Hangs with Error cleaning broadcast

2016-03-15 Thread manas kar

I am using spark 1.6. I am not using any broadcast variable. This broadcast variable is probably used by the state management of mapwithState ...Manas On Tue, Mar 15, 2016 at 10:40 AM, Ted Yu wrote: > Which version of Spark are you using ? > > Can you show the code snippet w.r.t. broadcast vari

Re: mapwithstate Hangs with Error cleaning broadcast

2016-03-15 Thread manas kar

klog or other issue to fix maybe you’ll get lucky > too J. > > > > Cheers > > Iain > > > > *From:* manas kar [mailto:poorinsp...@gmail.com] > *Sent:* 15 March 2016 14:49 > *To:* Ted Yu > *Cc:* user > *Subject:* [MARKETING] Re: mapwithstate Hangs with Error

How to handle Option[Int] in dataframe

2015-11-02 Thread manas kar

Hi, I have a case class with many columns that are Option[Int] or Option[Array[Byte]] and such. I would like to save it to parquet file and later read it back to my case class too. I found that Option[Int] when null returns 0 when the field is Null. My question: Is there a way to get Option[In

Master dies after program finishes normally

2015-02-12 Thread Manas Kar

Hi, I have a Hidden Markov Model running with 200MB data. Once the program finishes (i.e. all stages/jobs are done) the program hangs for 20 minutes or so before killing master. In the spark master the following log appears. 2015-02-12 13:00:05,035 ERROR akka.actor.ActorSystemImpl: Uncaught fat

Re: Master dies after program finishes normally

2015-02-12 Thread Manas Kar

re are many tips here > > http://spark.apache.org/docs/1.2.0/tuning.html > > Did you try these? > > On Fri, Feb 13, 2015 at 12:09 AM, Manas Kar > wrote: > >> Hi, >> I have a Hidden Markov Model running with 200MB data. >> Once the program finishes (i.e

Re: Master dies after program finishes normally

2015-02-12 Thread Manas Kar

cores, what is the > size of the memory? > > On Fri, Feb 13, 2015 at 12:42 AM, Manas Kar > wrote: > >> Hi Arush, >> Mine is a CDH5.3 with Spark 1.2. >> The only change to my spark programs are >> -Dspark.driver.maxResultSize=3g -Dspark.akka.frameSize=1000. &

Re: Master dies after program finishes normally

2015-02-12 Thread Manas Kar

I have 5 workers each executor-memory 8GB of memory. My driver memory is 8 GB as well. They are all 8 core machines. To answer Imran's question my configurations are thus. executor_total_max_heapsize = 18GB This problem happens at the end of my program. I don't have to run a lot of jobs to see

How to print more lines in spark-shell

2015-02-23 Thread Manas Kar

Hi experts, I am using Spark 1.2 from CDH5.3. When I issue commands like myRDD.take(10) the result gets truncated after 4-5 records. Is there a way to configure the same to show more items? ..Manas

How to debug a Hung task

2015-02-27 Thread Manas Kar

Hi, I have a spark application that hangs on doing just one task (Rest 200-300 task gets completed in reasonable time) I can see in the Thread dump which function gets stuck how ever I don't have a clue as to what value is causing that behaviour. Also, logging the inputs before the function is exe

Re: RDDs

2015-03-03 Thread Manas Kar

The above is a great example using thread. Does any one have an example using scala/Akka Future to do the same. I am looking for an example like that which uses a Akka Future and does something if the Future "Timesout" On Tue, Mar 3, 2015 at 7:00 AM, Kartheek.R wrote: > Hi TD, > "You can always

Re: RDDs

2015-03-03 Thread Manas Kar

The above is a great example using thread. Does any one have an example using scala/Akka Future to do the same. I am looking for an example like that which uses a Akka Future and does something if the Future "Timesout" On Tue, Mar 3, 2015 at 9:16 AM, Manas Kar wrote: > The abo

java.io.InvalidClassException: org.apache.spark.rdd.PairRDDFunctions; local class incompatible: stream classdesc

2015-03-10 Thread Manas Kar

Hi, I have a CDH5.3.2(Spark1.2) cluster. I am getting an local class incompatible exception for my spark application during an action. All my classes are case classes(To best of my knowledge) Appreciate any help. Exception in thread "main" org.apache.spark.SparkException: Job aborted due to sta

Re: Joining data using Latitude, Longitude

2015-03-11 Thread Manas Kar

There are few techniques currently available. Geomesa which uses GeoHash also can be proved useful.( https://github.com/locationtech/geomesa) Other potential candidate is https://github.com/Esri/gis-tools-for-hadoop especially https://github.com/Esri/geometry-api-java for inner customization. If

Re: PairRDD serialization exception

2015-03-11 Thread Manas Kar

putStream.java:1915) > > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > > at > > > org.apache.spark.serializer.JavaDe

Re: Cannot run spark-shell "command not found".

2015-03-30 Thread Manas Kar

If you are only interested in getting a hands on with Spark and not with building it with specific version of Hadoop use one of the bundle provider like cloudera. It will give you a very easy way to install and monitor your services.( I find installing via cloudera manager http://www.cloudera.com/

Spark unit test fails

2015-04-03 Thread Manas Kar

Hi experts, I am trying to write unit tests for my spark application which fails with javax.servlet.FilterRegistration error. I am using CDH5.3.2 Spark and below is my dependencies list. val spark = "1.2.0-cdh5.3.2" val esriGeometryAPI = "1.2" val csvWriter = "1.0.0"

Re: Spark unit test fails

2015-04-06 Thread Manas Kar

Unknown Source) > [info] at java.lang.ClassLoader.defineClass(Unknown Source) > [info] at java.security.SecureClassLoader.defineClass(Unknown Source) > [info] at java.net.URLClassLoader.defineClass(Unknown Source) > [info] at java.net.URLClassLoader.access$100(Unknown Source)

newAPIHadoopRDD file name

2015-04-18 Thread Manas Kar

I would like to get the file name along with the associated objects so that I can do further mapping on it. My code below gives me AvroKey[myObject], NullWritable but I don't know how to get the file that gave those objects. sc.newAPIHadoopRDD(job.getConfiguration, classOf[AvroKeyInputFo

Spark Streaming Example with CDH5

2014-06-17 Thread manas Kar

Hi Spark Gurus, I am trying to compile a spark streaming example with CDH5 and having problem compiling it. Has anyone created an example spark streaming using CDH5(preferably Spark 0.9.1) and would be kind enough to share the build.sbt(.scala) file?(or point to their example on github). I know

Null values in Date field only when RDD is saved as File.

2014-10-03 Thread Manas Kar

Hi, I am using a library that parses Ais Messages. My code which follows the simple steps gives me null values in Date field. 1) Get the message from file. 2) parse the message. 3) map the message RDD to only keep the (Date, SomeInfo) 4) take top 100 element. Result = the Date field appears fine

How to create Track per vehicle using spark RDD

2014-10-14 Thread Manas Kar

Hi, I have an RDD containing Vehicle Number , timestamp, Position. I want to get the "lag" function equivalent to my RDD to be able to create track segment of each Vehicle. Any help? PS: I have tried reduceByKey and then splitting the List of position in tuples. For me it runs out of memory eve

Asymmetric spark cluster memory utilization

2014-10-25 Thread Manas Kar

Hi, I have a spark cluster that has 5 machines with 32 GB memory each and 2 machines with 24 GB each. I believe the spark.executor.memory will assign the executor memory for all executors. How can I use 32 GB memory from the first 5 machines and 24 GB from the next 2 machines. Thanks ..Manas

Spark 1.2 + Avro file does not work in HDP2.2

2014-12-12 Thread Manas Kar

Hi Experts, I have recently installed HDP2.2(Depends on hadoop 2.6). My spark 1.2 is built with hadoop 2.4 profile. My program has following dependencies val avro= "org.apache.avro" % "avro-mapred" %"1.7.7" val spark = "org.apache.spark" % "spark-core_2.10" % "1.2.0" % "pr

ETL for postgres to hadoop

2014-04-08 Thread Manas Kar

as to how to do it easily? Thanks Manas [cid:ee_gradient_tm_150wide.png@f20f7501e5a14d6f85ec33629f725228] Manas Kar Intermediate Software Developer, Product Development | exactEarth Ltd. 60 Struck Ct. Cambridge, Ontario N1R 8L2 office. +1.519.622.4445 ext. 5869 | direct: +1.519.620

Re: Shark on cloudera CDH5 error

2014-05-05 Thread manas Kar

No replies yet. Guess everyone who had this problem knew the obvious reason why the error occurred. It took me some time to figure out the work around though. It seems shark depends on /var/lib/spark/shark-0.9.1/lib_managed/jars/org.apache.hadoop/hadoop-core/hadoop-core.jar for client server com

Can I share RDD between a pyspark and spark API

2014-05-05 Thread manas Kar

Hi experts. I have some pre-built python parsers that I am planning to use, just because I don't want to write them again in scala. However after the data is parsed I would like to take the RDD and use it in a scala program.(Yes, I like scala more than python and more comfortable in scala :) In d

Re: mapwithstate Hangs with Error cleaning broadcast

Re: mapwithstate Hangs with Error cleaning broadcast

How to handle Option[Int] in dataframe

Master dies after program finishes normally

Re: Master dies after program finishes normally

Re: Master dies after program finishes normally

Re: Master dies after program finishes normally

How to print more lines in spark-shell

How to debug a Hung task

Re: RDDs

Re: RDDs

java.io.InvalidClassException: org.apache.spark.rdd.PairRDDFunctions; local class incompatible: stream classdesc

Re: Joining data using Latitude, Longitude

Re: PairRDD serialization exception

Re: Cannot run spark-shell "command not found".

Spark unit test fails

Re: Spark unit test fails

newAPIHadoopRDD file name

Spark Streaming Example with CDH5

Null values in Date field only when RDD is saved as File.

How to create Track per vehicle using spark RDD

Asymmetric spark cluster memory utilization

Spark 1.2 + Avro file does not work in HDP2.2

ETL for postgres to hadoop

Re: Shark on cloudera CDH5 error

Can I share RDD between a pyspark and spark API

26 matches

Site Navigation

Mail list logo

Footer information