How to consider HTML files in Spark

2015-03-12 Thread yh18190
Hi.I am very much fascinated to Spark framework.I am trying to use Pyspark + Beautifulsoup to parse HTML files.I am facing problems to load html file into beautiful soup. Example filepath= file:///path to html directory def readhtml(inputhtml): { soup=Beautifulsoup(inputhtml) //to load html content

Request for help in writing to Textfile

2014-08-25 Thread yh18190
Hi Guys, I am currently playing with huge data.I have an RDD which returns RDD[List[(tuples)]].I need only the tuples to be written to textfile output using saveAsTextFile function. example:val mod=modify.saveASTextFile() returns List((20140813,4,141127,3,HYPHLJLU,HY,KNGHWEB,USD,144.00,662.

Request for Help

2014-08-25 Thread yh18190
Hi Guys, I just want to know whether their is any way to determine which file is being handled by spark from a group of files input inside a directory.Suppose I have 1000 files which are given as input,I want to determine which file is being handled currently by spark program so that if any error

Unable to ship external Python libraries in PYSPARK

2014-09-12 Thread yh18190
Hi all, I am currently working on pyspark for NLP processing etc.I am using TextBlob python library.Normally in a standalone mode it easy to install the external python libraries .In case of cluster mode I am facing problem to install these libraries on worker nodes remotely.I cannot access each a

Re: Unable to ship external Python libraries in PYSPARK

2014-10-07 Thread yh18190
Hi David, Thanks for the reply and effort u put to explain the concepts.Thanks for example.It worked. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-ship-external-Python-libraries-in-PYSPARK-tp14074p15844.html Sent from the Apache Spark User List

Regarding Successive operation on elements and recursively

2014-03-18 Thread yh18190
Hi , >I am new to Spark scala environment.Currently I am working on Discrete wavelet transformation algos on time series data. > I have to perform recursive additions on successive elements in RDDs. > for example > List of elements(RDDS) --a1 a2 a3 a4. > level1 Tranformation --a1+a2 a3+a4 a

Splitting RDD and Grouping together to perform computation

2014-03-24 Thread yh18190
Hi,I have large data set of numbers ie RDD and wanted to perform a computation only on groupof two values at a time.For example1,2,3,4,5,6,7... is an RDDCan i group the RDD into (1,2),(3,4),(5,6)...?? and perform the respective computations ?in an efficient manner?As we do'nt have a way to index e

Re: Splitting RDD and Grouping together to perform computation

2014-03-24 Thread yh18190
We need some one who can explain us with short code snippet on given example so that we get clear cut idea on RDDs indexing.. Guys please help us -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp31

Re: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread yh18190
Hi, Thanks Nanzhu.I tried to implement your suggestion on following scenario.I have RDD of say 24 elements.In that when i partioned into two groups of 12 elements each.Their is loss of order of elements in partition.Elemest are partitioned randomly.I need to preserve the order such that the first 1

RE: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread yh18190
Hi, Here is my code for given scenario.Could you please let me know where to sort?I mean on what basis we have to sort??so that they maintain order in partition as thatof original sequence.. val res2=reduced_hccg.map(_._2)// which gives RDD of numbers res2.foreach(println) val result= res2.ma

RE: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread yh18190
Hi Andriana, Thanks for suggestion.Could you please modify my code part where I need to do so..I apologise for inconvinience ,becoz i am new to spark I coudnt apply appropriately..i would be thankful to you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/S

RE: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread yh18190
Hi Andriana, Ofcourse u can sortbykey but after that when u perform mapparttion it doesnt guarantee that 1st partition has all those eleement in order as of original sequence..I think we need a partitioner such that it partitions the sequence maintaining order... Could anyone help me in defining

Zip or map elements to create new RDD

2014-03-29 Thread yh18190
Hi, I have an RDD of elements and want to create a new RDD by Zipping other RDD in order. result[RDD] with sequence of 10,20,30,40,50 ...elements. I am facing problems as index is not an RDD...its gives an error...Could anyone help me how we can zip it or map it inorder to obtain following result.(

Re: Zip or map elements to create new RDD

2014-03-29 Thread yh18190
Thanks sonal.Is der anyother way like to map values with Increasing indexes...so that i can map(t=>(i,t)) where value if 'i' increases after each map operation on element... Please help me ..in this aspect -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Zi

How to index each map operation????

2014-03-29 Thread yh18190
Hi, I want to perform map operation on an RDD of elements such that resulting RDD is a key value pair(counter,value) For example var k:RDD[Int]=10,20,30,40,40,60... k.map(t=>(i,t)) where 'i' value should be like a counter whose value increments after each mapoperation... Pleas help me.. I tried

Can we convert scala.collection.ArrayBuffer[(Int,Double)] to org.spark.RDD[(Int,Double])

2014-03-30 Thread yh18190
Hi, Can we convert directly scala collection to spark RDD data type without using parellize method? Is their any way to create custom converted RDD datatype from scala type using some typecast like that? Please suggest me -- View this message in context: http://apache-spark-user-list.1001

Re: How to index each map operation????

2014-04-02 Thread yh18190
Hi Therry, Thanks for the above responses..I implemented using RangePartitioner..we need to use any of the custom partitioners in orderto perform this task..Normally u cant maintain a counter becoz count operations should beperformed on each partitioned block ofdata... -- View this message in c

Need suggestions

2014-04-02 Thread yh18190
Hi Guys, Currently I am facing this issue ..Not able to find erros.. here is sbt file. name := "Simple Project" version := "1.0" scalaVersion := "2.10.3" resolvers += "bintray/meetup" at "http://dl.bintray.com/meetup/maven"; resolvers += "Akka Repository" at "http://repo.akka.io/releases/"; r

Re: Need suggestions

2014-04-02 Thread yh18190
Hi, Thanks for response.Could you please look into my repo..Here Utils class is the class.I cannot paste the entire code..Thaswhy.. I have other class from where I would be calling Utils class for object creation.. package main.scala import org.apache.spark.SparkContext import org.apache.spark.S

Re: Need suggestions

2014-04-02 Thread yh18190
Its working under local mode..but not under cluster mode with 4 slaves -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-suggestions-tp3650p3653.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Need suggestions

2014-04-02 Thread yh18190
Hi, Here is the sparkcontext feature.Do I need to any more extra jars to slaves separetely or this is enough? But i am able to see this created jar in my target directory.. val sc = new SparkContext("spark://spark-master-001:7077", "Simple App", utilclass.spark_home, List("target/sc

Regarding Sparkcontext object

2014-04-02 Thread yh18190
Hi Is it always needed that sparkcontext object be created in Main method of class.Is it necessary?Can we create "sc" object in other class and try to use it by passing this object through function and use it? Please clarify.. -- View this message in context: http://apache-spark-user-list.100

How to use addJar for adding external jars in spark-0.9?

2014-04-03 Thread yh18190
Hi, I guess their is problem with spark 0.9 version because when I tried to add external jar jerkson_2.9.1_0.5.0 version with scala version being 2.10.3 in cluster. I am facing java.classNodef error becoz this jars are not being sent to worker nodes.. Please let me know how to resolve this issue,,

Problem with KryoSerializer

2014-04-15 Thread yh18190
Hi, I have a problem when i want to use spark kryoserializer by extending a class Kryoregistarar to register custom classes inorder to create objects.I am getting following exception When I run following program..Please let me know what could be the problem... ] (run-main) org.apache.spark.SparkEx

Regarding Partitioner

2014-04-16 Thread yh18190
Hi,, I have large dataset of elemenst[RDD] and i want to divide it into two exactly equal sized partitions maintaining order of elements.I tried using RangePartitioner like var data= partitionedFile.partitionBy(new RangePartitioner(2, partitionedFile)). This doesnt give satisfactory results beco

Job failed: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-05-12 Thread yh18190
Hi, I am facing above exception when I am trying to apply a method(ComputeDwt) on RDD[(Int,ArrayBuffer[(Int,Double)])] input. I am even using extends Serialization option to serialize objects in spark.Here is the code snippet. Could anyone suggest me what could be the problem and what should be d

Is their a way to Create SparkContext object?

2014-05-12 Thread yh18190
Hi, Could anyone suggest an idea how can we create sparkContext object in other classes or fucntions where we need to convert a scala collection to RDD using sc object.like sc.makeRDD(list).instead of using Main class sparkcontext object? is their a way to pass sc object as a parameter to functio

Re: Is their a way to Create SparkContext object?

2014-05-13 Thread yh18190
Thanks Mateh Zahria.Can i pass it as a parameter as part of closures. for example RDD.map(t=>compute(sc,t._2)) can I use sc inside map function?Pls let me know -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-their-a-way-to-Create-SparkContext-object-tp56