Hi.I am very much fascinated to Spark framework.I am trying to use Pyspark +
Beautifulsoup to parse HTML files.I am facing problems to load html file
into beautiful soup.
Example
filepath= file:///path to html directory
def readhtml(inputhtml):
{
soup=Beautifulsoup(inputhtml) //to load html content
Hi Guys,
I am currently playing with huge data.I have an RDD which returns
RDD[List[(tuples)]].I need only the tuples to be written to textfile output
using saveAsTextFile function.
example:val mod=modify.saveASTextFile() returns
List((20140813,4,141127,3,HYPHLJLU,HY,KNGHWEB,USD,144.00,662.
Hi Guys,
I just want to know whether their is any way to determine which file is
being handled by spark from a group of files input inside a
directory.Suppose I have 1000 files which are given as input,I want to
determine which file is being handled currently by spark program so that if
any error
Hi all,
I am currently working on pyspark for NLP processing etc.I am using TextBlob
python library.Normally in a standalone mode it easy to install the external
python libraries .In case of cluster mode I am facing problem to install
these libraries on worker nodes remotely.I cannot access each a
Hi David,
Thanks for the reply and effort u put to explain the concepts.Thanks for
example.It worked.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-ship-external-Python-libraries-in-PYSPARK-tp14074p15844.html
Sent from the Apache Spark User List
Hi ,
>I am new to Spark scala environment.Currently I am working on Discrete
wavelet transformation algos on time series data.
> I have to perform recursive additions on successive elements in RDDs.
> for example
> List of elements(RDDS) --a1 a2 a3 a4.
> level1 Tranformation --a1+a2 a3+a4 a
Hi,I have large data set of numbers ie RDD and wanted to perform a
computation only on groupof two values at a time.For
example1,2,3,4,5,6,7... is an RDDCan i group the RDD into
(1,2),(3,4),(5,6)...?? and perform the respective computations ?in an
efficient manner?As we do'nt have a way to index e
We need some one who can explain us with short code snippet on given example
so that we get clear cut idea on RDDs indexing..
Guys please help us
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp31
Hi,
Thanks Nanzhu.I tried to implement your suggestion on following scenario.I
have RDD of say 24 elements.In that when i partioned into two groups of 12
elements each.Their is loss of order of elements in partition.Elemest are
partitioned randomly.I need to preserve the order such that the first 1
Hi,
Here is my code for given scenario.Could you please let me know where to
sort?I mean on what basis we have to sort??so that they maintain order in
partition as thatof original sequence..
val res2=reduced_hccg.map(_._2)// which gives RDD of numbers
res2.foreach(println)
val result= res2.ma
Hi Andriana,
Thanks for suggestion.Could you please modify my code part where I need to
do so..I apologise for inconvinience ,becoz i am new to spark I coudnt apply
appropriately..i would be thankful to you.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/S
Hi Andriana,
Ofcourse u can sortbykey but after that when u perform mapparttion it doesnt
guarantee that 1st partition has all those eleement in order as of original
sequence..I think we need a partitioner such that it partitions the sequence
maintaining order...
Could anyone help me in defining
Hi,
I have an RDD of elements and want to create a new RDD by Zipping other RDD
in order.
result[RDD] with sequence of 10,20,30,40,50 ...elements.
I am facing problems as index is not an RDD...its gives an error...Could
anyone help me how we can zip it or map it inorder to obtain following
result.(
Thanks sonal.Is der anyother way like to map values with Increasing
indexes...so that i can map(t=>(i,t)) where value if 'i' increases after
each map operation on element...
Please help me ..in this aspect
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Zi
Hi,
I want to perform map operation on an RDD of elements such that resulting
RDD is a key value pair(counter,value)
For example var k:RDD[Int]=10,20,30,40,40,60...
k.map(t=>(i,t)) where 'i' value should be like a counter whose value
increments after each mapoperation...
Pleas help me..
I tried
Hi,
Can we convert directly scala collection to spark RDD data type without
using parellize method?
Is their any way to create custom converted RDD datatype from scala type
using some typecast like that?
Please suggest me
--
View this message in context:
http://apache-spark-user-list.1001
Hi Therry,
Thanks for the above responses..I implemented using RangePartitioner..we
need to use any of the custom partitioners in orderto perform this
task..Normally u cant maintain a counter becoz count operations should
beperformed on each partitioned block ofdata...
--
View this message in c
Hi Guys,
Currently I am facing this issue ..Not able to find erros..
here is sbt file.
name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.3"
resolvers += "bintray/meetup" at "http://dl.bintray.com/meetup/maven";
resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
r
Hi,
Thanks for response.Could you please look into my repo..Here Utils class is
the class.I cannot paste the entire code..Thaswhy..
I have other class from where I would be calling Utils class for object
creation..
package main.scala
import org.apache.spark.SparkContext
import org.apache.spark.S
Its working under local mode..but not under cluster mode with 4 slaves
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Need-suggestions-tp3650p3653.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi,
Here is the sparkcontext feature.Do I need to any more extra jars to slaves
separetely or this is enough?
But i am able to see this created jar in my target directory..
val sc = new SparkContext("spark://spark-master-001:7077", "Simple App",
utilclass.spark_home,
List("target/sc
Hi
Is it always needed that sparkcontext object be created in Main method of
class.Is it necessary?Can we create "sc" object in other class and try to
use it by passing this object through function and use it?
Please clarify..
--
View this message in context:
http://apache-spark-user-list.100
Hi,
I guess their is problem with spark 0.9 version because when I tried to add
external jar jerkson_2.9.1_0.5.0 version with scala version being 2.10.3 in
cluster.
I am facing java.classNodef error becoz this jars are not being sent to
worker nodes..
Please let me know how to resolve this issue,,
Hi,
I have a problem when i want to use spark kryoserializer by extending a
class Kryoregistarar to register custom classes inorder to create objects.I
am getting following exception When I run following program..Please let me
know what could be the problem...
] (run-main) org.apache.spark.SparkEx
Hi,,
I have large dataset of elemenst[RDD] and i want to divide it into two
exactly equal sized partitions maintaining order of elements.I tried using
RangePartitioner like var data= partitionedFile.partitionBy(new
RangePartitioner(2, partitionedFile)).
This doesnt give satisfactory results beco
Hi,
I am facing above exception when I am trying to apply a method(ComputeDwt)
on RDD[(Int,ArrayBuffer[(Int,Double)])] input.
I am even using extends Serialization option to serialize objects in
spark.Here is the code snippet.
Could anyone suggest me what could be the problem and what should be d
Hi,
Could anyone suggest an idea how can we create sparkContext object in other
classes or fucntions where we need to convert a scala collection to RDD
using sc object.like sc.makeRDD(list).instead of using Main class
sparkcontext object?
is their a way to pass sc object as a parameter to functio
Thanks Mateh Zahria.Can i pass it as a parameter as part of closures.
for example
RDD.map(t=>compute(sc,t._2))
can I use sc inside map function?Pls let me know
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-their-a-way-to-Create-SparkContext-object-tp56
28 matches
Mail list logo