Hi,
I am trying to cluster words of some articles. I used TFIDF and Word2Vec in
Spark to get the vector for each word and I used KMeans to cluster the
words. Now, is there any way to get back the words from the vectors? I want
to know what words are there in each cluster.
I am aware that TFIDF does
serve
> more meaningful trends and speedups.
>
> Joseph
>
> On Sat, Feb 28, 2015 at 7:26 AM, Deep Pradhan
> wrote:
>
>> Hi,
>> I am running Spark applications in GCE. I set up cluster with different
>> number of nodes varying from 1 to 7.
Hi,
I am running Spark applications in GCE. I set up cluster with different
number of nodes varying from 1 to 7. The machines are single core machines.
I set the spark.default.parallelism to the number of nodes in the cluster
for each cluster. I ran the four applications available in Spark Examples
Hi,
I have a four single core machines as slaves in my cluster. I set the
spark.default.parallelism to 4 and ran SparkTC given in examples. It took
around 26 sec.
Now, I increased the spark.default.parallelism to 8, but my performance
deteriorates. The same application takes 32 sec now.
I have read
What should be the expected performance of Spark Applications with the
increase in the number of nodes in a cluster, other parameters being
constant?
Thank You
Regards,
Deep
Has KNN classification algorithm been implemented on MLlib?
Thank You
Regards,
Deep
ng purposes.
> :)
>
> Thanks
> Best Regards
>
> On Tue, Feb 24, 2015 at 8:25 PM, Deep Pradhan
> wrote:
>
>> Hi,
>> I have just signed up for Amazon AWS because I learnt that it provides
>> service for free for the first 12 months.
>> I want to run Spark on EC2 cluster. Will they charge me for this?
>>
>> Thank You
>>
>
>
ndalone mode on a
> cluster, you can find more details here:
> https://spark.apache.org/docs/latest/spark-standalone.html
>
> Cheers
> Gen
>
>
> On Tue, Feb 24, 2015 at 4:07 PM, Deep Pradhan
> wrote:
>
>> Kindly bear with my questions as I am new to this.
>>
paying the ~$0.07/hour to play with an
> m3.medium, which ought to be pretty OK for basic experimentation.
>
> On Tue, Feb 24, 2015 at 3:14 PM, Deep Pradhan
> wrote:
> > Thank You Sean.
> > I was just trying to experiment with the performance of Spark
> Applications
>
that is at all CPU
> intensive. It's for, say, running a low-traffic web service.
>
> On Tue, Feb 24, 2015 at 2:55 PM, Deep Pradhan
> wrote:
> > Hi,
> > I have just signed up for Amazon AWS because I learnt that it provides
> > service for free for the first 12 months.
> > I want to run Spark on EC2 cluster. Will they charge me for this?
> >
> > Thank You
>
e types of machine that you
> launched, but not on the utilisation of machine.
>
> Hope it would help.
>
> Cheers
> Gen
>
>
> On Tue, Feb 24, 2015 at 3:55 PM, Deep Pradhan
> wrote:
>
>> Hi,
>> I have just signed up for Amazon AWS because I learnt that it prov
Hi,
I have just signed up for Amazon AWS because I learnt that it provides
service for free for the first 12 months.
I want to run Spark on EC2 cluster. Will they charge me for this?
Thank You
Here, I wanted to ask a different thing though.
Let me put it this way.
What is the relationship between the performance of a Spark Job and the
number of cores in the standalone Spark single node cluster.
Thank You
On Tue, Feb 24, 2015 at 8:39 AM, Deep Pradhan
wrote:
> You m
ally over subscribe this. So if you have 10 free CPU cores,
> set num_cores to 20.
>
>
> On Monday, February 23, 2015, Deep Pradhan
> wrote:
>
>> How is task slot different from # of Workers?
>>
>>
>> >> so don't read into any performance metrics you
t to the total # of task slots in the Executors.
>
> If you're running on a single node, shuffle operations become almost free
> (because there's no network movement), so don't read into any performance
> metrics you've collected to extrapolate what may happen at scale.
&
Hi,
If I repartition my data by a factor equal to the number of worker
instances, will the performance be better or worse?
As far as I understand, the performance should be better, but in my case it
is becoming worse.
I have a single node standalone cluster, is it because of this?
Am I guaranteed t
Has anyone done any work on that?
On Sun, Feb 22, 2015 at 9:57 AM, Deep Pradhan
wrote:
> Yes, exactly.
>
> On Sun, Feb 22, 2015 at 9:10 AM, Ognen Duzlevski <
> ognen.duzlev...@gmail.com> wrote:
>
>> On Sat, Feb 21, 2015 at 8:54 AM, Deep Pradhan
>> wrote:
>&g
the same way?
Thank You
On Sun, Feb 22, 2015 at 10:02 AM, Deep Pradhan
wrote:
> >> So increasing Executors without increasing physical resources
> If I have a 16 GB RAM system and then I allocate 1 GB for each executor,
> and give number of executors as 8, then I am increasing the
close. The actual observed improvement is very algorithm-dependent,
> though; for instance, some ML algorithms become hard to scale out past a
> certain point because the increase in communication overhead outweighs the
> increase in parallelism.
>
> On Sat, Feb 21, 2015 at 8:1
Yes, exactly.
On Sun, Feb 22, 2015 at 9:10 AM, Ognen Duzlevski
wrote:
> On Sat, Feb 21, 2015 at 8:54 AM, Deep Pradhan
> wrote:
>
>> No, I am talking about some work parallel to prediction works that are
>> done on GPUs. Like say, given the data for smaller number of nodes
So, if I keep the number of instances constant and increase the degree of
parallelism in steps, can I expect the performance to increase?
Thank You
On Sat, Feb 21, 2015 at 9:07 PM, Deep Pradhan
wrote:
> So, with the increase in the number of worker instances, if I also
> increase the deg
in performance, right?
Thank You
On Sat, Feb 21, 2015 at 8:52 PM, Deep Pradhan
wrote:
> Yes, I have decreased the executor memory.
> But,if I have to do this, then I have to tweak around with the code
> corresponding to each configuration right?
>
> On Sat, Feb 21, 2015 at 8:4
?
>
> Bottom line, you wouldn't use multiple workers on one small standalone
> node. This isn't a good way to estimate performance on a distributed
> cluster either.
>
> On Sat, Feb 21, 2015 at 3:11 PM, Deep Pradhan
> wrote:
> > No, I just have a single node stan
e you to pay more overhead of managing so many small
> >> tasks, for no speed up in execution time.
> >>
> >> Can you provide any more specifics though? you haven't said what
> >> you're running, what mode, how many workers, how long it takes, etc.
> >
like, without having the 10 nodes cluster, I can know the behavior of
the application in 10 nodes cluster by having a single node with 10
workers. The time taken may vary but I am talking about the behavior. Can
we say that?
On Sat, Feb 21, 2015 at 8:21 PM, Deep Pradhan
wrote:
> Yes, I am talk
8:22 PM, Ted Yu wrote:
> Can you be a bit more specific ?
>
> Are you asking about performance across Spark releases ?
>
> Cheers
>
> On Sat, Feb 21, 2015 at 6:38 AM, Deep Pradhan
> wrote:
>
>> Hi,
>> Has some performance prediction work been done on Spark?
>>
>> Thank You
>>
>>
>
takes, etc.
>
> On Sat, Feb 21, 2015 at 2:37 PM, Deep Pradhan
> wrote:
> > Hi,
> > I have been running some jobs in my local single node stand alone
> cluster. I
> > am varying the worker instances for the same job, and the time taken for
> the
> > job to comp
Hi,
Has some performance prediction work been done on Spark?
Thank You
Hi,
I have been running some jobs in my local single node stand alone cluster.
I am varying the worker instances for the same job, and the time taken for
the job to complete increases with increase in the number of workers. I
repeated some experiments varying the number of nodes in a cluster too an
o. Your executor probably takes as many threads as
> cores in both cases, 4.
>
>
> On Sat, Feb 7, 2015 at 10:14 AM, Deep Pradhan
> wrote:
> > Hi,
> > I am using YourKit tool to profile Spark jobs that is run in my Single
> Node
> > Spark Cluster.
> &
Hi,
I am using YourKit tool to profile Spark jobs that is run in my Single Node
Spark Cluster.
When I see the YourKit UI Performance Charts, the thread count always
remains at
All threads: 34
Daemon threads: 32
Here are my questions:
1. My system can run only 4 threads simultaneously, and obvious
Hi,
Is the implementation of All Pairs Shortest Path on GraphX for directed
graphs or undirected graph? When I use the algorithm with dataset, it
assumes that the graph is undirected.
Has anyone come across that earlier?
Thank you
Hi,
When we submit a PR in Github, there are various tests that are performed
like RAT test, Scala Style Test, and beyond this many other tests which run
for more time.
Could anyone please direct me to the details of the tests that are
performed there?
Thank You
I have a single node Spark standalone cluster. Will this also work for my
cluster?
Thank You
On Fri, Feb 6, 2015 at 11:02 AM, Mark Hamstra
wrote:
>
> https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit
>
> On Thu, Feb 5, 2015 at 9:18 PM,
job is slow? Gatling seems to be a load generating framework so I'm not
> sure how you'd use it (i've never used it before). Spark runs on the JVM so
> you can use any JVM profiling tools like YourKit.
>
> Kostas
>
> On Thu, Feb 5, 2015 at 9:03 PM, Deep Pradhan
>
I read somewhere about Gatling. Can that be used to profile Spark jobs?
On Fri, Feb 6, 2015 at 10:27 AM, Kostas Sakellis
wrote:
> Which Spark Job server are you talking about?
>
> On Thu, Feb 5, 2015 at 8:28 PM, Deep Pradhan
> wrote:
>
>> Hi,
>> Can Spark Job Server
Hi,
Can Spark Job Server be used for profiling Spark jobs?
wrote:
> Hi Deep,
>
> What is your configuration and what is the size of the 2 data sets?
>
> Thanks
> Arush
>
> On Mon, Feb 2, 2015 at 11:56 AM, Deep Pradhan
> wrote:
>
>> I did not check the console because once the job starts I cannot run
>> anything els
, 2015 at 11:53 AM, Jerry Lam wrote:
> Hi Deep,
>
> How do you know the cluster is not responsive because of "Union"?
> Did you check the spark web console?
>
> Best Regards,
>
> Jerry
>
>
> On Mon, Feb 2, 2015 at 1:21 AM, Deep Pradhan
> wrote:
>
>
The cluster hangs.
On Mon, Feb 2, 2015 at 11:25 AM, Jerry Lam wrote:
> Hi Deep,
>
> what do you mean by stuck?
>
> Jerry
>
>
> On Mon, Feb 2, 2015 at 12:44 AM, Deep Pradhan
> wrote:
>
>> Hi,
>> Is there any better operation than Union. I am using unio
Hi,
Is there any better operation than Union. I am using union and the cluster
is getting stuck with a large data set.
Thank you
Hi All,
Gordon SC has Spark installed in it. Has anyone tried to run Spark jobs on
Gordon?
Thank You
Hi,
Is there a better programming construct than while loop in Spark?
Thank You
gt; On Mon, Jan 19, 2015 at 8:33 PM, Deep Pradhan
> wrote:
>
>> Hi Ted,
>> When I am running the same job with small data, I am able to run. But
>> when I run it with relatively bigger set of data, it is giving me
>> OutOfMemoryError: GC overhead limit exceeded.
>
: stopped
o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
15/01/17 14:33:39 INFO ContextHandler: stopped
o.e.j.s.ServletContextHandler{/,null}
15/01/17 14:33:39 INFO ContextHandler: stopped
o.e.j.s.ServletContextHandler{/static,null}
..
On Tue, Jan 20, 2015 at 9:52 AM, Deep Pradhan
wrote
I had the Spark Shell running through out. Is it because of that?
On Tue, Jan 20, 2015 at 9:47 AM, Ted Yu wrote:
> Was there another instance of Spark running on the same machine ?
>
> Can you pastebin the full stack trace ?
>
> Cheers
>
> On Mon, Jan 19, 2015 at 8:11 PM,
Hi,
I am running a Spark job. I get the output correctly but when I see the
logs file I see the following:
AbstractLifeCycle: FAILED.: java.net.BindException: Address already in
use...
What could be the reason for this?
Thank You
The error in the log file says:
*java.lang.OutOfMemoryError: GC overhead limit exceeded*
with certain task ID and the error repeats for further task IDs.
What could be the problem?
On Sun, Jan 18, 2015 at 2:45 PM, Deep Pradhan
wrote:
> Updating the Spark version means setting up the ent
n 17, 2015 at 2:40 PM, Deep Pradhan
> wrote:
>
>> Hi,
>> I am using Spark-1.0.0 in a single node cluster. When I run a job with
>> small data set it runs perfectly but when I use a data set of 350 KB, no
>> output is being produced and when I try to run it the second t
Hi,
I am using Spark-1.0.0 in a single node cluster. When I run a job with
small data set it runs perfectly but when I use a data set of 350 KB, no
output is being produced and when I try to run it the second time it is
giving me an exception telling that SparkContext was shut down.
Can anyone help
This gives me two pair RDDs, one is the edgesRDD and another is verticesRDD
with each vertex padded with value null. But I have to take a three way
join of these two RDD and I have only one common attribute in these two
RDDs. How can I go about doing the three join?
Hi,
I have two RDDs, vertices and edges. Vertices is an RDD and edges is a pair
RDD. I want to take three way join of these two. Joins work only when both
the RDDs are pair RDDS right? So, how am I supposed to take a three way
join of these RDDs?
Thank You
Hi,
I have two RDDs, vertices and edges. Vertices is an RDD and edges is a pair
RDD. I want to take three way join of these two. Joins work only when both
the RDDs are pair RDDS right? So, how am I supposed to take a three way
join of these RDDs?
Thank You
Is there any tool to profile GraphX codes in a cluster? Is there a way to
know the messages exchanged among the nodes in a cluster?
WebUI does not give all the information.
Thank You
Hi,
I have a graph and I want to create RDDs equal in number to the nodes in
the graph. How can I do that?
If I have 10 nodes then I want to create 10 rdds. Is that possible in
GraphX?
Like in C language we have array of pointers. Do we have array of RDDs in
Spark.
Can we create such an array and t
Dave wrote:
> At 2014-12-03 02:13:49 -0800, Deep Pradhan
> wrote:
> > We cannot do sc.parallelize(List(VertexRDD)), can we?
>
> There's no need to do this, because every VertexRDD is also a pair RDD:
>
> class VertexRDD[VD] extends RDD[(VertexId, VD)]
>
> Y
case?
We cannot do *sc.parallelize(List(VertexRDD)), *can we?
On Wed, Dec 3, 2014 at 3:32 PM, Ankur Dave wrote:
> At 2014-12-02 22:01:20 -0800, Deep Pradhan
> wrote:
> > I have a graph which returns the following on doing graph.vertices
> > (1, 1.0)
> > (2, 1.0)
> >
And one more thing, the given tupes
(1, 1.0)
(2, 1.0)
(3, 2.0)
(4, 2.0)
(5, 0.0)
are a part of RDD and they are not just tuples.
graph.vertices return me the above tuples which is a part of VertexRDD.
On Wed, Dec 3, 2014 at 3:43 PM, Deep Pradhan
wrote:
> This is just an example but if
Hi,
I have a graph which returns the following on doing graph.vertices
(1, 1.0)
(2, 1.0)
(3, 2.0)
(4, 2.0)
(5, 0.0)
Here 5 is the root node of the graph. This is a VertexRDD. I want to group
all the vertices with the same attribute together, like into one RDD or
something. I want all the vertices
Hi,
I was just going through the two codes in GraphX namely SVDPlusPlus and
TriangleCount. In the first I see an RDD as an input to run ie, run(edges:
RDD[Edge[Double]],...) and in the other I see run(VD:..., ED:...)
Can anyone explain me the difference between these two? Infact SVDPlusPlus
is the
Hi,
I was going through this paper on Pregel titled, "Pregel: A System for
Large-Scale Graph Processing". In the second section named Model Of
Computation, it says that the input to a Pregel computation is a directed
graph.
Is it the same in the Pregel abstraction of GraphX too? Do we always ha
Could it be because my edge list file is in the form (1 2), where there
is an edge between node 1 and node 2?
On Tue, Nov 18, 2014 at 4:13 PM, Ankur Dave wrote:
> At 2014-11-18 15:51:52 +0530, Deep Pradhan
> wrote:
> > Yes the above command works, but there is this problem.
Hi,
Is it necessary for every vertex to have an attribute when we load a graph
to GraphX?
In other words, if I have an edge list file containing pairs of vertices
i.e., <1 2> means that there is an edge between node 1 and node 2. Now,
when I run PageRank on this data it return a NaN.
Can I use th
Yes the above command works, but there is this problem. Most of the times,
the total rank is Nan (Not a Number). Why is it so?
Thank You
On Tue, Nov 18, 2014 at 3:48 PM, Deep Pradhan
wrote:
> What command should I use to run the LiveJournalPageRank.scala?
>
> > If you want to writ
Tue, Nov 18, 2014 at 3:35 PM, Ankur Dave wrote:
> At 2014-11-18 14:51:54 +0530, Deep Pradhan
> wrote:
> > I am using Spark-1.0.0. There are two GraphX directories that I can see
> here
> >
> > 1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/gra
I meant to ask whether it gives the solution faster than other algorithms.
What do you mean by distributed algorithms? Can we not use any algorithm on
a distributed environment?
Thank You
On Tue, Nov 18, 2014 at 3:41 PM, Ankur Dave wrote:
> At 2014-11-18 15:29:08 +0530, Deep Pradhan
>
=EdgePartition2D*
Now, how do I run the LiveJournalPageRank.scala that is there in 1?
On Tue, Nov 18, 2014 at 2:51 PM, Deep Pradhan
wrote:
> Hi,
> I am using Spark-1.0.0. There are two GraphX directories that I can see
> here
>
> 1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/
Does Bellman-Ford give the best solution?
On Tue, Nov 18, 2014 at 3:27 PM, Ankur Dave wrote:
> At 2014-11-18 14:59:20 +0530, Deep Pradhan
> wrote:
> > So "landmark" can contain just one vertex right?
>
> Right.
>
> > Which algorithm has been used t
There are no vertices of zero outdegree.
The total rank for the graph with numIter = 10 is 4.99 and for the graph
with numIter = 100 is 5.99
I do not know why so much variation.
On Tue, Nov 18, 2014 at 3:22 PM, Ankur Dave wrote:
> At 2014-11-18 12:02:52 +0530, Deep Pradhan
> wrote:
>
So "landmark" can contain just one vertex right?
Which algorithm has been used to compute the shortest path?
Thank You
On Tue, Nov 18, 2014 at 2:53 PM, Ankur Dave wrote:
> At 2014-11-17 14:47:50 +0530, Deep Pradhan
> wrote:
> > I was going through the graphx secti
Hi,
I am using Spark-1.0.0. There are two GraphX directories that I can see here
1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx
which contains LiveJournalPageRank,scala
2. spark-1.0.0/graphx/src/main/scala/org/apache/sprak/graphx/lib which
contains Analy
Hi,
I just ran the PageRank code in GraphX with some sample data. What I am
seeing is that the total rank changes drastically if I change the number of
iterations from 10 to 100. Why is that so?
Thank You
Hi,
I was going through the graphx section in the Spark API in
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.lib.ShortestPaths$
Here, I find the word "landmark". Can anyone explain to me what is landmark
means. Is it a simple English word or does it mean somethi
Hi,
Is there any way to know which of my functions perform better in Spark? In
other words, say I have achieved same thing using two different
implementations. How do I judge as to which implementation is better than
the other. Is processing time the only metric that we can use to claim the
goodnes
. Is there another way to do this?
Thank you
On Fri, Nov 14, 2014 at 3:39 PM, Deep Pradhan
wrote:
> How to create an empty RDD in Spark?
>
> Thank You
>
How to create an empty RDD in Spark?
Thank You
; method of an existing RDD if you have one.
>
> - Patrick
>
> On Thu, Nov 13, 2014 at 10:21 PM, Deep Pradhan
> wrote:
> > Hi,
> >
> > I am using Spark 1.0.0 and Scala 2.10.3.
> >
> > I want to use toLocalIterator in a code but the spark shell tells
Hi,
I am using Spark 1.0.0 and Scala 2.10.3.
I want to use toLocalIterator in a code but the spark shell tells
*not found: value toLocalIterator*
I also did import org.apache.spark.rdd but even after this the shell tells
*object toLocalIterator is not a member of package org.apache.spark.rdd*
Hi,
Can we pass RDD to functions?
Like, can we do the following?
*def func (temp: RDD[String]):RDD[String] = {*
*//body of the function*
*}*
Thank You
Has anyone implemented Queues using RDDs?
Thank You
Hi,
The collect method returns an Array. If I have a huge set of data and I do
something like the following:
*val rdd2 = rdd1.mapValues(v => 0).collect *//where rdd1 is some key-value
pair RDD
As per my understanding, this will return an array(String, Int) and if my
data is huge this will return
Hi,
Can Spark achieve whatever GraphX can?
Keeping aside the performance comparison between Spark and GraphX, if I
want to implement any graph algorithm and I do not want to use GraphX, can
I get the work done with Spark?
Than You
Can we iterate over RDD of Iterable[String]? How do we do that?
Because the entire Iterable[String] seems to be a single element in the RDD.
Thank You
Hi,
Is it always possible to get one RDD from another.
For example, if I do a *top(K)(Ordering)*, I get an Int right? (In my
example the type is Int). I do not get an RDD.
Can anyone explain this to me?
Thank You
Hi,
I want to make the following changes in the RDD (create new RDD from the
existing to reflect some transformation):
In an RDD of key-value pair, I want to get the keys for which the values
are 1.
How to do this using map()?
Thank You
Hi,
We all know that RDDs are immutable.
There are not enough operations that can achieve anything and everything on
RDDs.
Take for example this:
I want an Array of Bytes filled with zeros which during the program should
change. Some elements of that Array should change to 1.
If I make an RDD with
)*
*val rootNode = nodeSizeTuple.top(1)(Ordering.by(f => f._2))*
The nodeSizeTuple is an RDD,but rootNode is an array. Here I have used all
RDD operations, but I am getting an array.
What about this case?
On Sat, Sep 13, 2014 at 11:45 AM, Deep Pradhan
wrote:
> Is it always true that whenever we apply op
2/technical-sessions/presentation/zaharia
>
> On Sat, Sep 13, 2014 at 12:06 AM, Deep Pradhan
> wrote:
>
>> Take for example this:
>> I have declared one queue *val queue = Queue.empty[Int]*, which is a
>> pure scala line in the program. I actually want the queue to be a
ark is an application as far as
> scala is concerned - there is no compilation (except of course, the scala,
> JIT compilation etc).
>
> On Fri, Sep 12, 2014 at 8:04 PM, Deep Pradhan
> wrote:
>
>> I know that unpersist is a method on RDD.
>> But my confusion is that, w
e abstractions introduced by Spark.
>
> An Int is just a Scala Int. You can't call unpersist on Int in Scala, and
> that doesn't change in Spark.
>
> On Fri, Sep 12, 2014 at 12:33 PM, Deep Pradhan
> wrote:
>
>> There is one thing that I am confused about.
>> Spar
There is one thing that I am confused about.
Spark has codes that have been implemented in Scala. Now, can we run any
Scala code on the Spark framework? What will be the difference in the
execution of the scala code in normal systems and on Spark?
The reason for my question is the following:
I had
Best Regards
>
> On Thu, Sep 11, 2014 at 3:26 PM, Deep Pradhan
> wrote:
>
>> I want to create a temporary variables in a spark code.
>> Can I do this?
>>
>> for (i <- num)
>> {
>> val temp = ..
>>{
>>do something
>>}
>> temp.unpersist()
>> }
>>
>> Thank You
>>
>
>
I want to create a temporary variables in a spark code.
Can I do this?
for (i <- num)
{
val temp = ..
{
do something
}
temp.unpersist()
}
Thank You
Hi,
I have "s" as an Iterable of String.
I also have "arr" as an array of bytes. I want to set the 's' position of
the array 'arr' to 1.
In short, I want to do
arr(s) = 1 // algorithmic notation
I tried the above but I am getting type mismatch error
How should I do this?
Thank You
Hi,
I have an array of bytes and I have filled the array with 0 in all the
postitions.
*var Array = Array.fill[Byte](10)(0)*
Now, if certain conditions are satisfied, I want to change some elements of
the array to 1 instead of 0. If I run,
*if (Array.apply(index)==0) Array.apply(index) = 1*
it
Hi,
I have an input file which consists of
I have created and RDD consisting of key-value pair where key is the node
id and the values are the children of that node.
Now I want to associate a byte with each node. For that I have created a
byte array.
Every time I print out the key-value pair in th
Hi,
Does Spark support recursive calls?
Hi,
I have the following ArrayBuffer
*ArrayBuffer(5,3,1,4)*
Now, I want to iterate over the ArrayBuffer.
What is the way to do it?
Thank You
gt; val a = ArrayBuffer(5,3,1,4)
> a: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(5, 3, 1, 4)
>
> scala> a.head
> res2: Int = 5
>
> scala> a.tail
> res3: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(3, 1, 4)
>
> scala> a.length
> res4: Int = 4
>
&g
Hi,
I have the following ArrayBuffer:
*ArrayBuffer(5,3,1,4)*
Now, I want to get the number of elements in this ArrayBuffer and also the
first element of the ArrayBuffer. I used .length and .size but they are
returning 1 instead of 4.
I also used .head and .last for getting the first and the last
1 - 100 of 119 matches
Mail list logo