Maybe connected component is what you need ?
On Oct 5, 2015 19:02, "Robineast" wrote:
> GraphX has a Shortest Paths algorithm implementation which will tell you,
> for
> all vertices in the graph, the shortest distance to a specific ('landmark')
> vertex. The returned value is '/a graph where eac
Helena,
The CassandraInputDStream sounds interesting. I dont find many things in
the jira though. Do you have more details on what it tries to achieve ?
Thanks,
Anwar.
On Tue, Mar 24, 2015 at 2:39 PM, Helena Edelson wrote:
> Streaming _from_ cassandra, CassandraInputDStream, is coming BTW
> ht
It looks like to be similar (simpler) to the connected component
implementation in GraphX.
Have you checked that ?
I have questions though, in your example, the graph is a tree. What is the
behavior if it is a more general graph ?
Cheers,
Anwar Rizal.
On Mon, Jan 12, 2015 at 1:02 AM, dizzy5112
any reason why mapPartitionsWithInputSplit has DeveloperApi
annotation ? Is it possible to remove ?
Best regards,
Anwar Rizal.
On Sun, Dec 21, 2014 at 10:47 PM, Shuai Zheng wrote:
> I just found a possible answer:
>
>
> http://themodernlife.github.io/scala/spark/hadoop/hdfs/2014/09/2
On Tue, Jun 17, 2014 at 5:39 PM, Chen Song wrote:
> Hey
>
> I am new to spark streaming and apologize if these questions have been
> asked.
>
> * In StreamingContext, reduceByKey() seems to only work on the RDDs of the
> current batch interval, not including RDDs of previous batches. Is my
> unde
Hi Niko,
I execute the script in 0.9/CDH5 using spark-shell , and it does not
generate ClassCastException. Which version are you using and can you give
more stack trace ?
Cheers,
a.
On Tue, Mar 25, 2014 at 7:55 PM, Niko Stahl wrote:
> Ok, so I've been able to narrow down the problem to this
I presume that you need to have access to the path of each file you are
reading.
I don't know whether there is a good way to do that for HDFS, I need to
read the files myself, something like:
def openWithPath(inputPath: String, sc:SparkContext) = {
val fs= (new
Path(inputPath)).getFile
Can you clarify what you're trying to achieve here ?
If you want to take only top 10 of each RDD, why don't sort followed by
take(10) of every RDD ?
Or, you want to take top 10 of five minutes ?
Cheers,
On Thu, May 29, 2014 at 2:04 PM, nilmish wrote:
> I have a DSTREAM which consists of RDD