Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Anwar Rizal
Maybe connected component is what you need ? On Oct 5, 2015 19:02, "Robineast" wrote: > GraphX has a Shortest Paths algorithm implementation which will tell you, > for > all vertices in the graph, the shortest distance to a specific ('landmark') > vertex. The returned value is '/a graph where eac

Re: Spark streaming alerting

2015-03-24 Thread Anwar Rizal
Helena, The CassandraInputDStream sounds interesting. I dont find many things in the jira though. Do you have more details on what it tries to achieve ? Thanks, Anwar. On Tue, Mar 24, 2015 at 2:39 PM, Helena Edelson wrote: > Streaming _from_ cassandra, CassandraInputDStream, is coming BTW > ht

Re: propogating edges

2015-01-11 Thread Anwar Rizal
It looks like to be similar (simpler) to the connected component implementation in GraphX. Have you checked that ? I have questions though, in your example, the graph is a tree. What is the behavior if it is a more general graph ? Cheers, Anwar Rizal. On Mon, Jan 12, 2015 at 1:02 AM, dizzy5112

Re: Find the file info of when load the data into RDD

2014-12-21 Thread Anwar Rizal
any reason why mapPartitionsWithInputSplit has DeveloperApi annotation ? Is it possible to remove ? Best regards, Anwar Rizal. On Sun, Dec 21, 2014 at 10:47 PM, Shuai Zheng wrote: > I just found a possible answer: > > > http://themodernlife.github.io/scala/spark/hadoop/hdfs/2014/09/2

Re: spark streaming questions

2014-06-17 Thread Anwar Rizal
On Tue, Jun 17, 2014 at 5:39 PM, Chen Song wrote: > Hey > > I am new to spark streaming and apologize if these questions have been > asked. > > * In StreamingContext, reduceByKey() seems to only work on the RDDs of the > current batch interval, not including RDDs of previous batches. Is my > unde

Re: ClassCastException when using saveAsTextFile

2014-06-05 Thread Anwar Rizal
Hi Niko, I execute the script in 0.9/CDH5 using spark-shell , and it does not generate ClassCastException. Which version are you using and can you give more stack trace ? Cheers, a. On Tue, Mar 25, 2014 at 7:55 PM, Niko Stahl wrote: > Ok, so I've been able to narrow down the problem to this

Re: sc.textFileGroupByPath("*/*.txt")

2014-06-01 Thread Anwar Rizal
I presume that you need to have access to the path of each file you are reading. I don't know whether there is a good way to do that for HDFS, I need to read the files myself, something like: def openWithPath(inputPath: String, sc:SparkContext) = { val fs= (new Path(inputPath)).getFile

Re: Selecting first ten values in a RDD/partition

2014-05-29 Thread Anwar Rizal
Can you clarify what you're trying to achieve here ? If you want to take only top 10 of each RDD, why don't sort followed by take(10) of every RDD ? Or, you want to take top 10 of five minutes ? Cheers, On Thu, May 29, 2014 at 2:04 PM, nilmish wrote: > I have a DSTREAM which consists of RDD