You could implement the receiver as a Spark Streaming Receiver <https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers>; the data received would be available for any streaming applications which operate on DStreams (e.g. Streaming KMeans <https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means> ).
On Tue, Jul 14, 2015 at 8:31 AM, Oded Maimon <o...@scene53.com> wrote: > Hi, > Thanks for all the help. > I'm still missing something very basic. > > If I wont use sparkR, which doesn't support streaming (will use mlib > instead as Debasish suggested), and I have my scala receiver working, how > the receiver should save the data in memory? I do see the store method, so > if i use it, how can i read the data from a different spark scala/java > application? how do i find/query this data? > > > Regards, > Oded Maimon > Scene53. > > On Tue, Jul 14, 2015 at 12:35 AM, Feynman Liang <fli...@databricks.com> > wrote: > >> Sorry; I think I may have used poor wording. SparkR will let you use R to >> analyze the data, but it has to be loaded into memory using SparkR (see >> SparkR >> DataSources >> <http://people.apache.org/~pwendell/spark-releases/latest/sparkr.html>). >> You will still have to write a Java receiver to store the data into some >> tabular datastore (e.g. Hive) before loading them as SparkR DataFrames and >> performing the analysis. >> >> R specific questions such as windowing in R should go to R-help@; you >> won't be able to use window since that is a Spark Streaming method. >> >> On Mon, Jul 13, 2015 at 2:23 PM, Oded Maimon <o...@scene53.com> wrote: >> >>> You are helping me understanding stuff here a lot. >>> >>> I believe I have 3 last questions.. >>> >>> If is use java receiver to get the data, how should I save it in memory? >>> Using store command or other command? >>> >>> Once stored, how R can read that data? >>> >>> Can I use window command in R? I guess not because it is a streaming >>> command, right? Any other way to window the data? >>> >>> Sent from IPhone >>> >>> >>> >>> >>> On Mon, Jul 13, 2015 at 2:07 PM -0700, "Feynman Liang" < >>> fli...@databricks.com> wrote: >>> >>> If you use SparkR then you can analyze the data that's currently in >>>> memory with R; otherwise you will have to write to disk (eg HDFS). >>>> >>>> On Mon, Jul 13, 2015 at 1:45 PM, Oded Maimon <o...@scene53.com> wrote: >>>> >>>>> Thanks again. >>>>> What I'm missing is where can I store the data? Can I store it in >>>>> spark memory and then use R to analyze it? Or should I use hdfs? Any other >>>>> places that I can save the data? >>>>> >>>>> What would you suggest? >>>>> >>>>> Thanks... >>>>> >>>>> Sent from IPhone >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Jul 13, 2015 at 1:41 PM -0700, "Feynman Liang" < >>>>> fli...@databricks.com> wrote: >>>>> >>>>> If you don't require true streaming processing and need to use R for >>>>>> analysis, SparkR on a custom data source seems to fit your use case. >>>>>> >>>>>> On Mon, Jul 13, 2015 at 1:06 PM, Oded Maimon <o...@scene53.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, thanks for replying! >>>>>>> I want to do the entire process in stages. Get the data using Java >>>>>>> or scala because they are the only Langs that supports custom receivers, >>>>>>> keep the data <somewhere>, use R to analyze it, keep the results >>>>>>> <somewhere>, output the data to different systems. >>>>>>> >>>>>>> I thought that <somewhere> can be spark memory using rdd or >>>>>>> dstreams.. But could it be that I need to keep it in hdfs to make the >>>>>>> entire process in stages? >>>>>>> >>>>>>> Sent from IPhone >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Jul 13, 2015 at 12:07 PM -0700, "Feynman Liang" < >>>>>>> fli...@databricks.com> wrote: >>>>>>> >>>>>>> Hi Oded, >>>>>>>> >>>>>>>> I'm not sure I completely understand your question, but it sounds >>>>>>>> like you could have the READER receiver produce a DStream which is >>>>>>>> windowed/processed in Spark Streaming and forEachRDD to do the OUTPUT. >>>>>>>> However, streaming in SparkR is not currently supported (SPARK-6803 >>>>>>>> <https://issues.apache.org/jira/browse/SPARK-6803>) so I'm not too >>>>>>>> sure how ANALYZER would fit in. >>>>>>>> >>>>>>>> Feynman >>>>>>>> >>>>>>>> On Sun, Jul 12, 2015 at 11:23 PM, Oded Maimon <o...@scene53.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> any help / idea will be appreciated :) >>>>>>>>> thanks >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Oded Maimon >>>>>>>>> Scene53. >>>>>>>>> >>>>>>>>> On Sun, Jul 12, 2015 at 4:49 PM, Oded Maimon <o...@scene53.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> we are evaluating spark for real-time analytic. what we are >>>>>>>>>> trying to do is the following: >>>>>>>>>> >>>>>>>>>> - READER APP- use custom receiver to get data from rabbitmq >>>>>>>>>> (written in scala) >>>>>>>>>> - ANALYZER APP - use spark R application to read the data >>>>>>>>>> (windowed), analyze it every minute and save the results inside >>>>>>>>>> spark >>>>>>>>>> - OUTPUT APP - user spark application (scala/java/python) to >>>>>>>>>> read the results from R every X minutes and send the data to few >>>>>>>>>> external >>>>>>>>>> systems >>>>>>>>>> >>>>>>>>>> basically at the end i would like to have the READER COMPONENT as >>>>>>>>>> an app that always consumes the data and keeps it in spark, >>>>>>>>>> have as many ANALYZER COMPONENTS as my data scientists wants, and >>>>>>>>>> have one OUTPUT APP that will read the ANALYZER results and send it >>>>>>>>>> to any >>>>>>>>>> relevant system. >>>>>>>>>> >>>>>>>>>> what is the right way to do it? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Oded. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> *This email and any files transmitted with it are confidential and >>>>>>>>> intended solely for the use of the individual or entity to whom they >>>>>>>>> are >>>>>>>>> addressed. Please note that any disclosure, copying or distribution >>>>>>>>> of the >>>>>>>>> content of this information is strictly forbidden. If you have >>>>>>>>> received >>>>>>>>> this email message in error, please destroy it immediately and notify >>>>>>>>> its >>>>>>>>> sender.* >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> *This email and any files transmitted with it are confidential and >>>>>>> intended solely for the use of the individual or entity to whom they are >>>>>>> addressed. Please note that any disclosure, copying or distribution of >>>>>>> the >>>>>>> content of this information is strictly forbidden. If you have received >>>>>>> this email message in error, please destroy it immediately and notify >>>>>>> its >>>>>>> sender.* >>>>>>> >>>>>> >>>>>> >>>>> *This email and any files transmitted with it are confidential and >>>>> intended solely for the use of the individual or entity to whom they are >>>>> addressed. Please note that any disclosure, copying or distribution of the >>>>> content of this information is strictly forbidden. If you have received >>>>> this email message in error, please destroy it immediately and notify its >>>>> sender.* >>>>> >>>> >>>> >>> *This email and any files transmitted with it are confidential and >>> intended solely for the use of the individual or entity to whom they are >>> addressed. Please note that any disclosure, copying or distribution of the >>> content of this information is strictly forbidden. If you have received >>> this email message in error, please destroy it immediately and notify its >>> sender.* >>> >> >> > > *This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom they are > addressed. Please note that any disclosure, copying or distribution of the > content of this information is strictly forbidden. If you have received > this email message in error, please destroy it immediately and notify its > sender.* >