Re: Few basic spark questions

Feynman Liang Tue, 14 Jul 2015 10:28:25 -0700

You could implement the receiver as a Spark Streaming Receiver
<https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers>;
the data received would be available for any streaming applications which
operate on DStreams (e.g. Streaming KMeans
<https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means>
).


On Tue, Jul 14, 2015 at 8:31 AM, Oded Maimon <o...@scene53.com> wrote:

> Hi,
> Thanks for all the help.
> I'm still missing something very basic.
>
> If I wont use sparkR, which doesn't support streaming (will use mlib
> instead as Debasish suggested), and I have my scala receiver working, how
> the receiver should save the data in memory? I do see the store method, so
> if i use it, how can i read the data from a different spark scala/java
> application? how do i find/query this data?
>
>
> Regards,
> Oded Maimon
> Scene53.
>
> On Tue, Jul 14, 2015 at 12:35 AM, Feynman Liang <fli...@databricks.com>
> wrote:
>
>> Sorry; I think I may have used poor wording. SparkR will let you use R to
>> analyze the data, but it has to be loaded into memory using SparkR (see 
>> SparkR
>> DataSources
>> <http://people.apache.org/~pwendell/spark-releases/latest/sparkr.html>).
>> You will still have to write a Java receiver to store the data into some
>> tabular datastore (e.g. Hive) before loading them as SparkR DataFrames and
>> performing the analysis.
>>
>> R specific questions such as windowing in R should go to R-help@; you
>> won't be able to use window since that is a Spark Streaming method.
>>
>> On Mon, Jul 13, 2015 at 2:23 PM, Oded Maimon <o...@scene53.com> wrote:
>>
>>> You are helping me understanding stuff here a lot.
>>>
>>> I believe I have 3 last questions..
>>>
>>> If is use java receiver to get the data, how should I save it in memory?
>>> Using store command or other command?
>>>
>>> Once stored, how R can read that data?
>>>
>>> Can I use window command in R? I guess not because it is a streaming
>>> command, right? Any other way to window the data?
>>>
>>> Sent from IPhone
>>>
>>>
>>>
>>>
>>> On Mon, Jul 13, 2015 at 2:07 PM -0700, "Feynman Liang" <
>>> fli...@databricks.com> wrote:
>>>
>>>  If you use SparkR then you can analyze the data that's currently in
>>>> memory with R; otherwise you will have to write to disk (eg HDFS).
>>>>
>>>> On Mon, Jul 13, 2015 at 1:45 PM, Oded Maimon <o...@scene53.com> wrote:
>>>>
>>>>> Thanks again.
>>>>> What I'm missing is where can I store the data? Can I store it in
>>>>> spark memory and then use R to analyze it? Or should I use hdfs? Any other
>>>>> places that I can save the data?
>>>>>
>>>>> What would you suggest?
>>>>>
>>>>> Thanks...
>>>>>
>>>>> Sent from IPhone
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 13, 2015 at 1:41 PM -0700, "Feynman Liang" <
>>>>> fli...@databricks.com> wrote:
>>>>>
>>>>>  If you don't require true streaming processing and need to use R for
>>>>>> analysis, SparkR on a custom data source seems to fit your use case.
>>>>>>
>>>>>> On Mon, Jul 13, 2015 at 1:06 PM, Oded Maimon <o...@scene53.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, thanks for replying!
>>>>>>> I want to do the entire process in stages. Get the data using Java
>>>>>>> or scala because they are the only Langs that supports custom receivers,
>>>>>>> keep the data <somewhere>, use R to analyze it, keep the results
>>>>>>> <somewhere>, output the data to different systems.
>>>>>>>
>>>>>>> I thought that <somewhere> can be spark memory using rdd or
>>>>>>> dstreams.. But could it be that I need to keep it in hdfs to make the
>>>>>>> entire process in stages?
>>>>>>>
>>>>>>> Sent from IPhone
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 13, 2015 at 12:07 PM -0700, "Feynman Liang" <
>>>>>>> fli...@databricks.com> wrote:
>>>>>>>
>>>>>>>  Hi Oded,
>>>>>>>>
>>>>>>>> I'm not sure I completely understand your question, but it sounds
>>>>>>>> like you could have the READER receiver produce a DStream which is
>>>>>>>> windowed/processed in Spark Streaming and forEachRDD to do the OUTPUT.
>>>>>>>> However, streaming in SparkR is not currently supported (SPARK-6803
>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-6803>) so I'm not too
>>>>>>>> sure how ANALYZER would fit in.
>>>>>>>>
>>>>>>>> Feynman
>>>>>>>>
>>>>>>>> On Sun, Jul 12, 2015 at 11:23 PM, Oded Maimon <o...@scene53.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> any help / idea will be appreciated :)
>>>>>>>>> thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Oded Maimon
>>>>>>>>> Scene53.
>>>>>>>>>
>>>>>>>>> On Sun, Jul 12, 2015 at 4:49 PM, Oded Maimon <o...@scene53.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi All,
>>>>>>>>>> we are evaluating spark for real-time analytic. what we are
>>>>>>>>>> trying to do is the following:
>>>>>>>>>>
>>>>>>>>>>    - READER APP- use custom receiver to get data from rabbitmq
>>>>>>>>>>    (written in scala)
>>>>>>>>>>    - ANALYZER APP - use spark R application to read the data
>>>>>>>>>>    (windowed), analyze it every minute and save the results inside 
>>>>>>>>>> spark
>>>>>>>>>>    - OUTPUT APP - user spark application (scala/java/python) to
>>>>>>>>>>    read the results from R every X minutes and send the data to few 
>>>>>>>>>> external
>>>>>>>>>>    systems
>>>>>>>>>>
>>>>>>>>>> basically at the end i would like to have the READER COMPONENT as
>>>>>>>>>> an app that always consumes the data and keeps it in spark,
>>>>>>>>>> have as many ANALYZER COMPONENTS as my data scientists wants, and
>>>>>>>>>> have one OUTPUT APP that will read the ANALYZER results and send it 
>>>>>>>>>> to any
>>>>>>>>>> relevant system.
>>>>>>>>>>
>>>>>>>>>> what is the right way to do it?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Oded.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *This email and any files transmitted with it are confidential and
>>>>>>>>> intended solely for the use of the individual or entity to whom they 
>>>>>>>>> are
>>>>>>>>> addressed. Please note that any disclosure, copying or distribution 
>>>>>>>>> of the
>>>>>>>>> content of this information is strictly forbidden. If you have 
>>>>>>>>> received
>>>>>>>>> this email message in error, please destroy it immediately and notify 
>>>>>>>>> its
>>>>>>>>> sender.*
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> *This email and any files transmitted with it are confidential and
>>>>>>> intended solely for the use of the individual or entity to whom they are
>>>>>>> addressed. Please note that any disclosure, copying or distribution of 
>>>>>>> the
>>>>>>> content of this information is strictly forbidden. If you have received
>>>>>>> this email message in error, please destroy it immediately and notify 
>>>>>>> its
>>>>>>> sender.*
>>>>>>>
>>>>>>
>>>>>>
>>>>> *This email and any files transmitted with it are confidential and
>>>>> intended solely for the use of the individual or entity to whom they are
>>>>> addressed. Please note that any disclosure, copying or distribution of the
>>>>> content of this information is strictly forbidden. If you have received
>>>>> this email message in error, please destroy it immediately and notify its
>>>>> sender.*
>>>>>
>>>>
>>>>
>>> *This email and any files transmitted with it are confidential and
>>> intended solely for the use of the individual or entity to whom they are
>>> addressed. Please note that any disclosure, copying or distribution of the
>>> content of this information is strictly forbidden. If you have received
>>> this email message in error, please destroy it immediately and notify its
>>> sender.*
>>>
>>
>>
>
> *This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. Please note that any disclosure, copying or distribution of the
> content of this information is strictly forbidden. If you have received
> this email message in error, please destroy it immediately and notify its
> sender.*
>

Re: Few basic spark questions

Reply via email to