Re: Few basic spark questions

Oded Maimon Tue, 14 Jul 2015 08:33:07 -0700

Hi,
Thanks for all the help.
I'm still missing something very basic.

If I wont use sparkR, which doesn't support streaming (will use mlib
instead as Debasish suggested), and I have my scala receiver working, how
the receiver should save the data in memory? I do see the store method, so
if i use it, how can i read the data from a different spark scala/java
application? how do i find/query this data?



Regards,
Oded Maimon
Scene53.

On Tue, Jul 14, 2015 at 12:35 AM, Feynman Liang <fli...@databricks.com>
wrote:

> Sorry; I think I may have used poor wording. SparkR will let you use R to
> analyze the data, but it has to be loaded into memory using SparkR (see SparkR
> DataSources
> <http://people.apache.org/~pwendell/spark-releases/latest/sparkr.html>).
> You will still have to write a Java receiver to store the data into some
> tabular datastore (e.g. Hive) before loading them as SparkR DataFrames and
> performing the analysis.
>
> R specific questions such as windowing in R should go to R-help@; you
> won't be able to use window since that is a Spark Streaming method.
>
> On Mon, Jul 13, 2015 at 2:23 PM, Oded Maimon <o...@scene53.com> wrote:
>
>> You are helping me understanding stuff here a lot.
>>
>> I believe I have 3 last questions..
>>
>> If is use java receiver to get the data, how should I save it in memory?
>> Using store command or other command?
>>
>> Once stored, how R can read that data?
>>
>> Can I use window command in R? I guess not because it is a streaming
>> command, right? Any other way to window the data?
>>
>> Sent from IPhone
>>
>>
>>
>>
>> On Mon, Jul 13, 2015 at 2:07 PM -0700, "Feynman Liang" <
>> fli...@databricks.com> wrote:
>>
>>  If you use SparkR then you can analyze the data that's currently in
>>> memory with R; otherwise you will have to write to disk (eg HDFS).
>>>
>>> On Mon, Jul 13, 2015 at 1:45 PM, Oded Maimon <o...@scene53.com> wrote:
>>>
>>>> Thanks again.
>>>> What I'm missing is where can I store the data? Can I store it in spark
>>>> memory and then use R to analyze it? Or should I use hdfs? Any other places
>>>> that I can save the data?
>>>>
>>>> What would you suggest?
>>>>
>>>> Thanks...
>>>>
>>>> Sent from IPhone
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jul 13, 2015 at 1:41 PM -0700, "Feynman Liang" <
>>>> fli...@databricks.com> wrote:
>>>>
>>>>  If you don't require true streaming processing and need to use R for
>>>>> analysis, SparkR on a custom data source seems to fit your use case.
>>>>>
>>>>> On Mon, Jul 13, 2015 at 1:06 PM, Oded Maimon <o...@scene53.com> wrote:
>>>>>
>>>>>> Hi, thanks for replying!
>>>>>> I want to do the entire process in stages. Get the data using Java or
>>>>>> scala because they are the only Langs that supports custom receivers, 
>>>>>> keep
>>>>>> the data <somewhere>, use R to analyze it, keep the results <somewhere>,
>>>>>> output the data to different systems.
>>>>>>
>>>>>> I thought that <somewhere> can be spark memory using rdd or
>>>>>> dstreams.. But could it be that I need to keep it in hdfs to make the
>>>>>> entire process in stages?
>>>>>>
>>>>>> Sent from IPhone
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 13, 2015 at 12:07 PM -0700, "Feynman Liang" <
>>>>>> fli...@databricks.com> wrote:
>>>>>>
>>>>>>  Hi Oded,
>>>>>>>
>>>>>>> I'm not sure I completely understand your question, but it sounds
>>>>>>> like you could have the READER receiver produce a DStream which is
>>>>>>> windowed/processed in Spark Streaming and forEachRDD to do the OUTPUT.
>>>>>>> However, streaming in SparkR is not currently supported (SPARK-6803
>>>>>>> <https://issues.apache.org/jira/browse/SPARK-6803>) so I'm not too
>>>>>>> sure how ANALYZER would fit in.
>>>>>>>
>>>>>>> Feynman
>>>>>>>
>>>>>>> On Sun, Jul 12, 2015 at 11:23 PM, Oded Maimon <o...@scene53.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> any help / idea will be appreciated :)
>>>>>>>> thanks
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Oded Maimon
>>>>>>>> Scene53.
>>>>>>>>
>>>>>>>> On Sun, Jul 12, 2015 at 4:49 PM, Oded Maimon <o...@scene53.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>> we are evaluating spark for real-time analytic. what we are trying
>>>>>>>>> to do is the following:
>>>>>>>>>
>>>>>>>>>    - READER APP- use custom receiver to get data from rabbitmq
>>>>>>>>>    (written in scala)
>>>>>>>>>    - ANALYZER APP - use spark R application to read the data
>>>>>>>>>    (windowed), analyze it every minute and save the results inside 
>>>>>>>>> spark
>>>>>>>>>    - OUTPUT APP - user spark application (scala/java/python) to
>>>>>>>>>    read the results from R every X minutes and send the data to few 
>>>>>>>>> external
>>>>>>>>>    systems
>>>>>>>>>
>>>>>>>>> basically at the end i would like to have the READER COMPONENT as
>>>>>>>>> an app that always consumes the data and keeps it in spark,
>>>>>>>>> have as many ANALYZER COMPONENTS as my data scientists wants, and
>>>>>>>>> have one OUTPUT APP that will read the ANALYZER results and send it 
>>>>>>>>> to any
>>>>>>>>> relevant system.
>>>>>>>>>
>>>>>>>>> what is the right way to do it?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Oded.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> *This email and any files transmitted with it are confidential and
>>>>>>>> intended solely for the use of the individual or entity to whom they 
>>>>>>>> are
>>>>>>>> addressed. Please note that any disclosure, copying or distribution of 
>>>>>>>> the
>>>>>>>> content of this information is strictly forbidden. If you have received
>>>>>>>> this email message in error, please destroy it immediately and notify 
>>>>>>>> its
>>>>>>>> sender.*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> *This email and any files transmitted with it are confidential and
>>>>>> intended solely for the use of the individual or entity to whom they are
>>>>>> addressed. Please note that any disclosure, copying or distribution of 
>>>>>> the
>>>>>> content of this information is strictly forbidden. If you have received
>>>>>> this email message in error, please destroy it immediately and notify its
>>>>>> sender.*
>>>>>>
>>>>>
>>>>>
>>>> *This email and any files transmitted with it are confidential and
>>>> intended solely for the use of the individual or entity to whom they are
>>>> addressed. Please note that any disclosure, copying or distribution of the
>>>> content of this information is strictly forbidden. If you have received
>>>> this email message in error, please destroy it immediately and notify its
>>>> sender.*
>>>>
>>>
>>>
>> *This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they are
>> addressed. Please note that any disclosure, copying or distribution of the
>> content of this information is strictly forbidden. If you have received
>> this email message in error, please destroy it immediately and notify its
>> sender.*
>>
>
>

-- 


*This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are 
addressed. Please note that any disclosure, copying or distribution of the 
content of this information is strictly forbidden. If you have received 
this email message in error, please destroy it immediately and notify its 
sender.*

Re: Few basic spark questions

Reply via email to