Fwd: Spark support for Complex Event Processing (CEP)

Michael Segel Wed, 27 Apr 2016 17:23:14 -0700
Doh! 
Wrong email account again! 

> Begin forwarded message:
> 
> From: Michael Segel <michael_se...@hotmail.com>
> Subject: Re: Spark support for Complex Event Processing (CEP)
> Date: April 27, 2016 at 7:16:55 PM CDT
> To: Mich Talebzadeh <mich.talebza...@gmail.com>
> Cc: Esa Heikkinen <esa.heikki...@student.tut.fi>, "user@spark" 
> <user@spark.apache.org>
> 
> Uhm… 
> I think you need to clarify a couple of things…
> 
> First there is this thing called analog signal processing…. Is that 
> continuous enough for you? 
> 
> But more to the point, Spark Streaming does micro batching so if you’re 
> processing a continuous stream of tick data, you will have more than 50K of 
> tics per second while there are markets open and trading.  Even at 50K a 
> second, that would mean 1 every .02 ms or 50 ticks a ms. 
> 
> And you don’t want to wait until you have a batch to start processing, but 
> you want to process when the data hits the queue and pull it from the queue 
> as quickly as possible. 
> 
> Spark streaming will be able to pull batches in as little as 500ms. So if you 
> pull a batch at t0 and immediately have a tick in your queue, you won’t 
> process that data until t0+500ms. And said batch would contain 25,000 
> entries. 
> 
> Depending on what you are doing… that 500ms delay can be enough to be fatal 
> to your trading process. 
> 
> If you don’t like stock data, there are other examples mainly when pulling 
> data from real time embedded systems. 
> 
> 
> If you go back and read what I said, if your data flow is >> (much slower) 
> than 500ms, and / or the time to process is >> 500ms ( much longer )  you 
> could use spark streaming.  If not… and there are applications which require 
> that type of speed…  then you shouldn’t use spark streaming. 
> 
> If you do have that constraint, then you can look at systems like 
> storm/flink/samza / whatever where you have a continuous queue and listener 
> and no micro batch delays.
> Then for each bolt (storm) you can have a spark context for processing the 
> data. (Depending on what sort of processing you want to do.) 
> 
> To put this in perspective… if you’re using spark streaming / akka / storm 
> /etc to handle real time requests from the web, 500ms added delay can be a 
> long time. 
> 
> Choose the right tool. 
> 
> For the OP’s problem. Sure Tracking public transportation could be done using 
> spark streaming. It could also be done using half a dozen other tools because 
> the rate of data generation is much slower than 500ms. 
> 
> HTH
> 
> 
>> On Apr 27, 2016, at 4:34 PM, Mich Talebzadeh <mich.talebza...@gmail.com 
>> <mailto:mich.talebza...@gmail.com>> wrote:
>> 
>> couple of things.
>> 
>> There is no such thing as Continuous Data Streaming as there is no such 
>> thing as Continuous Availability.
>> 
>> There is such thing as Discrete Data Streaming and  High Availability  but 
>> they reduce the finite unavailability to minimum. In terms of business needs 
>> a 5 SIGMA is good enough and acceptable. Even the candles set to a 
>> predefined time interval say 2, 4, 15 seconds overlap. No FX savvy trader 
>> makes a sell or buy decision on the basis of 2 seconds candlestick
>> 
>> The calculation itself in measurements is subject to finite error as defined 
>> by their Confidence Level (CL) using Standard Deviation function.
>> 
>> OK so far I have never noticed a tool that requires that details of 
>> granularity. Those stuff from Flink etc is in practical term is of little 
>> value and does not make commercial sense.
>> 
>> Now with regard to your needs, Spark micro batching is perfectly adequate.
>> 
>> HTH
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 27 April 2016 at 22:10, Esa Heikkinen <esa.heikki...@student.tut.fi 
>> <mailto:esa.heikki...@student.tut.fi>> wrote:
>> 
>> Hi
>> 
>> Thanks for the answer.
>> 
>> I have developed a log file analyzer for RTPIS (Real Time Passenger 
>> Information System) system, where buses drive lines and the system try to 
>> estimate the arrival times to the bus stops. There are many different log 
>> files (and events) and analyzing situation can be very complex. Also spatial 
>> data can be included to the log data.
>> 
>> The analyzer also has a query (or analyzing) language, which describes a 
>> expected behavior. This can be a requirement of system. Analyzer can be 
>> think to be also a test oracle.
>> 
>> I have published a paper (SPLST'15 conference) about my analyzer and its 
>> language. The paper is maybe too technical, but it is found:
>> http://ceur-ws.org/Vol-1525/paper-19.pdf 
>> <http://ceur-ws.org/Vol-1525/paper-19.pdf>
>> 
>> I do not know yet where it belongs. I think it can be some "CEP with 
>> delays". Or do you know better ?
>> My analyzer can also do little bit more complex and time-consuming 
>> analyzings? There are no a need for real time.
>> 
>> And it is possible to do "CEP with delays" reasonably some existing analyzer 
>> (for example Spark) ?
>> 
>> Regards
>> PhD student at Tampere University of Technology, Finland,  
>> <http://www.tut.fi/>www.tut.fi <http://www.tut.fi/>
>> Esa Heikkinen
>> 
>> 27.4.2016, 15:51, Michael Segel kirjoitti:
>>> Spark and CEP? It depends… 
>>> 
>>> Ok, I know that’s not the answer you want to hear, but its a bit more 
>>> complicated… 
>>> 
>>> If you consider Spark Streaming, you have some issues. 
>>> Spark Streaming isn’t a Real Time solution because it is a micro batch 
>>> solution. The smallest Window is 500ms.  This means that if your compute 
>>> time is >> 500ms and/or  your event flow is >> 500ms this could work.
>>> (e.g. 'real time' image processing on a system that is capturing 60FPS 
>>> because the processing time is >> 500ms. ) 
>>> 
>>> So Spark Streaming wouldn’t be the best solution…. 
>>> 
>>> However, you can combine spark with other technologies like Storm, Akka, 
>>> etc .. where you have continuous streaming. 
>>> So you could instantiate a spark context per worker in storm… 
>>> 
>>> I think if there are no class collisions between Akka and Spark, you could 
>>> use Akka, which may have a better potential for communication between 
>>> workers. 
>>> So here you can handle CEP events. 
>>> 
>>> HTH
>>> 
>>>> On Apr 27, 2016, at 7:03 AM, Mich Talebzadeh <mich.talebza...@gmail.com 
>>>> <mailto:mich.talebza...@gmail.com>> wrote:
>>>> 
>>>> please see my other reply
>>>> 
>>>> Dr Mich Talebzadeh
>>>>  
>>>> LinkedIn   
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>  
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>>>  
>>>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>>>  
>>>> 
>>>> On 27 April 2016 at 10:40, Esa Heikkinen < 
>>>> <mailto:esa.heikki...@student.tut.fi>esa.heikki...@student.tut.fi 
>>>> <mailto:esa.heikki...@student.tut.fi>> wrote:
>>>> Hi
>>>> 
>>>> I have followed with interest the discussion about CEP and Spark. It is 
>>>> quite close to my research, which is a complex analyzing for log files and 
>>>> "history" data  (not actually for real time streams).
>>>> 
>>>> I have few questions:
>>>> 
>>>> 1) Is CEP only for (real time) stream data and not for "history" data?
>>>> 
>>>> 2) Is it possible to search "backward" (upstream) by CEP with given time 
>>>> window? If a start time of the time window is earlier than the current 
>>>> stream time.
>>>> 
>>>> 3) Do you know any good tools or softwares for "CEP's" using for log data ?
>>>> 
>>>> 4) Do you know any good (scientific) papers i should read about CEP ?
>>>> 
>>>> 
>>>> Regards
>>>> PhD student at Tampere University of Technology, Finland, www.tut.fi 
>>>> <http://www.tut.fi/>
>>>> Esa Heikkinen
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>>>> <mailto:user-unsubscr...@spark.apache.org>
>>>> For additional commands, e-mail:  
>>>> <mailto:user-h...@spark.apache.org>user-h...@spark.apache.org 
>>>> <mailto:user-h...@spark.apache.org>
>>>> 
>>>> 
>>> 
>>> The opinions expressed here are mine, while they may reflect a cognitive 
>>> thought, that is purely accidental. 
>>> Use at your own risk. 
>>> Michael Segel
>>> michael_segel (AT) hotmail.com <http://hotmail.com/>
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> The opinions expressed here are mine, while they may reflect a cognitive 
> thought, that is purely accidental. 
> Use at your own risk. 
> Michael Segel
> michael_segel (AT) hotmail.com <http://hotmail.com/>
> 
> 
> 
> 
>
Fwd: Spark support for Complex Event Processing (CEP)

Reply via email to