Hi Alex, We went through this path already :) This is the reason we try other approaches. The recursion makes it very inefficient for some cases. For details, this paper describes it very well: https://people.cs.umass.edu/%7Eyanlei/publications/sase-sigmod08.pdf which is the same paper references in Flink ticket.
Please let me know if I overlook something. Thank you for sharing this! Best Regards, Jerry On Tue, Mar 1, 2016 at 11:58 AM, Alex Kozlov <ale...@gmail.com> wrote: > For the purpose of full disclosure, I think Scala offers a much more > efficient pattern matching paradigm. Using nPath is like using assembler > to program distributed systems. Cannot tell much here today, but the > pattern would look like: > > | def matchSessions(h: Seq[Session[PageView]], id: String, p: > Seq[PageView]) : > > Seq[Session[PageView]] = { | p match { > > | case Nil => Nil > > | case PageView(ts1, "company.com>homepage") :: > PageView(ts2, > > "company.com>plus>products landing") :: tail if ts2 > ts1 + 600 => > > | matchSessions(h, id, tail).+:(new Session(id, p)) > > | case _ => matchSessions(h, id, p.tail) > > | } > > Look for Scala case statements with guards and upcoming book releases. > > http://docs.scala-lang.org/tutorials/tour/pattern-matching > > https://www.safaribooksonline.com/library/view/scala-cookbook/9781449340292/ch03s14.html > > On Tue, Mar 1, 2016 at 8:34 AM, Henri Dubois-Ferriere <henr...@gmail.com> > wrote: > >> fwiw Apache Flink just added CEP. Queries are constructed >> programmatically rather than in SQL, but the underlying functionality is >> similar. >> >> https://issues.apache.org/jira/browse/FLINK-3215 >> >> On 1 March 2016 at 08:19, Jerry Lam <chiling...@gmail.com> wrote: >> >>> Hi Herman, >>> >>> Thank you for your reply! >>> This functionality usually finds its place in financial services which >>> use CEP (complex event processing) for correlation and pattern matching. >>> Many commercial products have this including Oracle and Teradata Aster Data >>> MR Analytics. I do agree the syntax a bit awkward but after you understand >>> it, it is actually very compact for expressing something that is very >>> complex. Esper has this feature partially implemented ( >>> http://www.espertech.com/esper/release-5.1.0/esper-reference/html/match-recognize.html >>> ). >>> >>> I found the Teradata Analytics documentation best to describe the usage >>> of it. For example (note npath is similar to match_recognize): >>> >>> SELECT last_pageid, MAX( count_page80 ) >>> FROM nPath( >>> ON ( SELECT * FROM clicks WHERE category >= 0 ) >>> PARTITION BY sessionid >>> ORDER BY ts >>> PATTERN ( 'A.(B|C)*' ) >>> MODE ( OVERLAPPING ) >>> SYMBOLS ( pageid = 50 AS A, >>> pageid = 80 AS B, >>> pageid <> 80 AND category IN (9,10) AS C ) >>> RESULT ( LAST ( pageid OF ANY ( A,B,C ) ) AS last_pageid, >>> COUNT ( * OF B ) AS count_page80, >>> COUNT ( * OF ANY ( A,B,C ) ) AS count_any ) >>> ) >>> WHERE count_any >= 5 >>> GROUP BY last_pageid >>> ORDER BY MAX( count_page80 ) >>> >>> The above means: >>> Find user click-paths starting at pageid 50 and passing exclusively >>> through either pageid 80 or pages in category 9 or category 10. Find the >>> pageid of the last page in the path and count the number of times page 80 >>> was visited. Report the maximum count for each last page, and sort the >>> output by the latter. Restrict to paths containing at least 5 pages. Ignore >>> pages in the sequence with category < 0. >>> >>> If this query is written in pure SQL (if possible at all), it requires >>> several self-joins. The interesting thing about this feature is that it >>> integrates SQL+Streaming+ML in one (perhaps potentially graph too). >>> >>> Best Regards, >>> >>> Jerry >>> >>> >>> On Tue, Mar 1, 2016 at 9:39 AM, Herman van Hövell tot Westerflier < >>> hvanhov...@questtec.nl> wrote: >>> >>>> Hi Jerry, >>>> >>>> This is not on any roadmap. I (shortly) browsed through this; and this >>>> looks like some sort of a window function with very awkward syntax. I think >>>> spark provided better constructs for this using dataframes/datasets/nested >>>> data... >>>> >>>> Feel free to submit a PR. >>>> >>>> Kind regards, >>>> >>>> Herman van Hövell >>>> >>>> 2016-03-01 15:16 GMT+01:00 Jerry Lam <chiling...@gmail.com>: >>>> >>>>> Hi Spark developers, >>>>> >>>>> Will you consider to add support for implementing "Pattern matching in >>>>> sequences of rows"? More specifically, I'm referring to this: >>>>> http://web.cs.ucla.edu/classes/fall15/cs240A/notes/temporal/row-pattern-recogniton-11.pdf >>>>> >>>>> This is a very cool/useful feature to pattern matching over live >>>>> stream/archived data. It is sorted of related to machine learning because >>>>> this is usually used in clickstream analysis or path analysis. Also it is >>>>> related to streaming because of the nature of the processing (time series >>>>> data mostly). It is SQL because there is a good way to express and >>>>> optimize >>>>> the query. >>>>> >>>>> Best Regards, >>>>> >>>>> Jerry >>>>> >>>> >>>> >>> >> > > > -- > Alex Kozlov > (408) 507-4987 > (650) 887-2135 efax > ale...@gmail.com >