Re: Streaming SQL - object models, ASTs and algebras

Julian Hyde Thu, 29 Jan 2015 17:11:39 -0800

> On Jan 29, 2015, at 4:38 PM, Yi Pan <nickpa...@gmail.com> wrote:
> 
> I am wondering if I can get an average that's per 30 min window averages?
> I.e. the following is the input events in a stream:
>  {10:01, ORCL, 10, 10}
>  {10:02, MSFT, 30, 30}
>  {10:03, ORCL, 100, 110}
>  {10:17, MSFT, 45, 75}
>  {10:59, ORCL, 20, 130}
>  {11:02, ORCL, 50, 50}
> Can I get the non-overlapping window average from 10:00-10:29, and
> 10:30-10:59, ... ? Could you give an example how to define that window
> operation in your model? Note that in this use case, although I may have 1
> trading record per minute from the stream, I only generate 2 average
> records from 10:00-11:00.


That takes a bit of messy date arithmetic, made even more messy by the fact 
that you can't regard a SQL timestamp as a "milliseconds" value as one would in 
C or Java, nor can you divide it. Hence the trick of subtracting a constant 
timestamp, which yields an interval value, and then doing integer division on 
that interval to find the number of 30-minute periods since the epoch.

select stream rowtime, ticker, amount,
  sum(amount) over (
    order by rowtime
    partition by ticker,
      (rowtime - TIMESTAMP '1970-01-01 00:00:00') MINUTE / 30)
from StockTrades;

If I were doing this kind of calculation often I'd define a UDF, or even that 
user-defined window SPI I mentioned earlier.

> {quote}
> CREATE TABLE Emp(empno INTEGER, name VARCHAR(20), department VARCHAR(20))
>  PARTITION BY HASHCODE (department);
> {quote}
> That's good! At least it resolves my question on: "which field is the
> partition key". However, I still have the question on the number of
> partitions. As I stated that when the system currently does not have a
> "auto-scaling" feature, the number of partitions for a stream has to be
> explicitly specified. Where do you suggest to put this information in w/o
> breaking SQL syntax?

I imagine you could start the system on Tuesday with 10 partitions per stream 
and restart it on Wednesday with 8 or 12? You wouldn't want to change the SQL, 
because that's in the application. But you could change the definition of the 
stream, either the DDL or by changing some other system configuration. Then 
partitioning function (applied by the system to route the record) could, say, 
take the value of p modulo of the current number of streams.

Julian

Re: Streaming SQL - object models, ASTs and algebras

Reply via email to