Re: SQL with Spark Streaming

Jason Dai Wed, 11 Mar 2015 07:48:57 -0700

Sorry typo; should be https://github.com/intel-spark/stream-sql


Thanks,
-Jason

On Wed, Mar 11, 2015 at 10:19 PM, Irfan Ahmad <ir...@cloudphysics.com>
wrote:

> Got a 404 on that link: https://github.com/Intel-bigdata/spark-streamsql
>
>
> *Irfan Ahmad*
> CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com>
> Best of VMworld Finalist
> Best Cloud Management Award
> NetworkWorld 10 Startups to Watch
> EMA Most Notable Vendor
>
> On Wed, Mar 11, 2015 at 6:41 AM, Jason Dai <jason....@gmail.com> wrote:
>
>> Yes, a previous prototype is available
>> https://github.com/Intel-bigdata/spark-streamsql, and a talk is given at
>> last year's Spark Summit (
>> http://spark-summit.org/2014/talk/streamsql-on-spark-manipulating-streams-by-sql-using-spark
>> )
>>
>> We are currently porting the prototype to use the latest DataFrame API,
>> and will provide a stable version for people to try soon.
>>
>> Thabnks,
>> -Jason
>>
>>
>> On Wed, Mar 11, 2015 at 9:12 AM, Tobias Pfeiffer <t...@preferred.jp>
>> wrote:
>>
>>> Hi,
>>>
>>> On Wed, Mar 11, 2015 at 9:33 AM, Cheng, Hao <hao.ch...@intel.com> wrote:
>>>
>>>>  Intel has a prototype for doing this, SaiSai and Jason are the
>>>> authors. Probably you can ask them for some materials.
>>>>
>>>
>>> The github repository is here: https://github.com/intel-spark/stream-sql
>>>
>>> Also, what I did is writing a wrapper class SchemaDStream that
>>> internally holds a DStream[Row] and a DStream[StructType] (the latter
>>> having just one element in every RDD) and then allows to do
>>> - operations SchemaRDD => SchemaRDD using
>>> `rowStream.transformWith(schemaStream, ...)`
>>> - in particular you can register this stream's data as a table this way
>>> - and via a companion object with a method `fromSQL(sql: String):
>>> SchemaDStream` you can get a new stream from previously registered tables.
>>>
>>> However, you are limited to batch-internal operations, i.e., you can't
>>> aggregate across batches.
>>>
>>> I am not able to share the code at the moment, but will within the next
>>> months. It is not very advanced code, though, and should be easy to
>>> replicate. Also, I have no idea about the performance of transformWith....
>>>
>>> Tobias
>>>
>>>
>>
>

Re: SQL with Spark Streaming

Reply via email to