On 5/16/2016 9:53 AM, Yuval Itzchakov wrote:
AFAIK, the underlying data represented under the DataSet[T]
abstraction will be formatted in Tachyon under the hood, but as with
RDD's if needed they will be spilled to local disk on the worker of
needed.
There is another option in case of RDD
AFAIK, the underlying data represented under the DataSet[T] abstraction
will be formatted in Tachyon under the hood, but as with RDD's if needed
they will be spilled to local disk on the worker of needed.
On Mon, May 16, 2016, 19:47 Benjamin Kim wrote:
> I have a curiosity question. These foreve
I have a curiosity question. These forever/unlimited DataFrames/DataSets will
persist and be query capable. I still am foggy about how this data will be
stored. As far as I know, memory is finite. Will the data be spilled to disk
and be retrievable if the query spans data not in memory? Is Tachy
Oh, that looks neat! Thx, will read up on that.
On Mon, May 16, 2016, 14:10 Ofir Manor wrote:
> Yuval,
> Not sure what in-scope to land in 2.0, but there is another new infra bit
> to manage state more efficiently called State Store, whose initial version
> is already commited:
>SPARK-13809
Yuval,
Not sure what in-scope to land in 2.0, but there is another new infra bit
to manage state more efficiently called State Store, whose initial version
is already commited:
SPARK-13809 - State Store: A new framework for state management for
computing Streaming Aggregates
https://issues.apach
Also, re-reading the relevant part from the Structured Streaming
documentation (
https://docs.google.com/document/d/1NHKdRSNCbCmJbinLmZuqNA1Pt6CGpFnLVRbzuDUcZVM/edit#heading=h.335my4b18x6x
):
Discretized streams (aka dstream)
Unlike Storm, dstream exposes a higher level API similar to RDDs. There
Hi Ofir,
Thanks for the elaborated answer. I have read both documents, where they do
a light touch on infinite Dataframes/Datasets. However, they do not go in
depth as regards to how existing transformations on DStreams, for example,
will be transformed into the Dataset APIs. I've been browsing the
Ofir,
Thanks for the clarification. I was confused for the moment. The links will be
very helpful.
> On May 15, 2016, at 2:32 PM, Ofir Manor wrote:
>
> Ben,
> I'm just a Spark user - but at least in March Spark Summit, that was the main
> term used.
> Taking a step back from the details, may
Ben,
I'm just a Spark user - but at least in March Spark Summit, that was the
main term used.
Taking a step back from the details, maybe this new post from Reynold is a
better intro to Spark 2.0 highlights
https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smar
Hi Ofir,
I just recently saw the webinar with Reynold Xin. He mentioned the Spark
Session unification efforts, but I don’t remember the DataSet for Structured
Streaming aka Continuous Applications as he put it. He did mention streaming or
unlimited DataFrames for Structured Streaming so one can
Hi Yuval,
let me share my understanding based on similar questions I had.
First, Spark 2.x aims to replace a whole bunch of its APIs with just two
main ones - SparkSession (replacing Hive/SQL/Spark Context) and Dataset
(merging of Dataset and Dataframe - which is why it inherits all the
SparkSQL go
I've been reading/watching videos about the upcoming Spark 2.0 release which
brings us Structured Streaming. One thing I've yet to understand is how this
relates to the current state of working with Streaming in Spark with the
DStream abstraction.
All examples I can find, in the Spark repository/d
12 matches
Mail list logo