Hi all,
My team uses Spark Streaming to implement the batch processing component of a
lambda architecture with 5 min intervals. We process roughly 15 TB/day using
three discrete Spark clusters and about 250 receivers per cluster. We've been
having some issues migrating our platform from Spark 1
Hi all,
We are using Spark Streaming ETL a large volume of time series datasets. In our
current design, each dataset we ETL will have a corresponding Spark Streaming
context + process running on our cluster. Each of these processes will be
passed configuration options specifying the data source
l parser I'd check out
SqlParser.scala. Thought it is likely we will abandon that code in the next
release for something more complete.
On Thu, Jul 31, 2014 at 11:16 AM, Budde, Adam
mailto:bu...@amazon.com>> wrote:
I’m working with a dataset where each row is stored as
I’m working with a dataset where each row is stored as a single-line flat JSON
object. I want to leverage Spark SQL to run relational queries on this data.
Many of the object keys in this dataset have dots in them, e.g.:
{ “key.number1”: “value1”, “key.number2”: “value2” … }
I can successfully