Slow activation using Spark Streaming's new receiver scheduling mechanism

2015-10-21 Thread Budde, Adam
Hi all, My team uses Spark Streaming to implement the batch processing component of a lambda architecture with 5 min intervals. We process roughly 15 TB/day using three discrete Spark clusters and about 250 receivers per cluster. We've been having some issues migrating our platform from Spark 1

Stop streaming context gracefully when SIGTERM is passed

2014-12-15 Thread Budde, Adam
Hi all, We are using Spark Streaming ETL a large volume of time series datasets. In our current design, each dataset we ETL will have a corresponding Spark Streaming context + process running on our cluster. Each of these processes will be passed configuration options specifying the data source

Re: Inconsistent Spark SQL behavior when column names contain dots

2014-07-31 Thread Budde, Adam
l parser I'd check out SqlParser.scala. Thought it is likely we will abandon that code in the next release for something more complete. On Thu, Jul 31, 2014 at 11:16 AM, Budde, Adam mailto:bu...@amazon.com>> wrote: I’m working with a dataset where each row is stored as

Inconsistent Spark SQL behavior when column names contain dots

2014-07-31 Thread Budde, Adam
I’m working with a dataset where each row is stored as a single-line flat JSON object. I want to leverage Spark SQL to run relational queries on this data. Many of the object keys in this dataset have dots in them, e.g.: { “key.number1”: “value1”, “key.number2”: “value2” … } I can successfully