date:20180803

re: should we dump a warning if we drop batches due to window move?

2018-08-03 Thread Peter Liu

Hello there, I have a quick question for the following case: situation: a spark consumer is able to process 5 batches in 10 sec (where the batch interval is zero by default - correct me if this is wrong). the window size is 10 sec (zero overlapping sliding). there are some fluctuations in the inc

Re: Spark model serving

2018-08-03 Thread Saikat Kanjilal

@holdenK et al ping on next steps. Sent from my iPhone On Jul 12, 2018, at 3:47 PM, Saikat Kanjilal mailto:sxk1...@hotmail.com>> wrote: Thanks maximiliano so much for responding, I didn't want this discussion to disappear in the wilderness of dev emails :), here's what I would like to see or

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-03 Thread Joseph Torres

I'd agree it might make sense to bundle this into an API. We'd have to think about whether it's a common enough use case to justify the API complexity. It might be worth exploring decoupling state and partitions, but I wouldn't want to start making decisions based on it without a clearer design pi

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-03 Thread Arun Mahadevan

coalesce might work. Say "spark.sql.shuffle.partitions" = 200, and then " input.readStream.map.filter.groupByKey(..).coalesce(2)" would still create 200 instances for state but execute just 2 tasks. However I think further groupByKey operations downstream would need similar coalesce. And thi

SPIP: Executor Plugin (SPARK-24918)

2018-08-03 Thread Imran Rashid

I'd like to propose adding a plugin api for Executors, primarily for instrumentation and debugging ( https://issues.apache.org/jira/browse/SPARK-24918). The changes are small, but as its adding a new api, it might be spip-worthy. I mentioned it as well in a recent email I sent about memory monito

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-03 Thread Joseph Torres

A coalesced RDD will definitely maintain any within-partition invariants that the original RDD maintained. It pretty much just runs its input partitions sequentially. There'd still be some Dataframe API work needed to get the coalesce operation where you want it to be, but this is much simpler tha

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-03 Thread Jungtaek Lim

I’m afraid I don’t know about the details on coalesce(), but some finding resource for coalesce, it looks like helping reducing actual partitions. For streaming aggregation, state for all partitions (by default, 200) must be initialized and committed even it is being unchanged. Otherwise error occ

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-03 Thread Joseph Torres

Scheduling multiple partitions in the same task is basically what coalesce() does. Is there a reason that doesn't work here? On Fri, Aug 3, 2018 at 5:55 AM, Jungtaek Lim wrote: > Here's a link for Google docs (anyone can comment): > https://docs.google.com/document/d/1DEOW3WQcPUq0YFgazkZx6Ei6EOd

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-03 Thread Jungtaek Lim

Here's a link for Google docs (anyone can comment): https://docs.google.com/document/d/1DEOW3WQcPUq0YFgazkZx6Ei6EOdj_3pXEsyq4LGpyNs/edit?usp=sharing Please note that I just copied the content to the google docs, so someone could point out lack of details. I would like to start with explanation of

Spark sql syntax checker

2018-08-03 Thread Alessandro Liparoti

Hi everyone, I am building a framework on top of Spark in which users specify sql queries and we analyze them in order to extract some metadata. Moreover, sql queries can be composed, meaning that if a user writes a query X to build a dataset, another user can use X in his own query to refer to it

Spark kafka streaming failure recovery scenario

2018-08-03 Thread sujith71955

Hi All, I have a scenario where my streaming application is running and in between the application has restarted, So basically i want to start from most recent data and go back to old data. Is there any way in spark we can do this or are we planning for providing such kind of flexibility in futur

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-03 Thread Arun Mahadevan

Can you share this in a google doc to make the discussions easier.? Thanks for coming up with ideas to improve upon the current restrictions with the SS state store. If I understood correctly, the plan is to introduce a logical partitioning scheme for state storage (based on keys) independent

re: should we dump a warning if we drop batches due to window move?

Re: Spark model serving

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

SPIP: Executor Plugin (SPARK-24918)

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

Spark sql syntax checker

Spark kafka streaming failure recovery scenario

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

12 matches

Site Navigation

Mail list logo

Footer information