Re: Asking for reviewing PRs regarding structured streaming

2018-07-25 Thread Jungtaek Lim
I'd like to bump this again, since only one of 6 pull requests is merged (5 remaining), and others are not reviewed (non code style) from committers. https://github.com/apache/spark/pulls/HeartSaVioR All pull requests are related to Structured Streaming, and most of all are already reviewed by co

Re: Asking for reviewing PRs regarding structured streaming

2018-07-15 Thread Jungtaek Lim
Bump. I got couple of review comments from contributors including soft LGTM, but still haven't got any (non code style) review from committers, so technically haven't have any progress to be merged. I'm planning to work on adding new feature as well, but it's not easy for me to concentrate on some

Re: Asking for reviewing PRs regarding structured streaming

2018-07-12 Thread Jungtaek Lim
I recently added more test results to SPARK-24763 [1] which shows that the proposal reduces state size according to the ratio of key-value size, whereas there's no performance hit and sometimes even slight boost. Please refer the latest comment in JIRA issue [2] to see the numbers from perf. tests

Re: Asking for reviewing PRs regarding structured streaming

2018-07-09 Thread Jungtaek Lim
Now I'm adding one more issue (SPARK-24763 [1]), which proposes a new option to enable optimization of state size in streaming aggregation without hurting performance. The idea is to remove data for key fields from value which is duplicated between key and value in state row. This requires additio

Re: Asking for reviewing PRs regarding structured streaming

2018-07-05 Thread Jungtaek Lim
Ted Yu suggested posting the improved numbers to this thread and I think it's good idea, so also posting here, but I also think explaining rationalization of my issues would help understanding why I'm submitting couple of patches, so I'll explain it first. (Sorry to post a wall of text). tl;dr. SP

Re: Asking for reviewing PRs regarding structured streaming

2018-07-05 Thread Jungtaek Lim
Bump. I have been having hard time working on making additional PRs since some of these rely on non-merged PRs, so spending additional time to decouple these things if possible. https://github.com/apache/spark/pulls/HeartSaVioR Pending 5 PRs so far, and may add more sooner or later. Thanks, Jung

Re: Asking for reviewing PRs regarding structured streaming

2018-06-30 Thread Jungtaek Lim
Kindly reminder since around 2 weeks passed. I've added more PR during 2 weeks and even planning to do more. 2018년 6월 19일 (화) 오후 6:34, Jungtaek Lim 님이 작성: > Hi Spark devs, > > I have couple of pull requests for structured streaming which are getting > older and fading out from earlier pages in PR

Asking for reviewing PRs regarding structured streaming

2018-06-19 Thread Jungtaek Lim
Hi Spark devs, I have couple of pull requests for structured streaming which are getting older and fading out from earlier pages in PR pages. https://github.com/apache/spark/pull/21469 https://github.com/apache/spark/pull/21357 https://github.com/apache/spark/pull/21222 Two of them are in a kind