Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-22 Thread Cody Innowhere
Hi JB, Glad to hear that. Still, I'm thinking about adding support of Meters & Histograms(maybe extending Distribution). As the discussion mentions, problem is that Meter/Histogram cannot be updated directly in current way because their internal data decays after time. Do you plan to refactor curre

Re: Looking for a good "write-here-if-fails" pattern

2017-06-22 Thread Kenneth Knowles
Using provenance to explain bad data in a general manner requires deep support from your data processing engine and is still a research topic (for example, https://blog.acolyer.org/2017/02/01/explaining-outputs-in-modern-data-analytics/) so I wouldn't go down that path. I expect that putting in the

Re: SQL in Stream Computing: MERGE or INSERT?

2017-06-22 Thread James
Hi Tyler, I think upsert is a good alternative, concise as INSERT and have the valid semantics. Just that user seems rarely use UPSERT either(might because there's no UPDATE in batch big data processing). By *"INSERT will behave differently in batch & stream processing"* I mean, if we use the "IN

Re: SQL in Stream Computing: MERGE or INSERT?

2017-06-22 Thread James
Hi Jesse, Yeah, I know the insert...select grammar. In my scenario, each of the value column is calculated separately(might calculated from different datasources), so insert...select might not be sufficient. Jesse Anderson 于2017年6月22日周四 下午10:35写道: > If I'm understanding correctly, Hive does that

答复: Fwd: [Report] Eagle - June 2017

2017-06-22 Thread 上海_中台研发部_数据平台部_基础数据部_唐觊隽
I am working on Alert engine based Apache Beam. I can help volunteers. -邮件原件- 发件人: Jyotirmoy Sundi [mailto:sundi...@gmail.com] 发送时间: 2017年6月22日 10:23 收件人: JingsongLee; dev@beam.apache.org 主题: Re: Fwd: [Report] Eagle - June 2017 Would like to help have worked on we beam apps On Wed, Jun

Re: Bundling multiple TestPipeline tests into one pipeline

2017-06-22 Thread Eugene Kirpichov
Another advantage of "custom runner" approach is that we can convert existing ValidatesRunner test classes one by one, switching them from RunWith(Junit4.class) to RunWith(BundledTestPipelines.class) or whatever (and making other necessary changes). On Thu, Jun 22, 2017 at 3:48 PM Kenneth Knowles

Re: Bundling multiple TestPipeline tests into one pipeline

2017-06-22 Thread Kenneth Knowles
This is a great idea! Your suggestion to do it via a JUnit test runner makes it very concrete. Kenn On Thu, Jun 22, 2017 at 3:27 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hi folks and especially runner developers, > > https://issues.apache.org/jira/browse/BEAM-2506 - quoting

Bundling multiple TestPipeline tests into one pipeline

2017-06-22 Thread Eugene Kirpichov
Hi folks and especially runner developers, https://issues.apache.org/jira/browse/BEAM-2506 - quoting from there: Currently ValidatesRunner test suites run 1 pipeline per unit test. That's a lot of small pipelines, and consumes a lot of resources especially in case of a pretty heavyweight runner l

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-06-22 Thread Ahmet Altay
+1 For Python, there are 2 hard blocking issues (and 2 nice to haves) all tagged as blocking 2.1.0 [1]. Ahmet [1] https://issues.apache.org/jira/browse/BEAM-2497?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202

Re: reading from s3 file in aws

2017-06-22 Thread Lukasz Cwik
Filed BEAM-2500 as a feature request. On Thu, Jun 22, 2017 at 9:00 AM, tarush grover wrote: > Hi All, > > Can we add a module s3-file-system in beam to directly support and have > integration with s3? > > Regards, > Tarush > > On Thu, 22 Jun 2017 at 9:21 PM, Lukasz Cwik > wrote: > > > You want

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-06-22 Thread Davor Bonaci
+1 On Thu, Jun 22, 2017 at 5:42 AM, Etienne Chauchot wrote: > Besides, there are some minor fixes/enhancements that lack in spark > > For info, bellow are the ones raised by nexmark test suite: > > https://issues.apache.org/jira/browse/BEAM-2499 > > https://issues.apache.org/jira/browse/BEAM-211

Re: reading from s3 file in aws

2017-06-22 Thread tarush grover
Hi All, Can we add a module s3-file-system in beam to directly support and have integration with s3? Regards, Tarush On Thu, 22 Jun 2017 at 9:21 PM, Lukasz Cwik wrote: > You want to depend on the Hadoop File System module[1] and configure > HadoopFileSystemOptions[2] with a S3 configuration[3]

Re: reading from s3 file in aws

2017-06-22 Thread Lukasz Cwik
You want to depend on the Hadoop File System module[1] and configure HadoopFileSystemOptions[2] with a S3 configuration[3]. 1: https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system 2: https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-06-22 Thread Etienne Chauchot
Besides, there are some minor fixes/enhancements that lack in spark For info, bellow are the ones raised by nexmark test suite: https://issues.apache.org/jira/browse/BEAM-2499 https://issues.apache.org/jira/browse/BEAM-2112 https://issues.apache.org/jira/browse/BEAM-2409 https://issues.apache

Re: SQL in Stream Computing: MERGE or INSERT?

2017-06-22 Thread Tyler Akidau
Calcite appears to have UPSERT support, can we just use that instead? Also, I don't understand your statement that "INSERT will behave differently in batch & stream processing". Can you explain further? -Tyler On Thu, Jun 22, 2017 at 7:35 AM J

Re: SQL in Stream Computing: MERGE or INSERT?

2017-06-22 Thread Jesse Anderson
If I'm understanding correctly, Hive does that with a insert into followed by a select statement that does the aggregation. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries On Thu, Jun 22, 2017 at 1:32 AM James wrote: >

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-22 Thread Jean-Baptiste Onofré
Hi Agree with Aviem and yes actually I'm working on a generic metric sink. I created a Jira about that. I'm off today, I will send some details asap. Regards JB On Jun 22, 2017, 15:16, at 15:16, Aviem Zur wrote: >Hi Cody, > >Some of the runners have their own metrics sink, for example Spark >r

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-22 Thread Aviem Zur
Hi Cody, Some of the runners have their own metrics sink, for example Spark runner uses Spark's metrics sink which you can configure to send the metrics to backends such as Graphite. There have been ideas floating around for a Beam metrics sink extension which will allow users to send Beam metric

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-06-22 Thread Aviem Zur
+1 There are important bug fixes that need to be released. On Thu, Jun 22, 2017 at 11:42 AM Etienne Chauchot wrote: > +1 on Ismaël words, but not a blocking point indeed, maybe more a nice > to have. > > > Le 22/06/2017 à 06:59, Ismaël Mejía a écrit : > > Thahks JB for keeping the time based rel

[DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-22 Thread Cody Innowhere
Hi guys, Currently metrics are implemented in runners/core as CounterCell, GaugeCell, DistributionCell, etc. If we want to send metrics to external systems via metrics reporter, we would have to define another set of metrics, say, codahale metrics, and update codahale metrics periodically with beam

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-06-22 Thread Etienne Chauchot
+1 on Ismaël words, but not a blocking point indeed, maybe more a nice to have. Le 22/06/2017 à 06:59, Ismaël Mejía a écrit : Thahks JB for keeping the time based release agenda. I really don't have any blocker but I would like to have the hadoop version alignment PR merged before this one and

Re: Reduced Availability from 17.6. - 24.6

2017-06-22 Thread Etienne Chauchot
Enjoy Aljoscha! Le 17/06/2017 à 07:03, Aljoscha Krettek a écrit : Hi, I’ll be on vacation next week, just in case anyone is wondering why I’m not responding. :-) Best, Aljoscha

SQL in Stream Computing: MERGE or INSERT?

2017-06-22 Thread James
Hi team, I am thinking about a SQL and stream computing related problem, want to hear your opinions. In stream computing, there is a typical case like this: *We want to calculate a big wide result table, which has one rowkey and ten value columns:* *create table result (* *rowkey varchar(127

Jenkins build became unstable: beam_Release_NightlySnapshot #455

2017-06-22 Thread Apache Jenkins Server
See