My apologizes, I thought we had a consensus already.
Regards
JB
On 12/04/2017 11:22 PM, Eugene Kirpichov wrote:
Thanks JB for sending the detailed notes about new stuff in 2.2.0! A lot of
exciting things indeed.
Regarding Java 8: I thought our consensus was to have the release notes say that
I believe you can provide ordering if you decide to put any unconsumed
records into state. Every time you read state and check to see if its the
next corresponding id. If so then emit the new sum otherwise push it back
onto state until you get the missing ids allowing you to backfill all the
prior
This would be absolutely great! It seems somewhat similar to the changes
that were made to the BigQuery sink to support WriteResult (
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.java
).
I find it helpfu
On Mon, Dec 4, 2017 at 3:22 PM, Lukasz Cwik wrote:
> Since processing can happen out of order, for example if the input was:
> ```
> {"id": "2", parent_id: "a", "timestamp": 2, "amount": 3}
> {"id": "1", parent_id: "a", "timestamp": 1. "amount": 1}
> {"id": "1", parent_id: "a", "timestamp": 3, "a
It makes sense to consider how this maps onto existing kinds of sinks.
E.g.:
- Something that just makes an RPC per record, e.g. MqttIO.write(): that
will emit 1 result per bundle (either a bogus value or number of records
written) that will be Combine'd into 1 result per pane of input. A user can
I agree that the proper API for enabling the use case "do something after
the data has been written" is to return a PCollection of objects where each
object represents the result of writing some identifiable subset of the
data. Then one can apply a ParDo to this PCollection, in order to "do
somethi
Since processing can happen out of order, for example if the input was:
```
{"id": "2", parent_id: "a", "timestamp": 2, "amount": 3}
{"id": "1", parent_id: "a", "timestamp": 1. "amount": 1}
{"id": "1", parent_id: "a", "timestamp": 3, "amount": 2}
```
would the output be 3 and then 5 or would you st
I also believe we were still in the investigatory phase for dropping
support for Java 7.
On Mon, Dec 4, 2017 at 2:22 PM, Eugene Kirpichov
wrote:
> Thanks JB for sending the detailed notes about new stuff in 2.2.0! A lot
> of exciting things indeed.
>
> Regarding Java 8: I thought our consensus w
Thanks JB for sending the detailed notes about new stuff in 2.2.0! A lot of
exciting things indeed.
Regarding Java 8: I thought our consensus was to have the release notes say
that we're *considering* going Java8-only, and use that to get more
opinions from the user community - but I can't find th
I'm super excited about this release! Great work everyone involved!
On Mon, Dec 4, 2017 at 10:58 AM, Jean-Baptiste Onofré
wrote:
> Just an important note that we forgot to mention.
>
> !! The 2.2.0 release will be the last one supporting Spark 1.x and Java 7
> !!
>
> Starting from Beam 2.3.0, th
Hi all!
First of all great work on the 2.2.0 release! really excited to start using
it.
I have a problem with how I should construct a pipeline that should emit a
sum of latest values which I hope someone might have some ideas on how to
solve.
Here is what I have:
I have a stateful stream of eve
+1
At the very least an empty PCollection could be produced with no
promises about its contents but the ability to be followed (e.g. as a
side input), which is forward compatible with whatever actual metadata
one may decide to produce in the future.
On Mon, Dec 4, 2017 at 11:06 AM, Kenneth Knowle
Hi,
I second Luke here: you have to use Spark 1.x or use the PR supporting Spark
2.x.
Regards
JB
On 12/04/2017 08:14 PM, Lukasz Cwik wrote:
It seems like your trying to use Spark 2.1.0. Apache Beam currently relies on
users using Spark 1.6.3. There is an open pull request[1] to migrate to Spa
It seems like your trying to use Spark 2.1.0. Apache Beam currently relies
on users using Spark 1.6.3. There is an open pull request[1] to migrate to
Spark 2.2.0.
1: https://github.com/apache/beam/pull/4208/
On Mon, Dec 4, 2017 at 10:58 AM, Opitz, Daniel A
wrote:
> We are trying to submit a Spa
+dev@
I am in complete agreement with Luke. Data dependencies are easy to
understand and a good way for an IO to communicate and establish causal
dependencies. Converting an IO from PDone to real output may spur further
useful thoughts based on the design decisions about what sort of output is
mos
We are trying to submit a Spark job through YARN with the following command:
spark-submit --conf spark.yarn.stagingDir=/path/to/stage --verbose --class
com.my.class --jars /path/to/jar1,path/to/jar2 /path/to/main/jar/application.jar
The application is being populated in the YARN scheduler howev
I think all sinks actually do have valuable information to output which can
be used after a write (file names, transaction/commit/row ids, table names,
...). In addition to this metadata, having a PCollection of all successful
writes and all failed writes is useful for users so they can chain an
ac
Just an important note that we forgot to mention.
!! The 2.2.0 release will be the last one supporting Spark 1.x and Java 7 !!
Starting from Beam 2.3.0, the Spark runner will work only with Spark 2.x and we
will focus only Java 8.
Regards
JB
On 12/04/2017 10:15 AM, Jean-Baptiste Onofré wrote
Thanks Reuven !
I would like to emphasize on some highlights in 2.2.0 release:
- New IOs have been introduced:
* TikaIO leveraging Apache Tika, allowing the deal with a lot of different
data formats
* RedisIO to read and write key/value pairs from a Redis server. This IO will
be soon extende
19 matches
Mail list logo