I think Marton has some good points here.
1) Is KeyedDataStream a better name if this is only a renaming?
2) the discretize semantics is unclear indeed. Are we operating on a single
or sequence of datasets? If the latter why not call it something else
(dstream). How are joins and other binary ope
Concerning your comments:
1) In the new design, there is no grouping without windowing. The
KeyedDataStream subsumes the grouping and key-ing for partitioned state.
The keyBy() + window() makes a parallel grouped window
keyBy() alone allows access to partitioned state.
My thought was
If we only want to have either keyBy or groupBy, why not keep groupBy? That
would be more consistent with the batch api.
On Tue, Jul 14, 2015 at 10:35 AM Stephan Ewen wrote:
> Concerning your comments:
>
> 1) In the new design, there is no grouping without windowing. The
> KeyedDataStream subsume
keyBy() does not do any grouping. Grouping in streams in not defined
without windows.
On Tue, Jul 14, 2015 at 10:48 AM, Gyula Fóra wrote:
> If we only want to have either keyBy or groupBy, why not keep groupBy? That
> would be more consistent with the batch api.
> On Tue, Jul 14, 2015 at 10:35 A
It is not a bit different than the batch API, because streaming semantics
are a bit different ;-)
One good thing is that we can make things better that were sub-optimal in
the Batch API.
On Tue, Jul 14, 2015 at 10:55 AM, Stephan Ewen wrote:
> keyBy() does not do any grouping. Grouping in stream
I agree, the groupBy, in the batch API is misleading, since a
ds.groupBy().reduce() does not really build any groups, it is really a
ds.keyBy().reduceByKey(). In the streaming API we can still fix this, IMHO.
On Tue, 14 Jul 2015 at 10:56 Stephan Ewen wrote:
> It is not a bit different than the b
I see your point, reduceByKey is much clearer.
The question is whether we want to introduce this inconsistency across the
two api-s or stick with what we have.
On Tue, Jul 14, 2015 at 10:57 AM Aljoscha Krettek
wrote:
> I agree, the groupBy, in the batch API is misleading, since a
> ds.groupBy().
I think the though was to explicitly not have the same terminology as the
batch API to not confuse people.
But this is a minor naming issue IMO.
On Tue, Jul 14, 2015 at 12:40 PM, Gyula Fóra wrote:
> I see your point, reduceByKey is much clearer.
>
> The question is whether we want to introduce
There is no inconsistency between the Batch and Streaming API. They have
different semantics - the batch API is implicitly always windowed.
There is a naming difference between the two APIs.
There is a strong inconsistency within the Streaming API right now.
Grouping and aggregating without windo
Ufuk Celebi created FLINK-2354:
--
Summary: Recover running jobs on JobManager failure
Key: FLINK-2354
URL: https://issues.apache.org/jira/browse/FLINK-2354
Project: Flink
Issue Type: Sub-task
William Saar created FLINK-2355:
---
Summary: Job hanging in collector, waiting for request buffer
Key: FLINK-2355
URL: https://issues.apache.org/jira/browse/FLINK-2355
Project: Flink
Issue Type:
Ufuk Celebi created FLINK-2356:
--
Summary: Resource leak in checkpoint coordinator
Key: FLINK-2356
URL: https://issues.apache.org/jira/browse/FLINK-2356
Project: Flink
Issue Type: Bug
C
Stephan Ewen created FLINK-2357:
---
Summary: New JobManager Runtime Web Frontend
Key: FLINK-2357
URL: https://issues.apache.org/jira/browse/FLINK-2357
Project: Flink
Issue Type: New Feature
Stephan Ewen created FLINK-2358:
---
Summary: Add Netty-HTTP based server and server handlers
Key: FLINK-2358
URL: https://issues.apache.org/jira/browse/FLINK-2358
Project: Flink
Issue Type: Sub-t
Gabor Gevay created FLINK-2359:
--
Summary: Add factory methods to the Java TupleX types
Key: FLINK-2359
URL: https://issues.apache.org/jira/browse/FLINK-2359
Project: Flink
Issue Type: Improvemen
Andra Lungu created FLINK-2360:
--
Summary: EOFException
Key: FLINK-2360
URL: https://issues.apache.org/jira/browse/FLINK-2360
Project: Flink
Issue Type: Bug
Components: Local Runtime
Andra Lungu created FLINK-2361:
--
Summary: flatMap + distict gives eroneous results for big data sets
Key: FLINK-2361
URL: https://issues.apache.org/jira/browse/FLINK-2361
Project: Flink
Issue Ty
Fabian Hueske created FLINK-2362:
Summary: distinct is missing in DataSet API documentation
Key: FLINK-2362
URL: https://issues.apache.org/jira/browse/FLINK-2362
Project: Flink
Issue Type: Bu
Hi,
Sorry for the brief hiatus. I was preparing for my GRE exam, but I am back.
I am starting to build Flink and a doubt which I had was, is a single-node
cluster configuration of Hadoop enough? I assume Hadoop is needed since it
is given on the build page.
On Sat, Jun 27, 2015 at 8:02 PM, Chiwan
Hi,
Hadoop is not a necessity for running Flink, but rather an option. Try the
steps of the setup guide. [1]
If you really nee HDFS though to get the best IO performance I would
suggest having Hadoop on all your machines running Flink.
[1]
https://ci.apache.org/projects/flink/flink-docs-release-0
Ok, thanks for the clarification. Let us try to document it in a way that
those thoughts are reflected then. Discretization will not happen upfront
we can wait with that.
On Tue, Jul 14, 2015 at 12:49 PM, Stephan Ewen wrote:
> There is no inconsistency between the Batch and Streaming API. They h
21 matches
Mail list logo