Github user asfgit closed the pull request at:
https://github.com/apache/flink/pull/1084
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-141411612
Looks good, will merge this!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user rmetzger commented on a diff in the pull request:
https://github.com/apache/flink/pull/1084#discussion_r39640862
--- Diff: .travis.yml ---
@@ -19,9 +19,9 @@ matrix:
- jdk: "oraclejdk7" # this will also deploy a uberjar to s3 at some
point
env: PROFIL
Github user aljoscha commented on a diff in the pull request:
https://github.com/apache/flink/pull/1084#discussion_r39640472
--- Diff: .travis.yml ---
@@ -19,9 +19,9 @@ matrix:
- jdk: "oraclejdk7" # this will also deploy a uberjar to s3 at some
point
env: PROFIL
Github user rmetzger commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-140732373
I think the pull request has grown quite a lot. I think we should merge it
now and then improve it from there.
---
If your project is set up for it, you can reply to t
Github user rmetzger commented on a diff in the pull request:
https://github.com/apache/flink/pull/1084#discussion_r39624668
--- Diff: .travis.yml ---
@@ -19,9 +19,9 @@ matrix:
- jdk: "oraclejdk7" # this will also deploy a uberjar to s3 at some
point
env: PROFIL
Github user aljoscha commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-140523411
I fixed the bug and now I'm confident that it should work.
Also, I updated the travis build matrix because the sink does not work with
hadoop 2.0.0-alpha. We ha
Github user aminouvic commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-140325179
Yeah you're right better have an operational version of the sink first,
followup JIRA created https://issues.apache.org/jira/browse/FLINK-2672
---
If your project is
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-140048228
Yes, let's merge it before extending it.
@aminouvic If you want, can you create a JIRA with a description of that
behavior, as a followup task?
As a
Github user aljoscha commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-139995006
Yes, I would be very much in favor of adding it in the current incarnation
before adding more stuff.
It sill has a bug right now with exactly once, somehow some
Github user fhueske commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-139994388
Yes, we just had a discussion on the user mailing list about a partitioned
output format for batch processing. Definitely a good addition for both DataSet
and DataStream
Github user aminouvic commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-139911603
Since the RollingSink was a little inspired by flume's HDFS Sink, it would
be nice to include another really valuable features that could make it more
complete.
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-138268930
Let's add this after we merge this, but it sounds quite valueable to me...
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user aljoscha commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-138252252
It could be done by circumventing FileSystem and just using the regular
Java File I/O API to perform the truncate if we detect that the FileSystem
works on file "file:/
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-138237480
I mean can we get the "truncate()" behavior for local file systems even if
you build for Hadoop versions earlier than 2.7?
---
If your project is set up for it, yo
Github user aljoscha commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-137962730
What do you mean? This is using the Hadoop FileSystem for everything. Is
your suggestion to abstract away the filesystems behind our own FileSystem
class again?
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-137782086
How hard is it to support the truncate file code path also for regular Unix
file systems (rather than only HDFS 2.7+)?
The reason is that this way we would s
Github user aljoscha commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-137764994
I (almost) completely reworked the sink. It is now called `RollingSink` and
the module is called `flink-connector-filesystem` to show that it works with
any Hadoop File
Github user aljoscha commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-137396268
Question is, should we do exactly once now or put it in as it is (more or
less)?
---
If your project is set up for it, you can reply to this email and have your
reply
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-137393554
I think using truncate for exactly once is the way to go. To support users
with older HDFS versions, how about this:
1. We consider only valid what was writt
Github user rmetzger commented on a diff in the pull request:
https://github.com/apache/flink/pull/1084#discussion_r38627730
--- Diff: docs/apis/streaming_guide.md ---
@@ -1836,6 +1837,110 @@ More about information about Elasticsearch can be
found [here](https://elastic.c
Github user rmetzger commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-137390414
I'm currently trying out the module. Some comments:
- Why do we name the module `flink-connector-hdfs`. I think a name such as
`flink-connector-filesystems` or `flin
Github user tillrohrmann commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-137148970
What about renaming the files upon checkpointing? I mean not triggering the
rolling mechanism but to write the current temp file as the latest part and
start a new
Github user aljoscha commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-137144072
Should be, but then we can also just keep them in memory and write when
checkpointing. But this has more potential of blowing up because OOM.
---
If your project is se
Github user tillrohrmann commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-137143368
Concerning the second short term option: Is the problem there that the
checkpointing is not aligned with the rolling? Thus, you can take a checkpoint
but you still
Github user aljoscha commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-137125224
If it fails in the middle of writing or before sync/flush is called on the
writer then the data can be in an inconsistent state. I see three ways of
dealing with this,
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1084#issuecomment-137094936
Looks very good from a first glance!
Can you explain how the writing behaves in cases of failues? Will it start
a new file? Will it try to append to the prev
Github user rmetzger commented on a diff in the pull request:
https://github.com/apache/flink/pull/1084#discussion_r38435351
--- Diff:
flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-hdfs/pom.xml
---
@@ -0,0 +1,107 @@
+
+
+http://maven.apache.o
GitHub user aljoscha opened a pull request:
https://github.com/apache/flink/pull/1084
[FLINK-2583] Add Stream Sink For Rolling HDFS Files
Note: The rolling sink is not yet integrated with
checkpointing/fault-tolerance.
You can merge this pull request into a Git repository by runni
29 matches
Mail list logo