Thanks Max and Timo for the explanation. :)
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Logical-plan-optimization-with-Calcite-tp8037p8106.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.com.
It should be keyBy(0) for DataStream API (since Flink 0.9)
Its groupBy() in DataSet API.
On Fri, Jul 22, 2016 at 1:27 AM, wrote:
> Hi,
> today,I use flink to rewrite my spark project,in spark ,data is
> rdd,and it have much transformations and actions,but in flink,the
> DataStream does not
Hi,today,I use flink to rewrite my spark project,in spark ,data is rdd,and
it have much transformations and actions,but in flink,the DataStream does not
have groupby and foreach, for example,val
env=StreamExecutionEnvironment.createLocalEnvironment() val
data=List(("1"->"a
I took another look at this and it occurred to me that the S3a directory issue
is actually localized to Cloudera's hadoop-aws version, which is stuck at
2.6.0. Apparently the zeroed out directory timestamps are in the Flink
recommended version. So, Flink/Yarn/S3a will work, just not with CDH5. I
I have a fix and test for a recursive HDFSCopyToLocal. I also added similar
code to Yarn application staging. However, even though all files and resources
now copy correctly, S3A still fails on Flink session creation. The failure
stems from the lib folder being registered as an application resou
I’saw the source code of
flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java
Flink ships the FLINK_LIB_DIR and add to classpath only jar files.
I want to know how to add resource file to classpath.
Best Regards,
Dong-iL, Kim.
> On Jul 22, 2016, at 4:28 AM, Dong iL,
Can't you use a KeyedStream, I mean keyBy with the sameKey? something like
this,
source.flatMap(new Tokenizer()).keyBy(0).sum(2).project(2).print();
Assuming tokenizer is giving Tuple3
1-> is always the same key, say "test"
2->the actual word
3-> 1
There might be some other good choices bu
Was trying to write a simple streaming Flink program that counts the total
words(not the frequency) in a fie.
I was thinking on the lines of :
counts = text.flatMap(new Tokenizer())
.count(); // count() isnt part of streamin APIs (but supported for batching)
Any suggestions on how to do this
Alijoscha - Thanks it works exactly as you said. I found out why my windows
were firing twice. I was making the error of adding the
AutoWatermarkInterval to the existing watermark each time the watermark was
sampled from the source just to fire a window if one of the sources was
delayed substantial
Hello.
I have a flink cluster on yarn.
I wanna add FLINK_LIB_DIR to classpath.
because hibernate.cfg.xml need to be on the classpath.
when i'm using stand alone cluster, just add FLINK_LIB_DIR to
FLINK_CLASSPATH.
but on yarn, Fixing config.sh, yarn-session.sh and flink-daemon.sh is not
working.
Be
Aljoscha,
Awesome. Exactly the behavior I was hoping would be exhibited. Thank you
for the quick answer :)
Thanks,
David
On Thu, Jul 21, 2016 at 2:17 AM, Aljoscha Krettek
wrote:
> Hi David,
> windows are being processed in order of their end timestamp. So if you
> specify an allowed lateness o
I meant to respond to this thread yesterday, but got busy with work and
slipped me.
This is possible doable using Flink Streaming, others can correct me here.
*Assumption:* Both the Batch and Streaming processes are reading from a
single Kafka topic and by "Batched data", I am assuming its the sa
At this point in time, imo, batch processing is not why you should be
considering Flink.
That said, I predict that the stream processing (and event processing) will
become the dominant methodology; as we begin to gravitate towards "I can't
wait; I want it now" phenomenon. In that methodology, I
I think so.
I’ll test it on EMR and then reply.
I am truly grateful for your support.
> On Jul 21, 2016, at 8:49 PM, Stephan Ewen wrote:
>
> I don't know that answer, sorry. Maybe one of the others can chime in here.
>
> Did you deactivate checkpointing (then it should not write to S3) and di
Stream2 does send watermarks only after it sees elements C,D. It send the
watermark (5) 20 seconds after Stream 1 sends it.
>From what I understand Flink merges watermarks from both streams on the
Reduce side. But does it wait a certain pre-configured amount of time (for
watermarks from both strea
Yes, that is to be expected. Stream 2 should only send the watermark once
the elements with a timestamp lower than the watermark have been sent as
well.
On Thu, 21 Jul 2016 at 13:10 Sameer W wrote:
> Thanks, Aljoscha,
>
> This what I am seeing when I use Ascending timestamps as watermarks-
>
> C
I don't know that answer, sorry. Maybe one of the others can chime in here.
Did you deactivate checkpointing (then it should not write to S3) and did
that resolve the leak?
Best,
Stephan
On Thu, Jul 21, 2016 at 12:52 PM, 김동일 wrote:
> Dear Stephan.
>
> I also suspect the s3.
> I’ve tried s3n,
Thanks, Aljoscha,
This what I am seeing when I use Ascending timestamps as watermarks-
Consider a window if 1-5 seconds
Stream 1- Sends Elements A,B
Stream 2 (20 seconds later) - Sends Elements C,D
I see Window (1-5) fires first with just A,B. After 20 seconds Window (1-5)
fires again but this
Dear Stephan.
I also suspect the s3.
I’ve tried s3n, s3a.
what is suitable library? I’m using aws-java-sdk-1.7.4 and hadoop-aws-2.7.2.
Best regards.
> On Jul 21, 2016, at 5:54 PM, Stephan Ewen wrote:
>
> Hi!
>
> There is a memory debugging logger, you can activate it like that:
> https://ci.
Hello all,
My task to cluster the stream of points around the centroids, I am using
DataStreamUtils to collect the stream and pass it on to the map function to
perform the necessary action. Below is the code:
DataStream points = newDataStream.map(new getPoints());
DataStream centroids = newCentro
Hi David,
windows are being processed in order of their end timestamp. So if you
specify an allowed lateness of zero (which will only be possible on Flink
1.1 or by using a custom trigger) you should be able to sort the elements.
The ordering is only valid within one key, though, since windows for
Hi!
Custom Kryo Serializers can be shipped either as objects (must be
serializable) or as classes (can be non serializable, must have a default
constructor).
For non-serializable serializers, try to use: ExecutionConfig.
registerTypeWithKryoSerializer(Class type, Class>
serializerClass)
Stephan
Hi!
There is a memory debugging logger, you can activate it like that:
https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#memory-and-performance-debugging
It will print which parts of the memory are growing.
What you can also try is to deactivate checkpointing for one run a
Hi,
to answer this question, it would be helpful if you could provide the
stacktrace of your exception and the code you use to register the serializer.
Best,
Stefan
> Am 21.07.2016 um 05:28 schrieb Shaosu Liu :
>
>
> Hi,
>
> How do I do Guava Immutable collections serialization in Flink?
>
Hi,
the answer to this question depends on how you are starting the jobs. Do you
have Java program that submits jobs in a loop that repeatedly calls
StreamExecutionEnvironment.execute() or a shell script that submits jobs
through the CLI? In both cases, the process should block (either on
Stre
25 matches
Mail list logo