Flink config as an argument (akka.client.timeout)

2016-05-24 Thread Juho Autio
Is there any way to set akka.client.timeout (or other flink config) when calling bin/flink run instead of editing flink-conf.yaml? I tried to add it as a -yD flag but couldn't get it working. Related: https://issues.apache.org/jira/browse/FLINK-3964

Re: Dynamic partitioning for stream output

2016-05-24 Thread Juho Autio
Related issue: https://issues.apache.org/jira/browse/FLINK-2672 On Wed, May 25, 2016 at 9:21 AM, Juho Autio wrote: > Thanks, indeed the desired behavior is to flush if bucket size exceeds a > limit but also if the bucket has been open long enough. Contrary to the > current RollingSink we don't w

Re: writeAsCSV with partitionBy

2016-05-24 Thread KirstiLaurila
Maybe, I don't know, but with streaming. How about batch? Srikanth wrote > Isn't this related to -- https://issues.apache.org/jira/browse/FLINK-2672 > ?? > > This can be achieved with a RollingSink[1] & custom Bucketer probably. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/a

Re: writeAsCSV with partitionBy

2016-05-24 Thread Juho Autio
RollingSink is part of Flink Streaming API. Can it be used in Flink Batch jobs, too? As implied in FLINK-2672, RollingSink doesn't support dynamic bucket paths based on the tuple fields. The path must be given when creating the RollingSink instance, ie. before deploying the job. Yes, a custom Buck

Re: Dynamic partitioning for stream output

2016-05-24 Thread Juho Autio
Thanks, indeed the desired behavior is to flush if bucket size exceeds a limit but also if the bucket has been open long enough. Contrary to the current RollingSink we don't want to flush all the time if the bucket changes but have multiple buckets "open" as needed. In our case the date to use for

Re: Submit Flink Jobs to YARN running on AWS

2016-05-24 Thread Bajaj, Abhinav
Hi, Has anyone tried to submit a Flink Job remotely to Yarn running in AWS ? The case I am stuck with is where the Flink client is on my laptop and YARN is running on AWS. @Robert, Did you get a chance to try this out? Regards, Abhi From: "Bajaj, Abhinav" mailto:abhinav.ba...@here.com>> Date:

subtasks and objects

2016-05-24 Thread Stavros Kontopoulos
Hey, Is it true that since taskmanager (a jvm) may have multiple subtasks implementing the same operator and thus same logic and loading the same classes, no separating classloading is done right? So if i use a scala object or static code as in java within that logic then that is shared among the

Re: Flink's WordCount at scale of 1BLN of unique words

2016-05-24 Thread Xtra Coder
Mentioning 100TB "in my context" is more like "saving current state" at some point of time to "backup" or "direct access" storage and continue with next 100TB/hours/days of streamed data. So - no, it is not about a finite data set. On Mon, May 23, 2016 at 11:13 AM, Matthias J. Sax wrote: > Are y

Data ingestion using a Flink TCP Server

2016-05-24 Thread omaralvarez
I have been trying to send data from several processes to my Flink application. I want to use a single port that will receive data from multiple clients. I have implemented my own SourceFunction, but I have two doubts. My TCP server has multiple threads receiving data from multiple clients, is ca

Incremental updates

2016-05-24 Thread Malgorzata Kudelska
Hi, I have the following question. Does Flink support incremental updates? In particular, I have a custom StateValue object and during the checkpoints I would like to save only the fields that changed since the previous checkpoint. Is that possible? Regards, Gosia

Non blocking operation in Apache flink

2016-05-24 Thread Maatary Okouya
I'm looking for a way to avoid thread starvation in my tasks, by returning future but i don't see how is that possible. Hence i would like to know, how flink handle the case where in your job you have to perform network calls (I use akka http or spray) or any IO operation and use the result of it.

Re: Connect 2 datastreams and iterate

2016-05-24 Thread Biplob Biswas
Hi Ufuk, I get the error before running it. I mean somehow the syntax is also not right. I am trying to do the following: ConnectedStreams connectedStream = br.connect(mainInput); IterativeStream.ConnectedIterativeStreams iteration =

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-24 Thread Flavio Pompermaier
Do you have any suggestion about how to reproduce the error on a subset of the data? I'm trying changing the following but I can't find a configuration causing the error :( rivate static ExecutionEnvironment getLocalExecutionEnv() { org.apache.flink.configuration.Configuration c = new org.

Re: stream keyBy without repartition

2016-05-24 Thread Kostas Kloudas
Hi Bart, From what I understand, you want to do a partial (per node) aggregation before shipping the result for the final one at the end. In addition, the keys do not seem to change between aggregations, right? If this is the case, this is the functionality of the Combiner in batch. In Batch

回复:problem of sharing TCP connection when transferring data

2016-05-24 Thread wangzhijiang999
 Hi Ufuk,    I am willing to do some work for this issue and has a basic solution for it. And wish to get professional suggestion from you. What is the next step for it ?  Looking forward to your reply! Zhijiang Wang--发件人:Ufuk

stream keyBy without repartition

2016-05-24 Thread Bart Wyatt
(migrated from IRC) Hello All, My situation is this: I have a large amount of data partitioned in kafka by "session" (natural partitioning). After I read the data, I would like to do as much as possible before incurring re-serialization or network traffic due to the size of the data. I am

Re: writeAsCSV with partitionBy

2016-05-24 Thread Srikanth
Isn't this related to -- https://issues.apache.org/jira/browse/FLINK-2672 ?? This can be achieved with a RollingSink[1] & custom Bucketer probably. [1] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/connectors/fs/RollingSink.html Srikanth On Tue, May

Re: Dynamic partitioning for stream output

2016-05-24 Thread Kostas Kloudas
Hi Juho, If I understand correctly, you want a custom RollingSink that caches some buckets, one for each topic/date key, and whenever the volume of data buffered exceeds a limit, then it flushes to disk, right? If this is the case, then you are right that this is not currently supported out-of-

Re: Combining streams with static data and using REST API as a sink

2016-05-24 Thread Aljoscha Krettek
Hi Josh, for the first part of your question you might be interested in our ongoing work of adding side inputs to Flink. I started this design doc: https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit?usp=sharing It's still somewhat rough around the edges but could

Re: Combining streams with static data and using REST API as a sink

2016-05-24 Thread Maximilian Michels
Hi Josh, You can trigger an occasional refresh, e.g. on every 100 elements received. Or, you could start a thread that does that every 100 seconds (possible with a lock involved to prevent processing in the meantime). Cheers, Max On Mon, May 23, 2016 at 7:36 PM, Josh wrote: > > Hi Max, > > Than

Dynamic partitioning for stream output

2016-05-24 Thread Juho Autio
Could you suggest how to dynamically partition data with Flink streaming? We've looked at RollingSink, that takes care of writing batches to S3, but it doesn't allow defining the partition dynamically based on the tuple fields. Our data is coming from Kafka and essentially has the kafka topic and

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-24 Thread Till Rohrmann
The error look really strange. Flavio, could you compile a test program with example data and configuration to reproduce the problem. Given that, we could try to debug the problem. Cheers, Till

Re: HDFS namenode and Flink

2016-05-24 Thread Till Rohrmann
Hi Thomas, if you want to run multiple Flink cluster in HA mode, you should configure for every cluster a specific recovery.zookeeper.path.root in your configuration. This will define the root path in ZooKeeper under which the meta checkpoint state handles and the job handles are stored. If you do

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-24 Thread Stefano Bortoli
Till mentioned the fact that 'spilling on disk' was managed through exception catch. The last serialization error was related to bad management of Kryo buffer that was not cleaned after spilling on exception management. Is it possible we are dealing with an issue similar to this but caused by anoth