Re: Questions about checkpoints/savepoints

2017-10-25 Thread vipul singh
As a followup to above, is there a way to get the last checkpoint metadata location inside *notifyCheckpointComplete* method? I tried poking around, but didnt see a way to achieve this. Or incase there is any other way to save the actual checkpoint metadata location information into a datastore(dy

Tasks, slots, and partitioned joins

2017-10-25 Thread David Dreyfus
Hello - I have a large number of pairs of files. For purpose of discussion: /source1/{1..1} and /source2/{1..1}. I want to join the files pair-wise: /source1/1 joined to /source2/1, /source1/2 joined to /source2/2, and so on. I then want to union the results of the pair-wise joins and per

Re: StreamTransformation object

2017-10-25 Thread Tony Wei
Hi Andrea, How about return `SingleOutputStreamOperator` when you called `HTM.learn()`, instead of create a new method in the external library. Since I guessed it called the API of Flink inner that function and the transformation in Flink, such as map, is actually return `SingleOutputStreamOperato

Re: Local combiner on each mapper in Flink

2017-10-25 Thread Kurt Young
Do you mean you want to keep the origin window as well as doing some combine operations inside window in the same time? What kind of data do you expect the following operator will receive? Best, Kurt On Thu, Oct 26, 2017 at 5:29 AM, Le Xu wrote: > Thank Kurt I'm trying out WindowedStream aggreg

Re: Local combiner on each mapper in Flink

2017-10-25 Thread Le Xu
Thank Kurt I'm trying out WindowedStream aggregate right now. Just wondering, is there any way for me to preserve the window after aggregation. More specifically, originally i have something like: WindowedStream, Tuple, TimeWindow> windowStream = dataStream .keyBy(0) //id

Re: Delta iteration not spilling to disk

2017-10-25 Thread Joshua Griffith
Hi Fabian, Switching the solution set join to a co-group indeed fixed the issue. Thank you! Joshua On Oct 25, 2017, at 11:00 AM, Fabian Hueske mailto:fhue...@gmail.com>> wrote: Hi Joshua, with the unmanaged solution set, the records are not serialized but they need to be copied to avoid them

StreamTransformation object

2017-10-25 Thread AndreaKinn
Hi, I'm using an external library with Flink I'm trying to implement slotSharingGroup(String) method on it. To do it I looked at SingleOutputStreamOperator Flink's class to see how the method slotSharingGroup(String) is implemented. An abstract: /public class SingleOutputStreamOperator extends Da

Re: Case Class TypeInformation

2017-10-25 Thread Fabian Hueske
Yes, that JIRA was actually motivated by your question. Thanks for the feedback :-) 2017-10-25 17:14 GMT+02:00 Joshua Griffith : > Hello Fabian, > > Thank you for the suggestion. I see that an issue has been created to > support adding custom type information to case classes: > https://issues.apa

Re: Delta iteration not spilling to disk

2017-10-25 Thread Fabian Hueske
Hi Joshua, with the unmanaged solution set, the records are not serialized but they need to be copied to avoid them from being mutated by the user-code JoinFunction. The stacktrace hints that the NPE is caused by copying a null record. This would happen if the solution set would not contain the ke

Re: Case Class TypeInformation

2017-10-25 Thread Joshua Griffith
Hello Fabian, Thank you for the suggestion. I see that an issue has been created to support adding custom type information to case classes: https://issues.apache.org/jira/browse/FLINK-7859 Joshua On Oct 17, 2017, at 3:01 AM, Fabian Hueske mailto:fhue...@gmail.com>> wrote: Hi Joshua, that's

Re: Delta iteration not spilling to disk

2017-10-25 Thread Joshua Griffith
Hello Fabian, Thank you for your response. I tried setting the solution set to unmanaged and got a different error: 2017-10-24 20:46:11.473 [Join (join solution trees) (1/8)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code: Join (join solution trees) (1/8) java.lang.N

Re: Problems with taskmanagers in Mesos Cluster

2017-10-25 Thread Manuel Montesino
Sorry, forget about the api methods comment, that is for flink jobs. For flink session, we do a deploy directly to marathon and is marathon that manage the job... that's the reason that restart the jobmanager and not the taskmanagers, because the taskmanagers are created by flink connecting to

Re: HBase config settings go missing within Yarn.

2017-10-25 Thread Till Rohrmann
Hi Niels, good to see that you solved your problem. I’m not entirely sure how Pig does it, but I assume that there must be some kind of HBase support where the HBase specific files are explicitly send to the cluster or that it copies the environment variables. For Flink supporting this kind of be

Re: Problems with taskmanagers in Mesos Cluster

2017-10-25 Thread Manuel Montesino
Hi Eron, Thanks for your response. Maybe I'm not explaining well. The thing is that when we redepoy a flink session, not kill or stop the active taskmanagers and create/start new ones (those with new configuration), that's what we want (a full redeploy) so there are not recovered TM, still t

Re: Checkpoint was declined (tasks not ready)

2017-10-25 Thread Till Rohrmann
Hi Bartek, I think your explanation of the problem is correct. Thanks a lot for your investigation. What we could do to solve the problem is the following: Either) We start the emitter thread before we restore the elements in the open method. That way the open method won't block forever but only

Re: SLF4j logging system gets clobbered?

2017-10-25 Thread Till Rohrmann
Hi Jared, this problem looks strange to me. Logback should not change its configuration if not explicitly being tinkered around with it. Could you quickly explain me how your mesos setup works? Are you submitting the job via the Web UI? I'm just wondering because I see client side as well as clus

State snapshotting when source is finite

2017-10-25 Thread Flavio Pompermaier
Hi to all, in my current use case I'd like to improve one step of our batch pipeline. There's one specific job that ingest a tabular dataset (of Rows) and explode it into a set of RDF statements (as Tuples). The objects we output are a containers of those Tuples (grouped by a field). Flink statefu

Re: Use a round-robin kafka partitioner

2017-10-25 Thread Chesnay Schepler
So you want to use the kafka partitioner directly? How about an adapter? public class KafkaPartitionerWrapper extends KafkaPartitioner implements Serializable { private final kafka.producer.Partitionerpartitioner; public KafkaPartitionerWrapper(kafka.producer.Partitioner partitioner) {

Re: Delta iteration not spilling to disk

2017-10-25 Thread Fabian Hueske
Hi Joshua, that is correct. Delta iterations cannot spill to disk. The solution set is managed in an in-memory hash table. Spilling that hash table to disk would have a significant impact on the performance. By default the hash table is organized in Flink's managed memory. You can try to increase

Re: Use a round-robin kafka partitioner

2017-10-25 Thread kla
Exactly, I did like this, the only thing is that I am using 1.2.0 version of Flink and in this version the class name is KafkaPartitioner. But the problem is that I would not like to "fork" the Kafka's source code. (Please check my first comment) Thanks, Konstantin -- Sent from: http://apache-

Re: Use a round-robin kafka partitioner

2017-10-25 Thread Chesnay Schepler
Hi! you will have to modify your partitioner to implement the FlinkKafkaPartitioner interface instead. You