Manual checkpoint

2017-10-10 Thread nragon
Can I trigger a checkpoint based on a specific event? Meaning, if a given event arrives (containing EOF in this case) it would be broadcasted to all downstream operators and trigger a savepoint aftewards. Thanks, Nuno -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble

Re: Sink buffering

2017-10-04 Thread nragon
Got it :) Thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Sink buffering

2017-10-04 Thread nragon
checkpointing interval ~= transactions are being committed on each Flink checkpoint So, if i set my checkpoint interval to 1ms, every 1ms there will be a commit, right? If I understoop correctly, TwoPhaseCommitSinkFunction stores transactions into it's state as for GenericWriteAheadSink it

Re: Sink buffering

2017-10-04 Thread nragon
Thanks for you opinion on this. TwoPhaseCommitSinkFunction would probably be the best solution overall. Using this with something like Phoenix or Tephra would probably work. This always depends on checkpointing interval right? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.

Re: Sink buffering

2017-10-03 Thread nragon
Anyone? :) -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Sink buffering

2017-09-29 Thread nragon
Hi, Just like mentioned at Berlin FF17, Pravega talk, can we simulate, somehow, sink buffering(pravega transactions) and coordinate them with checkpoints? My intension is to buffer records before sending them to hbase. Any opinions or tips? Thanks -- Sent from: http://apache-flink-user-mailing

Re: Custom Serializers

2017-09-28 Thread nragon
Got it :) I've redesign my object which I use across jobs. Ended up with 4 serializers. My object Element holds 2 fields, an array of Parameter and a Metadata. Metadata holds an array of ParameterInfo and each Parameter holds it's ParameterInfo (Kinda duplicate against Metadata but needed for legac

Re: Custom Serializers

2017-09-27 Thread nragon
Should I use TypeSerializerSingleton if it is independent of the object which it's serializing? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Custom Serializers

2017-09-19 Thread nragon
createInstance(Object[] fields) at TupleSerializerBase seems not to be part of TypeSerializer API. Will I be loosing any functionality? In what cases do you use this instead of createInstance()? // We use this in the Aggregate and Distinct Operators to create instances // of immutable Tuples (i.e.

Re: Custom Serializers

2017-09-18 Thread nragon
One other thing :). Can i set tuple generic type dynamically? Meaning, build a tuple of N arity and build TupleSerializer based on those types. This because I'll only know these types based on user inputs. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Custom Serializers

2017-09-18 Thread nragon
Ok, got it. Thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Custom Serializers

2017-09-18 Thread nragon
So, no need for typeinfo, comparator or factory? https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#defining-type-information-using-a-factory -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Custom Serializers

2017-09-18 Thread nragon
Sorry for bringing this up, any tips on this? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Custom Serializers

2017-09-15 Thread nragon
Eventually I'll have a class named Element which holds an array of Parameter Do I need typeinfo, comparator, factory and serializer for both of them? Thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Custom Serializers

2017-09-15 Thread nragon
Sorry, I was discussing this with Stephan before posting it here. Basically main wrapper holds an array with a custom object and because its size can change thoughtout the stream and users can customize their sources dynamically, it make it difficult to create a generic pojo or use tuple for this p

Custom Serializers

2017-09-15 Thread nragon
Hi, First of all, great #FF17, really enjoyed it. After attending some of the dataArtisans folks talks, realized that serialization should be optimized if there is no way to use supported objects. In my case, users can configure their source in our application online which gives them freedom to dy

Re: Sink - Cassandra

2017-08-28 Thread nragon
Nick, Can you send some of your examples using phoenix? Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Sink-Cassandra-tp4107p15197.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: SQL API with Table API

2017-08-01 Thread nragon
"No, those are two different queries. " This is enough. The second part does not applies since i'm calculating EventTime from table source. Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SQL-API-with-Table-API-tp14599p14605.html Sen

SQL API with Table API

2017-08-01 Thread nragon
Hi, Can i expect the output from this: Table revenue = tableEnv.sql( "SELECT TUMBLE_START(EventTime, INTERVAL '30' MINUTE) as tStart, " + "TUMBLE_END(EventTime, INTERVAL '30' MINUTE) as tEnd, " + "cID, " + "cName, " +

Re: Logback user class

2017-07-26 Thread nragon
I've changed that line and compiled it into lib/. Error remains. I'm running a local custer with start-local.sh -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Logback-user-class-tp14464p14469.html Sent from the Apache Flink User Mailing List

Logback user class

2017-07-26 Thread nragon
Hi, I executing the following snippet on two different environments. StreamExecutionEnvironment streamEnv = StreamExecutionEnvironment.createRemoteEnvironment("x", 6123); streamEnv.setParallelism(10); StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(

Re: Flink shaded table API

2017-07-26 Thread nragon
Hi Fabian, It's a dependency problem between our current libraries and flink's. Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-shaded-table-API-tp14432p14461.html Sent from the Apache Flink User Mailing List archive. mailing l

Re: Flink shaded table API

2017-07-25 Thread nragon
Well, it might be scala conflits on my client side since no job is sent to stream environment. When i remove print schema or explain the job in sent and executed properly on flink side. Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink shaded table API

2017-07-25 Thread nragon
Let's see if I can sample this :P. First i'm reading from kafka. FlinkKafkaConsumer010 consumer = KafkaSource.consumer(this.zookeeper, this.sourceName, 5); consumer.assignTimestampsAndWatermarks(KafkaTimestampExtractor.extractor()); Then, converting my object(Data

Flink shaded table API

2017-07-25 Thread nragon
I'm getting the following error when using table API: Caused by: java.lang.NoClassDefFoundError: org/apache/flink/shaded/calcite/com/fasterxml/jackson/databind/ObjectMapper at org.apache.flink.table.explain.PlanJsonParser.getSqlExecutionPlan(PlanJsonParser.java:32) ... 18 common fr

Re: Detached execution API

2017-07-20 Thread nragon
Yes, something like that. Although, this would be an MVP for this purpose which has minimal impacts on next releases. Moreover, I think clients are too specific to be included as API because each user will have their own way to implement a stop, start, or whatever (of course a base client is welcom

Detached execution API

2017-07-20 Thread nragon
It would be nice to let users deploy detached jobs through api. For instance> *StreamExecutionEnviroment* public JobExecutionResult execute() throws Exception { return execute(DEFAULT_JOB_NAME, false); } Which keep backward compatibility public abstract JobExecutionResult execute(Strin

Re: [POLL] Who still uses Java 7 with Flink ?

2017-07-13 Thread nragon
+1 dropping java 7 -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/POLL-Who-still-uses-Java-7-with-Flink-tp12216p14266.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

StreamTableSource

2017-07-12 Thread nragon
Hi, I have two streams coming from kafka which I want to map into table environment. Because they are not pojo or tuple I will have to map them using, for instance, Types.ROW_NAMED. Can i use StreamTableSource and call registerTableSource or should I use the same code inside getDataStream but call

Re: Latest spark yahoo benchmark

2017-06-26 Thread nragon
Yes, indeed. That's why we choose Flink instead all the others. This post was just pure curiosity to see spark trying to migrate into a pure streaming engine. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Latest-spark-yahoo-benchmark-tp13

Re: Related datastream

2017-06-22 Thread nragon
I believe I could try with microbatch system in order to release some memory. Meaning, if I have to generate 1M records splitting in 100m each iteration. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Related-datastream-tp13901p13908.html Se

Re: Related datastream

2017-06-21 Thread nragon
The reason I'm doing it on stream is because i can have many records in memory and I want to execute this in an ordinary laptop. With streaming i can achieve this. So i set my links between a and c with 0..4 meaning each record from a can have between 0 or 4 records, same for b. But for now leta co

Related datastream

2017-06-21 Thread nragon
Hi, I have two datastreams, dataStreamA and dataStreamB. Is there any change to generate a dataStreamC with fields from dataStreamA and dataStreamB? P.S.: I'm trying to simulate a relational database model and generate data. dataStreamC has foreign key from dataStreamA and dataStreamB Thanks

Re: Kafka and Flink integration

2017-06-21 Thread nragon
So, serialization between producer application -> kafka -> flink kafka consumer will use avro, thrift or kryo right? From there, the remaining pipeline can just use standard pojo serialization, which would be better? -- View this message in context: http://apache-flink-user-mailing-list-archive

RE: Kafka and Flink integration

2017-06-20 Thread nragon
Just one more question :). Considering I'm producing into kafka with other application other than flink, which serializer should i use in order to use pojo types when consuming those same messages (now in flink)? -- View this message in context: http://apache-flink-user-mailing-list-archive.233

Re: Kafka watermarks

2017-06-20 Thread nragon
I guess using BoundedOutOfOrdernessTimestampExtractor inside consumer will work. Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-watermarks-tp13849p13880.html Sent from the Apache Flink User Mailing List archive. mailing list arc

RE: Kafka and Flink integration

2017-06-20 Thread nragon
Thanks, I'll try to refactor into POJOs. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-and-Flink-integration-tp13792p13879.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

RE: Kafka and Flink integration

2017-06-20 Thread nragon
Can i have pojo has composition of other pojo? My custom object has many dependencies and in order to refactor it I must also change another 5 classes as well. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-and-Flink-integration-tp1379

Re: Kafka watermarks

2017-06-20 Thread nragon
So, in order to work with event time I have to options, inside kafka consumer or after kafka consumer. The first I can use: FlinkKafkaConsumer09 consumer. consumer. assignTimestampsAndWatermarks() The other option: FlinkKafkaConsumer09 consumer. DataStream dataStream =env.addSource(consume

Kafka watermarks

2017-06-20 Thread nragon
When consuming from kafka should we use BOOTE inside consumer or after? Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-watermarks-tp13849.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble

Re: Kafka and Flink integration

2017-06-20 Thread nragon
Attaching jfr flight_recording_10228245112.jfr -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-and-Flink-integration-

Re: Kafka and Flink integration

2017-06-20 Thread nragon
Do I need to use registerTypeWithKryoSerializer() in my execution environment? My serialization into kafka is done with the following snippet try (ByteArrayOutputStream byteArrayOutStream = new ByteArrayOutputStream(); Output output = new Output(byteArrayOutStream)) { Kryo kryo = new Kryo();

Latest spark yahoo benchmark

2017-06-18 Thread nragon
databricks.com/blog/2017/06/06/simple-super-fast-streaming-engine-apache-spark.html Should flink users be worry about this huge difference? End of microbatch with 65M benchmark. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Latest-spark-ya

Re: Kafka and Flink integration

2017-06-16 Thread nragon
My custom object is used across all job, so it'll be part of checkpoints. Can you point me some references with some examples? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-and-Flink-integration-tp13792p13802.html Sent from the Apache

Kafka and Flink integration

2017-06-16 Thread nragon
I have to produce custom objects into kafka and read them with flink. Any tuning advices to use kryo? Such as class registration or something like that? Any examples? Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-and-Flink-int

Re: ReduceFunction mechanism

2017-06-14 Thread nragon
just an fyi: currently i'm doing map -> keyby -> reduce which in fact could only be keyby -> reduce since reduce can have the map logic. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ReduceFunction-mechanism-tp13651p13740.html Sent from the

Re: Java parallel streams vs IterativeStream

2017-06-14 Thread nragon
I'll test with java 8 streams. Thanks, Nuno -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Java-parallel-streams-vs-IterativeStream-tp13655p13737.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com

Re: Java parallel streams vs IterativeStream

2017-06-14 Thread nragon
I've mentioned java 8 stream beacuse avoids leaving map, thus decreasing network io, if not chained, and takes advantage of multiple cpus. Guess will have to test it. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Java-parallel-streams-vs-It

Re: ReduceFunction mechanism

2017-06-14 Thread nragon
My goal is to touch each element before the aggregation but i could do it in the reduce function a not having to add another function, thus creating more overhead. The reduce method receives the reduced and a new element which i would change and apply my aggregation. I'm doing keyby->reduce. Using

Re: Java parallel streams vs IterativeStream

2017-06-13 Thread nragon
That would work but after FlatMap, T> I would have to downstream all elements into one. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Java-parallel-streams-vs-IterativeStream-tp13655p13685.html Sent from the Apache Flink User Mailing List

Re: ReduceFunction mechanism

2017-06-13 Thread nragon
So, if my reduce function applies some transformation I must migrate that transformation to a map before the reduce to ensure it transforms, even if there is only one element? I can chain them together and it will be "almost" as they were in the same function(Ensure same thread processing)? -- V

Re: Java parallel streams vs IterativeStream

2017-06-13 Thread nragon
Iterate until all elements were changed perhaps. But just wanted to know if there areimplementations out there using java 8 streams, in cases where you want to parallelize a map function even if it is function scoped. So, in my case, if the computation for each list element is to heavy, how can one

Java parallel streams vs IterativeStream

2017-06-12 Thread nragon
In my map functions i have an object containing a list which must be changed, executing some logic. So, considering java 8 parallel streams would it be worth to use them or does IterativeStreams offer a better performance without java 8 streams parallel overhead? Thanks -- View this message in

ReduceFunction mechanism

2017-06-12 Thread nragon
Hi, Regarding ReduceFunction. Is reduce() called when there is only one record for a given key? Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ReduceFunction-mechanism-tp13651.html Sent from the Apache Flink User Mailing List archiv

Re: Sink - Cassandra

2017-05-15 Thread nragon
Hi Nick, I'm trying to integrate Hbase with streaming. Did you accomplish good results(writes/s) with phoenix? Can you share your code. Thanks, Nuno -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Sink-Cassandra-tp4107p13138.html Sent from

Re: Shared DataStream

2017-04-07 Thread nragon
Yeah, hoping you'd say that :) Would be nice to share them just because the network overhead between flink and kafka. Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Shared-DataStream-tp12545p12549.html Sent from the Apache Flink User

Shared DataStream

2017-04-06 Thread nragon
Hi, Can we share the end point of on job (datastream) with another job? The arquitecture I'm working on should provide abstraction and dynamism between processing jobs, which transform and load data into hbase or other sink, and cep jobs, which will be used to detect anomalies. But because the da

Re: Should I decrease the taskmanager.memory.fraction ?

2017-04-06 Thread nragon
Ok, got it :) But since memory segments are good to avoid gc problems, will streaming jobs suffer from gc when keeping window or any other states? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-decrease-the-taskmanager-memory-fracti

Re: Should I decrease the taskmanager.memory.fraction ?

2017-04-06 Thread nragon
Like stephan said we don't need to worry about memory managment with streaming? Meaning if I have 3 taskmanagers with 2Gb each, memory segments won't be pre-allocated and I will have 2Gb heap for streaming? Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2

Re: Should I decrease the taskmanager.memory.fraction ?

2017-04-06 Thread nragon
Hi, Regarding this topic, does flink uses memory segments for window states? Is memory managment only "used at is full capacity" for batch? Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-decrease-the-taskmanager-memory-frac

Re: Signal Trigger

2017-04-04 Thread nragon
Hi, I believe this "timestamp + this.delay" is the signal event timestamp + the allowed lateness which in this case an configuring it as EventTimeSessionSignalTrigger.of(this.lateness.toMilliseconds()); So, if the allowed lateness is 10 seconds and the event arrived at 15:10 the event timer would

Re: In-Memory data grid

2017-04-04 Thread nragon
My concern and my requirements are that the cache must be shared with other jobs or any other application. If a user or job changes the value in ignite, it must be updated in every flink job it uses that cache. Thanks -- View this message in context: http://apache-flink-user-mailing-list-archi

Re: In-Memory data grid

2017-04-03 Thread nragon
Just to add a scenario. My current arquitecture is the following: I've deployed 4 ignite node in yarn and 5 task managers with 2G and 2 slots each. As cache on ignite I have on record in key/value (string, object[]) My thoughput without ignite is 30k/sec when I add the lookup i get 3k/sec My ques

Signal Trigger

2017-04-03 Thread nragon
Hi Stephan, My use case is the following: Every time an event is processed with a signal event the window must be fired after the allowed lateness. If no signal event arrives the window must close after the gap, like in session window. I’m registering a timer for signal + allowed lateness. Hope yo

In-Memory data grid

2017-03-31 Thread nragon
I would like to now if there is anyone who used an imdg with flink. Trying to see whether apache ignite, apache geode or hazelcast might be a good choice. Thanks, Nuno -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/In-Memory-data-grid-tp1