AWS S3 and Reshift

2016-10-04 Thread sunny patel
Hi Guys, We are in the process of creating Proof of concept., I am looking for the sample project - Flink scala or java which can load data from database to database or CSV to relational database(any). CSV --> S3 --> AWS Redshift could you please some one advise me on that.. C

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-10-04 Thread Tzu-Li (Gordon) Tai
Hi Steffen, Turns out that FLINK-4514 just missed Flink 1.1.2 and wasn’t included in the release (I’ll update the resolve version in JIRA to 1.1.3, thanks for noticing this!). The Flink community is going to release 1.1.3 asap, which will include the fix. If you don’t want to wait for the releas

Re: Presented Flink use case in Japan

2016-10-04 Thread Hironori Ogibayashi
Thank you for the response. Regarding adding to the page, I will check with our PR department. Regards, Hironori 2016/10/04 21:12、Fabian Hueske のメッセージ: > Thanks Hironori for sharing these excellent news! > Do you think it would be possible to add your use case to Flink's Powered-By > wiki page

Re: Flink job fails to restore RocksDB state after upgrading to 1.2-SNAPSHOT

2016-10-04 Thread Stephan Ewen
I think Josh found a "WIP" bug. The code is very much in flux because of the new feature that allows to change the parallelism with which savepoints are resumed. The "user code class loader" is not yet properly used in the operator state backend when reloading snapshot state. This will be integrat

Re: Flink Checkpoint runs slow for low load stream

2016-10-04 Thread Chakravarthy varaga
Thanks for your prompt response Stephan. I'd wait for Flink 1.1.3 !!! Best Regards Varaga On Tue, Oct 4, 2016 at 5:36 PM, Stephan Ewen wrote: > The plan to release 1.1.3 is asap ;-) > > Waiting for last backported patched to get in, then release testing and > release. > > If you want to te

Re: Exception while running Flink jobs (1.0.0)

2016-10-04 Thread Stephan Ewen
It would be great to know if this only occurs in setups where Netty in involved (more than one TaskManager and, and at least one shuffle/rebalance) or also in one-taskmanager setups (which have local channels only). Stephan On Tue, Oct 4, 2016 at 11:49 AM, Till Rohrmann wrote: > Hi Tarandeep, >

Re: Regarding Late Elements

2016-10-04 Thread vinay patil
Yes Kostas, thank you for the explanation , I will take a look Regards, Vinay Patil On Tue, Oct 4, 2016 at 11:23 AM, Kostas Kloudas [via Apache Flink User Mailing List archive.] wrote: > Hi Vinay, > > These methods are useful when using your trigger with SessionWindows. When > using session win

Re: Flink Checkpoint runs slow for low load stream

2016-10-04 Thread Stephan Ewen
The plan to release 1.1.3 is asap ;-) Waiting for last backported patched to get in, then release testing and release. If you want to test it today, you would need to manually build the release-1.1 branch. Best, Stephan On Tue, Oct 4, 2016 at 5:46 PM, Chakravarthy varaga < chakravarth...@gmail

Re: Handling decompression exceptions

2016-10-04 Thread Stephan Ewen
How about just overriding the "readLine()" method to call "super.readLine()" and catching EOF exceptions? On Tue, Oct 4, 2016 at 5:56 PM, Fabian Hueske wrote: > Hi Yassine, > > AFAIK, there is no built-in way to ignore corrupted compressed files. > You could try to implement a FileInputFormat th

Re: Regarding Late Elements

2016-10-04 Thread Kostas Kloudas
Hi Vinay, These methods are useful when using your trigger with SessionWindows. When using session windows, the state of a window and that of the corresponding trigger has to be merged with that of other windows. These methods do exactly that: the canMerge() says if the trigger can be used w

Re: Regarding Late Elements

2016-10-04 Thread vinay patil
Hi Kostas, Yes you are right , I am always doing FIRE_AND_PURGE , if we don't do this and only use FIRE , the window function will get the elements in incremental fashion (1, 2,3..so on) I had observed this while testing. Can you please explain me the importance of canMerge and onMerge functions

Re: Handling decompression exceptions

2016-10-04 Thread Fabian Hueske
Hi Yassine, AFAIK, there is no built-in way to ignore corrupted compressed files. You could try to implement a FileInputFormat that wraps the CsvInputFormat and forwards all calls to the wrapped CsvIF. The wrapper would also catch and ignore the EOFException. If you do that, you would not be able

Re: Regarding Late Elements

2016-10-04 Thread Kostas Kloudas
Hi Vinay, By setting the allowed_lateness to LongMax you are ok. Sorry I forgot that this was the default value. Just a note (although you have it right in your code), in this case you should always FIRE_AND_PURGE and not just FIRE. In other case your state will keep growing as it is never ga

Re: Flink Checkpoint runs slow for low load stream

2016-10-04 Thread Chakravarthy varaga
Hi Gordon, Do I need to clone and build release-1.1 branch to test this? I currently use flinlk 1.1.2 runtime. When is the plan to release it in 1.1.3? Best Regards Varaga On Tue, Oct 4, 2016 at 9:25 AM, Tzu-Li (Gordon) Tai wrote: > Hi, > > Helping out here: this is the PR for async

Handling decompression exceptions

2016-10-04 Thread Yassine MARZOUGUI
Hi all, I am reading a large number of GZip compressed csv files, nested in a HDFS directory: Configuration parameters = new Configuration(); parameters.setBoolean("recursive.file.enumeration", true); DataSet> hist = env.readCsvFile("hdfs:///shared/logs/") .ignoreFirstLine()

Re: Regarding Late Elements

2016-10-04 Thread vinay patil
Hi Kostas, The late elements are immediately getting triggered with the code I have sent, I have tested it with a test case as follows : (I am doing the outer-join operation by doing the union of stream1 and stream2) 1. Push 5 records to Kafka Topic 1 -> sourceStream1 2. Wait for few minutes -

Re: Regarding Late Elements

2016-10-04 Thread Kostas Kloudas
Hi Vinay, From what I understand from your code, the only difference of your trigger compared to the one shipping with Flink is that for the late elements, instead of firing and keeping the element, you fire and purge, i.e. clean the window state. This does not solve the problem of dropping t

Re: Blobstorage Locally and on HDFS

2016-10-04 Thread Konstantin Knauf
Hi Ufuk, any ideas? Any configuration that could be wrong? Cheers, Konstantin On 30.09.2016 13:13, Konstantin Knauf wrote: > Hi Ufuk, > > thanks for your quick answer. > > Setup: 2 Servers, each running a JM as well as TM > > 1) Removing all existing blobstores locally (/tmp) as well as on H

Re: Regarding Late Elements

2016-10-04 Thread vinay patil
Hi Kostas, Thank you for your reply, yes that will be a good functionality to have, but for now the Custom Trigger as close to 1.0.3 works for me. public TriggerResult onElement(Object element, long timestamp, TimeWindow window, TriggerContext ctx) throws Exception { if(window.maxTimestamp() <= ct

Re: TimeWindow Trigger which only fires when the values have changed

2016-10-04 Thread Kostas Kloudas
Hi Dominik, To only fire when new elements have arrived, you should modify your EventTimeTriggerWithEarlyAndLateFiring to detect that more elements have arrived since the last firing. To do so, you should add some extra of state, e.g. ValueStateDescriptor, that you set to true in the onElement(

Re: TimeWindow Trigger which only fires when the values have changed

2016-10-04 Thread Till Rohrmann
Hi Dominik, you could extend the EventTimeTriggerWithEarlyAndLateFiring trigger to store for each key whether you’ve seen a new element since the last firing or not. When firing you can set the state back to alreadyFired. For that you can use the TriggerContext.getPartitionedState. The community

TimeWindow Trigger which only fires when the values have changed

2016-10-04 Thread Dominik Bruhn
Hi, I'm heavily relying on TimeWindows for my real time processing. Roughly my job consumes from an AMQP queue, computes some time buckets and saves the time-buckets to Cassandra. I found the EventTimeTriggerWithEarlyAndLateFiring [1] class which already helped me a lot: Even with long time-w

Re: Flink scala or Java - Dataload from CSV to database..

2016-10-04 Thread Flavio Pompermaier
I think you can start from this (using flink table-api), I hope it could be helpful: PS:maybe someone could write a blog post on how to do this with Scala since it's a frequent question on the mailing list... :) public static void main(String[] args) throws Exception { String path

Re: Presented Flink use case in Japan

2016-10-04 Thread Tzu-Li (Gordon) Tai
Really great to hear this! Cheers, Gordon On October 4, 2016 at 8:13:27 PM, Till Rohrmann (trohrm...@apache.org) wrote: It's always great to hear Flink success stories :-) Thanks for sharing it with the community.  I hope Flink helps you to solve even more problems. And don't hesitate to reac

Flink scala or Java - Dataload from CSV to database..

2016-10-04 Thread ram kumar
Hi Guys, We are in the process of creating POC, I am looking for the sample project - Flink scala or java which can load data from database to database or CSV to relational database(any). CSV --> SQLSERVER --> AWS Redshift could you please some one help me on that.. Cheers Ram

Re: Presented Flink use case in Japan

2016-10-04 Thread Till Rohrmann
It's always great to hear Flink success stories :-) Thanks for sharing it with the community. I hope Flink helps you to solve even more problems. And don't hesitate to reach out to the community whenever you stumble across some Flink problems. Cheers, Till On Tue, Oct 4, 2016 at 2:04 PM, Hironor

Re: Presented Flink use case in Japan

2016-10-04 Thread Fabian Hueske
Thanks Hironori for sharing these excellent news! Do you think it would be possible to add your use case to Flink's Powered-By wiki page [1] ? Thanks, Fabian [1] https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink 2016-10-04 14:04 GMT+02:00 Hironori Ogibayashi : > Hello, > > Just

Presented Flink use case in Japan

2016-10-04 Thread Hironori Ogibayashi
Hello, Just for information. Last week, I have presented our Flink use case in my company's conference. (http://developers.linecorp.com/blog/?p=3992) Here is the slide. http://www.slideshare.net/linecorp/b-6-new-stream-processing-platformwith-apache-flink I think the video with English subtitle

Re: Enriching a tuple mapped from a datastream with data coming from a JDBC source

2016-10-04 Thread Philipp Bussche
Awesome, thanks Fabian ! I will give this a try. Fabian Hueske-2 wrote > Hi Philipp, > > If I got your requirements right you would like to: > 1) load an initial hashmap via JDBC > 2) update the hashmap from a stream > 3) use the hashmap to enrich another stream. > > You can use a CoFlatMap to

Re: Exception while running Flink jobs (1.0.0)

2016-10-04 Thread Till Rohrmann
Hi Tarandeep, it would be great if you could compile a small example data set with which you're able to reproduce your problem. We could then try to debug it. It would also be interesting to know whether Flavio's bug solves your problem or not. Cheers, Till On Mon, Oct 3, 2016 at 10:26 PM, Flavi

Re: Side Inputs vs. Connected Streams

2016-10-04 Thread Till Rohrmann
Hi Sameer, the semantics of side inputs are not fully fledged out yet, as far as I know. The design doc says that one first waits for some data on the side input before starting processing the main input. Therefore, I would assume that you have some temporal ordering of the side input and main in

Re: Flink job fails to restore RocksDB state after upgrading to 1.2-SNAPSHOT

2016-10-04 Thread Till Rohrmann
Hi Josh, the internal state representation of Kafka sources has been changed recently so that it is now possible to rescale the Kafka sources. That is the reason why the old savepoint which contains the Kafka state in the old representation is not able to be read by the updated Kafka sources. The

Re: Iterations vs. combo source/sink

2016-10-04 Thread Till Rohrmann
Hi Ken, you can let a class implement both the SourceFunction and the SinkFunction. However when running a job, the source and the sink will be distinct instances. Thus, there is no way that they share instance variables. What you could do is to write the updated and newly discovered URLs to a me

Re: Enriching a tuple mapped from a datastream with data coming from a JDBC source

2016-10-04 Thread Fabian Hueske
Hi Philipp, If I got your requirements right you would like to: 1) load an initial hashmap via JDBC 2) update the hashmap from a stream 3) use the hashmap to enrich another stream. You can use a CoFlatMap to do this: stream1.connect(stream2).flatMap(new YourCoFlatMapFunction). YourCoFlatMapFunc

Re: Regarding Late Elements

2016-10-04 Thread Kostas Kloudas
Hello LF and Vinay, With the introduction of “allowed lateness” elements and windows are kept around until the watermark passes the window.maxTimestamp + allowed_lateness and then they are cleaned up (garbage collected) Every element that comes in and belongs to a window that is garbage collec

Re: Flink Checkpoint runs slow for low load stream

2016-10-04 Thread Tzu-Li (Gordon) Tai
Hi, Helping out here: this is the PR for async Kafka offset committing - https://github.com/apache/flink/pull/2574. It has already been merged into the master and release-1.1 branches, so you can try out the changes now if you’d like. The change should also be included in the 1.1.3 release, whic

Re: Using Flink

2016-10-04 Thread Tzu-Li (Gordon) Tai
Hi Govindarajan, Regarding the stagnant Kakfa offsets, it’ll be helpful if you can supply more information for the following to help us identify the cause: 1. What is your checkpointing interval set to? 2. Did you happen to have set the “max.partition.fetch.bytes” property in the properties give

Re: Using Flink

2016-10-04 Thread Till Rohrmann
Hi Govindarajan, you can broadcast the stream with debug logger information by calling `stream.broadcast`. Then every stream record should be send to all sub-tasks of the downstream operator. Cheers, Till On Mon, Oct 3, 2016 at 5:13 PM, Govindarajan Srinivasaraghavan < govindragh...@gmail.com> w

Re: Using Flink and Cassandra with Scala

2016-10-04 Thread Fabian Hueske
FYI: FLINK-4497 [1] requests Scala tuple and case class support for the Cassandra sink and was opened about a month ago. [1] https://issues.apache.org/jira/browse/FLINK-4497 2016-09-30 23:14 GMT+02:00 Stephan Ewen : > How hard would it be to add case class support? > > Internally, tuples and cas

Available for freelancing

2016-10-04 Thread Deepak Sharma
Hi I am spark developer with good exposure to spark streaming/core/sql. I have just started on flink and keen to take any part time role as developer , in case anyone needs it. Please do write me back , if anyone has got any such opportunity. -- Thanks Deepak

Re: Controlling savepoints from inside an application

2016-10-04 Thread Ufuk Celebi
On Fri, Sep 30, 2016 at 4:19 PM, Astrac wrote: > What I mean by "manually reading the savepoint" is that rather than > providing the savepoint path via "the --fromSavepoint > hdfs://some-path/to/savepoint" option I'd like to provide it in the code > that initialises the StreamExecutionEnvironment.