Re: Error running on Hadoop 2.7

2018-03-22 Thread Ken Krugler
Hi Ashish, Are you using Flink 1.4? If so, what does the “hadoop classpath” command return from the command line where you’re trying to start the job? Asking because I’d run into issues with https://issues.apache.org/jira/browse/FLINK-7477 , wh

Re: Restart hook and checkpoint

2018-03-22 Thread Ashish Pokharel
Fabian, that sounds good. Should I recap some bullets in an email and start a new thread then? Thanks, Ashish > On Mar 22, 2018, at 5:16 AM, Fabian Hueske wrote: > > Hi Ashish, > > Agreed! > I think the right approach would be to gather the requirements and start a > discussion on the dev m

Re: Error running on Hadoop 2.7

2018-03-22 Thread Ashish Pokharel
Hi All, Looks like we are out of the woods for now (so we think) - we went with Hadoop free version and relied on client libraries on edge node. However, I am still not very confident as I started digging into that stack as well and realized what Till pointed out (traces leads to a class that

Re: Example PopularPlacesFromKafka fails to run

2018-03-22 Thread James Yu
Just figured out the data format in nycTaxiRides.gz doesn't match to the way TaxiRide.java interpreting the lines fed into it. Then I check the exercise training github and found the TaxiRide.java ( https://github.com/dataArtisans/flink-training-exercises/tree/master/src/main/java/com/dataartisans/

Re: Rowtime

2018-03-22 Thread Fabian Hueske
Hi Gregory, Event-time timestamps are handled a bit differently in Flink's SQL compared to the DataStream API. In the DataStream API, timestamps are hidden from the user and implicitly used for time-based operations such as windows. In SQL, the query semantics cannot depend on hidden fields. There

Example PopularPlacesFromKafka fails to run

2018-03-22 Thread James Yu
Hi, I fail to run the PopularPlacesFromKafka example with the following exception, and I wonder what might cause this "Invalid record" error? when running within Intellij IDEA --> 07:52:23.960 [Source: Custom Source -> Map (7/8)] INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom S

Rowtime

2018-03-22 Thread Gregory Fee
Hello! I have found that even though I am processing using event time and I provide an event time for all my events that the events produced in a RetractStream I create from a Table do not have timestamps. That is to say that I put a ProcessFunction on the RetractStream and ctx.timestamp() always r

Re: Lucene SPI class loading fails with shaded flink-connector-elasticsearch

2018-03-22 Thread Till Rohrmann
Hi Manuel, thanks for reporting this issue. It sounds to me like a bug we should fix. I've pulled Gordon into the conversation since he will most likely know more about the ElasticSearch connector shading. Cheers, Till On Thu, Mar 22, 2018 at 5:09 PM, Haddadi Manuel wrote: > Hello, > > When up

Lucene SPI class loading fails with shaded flink-connector-elasticsearch

2018-03-22 Thread Haddadi Manuel
Hello, When upgrading from flink-1.3.2 to flink-1.4.2, I faced this error on runtime of a Flink job : java.util.ServiceConfigurationError: An SPI class of type org.apache.lucene.codecs.PostingsFormat with classname org.apache.lucene.search.suggest.document.Completion50PostingsFormat does not

Re: Error running on Hadoop 2.7

2018-03-22 Thread Kien Truong
Hi Ashish, Yeah, we also had this problem before. It can be solved by recompiling Flink with HDP version of Hadoop according to instruction here: https://ci.apache.org/projects/flink/flink-docs-release-1.4/start/building.html#vendor-specific-versions Regards, Kien On 3/22/2018 12:25 AM,

Re: [ANNOUNCE] Weekly community update #12

2018-03-22 Thread Till Rohrmann
Eron pointed out to me that the Flink improvement proposal 6 (short Flip-6) deserves some more comments since not everyone will be aware of what it actually means. I totally agree and will try to give a bit more context for everyone interested. Flip-6 is intended to solve some of the Flink's short

Re: Kafka Consumers Partition Discovery doesn't work

2018-03-22 Thread Tzu-Li (Gordon) Tai
Hi, I think you’ve made a good point: there is currently no logs that tell anything about discovering a new partition. We should probably add this. And yes, it would be great if you can report back on this using either the latest master, release-1.5 or release-1.4 branches. On 22 March 2018 at

Re: Migration to Flip6 Kubernetes

2018-03-22 Thread Till Rohrmann
Hi Edward and Eron, you're right that there is currently no JobClusterEntrypoint implementation for Kubernetes. How this entrypoint looks like mostly depends on how the job is stored and retrieved. There are multiple ways conceivable: - The entrypoint connects to an external system from which it

Re: Kafka Consumers Partition Discovery doesn't work

2018-03-22 Thread Juho Autio
Thanks, that sounds promising. I don't know how to check if it's consuming all partitions? For example I couldn't find any logs about discovering a new partition. However, did I understand correctly that this is also fixed in Flink dev? If yes, I could rebuild my 1.5-SNAPSHOT and try again. On Thu

Re: Kafka Consumers Partition Discovery doesn't work

2018-03-22 Thread Tzu-Li (Gordon) Tai
Hi Juho, Can you confirm that the new partition is consumed, but only that Flink’s reported metrics do not include them? If yes, then I think your observations can be explained by this issue:  https://issues.apache.org/jira/browse/FLINK-8419 This issue should have been fixed in the recently rele

Kafka Consumers Partition Discovery doesn't work

2018-03-22 Thread Juho Autio
According to the docs*, flink.partition-discovery.interval-millis can be set to enable automatic partition discovery. I'm testing this, apparently it doesn't work. I'm using Flink Version: 1.5-SNAPSHOT Commit: 8395508 and FlinkKafkaConsumer010. I had my flink stream running, consuming an existin

Re: Let BucketingSink roll file on each checkpoint

2018-03-22 Thread XilangYan
Ok, then may be I have misunderstanding about checkpoint. I thought flink use checkpoint to store offset, but when kafka connector making a checkpoint, it doesn't know whether data is in in-progress file or in pending-file so a whole offset is saved in checkpoint. I used to guess, the data in in-p

Re: Connect more than stream!!

2018-03-22 Thread miki haiat
You can join streams http://training.data-artisans.com/exercises/eventTimeJoin.html On Thu, 22 Mar 2018, 11:36 Puneet Kinra, wrote: > Hi > > Is there any way of connecting multiple streams(more than one) in flink > > -- > *Cheers * > > *Puneet Kinra* > > *Mobile:+918800167808 | Skype : puneet.

Re: how does SQL mode work with PopularPlaces example?

2018-03-22 Thread Fabian Hueske
Hi James, the exercise does not require to filter on pickup events. It says: "This is done by counting every five minutes the number of taxi rides that started and ended in the same area within the last 15 minutes. Arrival and departure locations should be separately counted." That is achieved b

Connect more than stream!!

2018-03-22 Thread Puneet Kinra
Hi Is there any way of connecting multiple streams(more than one) in flink -- *Cheers * *Puneet Kinra* *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com * *e-mail :puneet.ki...@customercentria.com *

Re: ListCheckpointed function - what happens prior to restoreState() being called?

2018-03-22 Thread Fabian Hueske
Hi Ken, ListCheckpointed.restoreState() is part of the operator state initialization, but you are right that it is not explicitly mentioned in the docs. That documentation page is rather about the internals and does not directly like to public API. I think you can also initialize the variable in

Re: Let BucketingSink roll file on each checkpoint

2018-03-22 Thread Fabian Hueske
Hi, Flink maintains its own Kafka offsets in its checkpoints and does not rely on Kafka's offset management. That way Flink guarantees that read offsets and checkpointed operator state are always aligned. The BucketingSink is designed to not lose any data and the mode of operation is described in

Re: Flink remote debug not working

2018-03-22 Thread Ankit Chaudhary
Thanks a lot Fabian :). I found the information on the configuration page this morning :))) https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/config.html Cheers, Ankit On Thu, Mar 22, 2018 at 9:56 AM, Fabian Hueske wrote: > Hi Ankit, > > The env.java.opts parameter is used for al

Re: Restart hook and checkpoint

2018-03-22 Thread Fabian Hueske
Hi Ashish, Agreed! I think the right approach would be to gather the requirements and start a discussion on the dev mailing list. Contributors and committers who are more familiar with the checkpointing and recovery internals should discuss a solution that can be integrated and doesn't break with

Re: CsvSink

2018-03-22 Thread Fabian Hueske
Great, thanks for reporting back! Best, Fabian 2018-03-20 22:40 GMT+01:00 karim amer : > Never mind I found the error and has nothing to do with flink. > Sorry > > On Tue, Mar 20, 2018 at 12:12 PM, karim amer > wrote: > >> here is the output after fixing the scala issues >> >> https://gist.gith

Re: Flink CEP window for 1 working day

2018-03-22 Thread Fabian Hueske
Hi, I don't think the CEP library is that flexible, but I loop in Kostas (CC) who knows more about it. I'm not exactly sure what you mean by "manipulate" event-time, but I don't think that's necessary. You can implement rules also with state and timers in the ProcessFunction. The function ingests

Re: Flink remote debug not working

2018-03-22 Thread Fabian Hueske
Hi Ankit, The env.java.opts parameter is used for all JVMs started by Flink, i.e., JM and TM. Since the JM process is started before the TM, the port is already in use when you start the TM. You can use env.java.opts.taskmanager to pass parameters only for TM JVMs. Best, Fabian 2018-03-20 14

Re: Error running on Hadoop 2.7

2018-03-22 Thread Till Rohrmann
Hi Ashish, the class `RequestHedgingRMFailoverProxyProvider` was only introduced with Hadoop 2.9.0. My suspicion is thus that you start the client with some Hadoop 2.9.0 dependencies on the class path. Could you please check the logs of the client what's on its class path? Maybe you could also sha

Query regarding to CountinousFileMonitoring operator

2018-03-22 Thread Puneet Kinra
if i set parallelsim equals to 1 still it create multiple splits while processing. -- *Cheers * *Puneet Kinra* *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com * *e-mail :puneet.ki...@customercentria.com *