Re: Flink not reading from Kafka

2017-02-17 Thread Mohit Anchlia
Interestingly enough same job runs ok on Linux but not on windows On Fri, Feb 17, 2017 at 4:54 PM, Mohit Anchlia wrote: > I have this code trying to read from a topic however the flink process > comes up and waits forever even though there is data in the topic. Not sure > why? Has anyone else se

Flink not reading from Kafka

2017-02-17 Thread Mohit Anchlia
I have this code trying to read from a topic however the flink process comes up and waits forever even though there is data in the topic. Not sure why? Has anyone else seen this problem? StreamExecutionEnvironment env = StreamExecutionEnvironment .*createLocalEnvironment*(); Properties propertie

Re: Aggregation problem.

2017-02-17 Thread Fabian Hueske
Hi, this looks like a bug to me. Can you open a JIRA and maybe a small testcase to reproduce the issue? Thank you, Fabian 2017-02-18 1:06 GMT+01:00 Kürşat Kurt : > Hi; > > > > I have a Dataset like this: > > > > *(**0,Auto,0.4,1,5.8317538999854194E-5)* > > *(0,Computer,0.2,1,4.8828125E-5)* > >

Aggregation problem.

2017-02-17 Thread Kürşat Kurt
Hi; I have a Dataset like this: (0,Auto,0.4,1,5.8317538999854194E-5) (0,Computer,0.2,1,4.8828125E-5) (0,Sports,0.4,2,1.7495261699956258E-4) (1,Auto,0.4,1,1.7495261699956258E-4) (1,Computer,0.2,1,4.8828125E-5) (1,Sports,0.4,1,5.8317538999854194E-5) This code; ds.groupBy(0).max(4).pr

Re: Performance tuning

2017-02-17 Thread Dmitry Golubets
Hi Daniel, I've implemented a macro that generates message pack serializers in our codebase. Resulting code is basically a series of writes\reads like in hand-written structured serialization. E.g. given case class Data1(str: String, subdata: Data2) case class Data2(num: Int) serialization code

Re: How important is 'registerType'?

2017-02-17 Thread Dmitry Golubets
Hi Till, It happened during deserialization of a savepoint. Best regards, Dmitry On Fri, Feb 17, 2017 at 2:48 PM, Till Rohrmann wrote: > Hi Dmitry, > > curious to know when exactly you observed the IllegalStateException. Did > it happen after resuming from a savepoint or did it already happen

blob store defaults to /tmp and files get deleted

2017-02-17 Thread Shannon Carey
A few of my jobs recently failed and showed this exception: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot load user class: org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09 ClassLoader info: URL ClassLoader: file: '/tmp/blobStore-5f023409-6af5-4de6-8ed0

Re: Performance tuning

2017-02-17 Thread Daniel Santos
Hello Dimitry, Could you please elaborate on your tuning on -> environment.addDefaultKryoSerializer(..) . I'm interested on knowing what have you done there for a boost of about 50% . Some small or simple example would be very nice. Thank you very much in advance. Kind Regards, Daniel Sa

Re: Performance tuning

2017-02-17 Thread Shannon Carey
One network setting is mentioned here: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html#controlling-latency From: Dmitry Golubets mailto:dgolub...@gmail.com>> Date: Friday, February 17, 2017 at 6:43 AM To: mailto:user@flink.apache.org>> Subject: Performance tun

Re: Cartesian product over windows

2017-02-17 Thread Sonex
Hi Till, when you say parallel windows, what do you mean? Do you mean the use of timeWindowAll which has all the elements of a window in a single task? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Cartesian-product-over-windows-tp11676p11

Re: How important is 'registerType'?

2017-02-17 Thread Till Rohrmann
Hi Dmitry, curious to know when exactly you observed the IllegalStateException. Did it happen after resuming from a savepoint or did it already happen during the first run of the program? If the latter is the case, then this might indicate a bug where we don’t use the correct ExecutionConfig to in

Re: Flink batch processing fault tolerance

2017-02-17 Thread Aljoscha Krettek
@Anton, these are the Ideas I was mentioning and I'm afraid I have nothing more to add. (In the FLIP) On Fri, 17 Feb 2017 at 06:26 wangzhijiang999 wrote: > yes, it is really a critical problem for large batch job because the > unexpected failure is a common case. > And we are already focusing on

Re: Clarification: use of AllWindowedStream.apply() function

2017-02-17 Thread Aljoscha Krettek
Yes, you're correct. :-) On Thu, 16 Feb 2017 at 14:24 nsengupta wrote: > Thanks, Aljoscha for the clarification. > > I understand that instead of using a flatMap() in the way I am using, I am > better off using : > * a fold (init, fold_func, window_func) first and then > * map to a different typ

Re: Reliable Distributed FS support (HCFS)

2017-02-17 Thread Aljoscha Krettek
Hi, I think atomic rename is not part of the requirements. I'll add +Stephan who recently wrote this document in case he has any additional input. Cheers, Aljoscha On Thu, 16 Feb 2017 at 23:28 Vijay Srinivasaraghavan wrote: > Following up on my question regarding backed Filesystem (HCFS) > req

Re: "Job submission to the JobManager timed out" on EMR YARN cluster with multiple jobs

2017-02-17 Thread Geoffrey Mon
Hi Gordon, I was using a Flink session that lasted as long as the plan jar was still running (which I believe would be a "per job yarn cluster"), by submitting a command to EMR that looked like: flink run -m yarn-cluster -yn 5 [jar] [jar arguments] Cheers, Geoffrey On Fri, Feb 17, 2017 at 12:09

Re: Checkpointing with RocksDB as statebackend

2017-02-17 Thread Vinay Patil
Hi Guys, There seems to be some issue with RocksDB memory utilization. Within few minutes of job run the physical memory usage increases by 4-5 GB and it keeps on increasing. I have tried different options for Max Buffer Size(30MB, 64MB, 128MB , 512MB) and Min Buffer to Merge as 2, but the physic

Performance tuning

2017-02-17 Thread Dmitry Golubets
Hi, My streaming job cannot benefit much from parallelization unfortunately. So I'm looking for things I can tune in Flink, to make it process sequential stream faster. So far in our current engine based on Akka Streams (non distributed ofc) we have 20k msg/sec. Ported to Flink I'm getting 14k so

Re: How important is 'registerType'?

2017-02-17 Thread Dmitry Golubets
Hi, I was using ```cs.knownDirectSubclasses``` recursively to find and register subclasses, which may have resulted in order mess. Later I changed that to cs.knownDirectSubclasses.toList.sortBy(_.fullName)``` which should have fixed the order. But either it didn't or there was another problem,

Re: Streaming data from MongoDB using Flink

2017-02-17 Thread Pedro Monteiro
Dear Gordon, Till Thank you so much for your helpful answers. I managed to solve my problem with your guidelines. Much appreciated, keep up the good work! Cheers Cumprimentos, *Pedro Lima Monteiro* On 17 February 2017 at 10:10, Tzu-Li (Gordon) Tai wrote: > Sorry, I just realized I didn’t no

Re: Akka 2.4

2017-02-17 Thread Till Rohrmann
Hi Dmitry, not sure if everything works out of the box when bumping the Akka version. Please keep us updated if you should try it out. Cheers, Till On Thu, Feb 16, 2017 at 6:20 PM, Ted Yu wrote: > Please see FLINK-3662 > > On Thu, Feb 16, 2017 at 9:01 AM, Dmitry Golubets > wrote: > >> Hi, >>

Re: Reading compressed XML data

2017-02-17 Thread Till Rohrmann
Hi Sebastian, file input splits basically define the region of a file which a subtask will read. Thus, your file input format would have to break up the bzip2 file exactly at the border of compressed blocks when generating the input file splits. Otherwise a subtask won't be able to decompress it.

Re: Streaming data from MongoDB using Flink

2017-02-17 Thread Tzu-Li (Gordon) Tai
Sorry, I just realized I didn’t notice the second part question of your last email when replying. Thanks Till for answering it! On February 17, 2017 at 6:05:58 PM, Till Rohrmann (trohrm...@apache.org) wrote: Dear Gordon, Thanks for your help, I think I am on the right track as of now. On the

Re: Streaming data from MongoDB using Flink

2017-02-17 Thread Till Rohrmann
Hi Pedro, in order to add new sources you have to first stop the job (maybe taking a savepoint if you want to resume later on) and then restart the job with the changed topology. Cheers, Till On Thu, Feb 16, 2017 at 4:06 PM, Tzu-Li (Gordon) Tai wrote: > Good to know! > > > On February 16, 2017

Re: Cartesian product over windows

2017-02-17 Thread Till Rohrmann
Hi Ioannis, with a flatMap operation which replicates elements and assigning them a proper key followed by a keyBy operation you can practically generate all different kinds of partitionings. So if you first collect the data in parallel windows, you can then replicate half of the data of each win

Re: CSV sink partitioning and bucketing

2017-02-17 Thread Fabian Hueske
Hi Flavio, Flink does not come with an OutputFormat that creates buckets. It should not be too hard to implement this in Flink though. However, if you want a solution fast, I would try the following approach: 1) Search for a Hadoop OutputFormat that buckets Strings based on a key (). 2) Implement

Re: Can't run flink on yarn on version 1.2.0

2017-02-17 Thread Bruno Aranda
Hi Howard, We run Flink 1.2 in Yarn without issues. Sorry I don't have any specific solution, but are you sure you don't have some sort of Flink mix? In your logs I can see: The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please

Can't run flink on yarn on version 1.2.0

2017-02-17 Thread Howard,Li(vip.com)
Hi, I’m trying to run flink on yarn by using command: bin/flink run -m yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar But I got the following error: 2017-02-17 15:52:40,746 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar

CSV sink partitioning and bucketing

2017-02-17 Thread Flavio Pompermaier
Hi to all, in my use case I'd need to output my Row objects into an output folder as CSV on HDFS but creating/overwriting new subfolders based on an attribute (for example create a subfolder for each value of a specified column). Then, it could be interesting to bucketing the data inside those fold