Re: Filter columns of a csv file with Flink

2018-07-06 Thread Hequn Cheng
Hi francois, > I see that CsvTableSource allows to define csv fields. Then, will it check if columns actually exists in the file and throw Exception if not ? Currently, CsvTableSource doesn't support Avro. CsvTableSource uses fieldDelim and rowDelim to parse data. But there is a workaround: read e

Re: Slide Window Compute Optimization

2018-07-06 Thread Rong Rong
+1. Yes your use case would probably fit best in the OVER aggregate use case. I actually created for myself a complimentary note for some of the complex aggregate components on top of Flink SQL/Table

Re: Description of Flink event time processing

2018-07-06 Thread Elias Levy
Apologies. Comments are now enabled. On Thu, Jul 5, 2018 at 6:09 PM Rong Rong wrote: > Hi Elias, > > Thanks for putting together the document. This is actually a very good, > well-rounded document. > I think you did not to enable access for comments for the link. Would you > mind enabling comme

StateMigrationException when switching from TypeInformation.of to createTypeInformation

2018-07-06 Thread Elias Levy
During some refactoring we changed a job using managed state from: ListStateDescriptor("config", TypeInformation.of(new TypeHint[ConfigState]() {})) to ListStateDescriptor("config", createTypeInformation[ConfigState]) After this change, Flink refused to start the new job from a savepoint or che

Re: flink1.5 web UI

2018-07-06 Thread Vishal Santoshi
It seems it is the UI refresh that forces the loop on the job server. From flink cli it does it once.. So this might be a false alarm. On Fri, Jul 6, 2018 at 4:55 PM, Vishal Santoshi wrote: > The UI shows the following and the JM goes into a convulsions trying to > retrieve a jobiid as above.

Re: flink1.5 web UI

2018-07-06 Thread Vishal Santoshi
The UI shows the following and the JM goes into a convulsions trying to retrieve a jobiid as above. org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. On Fri, Jul 6, 2018 at 4:53 PM, Vishal Santoshi wrote: > If we submit a job through CLI and it has an

flink1.5 web UI

2018-07-06 Thread Vishal Santoshi
If we submit a job through CLI and it has an error ( missing args and so on ) , the JM goes into convulsions. It seems it submits a job without fist validating and then goes into a loop trying to figure out the job Jul 06 16:51:26 flink-9edd15d7.bf2.tumblr.net docker[31171]: at scala.concurrent.fo

Re: Filter columns of a csv file with Flink

2018-07-06 Thread françois lacombe
Hi Hequn, The Table-API is really great. I will use and certainly love it to solve the issues I mentioned before One subsequent question regarding Table-API : I've got my csv files and avro schemas that describe them. As my users can send erroneous files, inconsistent with schemas, I want to chec

Re: [BucketingSink] incorrect indexing of part files, when part suffix is specified

2018-07-06 Thread Rinat
Hi Mingey ! I’ve implemented the group of tests, that shows that problem exists only when part suffix is specified and file in pending state exists here is an exception testThatPartIndexIsIncrementedWhenPartSuffixIsSpecifiedAndPreviousPartFileInProgressState(org.apache.flink.streaming.connector

Re: Filter columns of a csv file with Flink

2018-07-06 Thread Hequn Cheng
Hi francois, If I understand correctly, you can use sql or table-api to solve you problem. As you want to project part of columns from source, a columnar storage like parquet/orc would be efficient. Currently, ORC table source is supported in flink, you can find more details here[1]. Also, there a

Re: Limiting in flight data

2018-07-06 Thread Vishal Santoshi
Further if there is are metrics that allows us to chart delays per pipe on n/w buffers, that would be immensely helpful. On Fri, Jul 6, 2018 at 10:02 AM, Vishal Santoshi wrote: > Awesome, thank you for pointing that out. We have seen stability on pipes > where previously throttling the source (

Re: Limiting in flight data

2018-07-06 Thread Vishal Santoshi
Awesome, thank you for pointing that out. We have seen stability on pipes where previously throttling the source ( rateLimiter ) was the only way out. https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java#L291 This though seems

flink rocksdb where to configure mount point

2018-07-06 Thread Siew Wai Yow
Hi, We configure rocksdb as statebackend and checkpoint dir persists to hdfs. When the job is run, rocksdb automatically mount to tmpfs /tmp, which consume memory. RocksDBStateBackend rocksdb = new RocksDBStateBackend(new FsStateBackend(hdfs://), true); env.setStateBac

Filter columns of a csv file with Flink

2018-07-06 Thread françois lacombe
Hi all, I'm a new user to Flink community. This tool sounds great to achieve some data loading of millions-rows files into a pgsql db for a new project. As I read docs and examples, a proper use case of csv loading into pgsql can't be found. The file I want to load isn't following the same struct

Re: is there a config to ask taskmanager to keep retrying connect to jobmanager after Disassociated?

2018-07-06 Thread Vishal Santoshi
Yep, pwrfect, that we do. Can you confirm though that jobs will restart in the case of a failover ? That is what we see and that is fine.. On Fri, Jul 6, 2018, 8:24 AM Chesnay Schepler wrote: > If i remember correctly the masters file is only used by the > [start|stop]-cluster.sh scripts to det

Re: is there a config to ask taskmanager to keep retrying connect to jobmanager after Disassociated?

2018-07-06 Thread Chesnay Schepler
If i remember correctly the masters file is only used by the [start|stop]-cluster.sh scripts to determine how many JobManagers should be started / stopped and which port they should use. it's not necessarily /required/, but without it you have to manually start/stop all jobmanagers. On 06.07

Re: is there a config to ask taskmanager to keep retrying connect to jobmanager after Disassociated?

2018-07-06 Thread Vishal Santoshi
Even though I must admit that the jobs restart but they do restart successfully with the new JM. On Fri, Jul 6, 2018, 8:08 AM Vishal Santoshi wrote: > Hello Chesnay, I have used an HA setup without the masters file and have > seen failover happen based on alerts from a leader election routi

Re: is there a config to ask taskmanager to keep retrying connect to jobmanager after Disassociated?

2018-07-06 Thread Vishal Santoshi
Hello Chesnay, I have used an HA setup without the masters file and have seen failover happen based on alerts from a leader election routine Is it actually required that there be a masters file when there is a central arbiterer ZK that has the alive JMs and a call back to force TMs to switch t

Re: A use-case for Flink and reactive systems

2018-07-06 Thread Fabian Hueske
Hi Yersinia, let me reply to some of your questions. I think these answers should also address most of Mich's questions as well. > What you are saying is I can either use Flink and forget database layer, or make a java microservice with a database. Mixing Flink with a Database doesn't make any se

Re: Slide Window Compute Optimization

2018-07-06 Thread Fabian Hueske
Hi Yennie, You might want to have a look at the OVER windows of Flink's Table API or SQL [1]. An OVER window computes an aggregate (such as a count) for each incoming record over a range of previous events. For example the query: SELECT ip, successful, COUNT(*) OVER (PARTITION BY ip, successful