RE: Could not build the program from JAR file.

2018-08-07 Thread Florian Simond
Hi Gary, It's not working with the last version: 1.5.2. Sorry about that 😊 Florian De : Gary Yao Envoyé : mardi 7 août 2018 08:28 À : Florian Simond Cc : user@flink.apache.org Objet : Re: Could not build the program from JAR file. Hi Florian, You write tha

Re: Could not build the program from JAR file.

2018-08-07 Thread vino yang
Hi Florian, The error message is because of a FileNotFoundException, see here[1]. Is there any more information about the exception. Do you make sure the jar exist? [1]: https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209 Than

Re: connection failed when running flink in a cluster

2018-08-07 Thread Felipe Gutierrez
Worked! this was exactly the problem. I have to set the IP otherwise it does not accept the jobs that I submit. Even if I set the IP and localhost at the /etc/hosts file and the command "ping localhost" returns my IP, it does not work. It is mandatory to use --hostname . Thanks Gary. Best Regards

RE: Could not build the program from JAR file.

2018-08-07 Thread Florian Simond
Yes, the jar exists. Both the WordCount example and my custom jar. I also just redownloaded and reinstalled flink 1.5.2 as I was thinking maybe a dependencies or something was missing... But still the same error... De : vino yang Envoyé : mardi 7 août 2018 09

RE: Could not build the program from JAR file.

2018-08-07 Thread Florian Simond
In the log, I can see that: First exception is a warning, not sure if it is important. Second one seems to be the one. It tries to find the file "-yn" ??? 2018-08-07 09:16:04,776 WARN org.apache.flink.client.cli.CliFrontend - Could not load CLI class org.apache.flink.

Re: Accessing source table data from hive/Presto

2018-08-07 Thread Fabian Hueske
Hi Mugunthan, this depends on the type of your job. Is it a batch or a streaming job? Some queries could be ported to Flink's SQL API as suggested by the link that Hequn posted. In that case, the query would be executed in Flink. Other options are to use a JDBC InputFormat or persisting the resul

Increase parallelism according to the number of tasks

2018-08-07 Thread Mich Talebzadeh
Hi, In my test environment I have two task managers. I would like to increase the parallelism to 2 from default of 1. Can it be done through properties properties.setProperty("parallelism", "2") Although that does not change anything. Thanks Dr Mich Talebzadeh LinkedIn * https://www.li

Re: Increase parallelism according to the number of tasks

2018-08-07 Thread Mich Talebzadeh
This worked streamExecEnv.setParallelism(2) Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:*

Re: Access to Kafka Event Time

2018-08-07 Thread Aljoscha Krettek
Hi Vishal, to answer the original question: it should not assumed that mutations of the element will be reflected downstream. For your situation this means that you have to use a ProcessingFunction to put the timestamp of a record into the record itself. Also, Flink 1.6 will come with the next

Re: Could not build the program from JAR file.

2018-08-07 Thread vino yang
Hi Florian, The log you gave before prints the phrase "Use the help option (-h or --help) to get help on the command." and the log just shows "-yn" as a jar file that looks like yours. There is a problem with the parameter specification. Can you share the full client log and your commit command?

Getting max directory error !

2018-08-07 Thread Puneet Kinra
Hi Max directory error. -- *Cheers * *Puneet Kinra* *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com * *e-mail :puneet.ki...@customercentria.com *

RE: Could not build the program from JAR file.

2018-08-07 Thread Florian Simond
Hi Vino, Here is the full log: https://pastebin.com/A9KQimSL The command used is the following: > ./bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096 > ./examples/batch/WordCount.jar It is the demo command described here:

Re: Could not build the program from JAR file.

2018-08-07 Thread Gary Yao
Hi Florian, Can you run export HADOOP_CLASSPATH=`hadoop classpath` before submitting the job [1]? Moreover, you should not use the -yn parameter. Beginning with Flink 1.5, the number of TaskManagers is not fixed anymore. Best, Gary [1] https://ci.apache.org/projects/flink/flink-docs-rele

Re: Getting max directory error !

2018-08-07 Thread Chesnay Schepler
we're gonna need more information to help you. On 07.08.2018 11:44, Puneet Kinra wrote: Hi Max directory error. -- *Cheers * * * *Puneet Kinra*

RE: Could not build the program from JAR file.

2018-08-07 Thread Florian Simond
Indeed, that's the solution. It was done automatically before with 1.4.2, that's why I missed that part... Do you have any pointer about the dynamic number of TaskManagers ? I'm curious to know how it works. Is it possible to still fix it ? Thank you, Florian _

Re: Flink job is not reading from certain kafka topic partitions

2018-08-07 Thread Tzu-Li (Gordon) Tai
Hi, The case you described looks a lot like this issue with the Flink Kafka Consumer in 1.3.0 / 1.3.1: https://issues.apache.org/jira/browse/FLINK-7143 If this is the case, you would have to upgrade to 1.3.2 or above to overcome this. The issue ticket description contains some things to keep in m

Re: Could not build the program from JAR file.

2018-08-07 Thread Gary Yao
You can find more information about the re-worked deployment model here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077 TaskManagers are started and shut down according to the slot requirements of the jobs. It is possible to return to the old behavior by setting m

RE: Could not build the program from JAR file.

2018-08-07 Thread Florian Simond
Thank you! So it is now normal that it takes around 5 minutes to start processing ? The job is reading from kafka and writing back into another kafka topic. When I start the job, it takes roughly 5 minutes before I get something in the output topic. I see a lot of 2018-08-07 12:20:34,672 IN

Re: Using a custom DeserializationSchema with Kafka and Python

2018-08-07 Thread Chesnay Schepler
This doesn't work since the FlinkKafkaConsumer010 isn't aware that the given deserializer is a jython class. Jython classes have to be serialized in a specific way (as seen in AbstractPythonUDF). For this to work you'll need to create a (java!) wrapper around the FlinkKafkaConsumer010 that ser

Re: Could not build the program from JAR file.

2018-08-07 Thread Gary Yao
Hi Florian, 5 minutes sounds too slow. Are you starting multiple "per-job clusters" at the same time? How many slots do you configure per TM? After you submit the job, how many resources do you have left in your YARN cluster? It might be that you are affected by FLINK-9455 [1]: Flink requests unn

RE: Could not build the program from JAR file.

2018-08-07 Thread Florian Simond
Hi Gary, No, I am not starting multiple "per-job clusters". I didn't configure anything regarding the number of slots per TM, so I guess the default value (1 then). But on the YARN UI I see that the number of "running containers" varies a lot (13 then 1 then 8 then 2 then 27 then 6 etc...)

RE: FlinkCEP and scientific papers ?

2018-08-07 Thread Esa Heikkinen
There was one good example of pattern query in the paper made by SASE+ language (in attachment). Could you easily say how to do that FlickCEP with Scala ? Or is it possible ? That SQL and CEP would also be very interesting, but when it is ready to use ? BR Esa From: vino yang Sent: Monday, Ju

Re: FlinkCEP and scientific papers ?

2018-08-07 Thread Timo Walther
Hi Esa, the SQL/CEP integration might be part of Flink 1.7. The discussion has just been started again [1]. Regards, Timo [1] https://issues.apache.org/jira/browse/FLINK-6935 Am 07.08.18 um 15:36 schrieb Esa Heikkinen: There was one good example of pattern query in the paper made by SASE+

[ANNOUNCE] Weekly community update #32

2018-08-07 Thread Till Rohrmann
Dear community, this is the weekly community update thread #32. Please post any news and updates you want to share with the community to this thread. # Flink 1.6.0 voting period about to end The voting period for the 4th RC of Flink 1.6.0 will end Wednesday 6:30pm CET [1]. So far there seem to b

Passing the individual table coilumn values to the local variables

2018-08-07 Thread Mich Talebzadeh
Hi, The following works fine tableEnv.registerDataStream("priceTable", splitStream, 'key, 'ticker, 'timeissued, 'price) val result = tableEnv.scan("priceTable").filter('ticker === "VOD" && 'price > 99.0).select('key, 'ticker, 'timeissued, 'price) val r = result.toDataStream[Row] r.

Re: Could not build the program from JAR file.

2018-08-07 Thread Gary Yao
Hi Florian, Thank you for the logs. They look indeed strange but I cannot reproduce this behavior. From the logs I can see that the ResourceManager is requesting containers with different resource profiles (2048mb or 1024mb memory): Requesting new TaskExecutor container with resources . Numbe

Re: Accessing source table data from hive/Presto

2018-08-07 Thread srimugunthan dhandapani
Thanks for the reply. I was mainly thinking of the usecase of streaming job. In the approach to port to Flink's SQL API, is it possible to read parquet data from S3 and register table in flink? On Tue, Aug 7, 2018 at 1:05 PM, Fabian Hueske wrote: > Hi Mugunthan, > > this depends on the type of

RE: Could not build the program from JAR file.

2018-08-07 Thread Florian Simond
Hi Gary, Good intuition... yarn.scheduler.minimum-allocation-mb is set to 2048 :) I specified -ytm 2048 and -yjm 2048 and the job started right away, I will also try again later to see if it's not luck. Thanks a lot ! Regarding the version, it is still 0.1, and that I have no clue I dow

Re: Passing the individual table coilumn values to the local variables

2018-08-07 Thread Mich Talebzadeh
I need this operation to stored filtered rows in an Hbase table. I can access an existing Hbase table through flink API My challenge is to put rows into Hbase table. Something like below and I don't seem to be able to extract individual column values from priceTable *val k

Filter-Join Ordering Issue

2018-08-07 Thread Dylan Adams
I'm trying to use the Flink DataSet API to validate some records and have run into an issue. My program uses joins to validate inputs against reference data. One of the attributes I'm validating is optional, and only needs to be validated when non-NULL. So I added a filter to prevent the null-keyed

JDBCInputFormat and SplitDataProperties

2018-08-07 Thread Alexis Sarda
Hi everyone, I have the following scenario: I have a database table with 3 columns: a host (string), a timestamp, and an integer ID. Conceptually, what I'd like to do is: group by host and timestamp -> based on all the IDs in each group, create a mapping to n new tuples -> for each unique tuple,

VerifyError when running Python streaming job

2018-08-07 Thread Joe Malt
Hi, I'm running into errors when trying to run a Flink streaming program. Running the WordCount example from the docs fails with this error: java.lang.VerifyError: (class: site$py, method: _

Re: Access to Kafka Event Time

2018-08-07 Thread Vishal Santoshi
Thanks a lot! Awesome that 1.6 will have the ts of the element On Tue, Aug 7, 2018, 4:19 AM Aljoscha Krettek wrote: > Hi Vishal, > > to answer the original question: it should not assumed that mutations of > the element will be reflected downstream. For your situation this means > that you h

Re: Working out through individual messages in Flink

2018-08-07 Thread Mich Talebzadeh
Hi Fabian, Reading your notes above I have converted the table back to DataStream. val tableEnv = TableEnvironment.getTableEnvironment(streamExecEnv) tableEnv.registerDataStream("priceTable", splitStream, 'key, 'ticker, 'timeissued, 'price) val key = tableEnv.scan("priceTable"

Re: Working out through individual messages in Flink

2018-08-07 Thread Jörn Franke
Hi Mich, Would it be possible to share the full source code ? I am missing a call to streamExecEnvironment.execute Best regards > On 8. Aug 2018, at 00:02, Mich Talebzadeh wrote: > > Hi Fabian, > > Reading your notes above I have converted the table back to DataStream. > > val tableEnv

Re: Working out through individual messages in Flink

2018-08-07 Thread Jörn Franke
(At the end of your code) > On 8. Aug 2018, at 00:29, Jörn Franke wrote: > > Hi Mich, > > Would it be possible to share the full source code ? > I am missing a call to streamExecEnvironment.execute > > Best regards > >> On 8. Aug 2018, at 00:02, Mich Talebzadeh wrote: >> >> Hi Fabian, >>

Re: Working out through individual messages in Flink

2018-08-07 Thread Mich Talebzadeh
Hi Jorn, Thanks I uploaded the Scala code to my GitHub --> md_streaming.scala https://github.com/michTalebzadeh/Flink Regards, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Running SQL to print to Std Out

2018-08-07 Thread 네이버
> On 6 Aug 2018, at 23:34, Mich Talebzadeh wrote: > > Awesome, thanks both. > > I can see it now on task manager stdout > > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com

unsubcscribe

2018-08-07 Thread 네이버
> On 7 Aug 2018, at 06:51, Timo Walther wrote: > > Hi Esa, > > the SQL/CEP integration might be part of Flink 1.7. The discussion has just > been started again [1]. > > Regards, > Timo > > [1] https://issues.apache.org/jira/browse/FLINK-6935 > >> Am 07.08.18 um 15:36 schrieb Esa Heikkinen:

checkpoint recovery behavior when kafka source is set to start from timestamp

2018-08-07 Thread Yan Zhou [FDS Science]
Hi Experts, In my application, the kafka source is set to start from a specified timestamp, by calling method FlinkKafkaConsumer010#setStartFromTimestamp(long startupOffsetsTimestamp). If the application have run a while and then recover from a checkpoint because of failure, what's the offse

Re: Passing the individual table coilumn values to the local variables

2018-08-07 Thread Hequn Cheng
Hi Mich, We can't convert a DataStream to a value. There are some options: 1. Use a TableSink to write data[1] into Hbase. 2. Use a UDF[2]. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sourceSinks.html#define-a-tablesink [2] https://ci.apache.org/projects/flin

Re: Passing the individual table coilumn values to the local variables

2018-08-07 Thread vino yang
Hi Mich, Here you need to understand that the print call does not print the value of a field, it is actually a call to an output to STDOUT sink. So, what you get here is not the value of a variable, please refer to the hequn recommendation. Thanks, vino. Hequn Cheng 于2018年8月8日周三 上午9:11写道: > Hi

Re: checkpoint recovery behavior when kafka source is set to start from timestamp

2018-08-07 Thread vino yang
Hi Yan Zhou: I think the java doc of the setStartFromTimestamp method has been explained very clearly, posted here: */*** ** Specify the consumer to start reading partitions from a specified timestamp.* ** The specified timestamp must be before the current timestamp.* ** This lets the consumer ig

Re: checkpoint recovery behavior when kafka source is set to start from timestamp

2018-08-07 Thread Yan Zhou [FDS Science]
Thank you Vino. It is very helpful. From: vino yang Sent: Tuesday, August 7, 2018 7:22:50 PM To: Yan Zhou [FDS Science] Cc: user Subject: Re: checkpoint recovery behavior when kafka source is set to start from timestamp Hi Yan Zhou: I think the java doc of the

Re: Filter-Join Ordering Issue

2018-08-07 Thread vino yang
Hi Dylan, I roughly looked at your job program and the DAG of the job. It seems that the optimizer chose the wrong optimization execution plan. cc Till. Thanks, vino. Dylan Adams 于2018年8月8日周三 上午2:26写道: > I'm trying to use the Flink DataSet API to validate some records and have > run into an i

Re: Small-files source - partitioning based on prefix of file

2018-08-07 Thread Averell
Thank you Fabian. "/In either case, some record will be read twice but if reading position can be reset, you can still have exactly-once state consistency because the state is reset as well./" I do not quite understand this statement. If I have read 30 lines from the checkpoint and sent those 30 r

Need help regarding Flink Batch Application

2018-08-07 Thread Ravi Bhushan Ratnakar
Hi Everybody, Currently I am working on a project where i need to write a Flink Batch Application which has to process hourly data around 400GB of compressed sequence file. After processing, it has write it as compressed parquet format in S3. I have managed to write the application in Flink and a

unsubscribtion

2018-08-07 Thread 네이버
> On 7 Aug 2018, at 19:42, Yan Zhou [FDS Science] wrote: > > Thank you Vino. It is very helpful. > From: vino yang > Sent: Tuesday, August 7, 2018 7:22:50 PM > To: Yan Zhou [FDS Science] > Cc: user > Subject: Re: checkpoint recovery behavior when kafka source is set to start > from timestamp

Re: unsubscribtion

2018-08-07 Thread Timo Walther
Hi, see https://flink.apache.org/community.html#mailing-lists for unsubscribing: Use: user-unsubscr...@flink.apache.org Regards, Timo Am 08.08.18 um 08:18 schrieb 네이버: On 7 Aug 2018, at 19:42, Yan Zhou [FDS Science] > wrote: Thank you Vino. It is very helpful