date:20200110

Re: Are there pipeline API's for ETL?

2020-01-10 Thread kant kodali

Hi Vino, Another use case would be I want to build a dag of batch sources, sinks and transforms and I want to schedule the jobs periodically. One can say similar to airflow but a Flink api would be lot better! Sent from my iPhone > On Jan 10, 2020, at 6:42 PM, vino yang wrote: > > > Hi kan

Re: Are there pipeline API's for ETL?

2020-01-10 Thread kant kodali

Hi Vino, I am new to Flink. I was thinking more like a dag builder api where I can build a dag of source,sink and transforms and hopefully fink take cares of the entire life cycle of the dag. An example would be CDAP pipeline api. Sent from my iPhone > On Jan 10, 2020, at 6:42 PM, vino yang

Re: Are there pipeline API's for ETL?

2020-01-10 Thread vino yang

Hi kant, Can you provide more context about your question? What do you mean about "pipeline API"? IMO, you can build an ETL pipeline via composing several Flink transform APIs. About choosing which transform APIs, it depends on your business logic. Here are the generic APIs list.[1] Best, Vino

Are there pipeline API's for ETL?

2020-01-10 Thread kant kodali

Hi All, I am wondering if there are pipeline API's for ETL? Thanks!

Re: How can I find out which key group belongs to which subtask

2020-01-10 Thread 杨东晓

Thanks Till , I will do some test about this , will this be some public feature in next release version or later? Till Rohrmann 于2020年1月10日周五上午6:15写道： > Hi, > > you would need to set the co-location constraint in order to ensure that > the sub-tasks of operators are deployed to the same machine

Re: How can I find out which key group belongs to which subtask

2020-01-10 Thread 杨东晓

Thanks Zhijiang, looks like serialization will always be there in keyed stream Zhijiang 于2020年1月10日周五上午12:08写道： > Only chained operators can avoid record serialization cost, but the > chaining mode can not support keyed stream. > If you want to deploy downstream with upstream in the same task m

Re: Please suggest helpful tools

2020-01-10 Thread Eva Eva

Thank you both for the suggestions. I did a bit more analysis using UI and identified at least one problem that's occurring with the job rn. Going to fix it first and then take it from there. *Problem that I identified:* I'm running with 26 parallelism. For the checkpoints that are expiring, one o

Re: Yarn Kerberos issue

2020-01-10 Thread Juan Gentile

The error we get is the following: org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:423) at org.apache.flink.client.c

Re: StreamingFileSink doesn't close multipart uploads to s3?

2020-01-10 Thread Ken Krugler

Hi Kostas, I didn’t see a follow-up to this, and have also run into this same issue of winding up with a bunch of .inprogress files when a bounded input stream ends and the job terminates. When StreamingFileSystem.close() is called, shouldn’t all buckets get auto-rolled, so that the .inprogres

Apache Flink - Sharing state in processors

2020-01-10 Thread M Singh

Hi: I have a few question about how state is shared in processors in Flink. 1. If I have a processor instantiated in the Flink app, and apply use in multiple times in the Flink - (a) if the tasks are in the same slot - do they share the same processor on the taskmanager ? (b) if the tasks

Re: Running Flink on java 11

2020-01-10 Thread Chesnay Schepler

The error you got is due to an older asm version which is fixed for 1.10 in https://issues.apache.org/jira/browse/FLINK-13467 . On 10/01/2020 15:58, KristoffSC wrote: Hi, Yangze Guo, Chesnay Schepler thank you very much for your answers. I have actually a funny setup. So I have a Flink Job mod

Re: Running Flink on java 11

2020-01-10 Thread KristoffSC

Hi, Yangze Guo, Chesnay Schepler thank you very much for your answers. I have actually a funny setup. So I have a Flink Job module, generated from Flink's maven archetype. This module has all operators and Flink environment config and execution. This module is compiled by maven with "maven.compil

Re: [Question] Failed to submit flink job to secure yarn cluster

2020-01-10 Thread Ethan Li

Hi Yangze, Thanks for your reply. Those are the docs I have read and followed. (I was also able to set up a standalone flink cluster with secure HDFS, Zookeeper and Kafa. ) Could you please let me know what I am missing? Thanks Best, Ethan > On Jan 10, 2020, at 6:28 AM, Yangze Guo wrote: >

Re: Yarn Kerberos issue

2020-01-10 Thread Aljoscha Krettek

Hi, Interesting! What problem are you seeing when you don't unset that environment variable? From reading UserGroupInformation.java our code should almost work when that environment variable is set. Best, Aljoscha On 10.01.20 15:23, Juan Gentile wrote: Hello Aljoscha! The way we send the

Re: Yarn Kerberos issue

2020-01-10 Thread Juan Gentile

Hello Aljoscha! The way we send the DTs to spark is by setting an env variable (HADOOP_TOKEN_FILE_LOCATION) which interestingly we have to unset when we run Flink because even if we do kinit that variable affects somehow Flink and doesn’t work. I’m not an expert but what you describe (We wou

Re: How can I find out which key group belongs to which subtask

2020-01-10 Thread Till Rohrmann

Hi, you would need to set the co-location constraint in order to ensure that the sub-tasks of operators are deployed to the same machine. It effectively means that subtasks a_i, b_i of operator a and b will be deployed to the same slot. This feature is not super well exposed but you can take a lo

Re: Yarn Kerberos issue

2020-01-10 Thread Aljoscha Krettek

Hi, it seems I hin send to early, my mail was missing a small part. This is the full mail again: to summarize and clarify various emails: currently, you can only use Kerberos authentication via tickets (i.e. kinit) or keytab. The relevant bit of code is in the Hadoop security module [1]. Her

Re: Yarn Kerberos issue

2020-01-10 Thread Aljoscha Krettek

Hi Juan, to summarize and clarify various emails: currently, you can only use Kerberos authentication via tickets (i.e. kinit) or keytab. The relevant bit of code is in the Hadoop security module: [1]. Here you can see that we either use keytab. I think we should be able to extend this to al

Re: Please suggest helpful tools

2020-01-10 Thread Congxian Qiu

Hi For expired checkpoint, you can find something like " Checkpoint xxx of job xx expired before completing" in jobmanager.log, then you can go to the checkpoint UI to find which tasks did not ack, and go to these tasks to see what happened. If checkpoint was been declined, you can find something

Re: [Question] Failed to submit flink job to secure yarn cluster

2020-01-10 Thread Yangze Guo

Hi, Ethan You could first check your cluster following this guide and check if all the related config[2] set correctly. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/security-kerberos.html [2] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#secu

Re: Flink Job claster scalability

2020-01-10 Thread Yangze Guo

Hi KristoffSC As Zhu said, Flink enables slot sharing[1] by default. This feature is nothing to do with the resource of your cluster. The benefit of this feature is written in [1] as well. I mean, it will not detect how many slots in your cluster and adjust its behavior toward this number. If you

Re: Running Flink on java 11

2020-01-10 Thread Chesnay Schepler

In regards to what we test: We run our tests against Java 8 *and *Java 11, with the compilation and testing being done with the same JDK. In other words, we don't check whether Flink compiled with JDK 8 runs on JDK 11, but we currently have no reason to believe that there is a problem (and som

Re: Running Flink on java 11

2020-01-10 Thread Yangze Guo

Hi Krzysztof All the tests run with Java 11 after FLINK-13457[1]. Its fix version is set to 1.10. So, I fear 1.9.1 is not guaranteed to be running on java 11. I suggest you to wait for the release-1.10. [1]https://issues.apache.org/jira/browse/FLINK-13457 Best, Yangze Guo On Fri, Jan 10, 2020 a

Re: Please suggest helpful tools

2020-01-10 Thread Yun Tang

Hi Eva If checkpoint failed, please view the web UI or jobmanager log to see why checkpoint failed, might be declined by some specific task. If checkpoint expired, you can also access the web UI to see which tasks did not respond in time, some hot task might not be able to respond in time. Gen

Re: Flink Job claster scalability

2020-01-10 Thread KristoffSC

Hi Zhu Zhu, well In my last test I did not change the job config, so I did not change the parallelism level of any operator and I did not change policy regarding slot sharing (it stays as default one). Operator Chaining is set to true without any extra actions like "start new chain, disable chain e

Re: Running Flink on java 11

2020-01-10 Thread Krzysztof Chmielewski

Hi, Thank you for your answer. Btw it seams that you send the replay only to my address and not to the mailing list :) I'm looking forward to try out 1.10-rc then. Regarding second thing you wrote, that *"on Java 11, all the tests(including end to end tests) would be run with Java 11 profile now.

Re: Null result cannot be used for atomic types

2020-01-10 Thread Jingsong Li

Hi sunfulin, Looks like the error is happened in sink instead of source. Caused by: java.lang.NullPointerException: Null result cannot be used for atomic types. at DataStreamSinkConversion$5.map(Unknown Source) So the point is how did you write to sink. Can you share these codes? Best, Jin

Re: Elasticsink sometimes gives NoClassDefFoundError

2020-01-10 Thread Arvid Heise

I don't see a particular reason why you see this behavior. Chesnay's explanation is the only plausible way that this behavior can happen. I fear that without a specific log, we cannot help further. On Fri, Jan 10, 2020 at 5:06 AM Jayant Ameta wrote: > Also, the ES version I'm using is 5.6.7 > >

Re: How can I find out which key group belongs to which subtask

2020-01-10 Thread Zhijiang

Only chained operators can avoid record serialization cost, but the chaining mode can not support keyed stream. If you want to deploy downstream with upstream in the same task manager, it can avoid network shuffle cost which can still get performance benefits. As I know @Till Rohrmann has impleme

Re: Are there pipeline API's for ETL?

Re: Are there pipeline API's for ETL?

Re: Are there pipeline API's for ETL?

Are there pipeline API's for ETL?

Re: How can I find out which key group belongs to which subtask

Re: How can I find out which key group belongs to which subtask

Re: Please suggest helpful tools

Re: Yarn Kerberos issue

Re: StreamingFileSink doesn't close multipart uploads to s3?

Apache Flink - Sharing state in processors

Re: Running Flink on java 11

Re: Running Flink on java 11

Re: [Question] Failed to submit flink job to secure yarn cluster

Re: Yarn Kerberos issue

Re: Yarn Kerberos issue

Re: How can I find out which key group belongs to which subtask

Re: Yarn Kerberos issue

Re: Yarn Kerberos issue

Re: Please suggest helpful tools

Re: [Question] Failed to submit flink job to secure yarn cluster

Re: Flink Job claster scalability

Re: Running Flink on java 11

Re: Running Flink on java 11

Re: Please suggest helpful tools

Re: Flink Job claster scalability

Re: Running Flink on java 11

Re: Null result cannot be used for atomic types

Re: Elasticsink sometimes gives NoClassDefFoundError

Re: How can I find out which key group belongs to which subtask

29 matches

Site Navigation

Mail list logo

Footer information