Re: Attach remote debugger to task executor

2021-03-15 Thread Reggie Quimosing
Thanks Julian, that worked! I totally missed this in the documentation. On Mon, Mar 15, 2021 at 4:06 PM Jaffe, Julian wrote: > You can use `env.java.opts.taskmanager` to specify java options for the > task managers specifically. Be aware you may want to set `suspend=n` or be > sure to attach yo

Re: Can I use PyFlink together with PyTorch/Tensorflow/PyTorch

2021-03-15 Thread Xingbo Huang
Hi Yik San, Thanks for the investigation of PyFlink together with all these ML libs. IMO, you could refer to the flink-ai-extended project that supports the Tensorflow on Flink, PyTorch on Flink etc, whose repository url is https://github.com/alibaba/flink-ai-extended. Flink AI Extended is a proje

Find many strange measurements in metrics database of influxdb

2021-03-15 Thread Tim yu
Hi all, I run many flink jobs that contains sql, they reports the metrics to infuxdb. I find many strange measurements in metrics database of influxdb, e.g. "from table1". Does sql produce those measurements ? What is the meanings of those measurements ? -- tim yu

Is it possible to mount node local disk for task managers in a k8s application cluster?

2021-03-15 Thread Chen-Che Huang
Hi, We use the per-job deployment mode to deploy our Flink services on Kubernetes. We're considering to move from the per-job mode to the application mode in view of the advantages of the application mode. However, it seems that `bin/flink run-application --target kubernetes-application` does n

Re: Application cluster - Best Practice

2021-03-15 Thread Yang Wang
I have created a ticket FLINK-21807[1] to track this requirement. [1]. https://issues.apache.org/jira/browse/FLINK-21807 Best, Yang Tamir Sagi 于2021年3月16日周二 上午1:11写道: > Hey Yang, > > The operator gave me a good lead by "revealing" that Application Deployer > does exist and there is a way to do

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-15 Thread Alexey Trenikhun
No, I believe original exception was from 1.12.1 to 1.12.1 Thanks, Alexey From: Yun Tang Sent: Monday, March 15, 2021 8:07:07 PM To: Alexey Trenikhun ; Tzu-Li (Gordon) Tai ; user@flink.apache.org Subject: Re: EOFException on attempt to scale up job with RocksDB

Re: Flink + Hive + Compaction + Parquet?

2021-03-15 Thread Kurt Young
Hi Theo, Regarding your first 2 questions, the answer is yes Flink supports streaming write to Hive. And Flink also supports automatically compacting small files during streaming write [1]. (Hive and Filesystem shared the same mechanism to do compaction, we forgot to add a dedicated document for h

Re: Evenly Spreading Out Source Tasks

2021-03-15 Thread Xintong Song
If all the tasks have the same parallelism 36, your job should only allocate 36 slots. The evenly-spread-out-slots option should help in your case. Is it possible for you to share the complete jobmanager logs? Thank you~ Xintong Song On Tue, Mar 16, 2021 at 12:46 AM Aeden Jameson wrote: >

Re: Time Temporal Join

2021-03-15 Thread Satyam Shekhar
Hello folks, I would love to hear back your feedback on this. Regards, Satyam On Wed, Mar 10, 2021 at 6:53 PM Satyam Shekhar wrote: > Hello folks, > > I am looking to enrich rows from an unbounded streaming table by > joining it with a bounded static table while preserving rowtime for the > st

Re: Prefix Seek RocksDB

2021-03-15 Thread Yun Tang
Hi Rex, You could configure prefix seek via RocksDB's column family options [1]. Be careful to use correct prefix extractor. [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#passing-options-factory-to-rocksdb Best Fro

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-15 Thread Yun Tang
Hi, Could you describe what you observed in details? Which states you compare with the session window state "merging-window-set", the "newKeysInState" or "existingKeysInState"? BTW, since we use list state as main state for window operator and we use RocksDB's merge operation for window state

Unit Testing for Custom Metrics in Flink

2021-03-15 Thread Rion Williams
Hi all, Recently, I was working on adding some custom metrics to a Flink job that required the use of dynamic labels (i.e. capturing various counters that were "slicable" by things like tenant / source, etc.). I ended up handling it in a very naive fashion that would just keep a dictionary of met

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-15 Thread Yun Tang
Hi, Can you scale the job at the same version from 1.12.1 to 1.12.1? Best Yun Tang From: Alexey Trenikhun Sent: Tuesday, March 16, 2021 4:46 To: Tzu-Li (Gordon) Tai ; user@flink.apache.org Subject: Re: EOFException on attempt to scale up job with RocksDB state

Re: Checkpoint fail due to timeout

2021-03-15 Thread Alexey Trenikhun
Hi Roman, I took thread dump: "Source: digital-itx-eastus2 -> Filter (6/6)#0" Id=200 BLOCKED on java.lang.Object@5366a0e2 owned by "Legacy Source Thread - Source: digital-itx-eastus2 -> Filter (6/6)#0" Id=202 at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedSt

Re: Why does Flink FileSystem sink splits into multiple files

2021-03-15 Thread Yik San Chan
Thank you, it works. Best, Yik San Chan On Mon, Mar 15, 2021 at 5:30 PM David Anderson wrote: > The first time you ran it without having specified the parallelism, and so > you got the default parallelism -- which is greater than 1 (probably 4 or > 8, depending on how many cores your computer h

Re: Got “pyflink.util.exceptions.TableException: findAndCreateTableSource failed.” when running PyFlink example

2021-03-15 Thread Yik San Chan
Thanks for your help, it works. Best, Yik San Chan On Tue, Mar 16, 2021 at 10:03 AM Xingbo Huang wrote: > Hi, > > The problem is that the legacy DataSet you are using does not support the > FileSystem connector you declared. You can use blink Planner to achieve > your needs. > > >>> > t_env

Re: Got “pyflink.util.exceptions.TableException: findAndCreateTableSource failed.” when running PyFlink example

2021-03-15 Thread Xingbo Huang
Hi, The problem is that the legacy DataSet you are using does not support the FileSystem connector you declared. You can use blink Planner to achieve your needs. >>> t_env = BatchTableEnvironment.create( environment_settings=EnvironmentSettings.new_instance() .in_batch_mode().

Re: Python StreamExecutionEnvironment from_collection Kafka example

2021-03-15 Thread Xingbo Huang
Hi, >From the error message, I think the problem is no python interpreter on your TaskManager machine. You need to install a python 3.5+ interpreter on the TM machine, and this python environment needs to install pyflink (pip install apache-flink). For details, you can refer to the document[1]. [

Prefix Seek RocksDB

2021-03-15 Thread Rex Fenley
Hello! I'm wondering if Flink RocksDB state backend is pre-configured to have Prefix Seeks enabled, such as for Joins and Aggs on the TableAPI [1]? If not, what's the easiest way to configure this? I'd imagine this would be beneficial. Thanks! [1] https://github.com/facebook/rocksdb/wiki/Prefix-

Re: Attach remote debugger to task executor

2021-03-15 Thread Jaffe, Julian
You can use `env.java.opts.taskmanager` to specify java options for the task managers specifically. Be aware you may want to set `suspend=n` or be sure to attach your debugger promptly, otherwise the task manager may time out attempting to connect to the job manager (since it’s waiting for you t

Attach remote debugger to task executor

2021-03-15 Thread Reggie Quimosing
I'm running flink locally via ./start-cluster.sh, and submitting my job via ./flink run . I can attach a debugger to the job client process using either: export JVM_ARGS=-agentlib:jdwp=transport=dt_socket,server

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-15 Thread Alexey Trenikhun
Savepoint was taken with 1.12.1, I've tried to scale up using same version and 1.12.2 From: Tzu-Li (Gordon) Tai Sent: Monday, March 15, 2021 12:06 AM To: user@flink.apache.org Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Hi,

parquet protobuf output and aws athena support

2021-03-15 Thread Jin Yi
using ParquetProtoWriters , does anyone have this working with aws athena ingestion via aws glue crawls? the parquet files being generated by our flink job looks fine

Re: Python StreamExecutionEnvironment from_collection Kafka example

2021-03-15 Thread Robert Cullen
Okay, I added the jars and fixed that exception. However I have a new exception that is harder to decipher: 2021-03-15 14:46:20 org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFail

Re: Working with DataStreams of Java objects in Pyflink?

2021-03-15 Thread Kevin Lam
Hi Shuiqiang Chen, Thanks for the quick response. Oh I see, that's too bad POJO is not currently supported. I'd like to check if I understand your suggestion about RowType. You're suggesting something like: 1/ Define subclasses of RowType in Java/Scala to hold our java objects we want to manipul

Re: Application cluster - Best Practice

2021-03-15 Thread Tamir Sagi
Hey Yang, The operator gave me a good lead by "revealing" that Application Deployer does exist and there is a way to do what I was looking for programmatically. Just a quick note, the operator uses the same Flink methods internally to deploy the application cluster but also provide a way to sub

Re: Evenly Spreading Out Source Tasks

2021-03-15 Thread Aeden Jameson
Hi Xintong, Thanks for replying. Yes, you understood my scenario. Every task has the same parallelism since we're using FlinkSql unless there is a way to change the parallelism of the source task that I have missed. Your explanation of the setting makes sense and is what I ended up concluding

Re: Working with DataStreams of Java objects in Pyflink?

2021-03-15 Thread Shuiqiang Chen
Hi Kevin, Currently, POJO type is not supported in Python DataStream API because it is hard to deal with the conversion between Python Objects and Java Objects. Maybe you can use a RowType to represent the POJO class such as Types.ROW_NAME([id, created_at, updated_at], [Types.LONG(), Types.LONG(),

Re: Python StreamExecutionEnvironment from_collection Kafka example

2021-03-15 Thread Robert Metzger
Hey, are you sure the class is in the lib/ folder of all machines / instances, and you've restarted Flink after adding the files to lib/ ? On Mon, Mar 15, 2021 at 3:42 PM Robert Cullen wrote: > Shuiqiang, > > I added the flink-connector-kafka_2-12 jar to the /opt/flink/lib directory > > When sub

Re: Python StreamExecutionEnvironment from_collection Kafka example

2021-03-15 Thread Robert Cullen
Shuiqiang, I added the flink-connector-kafka_2-12 jar to the /opt/flink/lib directory When submitting this job to my flink cluster I’m getting this stack trace at runtime: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runti

Working with DataStreams of Java objects in Pyflink?

2021-03-15 Thread Kevin Lam
Hi all, Looking to use Pyflink to work with some scala-defined objects being emitted from a custom source. When trying to manipulate the objects in a pyflink defined MapFunction

Re: Netty LocalTransportException: Sending the partition request to 'null' failed

2021-03-15 Thread Robert Metzger
Hey Matthias, are you sure you can connect to 127.0.1.1, since everything between 127.0.0.1 and 127.255.255.255 is bound to the loopback device?: https://serverfault.com/a/363098 On Mon, Mar 15, 2021 at 11:13 AM Matthias Seiler < matthias.sei...@campus.tu-berlin.de> wrote: > Hi Arvid, > > I l

Re: flink参数问题

2021-03-15 Thread Yik San Chan
Hi lxk, Can you please translate the question to English, and provide more info so that people can help? Thanks. On Mon, Mar 15, 2021 at 2:20 PM lxk7...@163.com wrote: > > 大佬们,我现在flink的版本是flink 1.10,但是我通过-ynm 指定yarn上的任务名称不起作用,一直显示的是Flink per-job > cluster > > -- > l

Re: Questions with State Processor Api

2021-03-15 Thread Roman Khachatryan
Hi Yuri, I think you can achieve this by using "normal" flink operators and sinks. One thing that immediately comes to my mind are timers [1]. It should be simpler to implement and setup rather than with the State Processor API (though it seems doable via this API too). [1] https://ci.apache.org/

Re: Flink Read S3 Intellij IDEA Error

2021-03-15 Thread Robert Metzger
Since this error is happening in your IDE, I would recommend using the IntelliJ debugger to follow the filesystem initialization process and see where it fails to pick up the credentials. On Fri, Mar 12, 2021 at 11:11 PM sri hari kali charan Tummala < kali.tumm...@gmail.com> wrote: > Same error.

Re: Netty LocalTransportException: Sending the partition request to 'null' failed

2021-03-15 Thread Matthias Seiler
Hi Arvid, I listened to ports with netcat and connected via telnet and each node can connect to the other and itself. The `/etc/hosts` file looks like this ``` 127.0.0.1   localhost 127.0.1.1   node-2.example.com   node-2    node-1 ``` Is the second line the reason it fails? I also replaced all

Re: Why does Flink FileSystem sink splits into multiple files

2021-03-15 Thread David Anderson
The first time you ran it without having specified the parallelism, and so you got the default parallelism -- which is greater than 1 (probably 4 or 8, depending on how many cores your computer has). Flink is designed to be scalable, and to achieve that, parallel instances of an operator, such as

Re: Application cluster - Best Practice

2021-03-15 Thread Yang Wang
Hi Tamir, Thanks for sharing the information. I think you are right. If you dive into the code implementation of flink-native-k8s-operator[1], they are just same. @Till Rohrmann Do you think it is reasonable to make "ApplicationDeployer" interface as public? Then I believe it will be great easi

Re: Checkpoint fail due to timeout

2021-03-15 Thread Roman Khachatryan
Hello Alexey, Thanks for the details. It looks like backpressure is indeed the cause of the issue. You can check that by looking at the (succeeded) checkpoint start delay in the tasks following the suspected source (digital-itx-eastus2?). To be sure, you can take a thread dump (or profile) those

Re: Flink + Hive + Compaction + Parquet?

2021-03-15 Thread Flavio Pompermaier
What about using Apache Hudi o Apache Iceberg? On Thu, Mar 4, 2021 at 10:15 AM Dawid Wysakowicz wrote: > Hi, > > I know Jingsong worked on Flink/Hive filesystem integration in the > Table/SQL API. Maybe he can shed some light on your questions. > > Best, > > Dawid > On 02/03/2021 21:03, Theo Die

Handle late message with flink SQL

2021-03-15 Thread Yi Tang
We can get a stream from a DataStream api by SideOutput. But it's hard to do the same thing with Flink SQL. I have an idea about how to get the late records while using Flink SQL. Assuming we have a source table for the late records, then we can query late records on it. Obviously, it's not a rea

Why does Flink FileSystem sink splits into multiple files

2021-03-15 Thread Yik San Chan
The question is cross-posted on StackOverflow https://stackoverflow.com/questions/66634813/why-does-flink-filesystem-sink-splits-into-multiple-files . I want to use Flink to read from an input file, do some aggregation, and write the result to an output file. The job is in batch mode. See `wordcou

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-15 Thread Yang Wang
Feel free to share the terminated JobManager logs if you could reproduce this issue again. Maybe "kubectl logs {pod_name} --previous" could help. Best, Yang Alexey Trenikhun 于2021年3月15日周一 下午2:28写道: > With 1.12.1 it happened quite often, with 1.12.2 not that match, I think I > saw it once or tw

RE: uniqueness of name when constructing a StateDescriptor

2021-03-15 Thread Colletta, Edward
Thank you. -Original Message- From: Tzu-Li (Gordon) Tai Sent: Monday, March 15, 2021 3:05 AM To: user@flink.apache.org Subject: Re: uniqueness of name when constructing a StateDescriptor NOTICE: This email is from an external sender - do not click on links or attachments unless you

Re: Relation between Two Phase Commit and Kafka's transaction aware producer

2021-03-15 Thread Tzu-Li (Gordon) Tai
+ user@f.a.o (adding the conversation back to the user mailing list) On Fri, Mar 12, 2021 at 6:06 AM Kevin Kwon wrote: > Thanks Tzu-Li > > Interesting algorithm. Is consumer offset also committed to Kafka at the > last COMMIT stage after the checkpoint has completed? > Flink does commit the of

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-15 Thread Tzu-Li (Gordon) Tai
Hi, Could you provide info on the Flink version used? Cheers, Gordon -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: uniqueness of name when constructing a StateDescriptor

2021-03-15 Thread Tzu-Li (Gordon) Tai
Hi, The scope is per individual operator, i.e. a single KeyedProcessFunction instance cannot have multiple registered state with the same name. Cheers, Gordon -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: [Statefun] Interaction Protocol for Statefun

2021-03-15 Thread Tzu-Li (Gordon) Tai
Hi, Interesting idea! Just some initial thoughts and questions, maybe others can chime in as well. In general I think the idea of supporting more high-level protocols on top of the existing StateFun messaging primitives is good. For example, what probably could be categorized under this effort i