Re: Reading files from multiple subdirectories

2020-06-11 Thread Yun Gao
Hi Lorenzo, Read from a previouse thread [1] and the source code, I think you may set inputFormat.setNestedFileEnumeration(true) to also scan the nested files. Best, Yun [1] https://lists.apache.org/thread.html/86a23b4c44d92c3adeb9ff4a708365fe4099796fb32deb6319e0e17f%40%3Cuser.flink.apache

Re: Insufficient number of network buffers- what does Total mean on the Flink Dashboard

2020-06-11 Thread Xintong Song
Hi Vijay, The memory configurations in Flink 1.9 and previous versions are indeed complicated and confusing. That is why we made significant changes to it in Flink 1.10. If possible, I would suggest upgrading to Flink 1.10, or the upcoming Flink 1.11 which is very likely to be released in this mon

Re: The network memory min (64 mb) and max (1 gb) mismatch

2020-06-11 Thread Xintong Song
Hi Clay, Could you verify the "taskmanager.sh" used is the same script shipped with Flink-1.10.1? Or a custom script is used? Also, does the jar file "bash-java-utils.jar" exist in your Flink bin directory? In Flink 1.10, the memory configuration for a TaskManager works as follows. - "taskman

Re: Does anyone have an example of Bazel working with Flink?

2020-06-11 Thread Austin Cawley-Edwards
Hey all, Adding to Aaron's response, we use Bazel to build our Flink apps. We've open-sourced some of our setup here[1] though a bit outdated. There are definitely rough edges/ probably needs a good deal of work to fit other setups. We have written a wrapper around the `java_library` and `java_bin

Re: Does anyone have an example of Bazel working with Flink?

2020-06-11 Thread Aaron Levin
Hi Dan, We use Bazel to compile our Flink applications. We're using "rules_scala" ( https://github.com/bazelbuild/rules_scala) to manage the dependencies and produce jars. We haven't had any issues. However, I have found that sometimes it's difficult to figure out exactly what Flink target or depe

Does anyone have an example of Bazel working with Flink?

2020-06-11 Thread Dan Hill
I took the Flink playground and I'm trying to swap out Maven for Bazel. I got to the point where I'm hitting the following error. I want to diff my code with an existing, working setup. Thanks! - Dan client_1| org.apache.flink.client.program.ProgramInvocationException: Neither

Re: Shipping Filesystem Plugins with YarnClusterDescriptor

2020-06-11 Thread Kostas Kloudas
Hi John, I think that using different plugins is not going to be an issue, assuming that the scheme of your FS's do not collide. This is already the case for S3 within Flink, where we have 2 implementations, one based on Presto and one based on Hadoop. For the first you can use the scheme s3p whil

The network memory min (64 mb) and max (1 gb) mismatch

2020-06-11 Thread Clay Teeter
Hi flink fans, I'm hoping for an easy solution. I'm trying to upgrade my 9.3 cluster to flink 10.1, but i'm running into memory configuration errors. Such as: *Caused by: org.apache.flink.configuration.IllegalConfigurationException: The network memory min (64 mb) and max (1 gb) mismatch, the net

Error incompatible types for field cpuCores when doing Flink Upgrade

2020-06-11 Thread Claude Murad
Hello, I upgraded Flink from 1.7 to 1.10 in Kubernetes. When the job manager is launched, the following exception occurs. If I do some cleanup in zookeeper and re-start, it will work. Any ideas about this error and what needs to be done without having to do cleanup in zookeeper? ERROR org.apac

Re: Understading Flink statefun deployment

2020-06-11 Thread Seth Wiesman
Hi Francesco, No, that architecture is not possible. I'm not sure if you've used Flink's DataStream API but embedded functions under the hood are very much like lightweight process functions. If you have a single DataStream application with two process functions you cannot scale their workers inde

Re: Shipping Filesystem Plugins with YarnClusterDescriptor

2020-06-11 Thread John Mathews
So I think that will work, but it has some limitations. Namely, when launching clusters through a service (which is our use case), it can be the case that multiple different clients want clusters with different plugins or different versions of a given plugin, but because the FlinkClusterDescriptor

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2020-06-11 Thread Matt Magsombol
I'm not the original poster, but I'm running into this same issue. What you just described is exactly what I want. I presume you guys are using some variant of this helm https://github.com/docker-flink/examples/tree/master/helm/flink to configure your k8s cluster? I'm also assuming that this cl

Reading files from multiple subdirectories

2020-06-11 Thread Lorenzo Nicora
Hi, related to the same case I am discussing in another thread, but not related to AVRO this time :) I need to ingest files a S3 Sink Kafka Connector periodically adds to an S3 bucket. Files are bucketed by date time as it often happens. Is there any way, using Flink only, to monitor a base-path

Re: Reading from AVRO files

2020-06-11 Thread Lorenzo Nicora
Hi Arvit, I followed your instructions for the breakpoint in SpecificDatumReader.readField *with AVRO 1.8.2*, For all timestamp-millis fields (I have many): Conversion conversion = ((SpecificRecordBase) r).getConversion(f.pos()); returns null for all timestamp-millis fields (I have many), so.

Re: [EXTERNAL] Re: Inconsistent checkpoint durations vs state size

2020-06-11 Thread Slotterback, Chris
Interestingly, it appears to have been related to the stream application design that was causing incremental checkpointing issues. Once the checkpoints started failing, they would cause a positive feedback loop of failure as more and more data built up to write, and other exceptions would pop up

streaming restored state after restart

2020-06-11 Thread Adam Atrea
Hi, I'm new to Flink - but after reading the documentation - What would be the best approach to stream data from a restored state following a job restart ? Say I have a MapState that gets populated during streaming with various computed results within a stateful operator. Upon job restart or on

Re: Reading from AVRO files

2020-06-11 Thread Arvid Heise
Sorry forget my last mail, that was half-finished. Here is the real one: Hi Lorenzo, if you still have time to investigate. Your stack trace shows that all expected code paths have been taken. Conversions are there; although they look different than here, but that can be attributed to the avro

Re: Reading from AVRO files

2020-06-11 Thread Arvid Heise
Hi Lorenzo, if you still have time to investigate. Your stack trace shows that all expected code paths have been taken. Conversions are there although they look different than here, but that can be attributed to the avro upgrade. @Override protected void readField(Object r, Schema.Field f, Objec

Re: Timer metric in Flink

2020-06-11 Thread Chesnay Schepler
There are no immediate plans, mostly because timers are fairly expensive and represent an easy trap to trashing performance of invalidating benchmark results. On 11/06/2020 13:11, Vinay Patil wrote: Ohh Okay, basically implement the Gauge and add timer functionality to it for now. Is there a

Re: Reading from AVRO files

2020-06-11 Thread Lorenzo Nicora
Thanks Gouwei, setting format.setReuseAvroValue(false) with 1.8.2-generated records does not solve the problem. 12:02:59,314 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Discarding checkpoint 1 of job 46ea458aff2a496c4617a6b57e4de937. java.lang.ClassCastException: java.la

Re: Timer metric in Flink

2020-06-11 Thread Vinay Patil
Ohh Okay, basically implement the Gauge and add timer functionality to it for now. Is there a plan or JIRA ticket to add Timer metric in future release, I think it is good to have Regards, Vinay Patil On Wed, Jun 10, 2020 at 5:55 PM Chesnay Schepler wrote: > You cannot add custom metric types

Restore from savepoint through Java API

2020-06-11 Thread Abhishek Rai
Hello, I'm writing a test for my custom sink function. The function is stateful and relies on checkpoint restores for maintaining consistency with the external system that it's writing to. For integration testing of the sink function, I have a MiniCluster based environment inside a single JVM th

Re: Reading from AVRO files

2020-06-11 Thread Guowei Ma
Hi, for the 1.8.2(case 1) you could try the format.setReuseAvroValue(false); Best, Guowei Lorenzo Nicora 于2020年6月11日周四 下午5:02写道: > Hi Arvid, > > thanks for the point about catching records. Gotcha! > > Sorry I cannot share the full schema or generated code. It's a 3rd party > IP and we signed

Re: Reading from AVRO files

2020-06-11 Thread Lorenzo Nicora
Hi Arvid, thanks for the point about catching records. Gotcha! Sorry I cannot share the full schema or generated code. It's a 3rd party IP and we signed a meter-think NDA... I think I can post snippets. The schema is heavily nested, including arrays of other record types Types are primitives, or

Re: Reading from AVRO files

2020-06-11 Thread Guowei Ma
Hi, I write a test for case 1 but it does not throw any exception. I use the org.apache.flink.formats.avro.generated.JodaTimeRecord for the test. Could you check whether AccountEntries.class has following code: private static final org.apache.avro.Conversion[] conversions = new org.apache.avr

Re: Reading from AVRO files

2020-06-11 Thread Guowei Ma
Hi, I write a test for the case 1 but it does not throw any exception. I use the org.apache.flink.formats.avro.generated.JodaTimeRecord for the test. Best, Guowei Arvid Heise 于2020年6月11日周四 下午3:58写道: > Hi Lorenzo, > > I'm glad that it worked out somehow, but I'd still like to understand what > w

Re: Reading from AVRO files

2020-06-11 Thread Arvid Heise
Hi Lorenzo, I'm glad that it worked out somehow, but I'd still like to understand what went wrong, so it will work more smoothly for future users. I double checked and we even test AvroSerializer with logical types, so I'm a bit puzzled. Could you attach GlHeader or at least show us how GlHeader#

Re: How to use Hbase Connector Sink

2020-06-11 Thread Caizhi Weng
Hi, The stack trace indicates that your query schema does not match with your sink schema. It seems that `active_ratio*25 score` in your query is a double value, not a `ROW` you declared in your sink. op <520075...@qq.com> 于2020年6月11日周四 下午3:31写道: > hi > flink1.10,wen i want to sink data to hbase

Re: How to use Hbase Connector Sink

2020-06-11 Thread godfrey he
hi, you should make sure the types of the selected fields and the types of sink table are the same, otherwise you will get the above exception. you can change `active_ratio*25 score` to row type, just like: insert into circle_weight select rowkey, ROW(info) from ( select concat_ws('_',circleName,

How to use Hbase Connector Sink

2020-06-11 Thread op
hi  flink1.10??wen i want to sink data to hbase table like this??  bstEnv.sqlUpdate("""CREATE TABLE circle_weight (                            rowkey String,                            info ROW

Re: Reading from AVRO files

2020-06-11 Thread Lorenzo Nicora
Hi Arvid, answering to your other questions Here is the stacktrace of the case (1), when I try to read using specific records generated by the AVRO 1.8.2 plugin java.lang.ClassCastException: java.lang.Long cannot be cast to org.joda.time.DateTime at com.tenx.client.generalledger.event.GlHeader.

Re: Flink Async IO operator tuning / micro-benchmarks

2020-06-11 Thread Arti Pande
Hi Arvid, Thanks for a quick reply. The second reference link ( http://codespeed.dak8s.net:8000/timeline/?ben=asyncWait.ORDERED&env=2) from your answer is not accessible though. Could you share some more numbers from it? Are these benchmarks published somewhere? Without actual IO call, Async IO