JSON to Parquet

2020-08-20 Thread Averell
JSON -> AVRO (GenericRecord) -> Parquet. Is that an option? I would want to be able to quickly/dynamically (as less code change as possible) change the JSON schema. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

SAX2 driver class org.apache.xerces.parsers.SAXParser not found

2020-08-27 Thread Averell
Hello, I have a Flink 1.10 job which runs in AWS EMR, checkpointing to S3a as well as writing output to S3a using StreamingFileSink. The job runs well until I add the Java Hadoop properties: /-Dfs.s3a.acl.default= BucketOwnerFullControl/. Since after that, the checkpoint process fails to complete

Re: JSON to Parquet

2020-08-27 Thread Averell
stion now. Thanks for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

2020-08-31 Thread Averell
. However, intermittenly my checkpoints still fail (about 10%). And whenever it fails, there are non-completed files left in S3 (attached screenshot below). I'm not sure whether those uncompleted files are expected, or is that a bug? Thanks and regards, Averell <http://apache-flink-user-mail

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

2020-09-12 Thread Averell
nd regards Averell <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/FlinkFileSink.png> -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

HA on AWS EMR

2020-10-18 Thread Averell
ashed. Do I miss something here? Thanks for your help. Regards, Averell [1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/yarn_setup.html -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: HA on AWS EMR

2020-10-19 Thread Averell
27;s dataDir, which is by default stored in the local storage of the EMR's master node. I'm trying to move this one to an EFS, in hope that it would help. Not sure whether this is a right approach. Thanks for your help. Regards, Averell [1] http://apache-flink-user-mailing-list-a

Re: HA on AWS EMR

2020-10-21 Thread Averell
some error messages attached below. Not sure that's a bug or expected behaviour. Thanks and best regards, Averell /07:39:33.906 [main-EventThread] ERROR org.apache.flink.shaded.curator4.org.apache.curator.ConnectionState - Authentication failed 07:40:11.585 [flink-akka.actor.defau

KryoException UnsupportedOperationException when writing Avro GenericRecords to Parquet

2020-10-21 Thread Averell
ing. I tried with both Flink 1.10.0 and 1.11.0, and currently stuck at this. Could you please help? Thanks and regards, Averell /com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: reserved (org.apache.avro.Schema$Field) fieldMap (org.apa

Re: KryoException UnsupportedOperationException when writing Avro GenericRecords to Parquet

2020-10-26 Thread Averell
Hello Till, Adding GenericRecordAvroTypeInfo(schema) does help. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: HA on AWS EMR

2020-10-27 Thread Averell
Hello Robert, Thanks for the info. That makes sense. I will save and cancel my jobs with 1.10, upgrade to 1.11, and restore the jobs from the savepoints. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Job is still in ACTIVE state after /jobs/:jobid/stop

2020-11-12 Thread Averell
, "now": 1605182656731, "timestamps": { "CANCELLING": 0, "FAILING": 0, "CANCELED": 0, "FINISHED": 0, "RUNNING": 1604016319495, "FAILED": 0, "RESTARTING": 0, "CREATED":

Re: Job is still in ACTIVE state after /jobs/:jobid/stop

2020-11-12 Thread Averell
I have some updates. Some weird behaviours were found. Please refer to the attached photo. All requests were sent via REST API The status of the savepoint triggered by that stop request (ID 11018) is "COMPLETED [Savepoint]", however, no checkpoint data has been persisted (in S3). The folder /`sav

Change to StreamingFileSink in Flink 1.10

2020-04-20 Thread Averell
tBuilder[_]] .withRollingPolicy(...) returns a RowFormatBuilder[_] .withBucketAssigner(...) returns Any/ I'm using Maven 3.6.0, Java 1.8.0_242, and Scala 2.11.12. Tried with/without IntelliJ, no difference. Not sure/understand what's wrong Thanks! Averell -- Sent from: http://apache-flink

Re: Change to StreamingFileSink in Flink 1.10

2020-04-20 Thread Averell
nd regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Change to StreamingFileSink in Flink 1.10

2020-04-20 Thread Averell
igner)*.asInstanceOf[RowFormatBuilder[IN, String, _]]* .build()/ Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Change to StreamingFileSink in Flink 1.10

2020-04-21 Thread Averell
Hello Leonard, Sivaprasanna, But my code was working fine with Flink v1.8. I also tried with a simple String DataStream, and got the same error. /StreamingFileSink .forRowFormat(new Path(path), new SimpleStringEncoder[String]()) .withRollingPolicy(DefaultRollingPolicy.b

Re: Change to StreamingFileSink in Flink 1.10

2020-04-22 Thread Averell
Thanks @Seth Wiesman and all. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-23 Thread Averell
but it didn't help (is it already too late? should that be there before the JM is started?) Thanks for your help. Averell / Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create checkpoint storage at checkpoint coordinator side. at org.apache.flink.runtime.ch

Re: Question about Scala Case Class and List in Flink

2020-04-24 Thread Averell
lds, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance./" I imported /org.apache.flink.streaming.api.scala._/ << is this enough to tell that I am using Scala API? Thanks and regar

Re: K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-24 Thread Averell
Thank you Yun Tang. Building my own docker image as suggested solved my problem. However, I don't understand why I need that while I already had that s3-hadoop jar included in my uber jar? Thanks. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.23360

Re: K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-26 Thread Averell
Hi David, Yang, Thanks. But I just tried to submit the same job on a YARN cluster using that same uberjar, and it was successful. I don't have flink-s3-fs-hadoop.jar anywhere in the lib or plugin folder. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-ar

Flink Metrics in kubernetes

2020-05-12 Thread Averell
gs that I need to care for when running in Kubernetes? Thanks a lot. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink Metrics in kubernetes

2020-05-12 Thread Averell
ve.2336050.n4.nabble.com/file/t1586/k8xDump.txt> . Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink Metrics in kubernetes

2020-05-13 Thread Averell
Hi Gary, Sorry for the false alarm. It's caused by a bug in my deployment - no metrics were added into the registry. Sorry for wasting your time. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Automatically resuming failed jobs in K8s

2020-06-10 Thread Averell
toring the job, I need to provide the full path of the last checkpoint (/s3:chk-2345//). Is there any option to just provide the base_path? 3. Store the info to restore the jobs in the K8s deployment config Thanks a lot. Regards, Averell -- Sent from: http://apache-flink-user-mailing-li

Re: Automatically resuming failed jobs in K8s

2020-06-12 Thread Averell
Thank you very much, Yang. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Preserving (best effort) messages order between operators

2019-10-30 Thread Averell
uld throttle the 1st operator when back-pressure is high, then I could mitigate the mentioned problem. But I could not find any guide on doing that. Could you please help? Thanks. Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: How to stream intermediate data that is stored in external storage?

2019-10-31 Thread Averell
Hi Kant, I wonder why you need to "source" your intermediate state from files? Why not "source" it from the previous operator? I.e. instead of (A join B) -> State -> files -> (C), why not do (A join B) -> State -> (files + C)? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.

Re: How to stream intermediate data that is stored in external storage?

2019-10-31 Thread Averell
ready-to-use options. Writing a custom sink function to write to your own SQL server is also a not-so-difficult solution. Best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Preserving (best effort) messages order between operators

2019-11-01 Thread Averell
hanks. Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

S3 file source - continuous monitoring - many files missed

2018-07-23 Thread Averell
Good day everyone, I have a Flink job that has an S3 folder as a source, and we keep putting thousands of small (around 1KB each) gzip files into that folder, with the rate of about 5000 files per minute. Here is how I created that source in Scala: / val my_input_format = new TextInputFormat(n

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell
Hi Jörn, Thanks. I had missed that EMRFS strong consistency configuration. Will try that now. We also had a backup solution - using Kinesis instead of S3 (I don't see Kinesis in your suggestion, but hope that it would be alright). "/The small size and high rate is not suitable for S3 or HDFS/" <<<

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell
Just some update: I tried to enable "EMRFS Consistent View" option, but it didn't help. Not sure whether that's what you recommended, or something else. Thanks! -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell
Could you please help explain more details on "/try read after write consistency (assuming the files are not modified) /"? I guess that the problem I got comes from the inconsistency in S3 files listing. Otherwise, I would have got exceptions on file not found. My use case is to read output files

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell
Hello Jörn. Thanks for your help. "/Probably the system is putting them to the folder and Flink is triggered before they are consistent./" <<< yes, I also guess so. However, if Flink is triggered before they are consistent, either (a) there should be some error messages, or (b) Flink should be abl

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell
Thank you Fabian. I tried to implement a quick test basing on what you suggested: having an offset from system time, and I did get improvement: with offset = 500ms - the problem has completely gone. With offset = 50ms, I still got around 3-5 files missed out of 10,000. This number might come from

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell
Hello Fabian, I created the JIRA bug https://issues.apache.org/jira/browse/FLINK-9940 BTW, I have one more question: Is it worth to checkpoint that list of processed files? Does the current implementation of file-source guarantee exactly-once? Thanks for your support. -- Sent from: http://ap

Re: S3 file source - continuous monitoring - many files missed

2018-07-25 Thread Averell
Thank you Fabian for the guide to implement the fix. I'm not quite clear about the best practice of creating JIRA ticket. I modified its priority to Major as you said that it is important. What would happen next with that issue then? Someone (anyone) will pick it and create a fix, then include tha

Re: S3 file source - continuous monitoring - many files missed

2018-07-30 Thread Averell
Here is my https://github.com/lvhuyen/flink implementation of the change. 3 files were updated: StreamExecutionEnvironment.java, StreamExecutionEnvironment.scala, and ContinuousFileMonitoringFunction.java. All the thanks to Fabian. -- Sent from: http://apac

Small-files source - partitioning based on prefix of file

2018-07-30 Thread Averell
und-robin. However, I could not find the call of "rebalancing" method, but "transform" is called. Not much information about that "transform" method though. Would it possible for me to ask for some guideline on this? Thanks for your help. Averell -- Sent from: http

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread Averell
Thank you Vino. Yes, I went thru that official guide before posting this question. The problem was that I could not see any call to one of those mentioned partitioning methods (partitionCustom, shuffle, rebalance, rescale, or broadcast) in the original readFile function. I'm still trying to look i

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread Averell
Thanks Vino. Yes, I can do that after the source function. But that means data would be shuffled - sending from every source to the right partition. I think that by doing the partition from within the file source would help to save that shuffling. Thanks. Averell. -- Sent from: http://apache

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread Averell
Oh, Thank you Vino. I was not aware of that reshuffling after every custom partitioning. Why would that needed though? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread Averell
Hi Vino, I'm a little bit confused. If I can do the partitioning from within the source function, using the same hash function on the key to identify the partition, would that be sufficient to avoid shuffling in the next byKey call? Thanks. Averell -- Sent from: http://apache-flink

Re: Detect late data in processing time

2018-07-30 Thread Averell
Hi Soheil, Why don't you just use the processing time as event time? Simply overriding extractTimestamp to return your processing time. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-07-31 Thread Averell
doing that yet. Thank you very much for your support. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-07 Thread Averell
Thank you Fabian. "/In either case, some record will be read twice but if reading position can be reset, you can still have exactly-once state consistency because the state is reset as well./" I do not quite understand this statement. If I have read 30 lines from the checkpoint and sent those 30 r

Re: Small-files source - partitioning based on prefix of file

2018-08-09 Thread Averell
Thank you Vino and Fabien for your help in answering my questions. As my files are small, I think there would not be much benefit in checkpointing file offset state. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-09 Thread Averell
. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-10 Thread Averell
stable/dev/stream/state/state.html> . However, I have not been able to find the information I am looking for. Please help point me to the right place. Thanks and best regards, Averell. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-12 Thread Averell
Thank you Fabian. It is clear to me now. Thanks a lot for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

CoFlatMapFunction with more than two input streams

2018-08-14 Thread Averell
licating the CoFlatMapFunction (*). With option 2, there's additional cost coming from unioning, splitting/selecting, and type-casting at the final streams. Is there any better option for me? Thank you very much for your support. Regards, Averell (*) I am using Scala, and I tried to c

Re: CoFlatMapFunction with more than two input streams

2018-08-15 Thread Averell
Regarding using Either wrapper, as my understanding, I would need to use that both in my sources (stream_A and B) and in the CoProcess/CoFlatMapFunction. Then using a super class Animal would be more convenient, wouldn't it? Thanks and regards, Averell -- Sent from: http://apache-flink-use

Re: CoFlatMapFunction with more than two input streams

2018-08-15 Thread Averell
quot;).map(_ match {    case Right(d) => d    case _ => NON_EXIST_DOG     }) val transformed_cat = transformed.select("cat").map(_ match {    case Left(c) => c    case _ => NON_EXIST_CAT     }) Thanks! Avere

Initial value of ValueStates of primitive types

2018-08-17 Thread Averell
Hi, In Flink's documents, I couldn't find any example that uses primitive type when working with States. What would be the initial value of a ValueState of type Int/Boolean/...? The same question apply for MapValueState like [String, Int] Thanks and regards, Averell -- Sent

Re: Initial value of ValueStates of primitive types

2018-08-17 Thread Averell
licit conversion: val y:Integer = getRuntimeContext.getState(new ValueStateDescriptor[Int]("Triggered", classOf[Int])).value() >> y = {Integer@5795} "0" Thanks for your time. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Raising a bug in Flink's unit test scripts

2018-08-24 Thread Averell
FLINK-9940 bug fix, or I have to raise a separated bug? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread Averell
of: (1) flink application killed (2) cluster crashed (3) one of the servers in the cluster crashed (4) unhandled exception raised when abnormal data received ... Could you please help explain? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-26 Thread Averell
Thank you Vino. I sometimes got the error message like the one below. It looks like my executors got overloaded. Here I have another question: is there any existing solution that allows me to have the job restored automatically? Thanks and best regards, Averell -- Sent from: http://apache

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-27 Thread Averell
tored automatically. I am trying to test my scenarios now. Found some issues, and I think it would be better to ask in a separate thread. Thanks and regards, Averell = org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 457d8f370ef8a50bb462946e1f12b80e)

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-27 Thread Averell
Hi Vino, Could you please tell where I should find the JM and TM logs? I'm running on an AWS EMR using yarn. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-27 Thread Averell
this is the expected behaviour, as my current checkpoint config is to checkpoint every 10s, and it took only a second or two for the listing of those 20K files. Am I correct here? And do we have a solution for this? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-lis

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-27 Thread Averell
Thank you, Vino. I found it, http://:8088/ Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-28 Thread Averell
files per minute) * I set the checkpointing interval = 1 minute. In this example, /will the 1st barrier be injected into the stream of file-splits 50 seconds after the 10,000th split, or after the 2,000th one?/ Sorry for being confusing. Thanks and best regards, Averell -- Sent from:

Re: Small-files source - partitioning based on prefix of file

2018-08-28 Thread Averell
Thank you Fabian. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

ElasticSearch 6 - error with UpdateRequest

2018-08-30 Thread Averell
Indexer: RequestIndexer): Unit = { requestIndexer.add(upsertRequest(element)) } What could be the issue here? Thanks for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: ElasticSearch 6 - error with UpdateRequest

2018-08-31 Thread Averell
t vs in my project? If you have some spare time, please help explain. I also would like to know the other way to fix that issue (that you implemented in your branch). Thanks a lot for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Promethus - custom metrics at job level

2018-09-02 Thread Averell
-flink-user-mailing-list-archive.2336050.n4.nabble.com/Service-discovery-for-Prometheus-on-YARN-td21988.html, which looks like a "no" answer to my question. However, I still hope for some easy to use solution. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-ma

Re: Promethus - custom metrics at job level

2018-09-03 Thread Averell
Thank you Reza. I will try your repo first :) Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Logging metrics from within Elasticsearch ActionRequestFailureHandler

2018-09-12 Thread Averell
had some better solution. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Logging metrics from within Elasticsearch ActionRequestFailureHandler

2018-09-13 Thread Averell
hanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Utilising EMR's master node

2018-09-16 Thread Averell
or the JM only? If that is not possible, should I have different hardware configurations between the master node and core nodes (smaller server for the master)? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Utilising EMR's master node

2018-09-17 Thread Averell
that be an expected behaviour? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Utilising EMR's master node

2018-09-19 Thread Averell
Hi Gary, Thanks for your help. Regarding TM configurations, in term of performance, when my 2 servers have 16 vcores each, should I have 2 TMs with 16GB mem, 16 task slots each, or 8 TMs with 4GB mem and 4 task slots each? Thanks and regards, Averell -- Sent from: http://apache-flink-user

Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell
monitor generates 0 records during that 15-20 minutes period) Could someone please help with this? Thank you very much for your time. Best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell
the global level would be too heavy. Thanks and regards, Averell == import java.util.Date import org.apache.flink.api.common.io.FilePathFilter import org.apache.flink.core.fs.Path import org.slf4j.LoggerFactory object SdcFilePathFilter { private val TIME_FORMAT = new

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell
Please refer to this version: === import java.util.Date import org.apache.flink.api.common.io.FilePathFilter import org.apache.flink.core.fs.Path import org.slf4j.LoggerFactory object SdcFilePathFilter { private val TIME_FORMAT = new java.text.SimpleDateFormat("MMdd hhmm

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-23 Thread Averell
help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-24 Thread Averell
, checkpointing process is not triggered though. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Scheduling sources

2018-09-24 Thread Averell
help give some help? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-24 Thread Averell
to ongoing checkpoint. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-25 Thread Averell
this 30s timer? Thanks and regards, Averell 2018-09-25 12:01:13.222 [Canceler/Interrupts for Source: Custom File Source (1/1) (a5f5434070044510eafc9103bc24af43).] WARN org.apache.flink.runtime.taskmanager.Task - Task 'Source: Custom File Source (1/1)' did not react to cancelling sig

Re: Scheduling sources

2018-09-25 Thread Averell
) gets enriched properly, I want to have (2a) loaded properly into memory before starting to process (1). Is there any walkaround solution for me in this case? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Utilising EMR's master node

2018-09-26 Thread Averell
configurations then still 4 TMs created, but the occupied ones are always on two different servers. I'm not sure whether that's EMR's issue, or YARN's or Flink's. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Scheduling sources

2018-09-26 Thread Averell
Hi Kostas, So that means my 2a will be broadcasted to all TMs? Is that possible to partition that? As I'm using CoProcessFunction to join 1 with 2. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Scheduling sources

2018-09-26 Thread Averell
Hi Tison, "/setting a timer on schedule start and trigger (1) after a fixed delay/" would be quite sufficient for me. Looking forward to the change of that Jira ticket's status. Thanks for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-arc

Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-26 Thread Averell
ogram.png> I'm using Prometheus with Grafana. Is that possible to do what I mentioned? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-27 Thread Averell
tp://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/Histo2.png> Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-28 Thread Averell
oring one) will be separated? Is that possible to disable checkpointing for that 2nd branch? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Difference between BucketingSink and StreamingFileSink

2018-10-04 Thread Averell
to parquet? Is that possible to partition my output basing on event-time? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Difference between BucketingSink and StreamingFileSink

2018-10-04 Thread Averell
Hi, https://issues.apache.org/jira/browse/FLINK-9749 <<< as per this ticket, StreamingFileSink is a newer option, which is better than BucketingSink for Parquet. Would love to see some example one using that. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mai

Re: Difference between BucketingSink and StreamingFileSink

2018-10-04 Thread Averell
rmat(new java.util.Date(in.getTimestamp))}" } override def getSerializer = SimpleVersionedStringSerializer.INSTANCE } Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Identifying missing events in keyed streams

2018-10-04 Thread Averell
not. Haven't found a guide for defining patterns of missing events. Could you please give some advices? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Streaming to Parquet Files in HDFS

2018-10-04 Thread Averell
ects/flink/flink-docs-master/dev/connectors/streamfile_sink.html> , buck-enconding can only combined with OnCheckpointRollingPolicy, which rolls on every checkpoint. So setting that CheckInterval makes no difference. So why should we expose that withBucketCheckInterval method? Thanks a

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Averell
ady-to-use solution that supports copying/moving file from HDFS to S3 (something like a trigger from Flink after it has finished writing to HDFS). Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Averell
What a great news. Thanks for that, Kostas. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Averell
HADOOP configurations? Thanks and best regards, Averell java.lang.Exception: unable to establish the security context at org.apache.flink.runtime.security.SecurityUtils.install(SecurityUtils.java:73) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1118) Caused by

"unable to establish the security context" with shaded Hadoop S3

2018-10-05 Thread Averell
stacktrace below. The shading of hadoop jars started from this ticket FLINK-10366 <https://issues.apache.org/jira/browse/FLINK-10366> . Googling the error didn't help. Could someone please help me? Thanks and best regards, Averell /Setting HADOOP_CONF_DIR=/etc/hadoop/conf be

Re: Utilising EMR's master node

2018-10-06 Thread Averell
Hi Gary, Thanks for the information. I didn't know that -yn is obsolete :( I am using Flink 1.6. Not sure whether that's a bug when I tried to set -yn explicitly, but I started only 1 cluster. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.

  1   2   >