Re: JDBC table source

2017-09-26 Thread Mohit Anchlia
long story short: implementing a JDBC TableSource for batch > query should be fairly easy. A true streaming solution that hooks into the > changelog stream of a table is not possible at the moment. > > Cheers, Fabian > > 2017-09-26 15:04 GMT-04:00 Mohit Anchlia : > >>

JDBC table source

2017-09-26 Thread Mohit Anchlia
We are looking to stream data from the database. Is there already a jdbc table source available for streaming?

Re: Deleting files in continuous processing

2017-08-21 Thread Mohit Anchlia
Just checking to see if there is a way to purge files after it's processed. On Tue, Aug 15, 2017 at 5:11 PM, Mohit Anchlia wrote: > Is there a way to delete a file once it has been processed? > > streamEnv > > .readFile(format, args[0], FileProcessingMode.*PROCESS_CONTINUOUSLY*, > 2000) >

Deleting files in continuous processing

2017-08-15 Thread Mohit Anchlia
Is there a way to delete a file once it has been processed? streamEnv .readFile(format, args[0], FileProcessingMode.*PROCESS_CONTINUOUSLY*, 2000)

Avoiding duplicates in joined stream

2017-08-15 Thread Mohit Anchlia
What's the best way to avoid duplicates in joined stream. In below code I get duplicates of "A" because I have multiple of "A" in fileInput3. SingleOutputStreamOperator fileInput3 = streamEnv.fromElements("A", "A") .assignTimestampsAndWatermarks(timestampAndWatermarkAssigner1); fileInput1.join(f

Re: Odd flink behaviour

2017-08-02 Thread Mohit Anchlia
ot know about the reached variable that you added in > your class. So there is no way it could reset it to false. > An alternative implementation without overriding open() could be to change > the reachedEnd method to check if the stream is still at offset 0. > > 2017-08-01 2

Re: Using FileInputFormat - org.apache.flink.streaming.api.functions.source.TimestampedFileInputSplit

2017-08-01 Thread Mohit Anchlia
This was user induced problem - me. I wasn't calling streamenv.execute() :( On Tue, Aug 1, 2017 at 1:29 PM, Mohit Anchlia wrote: > This doesn't work even with TextInputFormat. Not sure what's wrong. > > On Tue, Aug 1, 2017 at 9:53 AM, Mohit Anchlia > wrote: > &

Re: Using FileInputFormat - org.apache.flink.streaming.api.functions.source.TimestampedFileInputSplit

2017-08-01 Thread Mohit Anchlia
This doesn't work even with TextInputFormat. Not sure what's wrong. On Tue, Aug 1, 2017 at 9:53 AM, Mohit Anchlia wrote: > I don't see the print output. > > On Tue, Aug 1, 2017 at 2:08 AM, Fabian Hueske wrote: > >> Hi Mohit, >> >> these are jus

Re: Odd flink behaviour

2017-08-01 Thread Mohit Anchlia
per.open(); > reached = false; > } > > Cheers, Fabian > > > 2017-08-01 8:08 GMT+02:00 Mohit Anchlia : > >> I didn't override open. I am using open that got inherited from >> FileInputFormat . Am I supposed to specifically override open? >> >&g

Re: Using FileInputFormat - org.apache.flink.streaming.api.functions.source.TimestampedFileInputSplit

2017-08-01 Thread Mohit Anchlia
; > 2017-08-01 0:32 GMT+02:00 Mohit Anchlia : > >> I even tried existing format but still same error: >> >> FileInputFormat fileInputFormat = *new* TextInputFormat(*new* >> Path(args[0])); >> >> fileInputFormat.setNestedFileEnumeration(*true*); >

Re: Odd flink behaviour

2017-07-31 Thread Mohit Anchlia
I didn't override open. I am using open that got inherited from FileInputFormat . Am I supposed to specifically override open? On Mon, Jul 31, 2017 at 9:49 PM, Fabian Hueske wrote: > Do you set reached to false in open()? > > > Am 01.08.2017 2:44 vorm. schrieb "Mohit Anch

Re: Odd flink behaviour

2017-07-31 Thread Mohit Anchlia
PDF"); String content = new String( Files.readAllBytes(Paths.get(this.currentSplit.getPath().getPath(; logger.info("Content " + content); reached = true; return content; } } On Mon, Jul 31, 2017 at 5:09 PM, Mohit Anchlia wrote: > I have a very simple program tha

Odd flink behaviour

2017-07-31 Thread Mohit Anchlia
I have a very simple program that just reads all the files in the path. However, flink is not working as expected. Everytime I execute this job I only see flink reading 2 files, even though there are more in that directory. On closer look it appears that it might be related to: [flink-akka.actor.

Re: Using FileInputFormat - org.apache.flink.streaming.api.functions.source.TimestampedFileInputSplit

2017-07-31 Thread Mohit Anchlia
] INFO org.apache.flink.api.java.typeutils.TypeExtractor - class org.apache.flink.streaming.api.functions.source.TimestampedFileInputSplit does not contain a setter for field modificationTime [main] INFO org.apache.flink.api.java.typeutils.TypeExtractor - c On Mon, Jul 31, 2017 at 1:07 PM, Mohit

Using FileInputFormat - org.apache.flink.streaming.api.functions.source.TimestampedFileInputSplit

2017-07-31 Thread Mohit Anchlia
In trying to use this code I get the following error. Is it asking me to implement additional interface? streamEnv.readFile(format, args[0], FileProcessingMode. *PROCESS_CONTINUOUSLY*, 2000).print(); [main] INFO com.s.flink.example.PDFInputFormat - Start streaming [main] INFO org.apache.flink.a

Re: Customer inputformat

2017-07-31 Thread Mohit Anchlia
flink/api/common/io/DelimitedInputFormat.java:public >> abstract class DelimitedInputFormat extends FileInputFormat >> implements Checkpoi >> >> flink-streaming-java/src/test/java/org/apache/flink/streamin >> g/runtime/operators/ContinuousFileProcessingRescalingTest.java: >>

Re: Invalid path exception

2017-07-31 Thread Mohit Anchlia
ws, you need to use "file:/c:/proj/..." with just one > slash after the scheme. > > > > On Mon, Jul 31, 2017 at 1:24 AM, Mohit Anchlia > wrote: > >> This is what I tired and it doesn't work. Is this a bug? >> >> format.setFilePath("fi

Re: Invalid path exception

2017-07-30 Thread Mohit Anchlia
This is what I tired and it doesn't work. Is this a bug? format.setFilePath("file:///c:/proj/test/a.txt.txt"); On Sun, Jul 30, 2017 at 2:10 PM, Chesnay Schepler wrote: > Did the path by chance start with file://C:/... ? > > If so, please try file:///C: ... > >

Invalid path exception

2017-07-30 Thread Mohit Anchlia
I am using flink 1.3.1 and getting this exception. Is there a workaround? Caused by: *java.nio.file.InvalidPathException*: Illegal char <:> at index 2: /C:/Users/m/default/flink-example/pom.xml at sun.nio.fs.WindowsPathParser.normalize(Unknown Source) at sun.nio.fs.WindowsPathParser.parse(Unknow

Re: Customer inputformat

2017-07-30 Thread Mohit Anchlia
ur PDFFileInputFormat on FileInputFormat > and set unsplittable to true. > FileInputFormat comes with lots of built-in functionality such as > InputSplit generation. > > Cheers, Fabian > > 2017-07-30 3:41 GMT+02:00 Mohit Anchlia : > >> Hi, >> >> I created a cu

Customer inputformat

2017-07-29 Thread Mohit Anchlia
Hi, I created a custom input format. Idea behind this is to read all binary files from a directory and use each file as it's own split. Each split is read as one whole record. When I run it in flink I don't get any error but I am not seeing any output from .print. Am I missing something? *p

Re: Reading static data

2017-07-14 Thread Mohit Anchlia
gt; one side of the stream that could be updated from time to time and will > always propagated (using a broadcast()) to all workers that do filtering, > augmentation etc. > > [1] http://training.data-artisans.com/dataStream/1-intro.html > > I hope this helps. > > Timo > >

Reading static data

2017-07-12 Thread Mohit Anchlia
What is the best way to read a map of lookup data? This lookup data is like a small short lived data that is available in transformation to do things like filtering, additional augmentation of data etc.

Re: Connecting workflows in batch

2017-03-02 Thread Mohit Anchlia
to schedule the second job, then it should be ok to combine both > jobs in one program and execute the second job after the first one has > completed. > > Cheers, > Till > ​ > > On Thu, Mar 2, 2017 at 2:33 AM, Mohit Anchlia > wrote: > >> It looks li

Re: Connecting workflows in batch

2017-03-01 Thread Mohit Anchlia
_api.html > > You would have to know the ID of your job and then you can poll the status > of your running jobs. > > On Mon, 27 Feb 2017 at 18:15 Mohit Anchlia wrote: > > What's the best way to track the progress of the job? > > On Mon, Feb 27, 2017 at 7:56 AM, Aljos

Thread safety

2017-02-27 Thread Mohit Anchlia
Trying to understand what parts of flink have thread safety built in them. Key question is, are the objects created in flink shared between threads (slots)? For eg: if I create a sink function and open a file is that shared between threads?

Re: Connecting workflows in batch

2017-02-27 Thread Mohit Anchlia
> execution of the next one. > > Best, > Aljoscha > > On Fri, 24 Feb 2017 at 19:16 Mohit Anchlia wrote: > >> Is there a way to connect 2 workflows such that one triggers the other if >> certain condition is met? However, the workaround may be to insert a >>

Re: Serialization schema

2017-02-26 Thread Mohit Anchlia
tputStream.java:1548) >>>>> >>>> > com.sy.flink.test.Tuple2Serializerr$1: this states that an anonymous > inner class in `Tuple2Serializerr` is not serializable. > > Could you check if that’s the case? > > > > On February 24, 2017 at 3:10:58 PM

Re: Java 7 -> 8 - Association failed error

2017-02-24 Thread Mohit Anchlia
se/YARN-4714 > > Cheers > > On Fri, Feb 24, 2017 at 2:41 PM, Mohit Anchlia > wrote: > >> Figured out. It looks like there is a virtual memory limit check enforced >> in yarn which just surfaced with java 8 >> >> On Fri, Feb 24, 2017 at 2:09 PM, Mohit Anchlia

Re: Java 7 -> 8 - Association failed error

2017-02-24 Thread Mohit Anchlia
Figured out. It looks like there is a virtual memory limit check enforced in yarn which just surfaced with java 8 On Fri, Feb 24, 2017 at 2:09 PM, Mohit Anchlia wrote: > I recently upgraded the cluster from java 7 to java 8. Now when I run > flink on a yarn cluster I see errors: Even

Java 7 -> 8 - Association failed error

2017-02-24 Thread Mohit Anchlia
I recently upgraded the cluster from java 7 to java 8. Now when I run flink on a yarn cluster I see errors: Eventually application gives up and terminates. Any suggestions? Association with remote system [akka.tcp://flink@slave:35543] has failed, address is now gated for [5000] ms. Reason: [Disass

Connecting workflows in batch

2017-02-24 Thread Mohit Anchlia
Is there a way to connect 2 workflows such that one triggers the other if certain condition is met? However, the workaround may be to insert a notification in a topic to trigger another workflow. The problem is that the addSink ends the flow so if we need to add a trigger after addSink there doesn'

Re: Serialization schema

2017-02-23 Thread Mohit Anchlia
> On February 24, 2017 at 2:43:38 PM, Mohit Anchlia (mohitanch...@gmail.com) > wrote: > > This is at high level what I am doing: > > Serialize: > > String s = tuple.getPos(0) + "," + tuple.getPos(1); > return s.getBytes() > > Deserialize: > > String

Re: Serialization schema

2017-02-23 Thread Mohit Anchlia
hole codes of Tuple2Serializerr. I guess the >> reason is some fields of Tuple2Serializerr do not implement Serializable. >> >> 2017-02-24 9:07 GMT+08:00 Mohit Anchlia : >> >>> I wrote a key serialization class to write to kafka however I am getting >>> this error. No

Re: Serialization schema

2017-02-23 Thread Mohit Anchlia
I am using String inside to convert into bytes. On Thu, Feb 23, 2017 at 6:50 PM, 刘彪 wrote: > Hi Mohit > As you did not give the whole codes of Tuple2Serializerr. I guess the > reason is some fields of Tuple2Serializerr do not implement Serializable. > > 2017-02-24 9:07 GMT+08:0

Serialization schema

2017-02-23 Thread Mohit Anchlia
I wrote a key serialization class to write to kafka however I am getting this error. Not sure why as I've already implemented the interfaces. Caused by: java.io.NotSerializableException: com.sy.flink.test.Tuple2Serializerr$1 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.jav

Re: Writing Tuple2 to a sink

2017-02-23 Thread Mohit Anchlia
Schema, maybe > named Tuple2SerializationSchema. > > 2017-02-22 7:17 GMT+08:00 Mohit Anchlia : > >> What's the best way to retrieve both the values in Tuple2 inside a custom >> sink given that the type is not known inside the sink function? >> > >

Writing Tuple2 to a sink

2017-02-21 Thread Mohit Anchlia
What's the best way to retrieve both the values in Tuple2 inside a custom sink given that the type is not known inside the sink function?

Re: Flink not reading from Kafka

2017-02-17 Thread Mohit Anchlia
Interestingly enough same job runs ok on Linux but not on windows On Fri, Feb 17, 2017 at 4:54 PM, Mohit Anchlia wrote: > I have this code trying to read from a topic however the flink process > comes up and waits forever even though there is data in the topic. Not sure > why? Has an

Flink not reading from Kafka

2017-02-17 Thread Mohit Anchlia
I have this code trying to read from a topic however the flink process comes up and waits forever even though there is data in the topic. Not sure why? Has anyone else seen this problem? StreamExecutionEnvironment env = StreamExecutionEnvironment .*createLocalEnvironment*(); Properties propertie

Re: Further aggregation possible after sink?

2017-02-13 Thread Mohit Anchlia
dding stages, but then your sink is no more a sink - it > would have transformed into a map or a flatmap ! > > On Mon, Feb 13, 2017 at 12:34 PM Mohit Anchlia > wrote: > >> Is it possible to further add aggregation after the sink task executes? >> Or is the sink the last

Further aggregation possible after sink?

2017-02-13 Thread Mohit Anchlia
Is it possible to further add aggregation after the sink task executes? Or is the sink the last stage of the workflow? Is this flow possible? start stream -> transform -> load (sink) -> mark final state as loaded in a table after all the load was successful in previous state (sink)

Hadoop 2.7.3

2017-02-10 Thread Mohit Anchlia
Does Flink support Hadoop 2.7.3? I installed Flink for HAdoop 2.7.0 but seeing this error: 2017-02-10 18:59:52,661 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster

Dealing with latency in Sink

2017-02-06 Thread Mohit Anchlia
What is the best way to dynamically adapt and tune down number of tasks created to write/read to a sink when sink slows down or the latency to sink increases? I am looking at the sink interface but don't see a way to influence flink to reduce the number of tasks or throttle the volume down to the s

Re: Clarification on state backend parameters

2017-02-04 Thread Mohit Anchlia
ibuted file system like HDFS that is > also accessible from each node, so that operators can be recovered on > different machines in case of machine failures. > > Am 03.02.2017 um 20:55 schrieb Mohit Anchlia : > > I thought rocksdb is used to as a store backend. If that is

Re: Parallelism and Partitioning

2017-02-03 Thread Mohit Anchlia
Any information on this would be helpful. On Thu, Feb 2, 2017 at 5:09 PM, Mohit Anchlia wrote: > What is the granularity of parallelism in flink? For eg: if I am reading > from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2 > consumer threads and allocates i

Re: Clarification on state backend parameters

2017-02-03 Thread Mohit Anchlia
e RocksDB instance data and > has in fact nothing to do with checkpoints. > > Best, > Stefan > > Am 03.02.2017 um 01:45 schrieb Mohit Anchlia : > > Trying to understand these 3 parameters: > > state.backend > state.backend.fs.checkpointdir > state.backend.rocksd

Parallelism and Partitioning

2017-02-02 Thread Mohit Anchlia
What is the granularity of parallelism in flink? For eg: if I am reading from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2 consumer threads and allocates it on 2 separate task managers? Also, it would be good to understand the difference between parallelism and partitioning

Clarification on state backend parameters

2017-02-02 Thread Mohit Anchlia
Trying to understand these 3 parameters: state.backend state.backend.fs.checkpointdir state.backend.rocksdb.checkpointdir state.checkpoints.dir As I understand stream of data and the state of operators are 2 different concepts and that both need to be checkpointed. I am bit confused about the pur