long story short: implementing a JDBC TableSource for batch
> query should be fairly easy. A true streaming solution that hooks into the
> changelog stream of a table is not possible at the moment.
>
> Cheers, Fabian
>
> 2017-09-26 15:04 GMT-04:00 Mohit Anchlia :
>
>>
We are looking to stream data from the database. Is there already a jdbc
table source available for streaming?
Just checking to see if there is a way to purge files after it's processed.
On Tue, Aug 15, 2017 at 5:11 PM, Mohit Anchlia
wrote:
> Is there a way to delete a file once it has been processed?
>
> streamEnv
>
> .readFile(format, args[0], FileProcessingMode.*PROCESS_CONTINUOUSLY*,
> 2000)
>
Is there a way to delete a file once it has been processed?
streamEnv
.readFile(format, args[0], FileProcessingMode.*PROCESS_CONTINUOUSLY*, 2000)
What's the best way to avoid duplicates in joined stream. In below code I
get duplicates of "A" because I have multiple of "A" in fileInput3.
SingleOutputStreamOperator fileInput3 = streamEnv.fromElements("A",
"A")
.assignTimestampsAndWatermarks(timestampAndWatermarkAssigner1);
fileInput1.join(f
ot know about the reached variable that you added in
> your class. So there is no way it could reset it to false.
> An alternative implementation without overriding open() could be to change
> the reachedEnd method to check if the stream is still at offset 0.
>
> 2017-08-01 2
This was user induced problem - me. I wasn't calling streamenv.execute() :(
On Tue, Aug 1, 2017 at 1:29 PM, Mohit Anchlia
wrote:
> This doesn't work even with TextInputFormat. Not sure what's wrong.
>
> On Tue, Aug 1, 2017 at 9:53 AM, Mohit Anchlia
> wrote:
>
&
This doesn't work even with TextInputFormat. Not sure what's wrong.
On Tue, Aug 1, 2017 at 9:53 AM, Mohit Anchlia
wrote:
> I don't see the print output.
>
> On Tue, Aug 1, 2017 at 2:08 AM, Fabian Hueske wrote:
>
>> Hi Mohit,
>>
>> these are jus
per.open();
> reached = false;
> }
>
> Cheers, Fabian
>
>
> 2017-08-01 8:08 GMT+02:00 Mohit Anchlia :
>
>> I didn't override open. I am using open that got inherited from
>> FileInputFormat . Am I supposed to specifically override open?
>>
>&g
;
> 2017-08-01 0:32 GMT+02:00 Mohit Anchlia :
>
>> I even tried existing format but still same error:
>>
>> FileInputFormat fileInputFormat = *new* TextInputFormat(*new*
>> Path(args[0]));
>>
>> fileInputFormat.setNestedFileEnumeration(*true*);
>
I didn't override open. I am using open that got inherited from
FileInputFormat . Am I supposed to specifically override open?
On Mon, Jul 31, 2017 at 9:49 PM, Fabian Hueske wrote:
> Do you set reached to false in open()?
>
>
> Am 01.08.2017 2:44 vorm. schrieb "Mohit Anch
PDF");
String content = new String(
Files.readAllBytes(Paths.get(this.currentSplit.getPath().getPath(;
logger.info("Content " + content);
reached = true;
return content;
}
}
On Mon, Jul 31, 2017 at 5:09 PM, Mohit Anchlia
wrote:
> I have a very simple program tha
I have a very simple program that just reads all the files in the path.
However, flink is not working as expected.
Everytime I execute this job I only see flink reading 2 files, even though
there are more in that directory. On closer look it appears that it might
be related to:
[flink-akka.actor.
] INFO org.apache.flink.api.java.typeutils.TypeExtractor - class
org.apache.flink.streaming.api.functions.source.TimestampedFileInputSplit
does not contain a setter for field modificationTime
[main] INFO org.apache.flink.api.java.typeutils.TypeExtractor - c
On Mon, Jul 31, 2017 at 1:07 PM, Mohit
In trying to use this code I get the following error. Is it asking me to
implement additional interface?
streamEnv.readFile(format, args[0], FileProcessingMode.
*PROCESS_CONTINUOUSLY*, 2000).print();
[main] INFO com.s.flink.example.PDFInputFormat - Start streaming
[main] INFO org.apache.flink.a
flink/api/common/io/DelimitedInputFormat.java:public
>> abstract class DelimitedInputFormat extends FileInputFormat
>> implements Checkpoi
>>
>> flink-streaming-java/src/test/java/org/apache/flink/streamin
>> g/runtime/operators/ContinuousFileProcessingRescalingTest.java:
>>
ws, you need to use "file:/c:/proj/..." with just one
> slash after the scheme.
>
>
>
> On Mon, Jul 31, 2017 at 1:24 AM, Mohit Anchlia
> wrote:
>
>> This is what I tired and it doesn't work. Is this a bug?
>>
>> format.setFilePath("fi
This is what I tired and it doesn't work. Is this a bug?
format.setFilePath("file:///c:/proj/test/a.txt.txt");
On Sun, Jul 30, 2017 at 2:10 PM, Chesnay Schepler
wrote:
> Did the path by chance start with file://C:/... ?
>
> If so, please try file:///C: ...
>
>
I am using flink 1.3.1 and getting this exception. Is there a workaround?
Caused by: *java.nio.file.InvalidPathException*: Illegal char <:> at index
2: /C:/Users/m/default/flink-example/pom.xml
at sun.nio.fs.WindowsPathParser.normalize(Unknown Source)
at sun.nio.fs.WindowsPathParser.parse(Unknow
ur PDFFileInputFormat on FileInputFormat
> and set unsplittable to true.
> FileInputFormat comes with lots of built-in functionality such as
> InputSplit generation.
>
> Cheers, Fabian
>
> 2017-07-30 3:41 GMT+02:00 Mohit Anchlia :
>
>> Hi,
>>
>> I created a cu
Hi,
I created a custom input format. Idea behind this is to read all binary
files from a directory and use each file as it's own split. Each split is
read as one whole record. When I run it in flink I don't get any error but
I am not seeing any output from .print. Am I missing something?
*p
gt; one side of the stream that could be updated from time to time and will
> always propagated (using a broadcast()) to all workers that do filtering,
> augmentation etc.
>
> [1] http://training.data-artisans.com/dataStream/1-intro.html
>
> I hope this helps.
>
> Timo
>
>
What is the best way to read a map of lookup data? This lookup data is like
a small short lived data that is available in transformation to do things
like filtering, additional augmentation of data etc.
to schedule the second job, then it should be ok to combine both
> jobs in one program and execute the second job after the first one has
> completed.
>
> Cheers,
> Till
>
>
> On Thu, Mar 2, 2017 at 2:33 AM, Mohit Anchlia
> wrote:
>
>> It looks li
_api.html
>
> You would have to know the ID of your job and then you can poll the status
> of your running jobs.
>
> On Mon, 27 Feb 2017 at 18:15 Mohit Anchlia wrote:
>
> What's the best way to track the progress of the job?
>
> On Mon, Feb 27, 2017 at 7:56 AM, Aljos
Trying to understand what parts of flink have thread safety built in them.
Key question is, are the objects created in flink shared between threads
(slots)? For eg: if I create a sink function and open a file is that shared
between threads?
> execution of the next one.
>
> Best,
> Aljoscha
>
> On Fri, 24 Feb 2017 at 19:16 Mohit Anchlia wrote:
>
>> Is there a way to connect 2 workflows such that one triggers the other if
>> certain condition is met? However, the workaround may be to insert a
>>
tputStream.java:1548)
>>>>>
>>>>
> com.sy.flink.test.Tuple2Serializerr$1: this states that an anonymous
> inner class in `Tuple2Serializerr` is not serializable.
>
> Could you check if that’s the case?
>
>
>
> On February 24, 2017 at 3:10:58 PM
se/YARN-4714
>
> Cheers
>
> On Fri, Feb 24, 2017 at 2:41 PM, Mohit Anchlia
> wrote:
>
>> Figured out. It looks like there is a virtual memory limit check enforced
>> in yarn which just surfaced with java 8
>>
>> On Fri, Feb 24, 2017 at 2:09 PM, Mohit Anchlia
Figured out. It looks like there is a virtual memory limit check enforced
in yarn which just surfaced with java 8
On Fri, Feb 24, 2017 at 2:09 PM, Mohit Anchlia
wrote:
> I recently upgraded the cluster from java 7 to java 8. Now when I run
> flink on a yarn cluster I see errors: Even
I recently upgraded the cluster from java 7 to java 8. Now when I run flink
on a yarn cluster I see errors: Eventually application gives up and
terminates. Any suggestions?
Association with remote system [akka.tcp://flink@slave:35543] has failed,
address is now gated for [5000] ms. Reason: [Disass
Is there a way to connect 2 workflows such that one triggers the other if
certain condition is met? However, the workaround may be to insert a
notification in a topic to trigger another workflow. The problem is that
the addSink ends the flow so if we need to add a trigger after addSink
there doesn'
> On February 24, 2017 at 2:43:38 PM, Mohit Anchlia (mohitanch...@gmail.com)
> wrote:
>
> This is at high level what I am doing:
>
> Serialize:
>
> String s = tuple.getPos(0) + "," + tuple.getPos(1);
> return s.getBytes()
>
> Deserialize:
>
> String
hole codes of Tuple2Serializerr. I guess the
>> reason is some fields of Tuple2Serializerr do not implement Serializable.
>>
>> 2017-02-24 9:07 GMT+08:00 Mohit Anchlia :
>>
>>> I wrote a key serialization class to write to kafka however I am getting
>>> this error. No
I am using String inside to convert into bytes.
On Thu, Feb 23, 2017 at 6:50 PM, 刘彪 wrote:
> Hi Mohit
> As you did not give the whole codes of Tuple2Serializerr. I guess the
> reason is some fields of Tuple2Serializerr do not implement Serializable.
>
> 2017-02-24 9:07 GMT+08:0
I wrote a key serialization class to write to kafka however I am getting
this error. Not sure why as I've already implemented the interfaces.
Caused by: java.io.NotSerializableException:
com.sy.flink.test.Tuple2Serializerr$1
at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.jav
Schema, maybe
> named Tuple2SerializationSchema.
>
> 2017-02-22 7:17 GMT+08:00 Mohit Anchlia :
>
>> What's the best way to retrieve both the values in Tuple2 inside a custom
>> sink given that the type is not known inside the sink function?
>>
>
>
What's the best way to retrieve both the values in Tuple2 inside a custom
sink given that the type is not known inside the sink function?
Interestingly enough same job runs ok on Linux but not on windows
On Fri, Feb 17, 2017 at 4:54 PM, Mohit Anchlia
wrote:
> I have this code trying to read from a topic however the flink process
> comes up and waits forever even though there is data in the topic. Not sure
> why? Has an
I have this code trying to read from a topic however the flink process
comes up and waits forever even though there is data in the topic. Not sure
why? Has anyone else seen this problem?
StreamExecutionEnvironment env = StreamExecutionEnvironment
.*createLocalEnvironment*();
Properties propertie
dding stages, but then your sink is no more a sink - it
> would have transformed into a map or a flatmap !
>
> On Mon, Feb 13, 2017 at 12:34 PM Mohit Anchlia
> wrote:
>
>> Is it possible to further add aggregation after the sink task executes?
>> Or is the sink the last
Is it possible to further add aggregation after the sink task executes? Or
is the sink the last stage of the workflow? Is this flow possible?
start stream -> transform -> load (sink) -> mark final state as loaded in a
table after all the load was successful in previous state (sink)
Does Flink support Hadoop 2.7.3? I installed Flink for HAdoop 2.7.0 but
seeing this error:
2017-02-10 18:59:52,661 INFO
org.apache.flink.yarn.YarnClusterDescriptor - Deployment
took more than 60 seconds. Please check if the requested resources are
available in the YARN cluster
What is the best way to dynamically adapt and tune down number of tasks
created to write/read to a sink when sink slows down or the latency to sink
increases? I am looking at the sink interface but don't see a way to
influence flink to reduce the number of tasks or throttle the volume down
to the s
ibuted file system like HDFS that is
> also accessible from each node, so that operators can be recovered on
> different machines in case of machine failures.
>
> Am 03.02.2017 um 20:55 schrieb Mohit Anchlia :
>
> I thought rocksdb is used to as a store backend. If that is
Any information on this would be helpful.
On Thu, Feb 2, 2017 at 5:09 PM, Mohit Anchlia
wrote:
> What is the granularity of parallelism in flink? For eg: if I am reading
> from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2
> consumer threads and allocates i
e RocksDB instance data and
> has in fact nothing to do with checkpoints.
>
> Best,
> Stefan
>
> Am 03.02.2017 um 01:45 schrieb Mohit Anchlia :
>
> Trying to understand these 3 parameters:
>
> state.backend
> state.backend.fs.checkpointdir
> state.backend.rocksd
What is the granularity of parallelism in flink? For eg: if I am reading
from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2
consumer threads and allocates it on 2 separate task managers?
Also, it would be good to understand the difference between parallelism and
partitioning
Trying to understand these 3 parameters:
state.backend
state.backend.fs.checkpointdir
state.backend.rocksdb.checkpointdir
state.checkpoints.dir
As I understand stream of data and the state of operators are 2 different
concepts and that both need to be checkpointed. I am bit confused about the
pur
49 matches
Mail list logo