Hello, everyone. I'm new to Calcite and have some problems with it. Flink
uses the Calcite to parse the sql and construct ast and logical plan. Would
the plan be optimized by caicite? For example,
multi filter condition:
val ds = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c)
v
It likely does not make sense to publish a file ( "batch data") into Kafka;
unless the file is very small.
An improvised pub-sub mechanism for Kafka could be to (a) write the file
into a persistent store outside of kafka (b) publishing of a message into
Kafka about that write so as to enable proce
I am currently working on an architecture for a big data streaming and batch
processing platform. I am planning on using Apache Kafka for a distributed
messaging system to handle data from streaming data sources and then pass on to
Apache Flink for stream processing. I would also like to use Fli
Thanks a ton, Till.
That worked. Thank you so much.
-Biplob
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-t-access-Flink-Dashboard-at-8081-running-Flink-program-using-Eclipse-tp8016p8035.html
Sent from the Apache Flink User Mailing Li
Hi Dominique,
your problem sounds like a good use case for session windows [1, 2]. If you
know that there is only a maximum gap between your request and response
message, then you could create a session window via:
input
.keyBy("ReqRespID")
.window(EventTimeSessionWindows.withGap(Time.mi
Hello Till,
Yup I can see the log output in my console, but there is no information
there regarding if there is any error in conversion. Just normal warn and
info as below:
22:09:16,676 WARN org.apache.flink.streaming.runtime.tasks.StreamTask
- No state backend has been specified, using def
How about using EventTime windows with watermark assignment and bounded
delays. That way you allow more than 5 minutes (bounded delay) for your
request and responses to arrive. Do you have a way to assign timestamp to
the responses based on the request timestamp (does the response contain the
reque
Hi all,
once again I need a "kick" to the right direction. I have a datastream
with request and responses identified by an ReqResp-ID. I like to
calculate the (avg, 95%, 99%) time between the request and response and
also like to count them. I thought of
".keyBy("ReqRespID").timeWindowAll(Tim
Hello Aljoscha Krettek,
Thank you. As you suggested, I changed my code as below:
*snippet 1:*
DataStream centroids = newCentroidDataStream.map(new
TupleCentroidConverter());
ConnectedIterativeStreams loop =
points.iterate().withFeedbackType(Centroid.class);
DataStream newCentroids = loop.flatMap(n
It depends if you have a log4j.properties file specified in your classpath.
If you see log output on the console, then it should also print errors
there.
Cheers,
Till
On Tue, Jul 19, 2016 at 3:08 PM, subash basnet wrote:
> Hello Till,
>
> Shouldn't it write something in the eclipse console if t
Hello,
I cannot have an access to the web interface from the nodes I am using.
However I will check the logs for anything suspicious and get back.
Thanks :-)
Regards,
Debaditya
On Tue, Jul 19, 2016 at 4:46 PM, Till Rohrmann wrote:
> Hi Debaditya,
>
> you can see in the web interface how much d
Hi Debaditya,
you can see in the web interface how much data each source has sent to the
downstream tasks and how much data was consumed by the sinks. This should
tell you whether your sources have actually read some data. You can also
check the log files whether you find anything suspicious there
Hi Biplob,
if you want to start the web interface from within your IDE, then you have
to create a local execution environment as Ufuk told you:
Configuration config = new Configuration();
config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true);
StreamExecutionEnvironment env =
StreamExecu
Thank you, Ufuk!
On Tue, Jul 19, 2016 at 5:51 AM, Ufuk Celebi wrote:
> PS: I forgot to mention that also constant iteration input is cached.
>
> On Mon, Jul 18, 2016 at 11:27 AM, Ufuk Celebi wrote:
> > Hey Saliya,
> >
> > the result of each iteration (super step) that is fed back to the
> > ite
Hello Till,
Shouldn't it write something in the eclipse console if there is any error
or warning. But nothing about error is printed on the console. And I
checked the flink project folder: flink-core, flink streaming as such but
couldn't find where the log is written when run via eclipse.
Best Re
Have you checked your logs whether they contain some problems? In general
it is not recommended collecting the streaming result back to your client.
It might also be a problem with `DataStreamUtils.collect`.
Cheers,
Till
On Tue, Jul 19, 2016 at 2:42 PM, subash basnet wrote:
> Hello all,
>
> I t
Hello all,
I tried to check if it works for tuple but same problem, the collection
still shows blank result. I took the id of centroid tuple and printed it,
but the collection displays empty.
DataStream centroids = newCentroidDataStream.map(new
TupleCentroidConverter());
DataStream> centroidId =
Yes you have to provide the path of your jar. The reason is:
1. When you start in the pseudo-cluster mode the tasks are started in their
own JVM's with their own class loader.
2. You client program has access to your custom operator classes but the
remote JVM's don't. Hence you need to ship the JAR
Thanks Ufuk, for the input. I tried what u suggested as well ( as follows)
Configuration config = new Configuration();
config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true);
StreamExecutionEnvironment env =
StreamExecutionEnvironment.crea
You can explicitly create a LocalEnvironment and provide a Configuration:
Configuration config = new Configuration();
config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true);
ExecutionEnvironment env = new LocalEnvironment(config);
...
On Tue, Jul 19, 2016 at 1:28 PM, Sameer W wrote:
>
Hi Sameer,
Thanks for that quick reply, I was using flink streaming so the program
keeps on running until i close it. But anyway I am ready to try this
getRemoteExecutionEnvironment(), I checked but it ask me for the jar file,
which is weird because I am running the program directly.
Does it mea
>From Eclipse it creates a local environment and runs in the IDE. When the
program finishes so does the Flink execution instance. I have never tried
accessing the console when the program is running but one the program is
finished there is nothing to connect to.
If you need to access the dashboard
Hi,
I am running my flink program using Eclipse and I can't access the dashboard
at http://localhost:8081, can someone help me with this?
I read that I need to check my flink-conf.yaml, but its a maven project and
I don't have a flink-conf.
Any help would be really appreciated.
Thanks a lot
Bip
Hi Ufuk,
Thanks for the update, is there any known way to fix this issue? Any
workaround that you know of, which I can try?
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p8015.html
Sent from t
Hi,
you have to ensure to filter the data that you send back on the feedback
edge, i.e. the loop.closeWith(newCentroids.broadcast()); statement needs to
take a stream that only has the centroids that you want to send back. And
you need to make sure to emit centroids with a good timestamp if you wan
PS: I forgot to mention that also constant iteration input is cached.
On Mon, Jul 18, 2016 at 11:27 AM, Ufuk Celebi wrote:
> Hey Saliya,
>
> the result of each iteration (super step) that is fed back to the
> iteration is cached. For the iterate operator that is the last partial
> solution and fo
Unfortunately, no. It's expected for streaming iterations to loose
data (known shortcoming), but I don't see why they never see the
initial input. Maybe Gyula or Paris (they worked on this previously)
can chime in.
– Ufuk
On Tue, Jul 19, 2016 at 10:15 AM, Biplob Biswas
wrote:
> Hi Ufuk,
>
> Did
Hi!
HDFS is mentioned in the docs but not explicitly listed as a requirement:
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/python.html#project-setup
I suppose the Python API could also distribute its libraries through
Flink's BlobServer.
Cheers,
Max
On Tue, Jul 19, 2016 at
Hi Ufuk,
Did you get time to go through my issue, just wanted to follow up to see
whether I can get a solution or not.
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p8010.html
Sent from the Ap
Hello all,
I am trying to convert datastream to collection, but it's shows blank
result. There is a stream of data which can be viewed on the console on
print(), but the collection of the same stream shows empty after
conversion. Below is the code:
DataStream centroids = newCentroidDataStream.map
Hello users,
I am currently doing a project in image processing with Open CV library.
Have anyone here faced any issue with parallelizing the library in flink? I
have written a code which is running fine on local environment, however
when I try to run it in distributed environment it writes (it wa
Feel free to do the contribution at any time you like. We can also
always make it part of a bugfix release if it does not make it into
the upcoming 1.1 RC (probably end of this week or beginning of next).
Feel free to ping me if you need any feed back or pointers.
– Ufuk
On Mon, Jul 18, 2016 at
Glad to hear it! The HDFS requirement should most definitely be
documented; i assumed it already was actually...
On 19.07.2016 03:42, Geoffrey Mon wrote:
Hello Chesnay,
Thank you very much! With your help I've managed to set up a Flink
cluster that can run Python jobs successfully. I solved m
33 matches
Mail list logo