n some key) sorted on time with a
ROW_NUMBER function that assigns increasing numbers to rows.
- do a group by on the row number modulo the window size.
Btw. count windows are supported by the Table API.
Best, Fabian
2017-10-17 17:16 GMT+02:00 Stefano Bortoli
mailto:stefano.bort...@huawei.com>
Hi all,
Is there a way to use a tumble window group by with row range in streamSQL?
I mean, something like this:
// "SELECT COUNT(*) " +
// "FROM T1 " +
//"GROUP BY TUMBLE(rowtime, INTERVAL '2' ROWS PRECEDING )"
However, even looking at tests and looking at the "row int
In fact the old problem was with the KryoSerializer missed initialization on
the exception that would trigger the spilling on disk. This would lead to dirty
serialization buffer that would eventually break the program. Till worked on it
debugging the source code generating the error. Perhaps som
, Billy [mailto:billy.newp...@gs.com]
Sent: Thursday, April 20, 2017 4:46 PM
To: Stefano Bortoli ; 'Fabian Hueske'
Cc: 'user@flink.apache.org'
Subject: RE: Flink memory usage
Your reuse idea kind of implies that it’s a GC generation rate issue, i.e. it’s
not collecting fast en
Hi Billy,
The only suggestion I can give is to check very well in your code for useless
variable allocations, and foster reuse as much as possible. Don’t create a new
collection at any map execution, but rather clear, reuse the collected output
of the flatMap, and so on. In the past we run lon
As Chesnay said, it not necessary to use a pool as the connection is reused
across split. However, if you had to customize it for some reasons, you can
do it starting from the JDBC Input and Output format.
cheers!
2016-07-05 13:27 GMT+02:00 Harikrishnan S :
> Awesome ! Thanks a lot ! I should pr
The connection will be managed by the splitManager, no need of using a
pool. However, if you had to, probably you should look into
establishConnection() method of the JDBCInputFormat.
2016-07-05 10:52 GMT+02:00 Flavio Pompermaier :
> why do you need a connection pool?
> On 5 Jul 2016 11:41, "Ha
Till mentioned the fact that 'spilling on disk' was managed through
exception catch. The last serialization error was related to bad management
of Kryo buffer that was not cleaned after spilling on exception management.
Is it possible we are dealing with an issue similar to this but caused by
anoth
Hi Flavio, Till,
do you think this can be possibly related to the serialization problem
caused by 'the management' of Kryo serializer buffer when spilling on disk?
We are definitely going beyond what is managed in memory with this task.
saluti,
Stefano
2016-05-16 9:44 GMT+02:00 Flavio Pompermaie
fers:16384
>>>>>>
>>>>>> The job just read a window of max 100k elements and then writes a
>>>>>> Tuple5 into a CSV on the jobmanger fs with parallelism 1 (in order to
>>>>>> produce a single file). The job dies after 40 mi
is not clear why it should refuse a connection to itself after
40min of run. we'll try to figure out possible environment issues. Its a
fresh installation, therefore we may have left out some configurations.
saluti,
Stefano
2016-04-28 9:22 GMT+02:00 Stefano Bortoli :
> I had this type of e
I had this type of exception when trying to build and test Flink on a
"small machine". I worked around the test increasing the timeout for Akka.
https://github.com/stefanobortoli/flink/blob/FLINK-1827/flink-tests/src/test/java/org/apache/flink/test/checkpointing/EventTimeAllWindowCheckpointingITCa
to 16 I expect to
>>> see 16 calls to the connection pool (i.e. ' CREATING
>>> NEW CONNECTION!') while I see only 8 (up to my maximum number of cores).
>>> The number of created task instead is correct (16).
>>>
>>>
Hi to all,
we've just upgraded to Flink 1.0.0 and we had some problems with joda
DateTime serialization.
The problem was caused by Flink-3305 that removed the JavaKaffee dependency.
We had to re-add such dependency in our application and then register the
DateTime serializer in the environment:
Ufuk Celebi :
> Do you have the code somewhere online? Maybe someone can have a quick
> look over it later. I'm pretty sure that is indeed a problem with the
> custom input format.
>
> – Ufuk
>
> On Tue, Mar 29, 2016 at 3:50 PM, Stefano Bortoli
> wrote:
> > Per
e, because there would be
millions of queries before the statistics are collected.
Perhaps we are doing something wrong, still to figure out what. :-/
thanks a lot for your help.
saluti,
Stefano
2016-03-29 13:30 GMT+02:00 Stefano Bortoli :
> That is exactly my point. I should have 32 threads
nning.
>
>
> On Tue, Mar 29, 2016 at 12:29 PM, Stefano Bortoli
> wrote:
>
>> In fact, I don't use it. I just had to crawl back the runtime
>> implementation to get to the point where parallelism was switching from 32
>> to 8.
>>
>> saluti,
&g
; something which you shouldn’t be concerned with since it is only used
> internally by the runtime.
>
> Cheers,
> Till
>
>
> On Tue, Mar 29, 2016 at 12:09 PM, Stefano Bortoli
> wrote:
>
>> Well, in theory yes. Each task has a thread, but only a numb
nContext = ExecutionContext.fromExecutor(new
> > ForkJoinPool())
> >
> > and thus the number of concurrently running threads is limited to the
> number
> > of cores (using the default constructor of the ForkJoinPool).
> > What do you think?
> >
> >
> &g
Hi guys,
I am trying to test a job that should run a number of tasks to read from a
RDBMS using an improved JDBC connector. The connection and the reading run
smoothly, but I cannot seem to be able to move above the limit of 8
concurrent threads running. 8 is of course the number of cores of my
ma
ripts using
Flink. We are testing it on a Oracle table of 11 billion records, but we
did not get through a complete run. We are just at first prototype level,
so there is surely some work to do. :-)
saluti,
Stefano
2016-03-23 10:38 GMT+01:00 Chesnay Schepler :
> On 23.03.2016 10:04, Stefan
ormats.
>
> Is the INT returned as a double as well?
>
> Note: The (runtime) output type is in no way connected to the TypeInfo you
> pass when constructing the format.
>
>
> On 21.03.2016 14:16, Stefano Bortoli wrote:
>
>> Hi squirrels,
>>
>> I working o
Hi squirrels,
I working on a flink job connecting to a Oracle DB. I started from the JDBC
example for Derby, and used the TupleTypeInfo to configure the fields of
the tuple as it is read.
The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently,
using Oracle, all the numbers are rea
What I normally do is to
java -cp MYUBERJAR.jar my.package.mainclass
does it make sense?
2015-10-23 17:22 GMT+02:00 Flavio Pompermaier :
> could you write ne the command please?I'm not in the office right now..
> On 23 Oct 2015 17:10, "Maximilian Michels" wrote:
>
>> Could you try submitting t
efinitely look into it once Flink Forward is over and we've
> finished the current release work. Thanks for reporting the issue.
>
> Cheers,
> Till
>
> On Tue, Oct 6, 2015 at 9:21 AM, Stefano Bortoli wrote:
>
>> Hi guys, I could manage to complete the process crossing
Hi guys, I could manage to complete the process crossing byte arrays I
deserialize within the group function. However, I think this workaround is
feasible just with relatively simple processes. Any idea/plan about to fix
the serialization problem?
saluti,
Stefano
Stefano Bortoli, PhD
*ENS
I had problems running a flink job with maven, probably there is some issue
of classloading. For me worked to run a simple java command with the
uberjar. So I build the jar using maven, and then run it this way
java -Xmx2g -cp target/youruberjar.jar yourclass arg1 arg2
hope it helps,
Stefano
201
02:00 Stefano Bortoli :
> here it is: https://issues.apache.org/jira/browse/FLINK-2800
>
> saluti,
> Stefano
>
> 2015-10-01 18:50 GMT+02:00 Stephan Ewen :
>
>> This looks to me like a bug where type registrations are not properly
>> forwarded to all Serializers.
>
ct 1, 2015 at 6:46 PM, Stefano Bortoli
> wrote:
>
>> Hi guys,
>>
>> I hit a Kryo exception while running a process 'crossing' POJOs datasets.
>> I am using the 0.10-milestone-1.
>> Checking the serializer:
>> org.apache.flink.api.java.typeutils.ru
Hi guys,
I hit a Kryo exception while running a process 'crossing' POJOs datasets. I
am using the 0.10-milestone-1.
Checking the serializer:
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:210)
I have noticed that the Kryo instance is reused along s
problems of classpath are eased, so I can live with it.
thanks a lot for your support.
saluti,
Stefano
2015-07-24 11:17 GMT+02:00 Stefano Bortoli :
> HI Stephan,
>
> I think I may have found a possible root of the problem. I do not build
> the fat jar, I simply execute the main with mav
t; Configuration cfg = new Configuration();
> TaskConfig taskConfig = new TaskConfig(cfg);
> taskConfig.setStubWrapper(userCode);
> taskConfig.getStubWrapper(ClassLoader.getSystemClassLoader());
>
>
>
> On Fri, Jul 24, 2015 at 10:44 AM, Stefano Bortoli
> wrote:
>
>>
; MongoHadoopOutputFormat?
>
> Can you try and call "SerializationUtils.clone(new MongoHadoopOutputFormat
> )", for example at the beginning of your main method? The
> SerializationUtils are part of Apache Commons and are probably in your
> class path anyways.
>
> S
k trace. I thought it was a matter of
serializable classes, so I have made all my classes serializable.. still I
have the same error. Perhaps it is not possible to do these things with
Flink.
any intuition? is it doable?
thanks a lot for your support. :-)
saluti,
Stefano Bortoli, PhD
*ENS Technica
Yes it does. :-) I have implemented it with Hadoop1 and Hadoop2.
Essentially I have extended the HadoopOutputFormat reusing part of the code
of the HadoopOutputFormatBase, and set the MongoOutputCommiter to replace
the FileOutputCommitter.
saluti,
Stefano
Stefano Bortoli, PhD
*ENS Technical
Seems like the HadoopOutputFormat wrapper is pretty much specialized on
> File Output Formats.
>
> Can you open an issue for that? Someone will need to look into this...
>
> On Wed, Jul 22, 2015 at 4:48 PM, Stefano Bortoli
> wrote:
>
>> In fact, on close() of the HadoopOutputFormat
finalizeGlobal to the
outputCommitter to FileOutputCommitter(), or keep it as a default in case
of no specific assignment.
saluti,
Stefano
2015-07-22 16:48 GMT+02:00 Stefano Bortoli :
> In fact, on close() of the HadoopOutputFormat the fileOutputCommitter
> returns false
-07-22 15:53 GMT+02:00 Stefano Bortoli :
> Debugging, it seem the commitTask method of the MongoOutputCommitter is
> never called. Is it possible that this 'bulk' approach of mongo-hadoop 1.4
> does not fit the task execution method of Flink?
>
> any idea? thanks a l
Debugging, it seem the commitTask method of the MongoOutputCommitter is
never called. Is it possible that this 'bulk' approach of mongo-hadoop 1.4
does not fit the task execution method of Flink?
any idea? thanks a lot in advance.
saluti,
Stefano
Stefano Bortoli, PhD
*ENS Technica
Hi,
I am trying to analyze and update a MongoDB collection with Apache Flink
0.9.0 and Mongo Hadoop 1.4.0 Hadoop 2.6.0.
The process is fairly simple, and the MongoInputFormat works smoothly,
however it does not write back to the collection. The process works,
because the writeAsText works as expe
Hi Robert,
I answer on behalf of Flavio. He told me the driver jar was included.
Smells lik class-loading issue due to 'conflicting' dependencies. Is it
possible?
Saluti,
Stefano
2015-06-05 16:24 GMT+02:00 Robert Metzger :
> Hi,
>
> is the MySQL driver part of the Jar file that you've build?
>
what I did was to implement ListValue in a MyListValue object, then you can
do pretty much what you want. :-)
saluti,
Stefano
2015-01-23 11:29 GMT+01:00 Robert Metzger :
> Hi,
>
> I think you can just use a java collection for the Tuple2's. (Starting
> from Flink 0.8.0)
>
> Robert.
>
> On Fri, J
42 matches
Mail list logo