Testing library for Flink

2020-01-20 Thread Juan Rodríguez Hortalá
Hi all, Recently, my colleagues at Complutense University of Madrid and I developed a testing library for Flink. The library extends on ScalaCheck to allow you to specify random generators of streams using temporal logic. You can also write assertions as temporal logic formulas. If you are interes

Re: Best way to compute the difference between 2 datasets

2019-09-16 Thread Juan Rodríguez Hortalá
> groups first, which typically means spilling to disk if the data set has > any significant size. > > — Ken > > PS - I assume that you’ve implemented a valid hashCode()/equals() for the > record. > > > On Jul 22, 2019, at 8:29 AM, Juan Rodríguez Hortalá < > juan.

Re: Execution environments for testing: local vs collection vs mini cluster

2019-08-04 Thread Juan Rodríguez Hortalá
t might > be changed over time without any notification. > > > 1. > https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/testing.html#integration-testing > 2. > https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/local_execution.html > > >

Re: Re: MiniClusterResource class not found using AbstractTestBase

2019-07-23 Thread Juan Rodríguez Hortalá
quot; % flinkVersion % Test **classifier > "tests"* > *)* > > Best, > Haibo > > At 2019-07-23 17:51:23, "Fabian Hueske" wrote: > > Hi Juan, > > Which Flink version do you use? > > Best, Fabian > > Am Di., 23. Juli 2019 um 06:49 Uhr

Re: Execution environments for testing: local vs collection vs mini cluster

2019-07-23 Thread Juan Rodríguez Hortalá
first? > > 1. Do you want to write a unit test (or integration test) case for your > project or for Flink? Or just want to run your job locally? > 2. Which mode do you want to test? DataStream or DataSet? > > > > Juan Rodríguez Hortalá 于2019年7月23日周二 > 下午1:12写道: > &g

Execution environments for testing: local vs collection vs mini cluster

2019-07-22 Thread Juan Rodríguez Hortalá
Hi, In https://ci.apache.org/projects/flink/flink-docs-stable/dev/local_execution.html and https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/runtime/minicluster/MiniCluster.html I see there are 3 ways to create an execution environment for testing: - StreamExecut

MiniClusterResource class not found using AbstractTestBase

2019-07-22 Thread Juan Rodríguez Hortalá
Hi, I'm trying to use AbstractTestBase in a test in order to use the mini cluster. I'm using specs2 with Scala, so I cannot extend AbstractTestBase because I also have to extend org.specs2.Specification, so I'm trying to access the mini cluster directly using Specs2 BeforeAll to initialize it as f

Best way to compute the difference between 2 datasets

2019-07-21 Thread Juan Rodríguez Hortalá
Hi, I've been trying to write a function to compute the difference between 2 datasets. With that I mean computing a dataset that has all the elements of a dataset that are not present in another dataset. I first tried using coCogroup, but it was very slow in a local execution environment, and ofte

Re: Exceptions when launching counts on a Flink DataSet concurrently

2019-05-02 Thread Juan Rodríguez Hortalá
3Cdev.flink.apache.org%3E > > Am Fr., 26. Apr. 2019 um 17:03 Uhr schrieb Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com>: > >> Hi Timo, >> >> Thanks for your answer. I was surprised to have problems calling those >> methods concurrently, becaus

Re: Exceptions when launching counts on a Flink DataSet concurrently

2019-04-26 Thread Juan Rodríguez Hortalá
now more about the internals of the execution? > > Regards, > Timo > > > Am 26.04.19 um 03:13 schrieb Juan Rodríguez Hortalá: > > Any thoughts on this? > > On Sun, Apr 7, 2019, 6:56 PM Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com> wrote: > >>

Re: Exceptions when launching counts on a Flink DataSet concurrently

2019-04-25 Thread Juan Rodríguez Hortalá
Any thoughts on this? On Sun, Apr 7, 2019, 6:56 PM Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.com> wrote: > Hi, > > I have a very simple program using the local execution environment, that > throws NPE and other exceptions related to concurrent access when launchi

Exceptions when launching counts on a Flink DataSet concurrently

2019-04-07 Thread Juan Rodríguez Hortalá
Hi, I have a very simple program using the local execution environment, that throws NPE and other exceptions related to concurrent access when launching a count for a DataSet from different threads. The program is https://gist.github.com/juanrh/685a89039e866c1067a6efbfc22c753e which is basically t

Re: IterativeStream seems to ignore maxWaitTimeMillis

2016-11-23 Thread Juan Rodríguez Hortalá
> > On Wed, 23 Nov 2016 at 05:08 Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com> wrote: > >> Thanks for your answer Aljoscha, >> >> The source stops, when I comment all the transformed streams and just >> print the input, the program completes. B

Re: IterativeStream seems to ignore maxWaitTimeMillis

2016-11-22 Thread Juan Rodríguez Hortalá
this wasn't needed. Greetings, Juan On Mon, Nov 21, 2016 at 9:17 AM, Aljoscha Krettek wrote: > Might it be that your initial source never stops? A loop will only > terminate if both the original source stops and the loop timeout is reached. > > On Mon, 21 Nov 2016 at 07:58

Re: Early events

2016-11-21 Thread Juan Rodríguez Hortalá
ava) >> >> Anyway, is this the expected behaviour for early events? Is Flink >> buffering early events until their future timestamp arrives? >> >> Thanks, >> >> Juan >> >> >> On Sat, Nov 19, 2016 at 8:31 PM, Juan Rodríguez Hortalá <

IterativeStream seems to ignore maxWaitTimeMillis

2016-11-20 Thread Juan Rodríguez Hortalá
Hi, I wrote a proof of concept for a Java version of mapWithState with time-based state eviction https://github.com/juanrh/flink-state-eviction/blob/a6bb0d4ca0908d2f4350209a4a41e381e99c76c5/src/main/java/com/github/juanrh/streaming/MapWithStateIterPoC.java. The idea is: - Convert an input KeyedS

Re: Failure to donwload flink-contrib dependency

2016-11-20 Thread Juan Rodríguez Hortalá
weet-inputformat > > note that the complete artifact name needs to be suffixed with a scala > binary version like _2.10 or _2.11 > > Hope that helps! > > Regards > Andrey > > On Sun, Nov 20, 2016 at 9:21 PM, Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmai

Failure to donwload flink-contrib dependency

2016-11-20 Thread Juan Rodríguez Hortalá
Hi, I'm having problems to download flink-contrib in my Java maven project, the relevant part of the pom is: UTF-8 1.1.3 org.apache.flink flink-contrib ${flink.version} I see that in https://repo1.maven.org/maven2/org/apache/flink/flink-contrib/1.1.3/ there are no jar files,

Re: Early events

2016-11-19 Thread Juan Rodríguez Hortalá
/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/test/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSourceTest.java ) Anyway, is this the expected behaviour for early events? Is Flink buffering early events until their future timestamp arrives? Thanks, Juan On Sat, Nov 19, 2016 at 8:31 PM, Juan Rodríguez

Early events

2016-11-19 Thread Juan Rodríguez Hortalá
Hi, Maybe this is already in the documentation, sorry if I'm asking something obvious. I was thinking that if you have event time then you can also have early events, which would be events whose extracted timestampt is in the future. This might happen in practice for example in sensors with a skew

An idea for a parallel AllWindowedStream

2016-11-08 Thread Juan Rodríguez Hortalá
Hi, As a self training exercise I've defined a class extending WindowedStream for implementing a proof of concept for a parallel version of AllWindowStream /** * Tries to create a parallel version of a AllWindowStream for a DataStream * by creating a KeyedStream by using as key the hash of the

Re: Testing DataStreams

2016-11-04 Thread Juan Rodríguez Hortalá
s many such tests. > > -Max > > > On Wed, Nov 2, 2016 at 4:58 PM, Juan Rodríguez Hortalá > wrote: > > Hi, > > > > I'm new to Flink, and I'm trying to write my first unit test for a > simple > > DataStreams job. In > > https://c

Testing DataStreams

2016-11-02 Thread Juan Rodríguez Hortalá
Hi, I'm new to Flink, and I'm trying to write my first unit test for a simple DataStreams job. In https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/util/package-summary.html I see several promising classes, but for example I cannot import org.apache.flink.

Re: About stateful transformations

2016-10-27 Thread Juan Rodríguez Hortalá
Cheers, > Aljoscha > > On Tue, 25 Oct 2016 at 06:47 Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com> wrote: > >> Hi Gyula, >> >> Thanks a lot for your response, it was very clear. I understand that >> there is no problem of small files due

Re: About stateful transformations

2016-10-24 Thread Juan Rodríguez Hortalá
ere is no data > fragmentation in the checkpoints. Similar applies to the FsStateBackend but > that keeps the local state strictly in memory. > > I think you should definitely give RocksDB + HDFS a try. It works > extremely well for very large state sizes given some tuning, but should >

About stateful transformations

2016-10-23 Thread Juan Rodríguez Hortalá
Hi all, I don't have much experience with Flink, so please forget me if I ask some obvious questions. I was taking a look to the documentation on stateful transformations in Flink at https://ci.apache.org/projects/flink/flink-docs- release-1.2/dev/state.html. I'm mostly interested in Flink for str

DataStream transformation isolation in Flink Streaming

2016-02-23 Thread Juan Rodríguez Hortalá
Hi, I was thinking on a problem and how to solve it with Flink Streaming. Imagine you have a stream of data where you want to apply several transformations, where some transformations depend on previous transformations and there is a final set of actions. This is modeled in a natural way as a DAG

Re: How to create a stream of data batches

2015-09-07 Thread Juan Rodríguez Hortalá
Hi, I'm just a Flink newbie, but maybe I'd suggest using window operators with a Count policy for that https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/streaming_guide.html#window-operators Hope that helps. Greetings, Juan 2015-09-04 14:14 GMT+02:00 Stephan Ewen : > Interest

Re: Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá
streaming paradigm, where you get >>> an unbounded stream of records and operators on these streams. >>> >>> Funny that you ask about a video for the DataStream slides. There is a >>> Flink training happening as we speak, and a video is being recorded right >

Re: Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá
r the DataStream slides. There is a > Flink training happening as we speak, and a video is being recorded right > now :-) Hopefully it will be made available soon. > > Best, > Kostas > > > On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodríguez Hortalá < > juan.rodriguez.hort.

Re: Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá
are more lessons at http://dataartisans.github.io/flink-training, for stream processing and the table API for which I haven't found a video. Does anyone have pointers to the missing videos? Greetings, Juan 2015-09-02 12:50 GMT+02:00 Juan Rodríguez Hortalá < juan.rodriguez.hort...@g

Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá
Hi list, I'm new to Flink, and I find this project very interesting. I have experience with Apache Spark, and for I've seen so far I find that Flink provides an API at a similar abstraction level but based on single record processing instead of batch processing. I've read in Quora that Flink exten