Re: SVM classification problem.

2016-10-02 Thread Simone Robutti
No, you don't get 100% accurracy in this case. You don't even want that, it would be a severe case of overfitting. You would have that only in the case that your dataset is linearly separable or separable with a finely tuned kernel, but in that case SVM would be an overkill and more traditional met

Re: Counting latest state of stateful entities in streaming

2016-09-30 Thread Simone Robutti
e size of this program depends on the number of > unique ids. That might cause problems if the id space grows very fast. > > Please let me know, if you have questions or if that works ;-) > > Cheers, Fabian > > > 2016-09-30 0:32 GMT+02:00 Simone Robutti : > >>

Counting latest state of stateful entities in streaming

2016-09-29 Thread Simone Robutti
Hello, in the last few days I tried to create my first real-time analytics job in Flink. The approach is kappa-architecture-like, so I have my raw data on Kafka where we receive a message for every change of state of any entity. So the messages are of the form (id,newStatus, timestamp) We want

Re: Merge the states of different partition in streaming

2016-09-28 Thread Simone Robutti
Solved. Probably there was an error in the way I was testing. Also I simplified the job and it works now. 2016-09-27 16:01 GMT+02:00 Simone Robutti : > Hello, > > I'm dealing with an analytical job in streaming and I don't know how to > write the last part. > > Actu

Merge the states of different partition in streaming

2016-09-27 Thread Simone Robutti
Hello, I'm dealing with an analytical job in streaming and I don't know how to write the last part. Actually I want to count all the elements in a window with a given status, so I keep a state with a Map[Status,Long]. This state is updated starting from tuples containing the oldStatus and the new

Discard message LeaderSessionMessage(null,ConnectionTimeout)

2016-09-13 Thread Simone Robutti
Hello, while running a job on Flink 1.1.2 on a cluster of 3 nodes using the KafkaProducer010, I encounter this error: WARN org.apache.flink.runtime.client.JobClientActor- Discard message LeaderSessionMessage(null,ConnectionTimeout) because the expected leader session ID 4a1c16fe-

Re: Storing JPMML-Model Object as a Variable Closure?

2016-09-05 Thread Simone Robutti
ch can get quite big). > > > > The „obvious“ optimization would be to initialize and hide the Evaluator > behind a singleton since it > > is thread safe. (Which is what I wanted to avoid in the first place. But > maybe that is the best solution > > at the moment?) >

Re: Storing JPMML-Model Object as a Variable Closure?

2016-09-05 Thread Simone Robutti
control of that part. 2016-09-05 15:24 GMT+02:00 Bauss, Julian : > Hi Simone, > > > > that sounds promising! > > Unfortunately your link leads to a 404 page. > > > > Best Regards, > > > > Julian > > > > *Von:* Simone Robutti [mailto:simone.r

Re: Storing JPMML-Model Object as a Variable Closure?

2016-09-05 Thread Simone Robutti
I think you could make use of this small component I've developed: https://gitlab.com/chobeat/Flink-JPMML It's specifically for using JPMML on Flink. Maybe there's too much stuff for what you need but you could reuse the code of the Operator to do what you need. 2016-09-05 14:11 GMT+02:00 Bauss,

Re: env.fromElements produces TypeInformation error

2016-06-04 Thread Simone Robutti
I'm not sure if this is the solution and I don't have the possibility to try right now, but you should move the case class "State" definition outside the abstract class. 2016-06-04 17:34 GMT+02:00 Dan Drewes : > > Hi, > > compiling the code: > > def minimize(f: DF, init: T): T = { > > //create

Re: sparse matrix

2016-05-30 Thread Simone Robutti
Hello, right now Flink's local matrices are rather raw and for this kind of usage, you should rely on Breeze. If you need to perform operations, slicing in this case, they are a better option if you don't want to reimplement everything. In case you already developed against Flink's matrices, ther

Re: Merging sets with common elements

2016-05-25 Thread Simone Robutti
}), (2, {2,3})} what's the result of > your operation? Is the result { ({1,2}, {1,2,3}) } because the 2 is > contained in both sets? > > Cheers, > Till > > On Wed, May 25, 2016 at 10:22 AM, Simone Robutti < > simone.robu...@radicalbit.io> wrote: > >> Hello, &g

Merging sets with common elements

2016-05-25 Thread Simone Robutti
Hello, I'm implementing MinHash for reccomendation on Flink. I'm almost done but I need an efficient way to merge sets of similar keys together (and later join these sets of keys with more data). The actual data structure is of the form DataSet[(Int,Set[Int])] where the left element of the tuple

Re: Bug while using Table API

2016-05-12 Thread Simone Robutti
mone! I've managed to reproduce the error. I'll try to figure >> out what's wrong and I'll keep you updated. >> >> -Vasia. >> On May 4, 2016 3:25 PM, "Simone Robutti" >> wrote: >> >>> Here is the code: >>> >

Re: Using FlinkML algorithms in Streaming

2016-05-11 Thread Simone Robutti
Actually model portability and persistence is a serious limitation to practical use of FlinkML in streaming. If you know what you're doing, you can write a blunt serializer for your model, write it in a file and rebuild the model stream-side with deserialized informations. I tried it for an SVM mo

Re: Using ML lib SVM with Java

2016-05-09 Thread Simone Robutti
To my knowledge FlinkML does not support an unified API and most things must be used exclusively with Scala Datasets. 2016-05-09 13:31 GMT+02:00 Malte Schwarzer : > Hi folks, > > I tried to get the FlinkML SVM running - but it didn't really work. The > SVM.fit() method requires a DataSet paramete

Re: Creating a custom operator

2016-05-09 Thread Simone Robutti
ata and sortPartition to > locally sort the data in a partition. Please note that MapPartition > operators do not support chaining and come therefore with a certain > serialization overhead. Whenever possible you should use a MapFunction or > FlatMapFunction which are a bit more lightwei

Re: Bug while using Table API

2016-05-04 Thread Simone Robutti
Here is the code: package org.example import org.apache.flink.api.scala._ import org.apache.flink.api.table.TableEnvironment object Job { def main(args: Array[String]) { // set up the execution environment val env = ExecutionEnvironment.getExecutionEnvironment val tEnv = TableEnvir

Bug while using Table API

2016-05-04 Thread Simone Robutti
Hello, while trying my first job using the Table API I got blocked by this error: Exception in thread "main" java.lang.NoSuchFieldError: RULES at org.apache.flink.api.table.plan.rules.FlinkRuleSets$.(FlinkRuleSets.scala:148) at org.apache.flink.api.table.plan.rules.FlinkRuleSets$.(FlinkRuleSets.s

Re: Creating a custom operator

2016-05-03 Thread Simone Robutti
ystem and is not straightforward. > The DataStream API has better support for custom operators. > > Can you explain what kind of operator you would like to add? > Maybe the functionality can be achieved with the existing operators. > > Best, Fabian > > 2016-05-03 12:54 GMT+02:00

Re: Creating a custom operator

2016-05-03 Thread Simone Robutti
d the commit IDs in the JIRA issues for > these extensions. > > Cheers, Fabian > > > > > > 2016-04-29 15:32 GMT+02:00 Simone Robutti : > >> Hello, >> >> I'm trying to create a custom operator to explore the internals of Flink. >> Actually th

Creating a custom operator

2016-04-29 Thread Simone Robutti
Hello, I'm trying to create a custom operator to explore the internals of Flink. Actually the one I'm working on is rather similar to Union and I'm trying to mimick it for now. When I run my job though, this error arise: Exception in thread "main" java.lang.IllegalArgumentException: Unknown opera

Create a cluster inside Flink

2016-04-28 Thread Simone Robutti
Hello everyone, I'm approaching a rather big and complex integration with an existing software and I would like to hear the opinion of more experienced users on how to tackle a few issues. This software builds a cloud with its own logic. What I need is to keep these nodes as instances inside the

Re: Explanation on limitations of the Flink Table API

2016-04-21 Thread Simone Robutti
ere [1]. > > Feedback is welcome and can be very easily included in this phase of the > development ;-) > > Cheers, Fabian > > [1] > https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0 > <https://docs.google.com/document/d/1sITIShmJMGe

Explanation on limitations of the Flink Table API

2016-04-21 Thread Simone Robutti
Hello, I would like to know if it's possible to create a Flink Table from an arbitrary CSV (or any other form of tabular data) without doing type safe parsing with expliciteky type classes/POJOs. To my knowledge this is not possible but I would like to know if I'm missing something. My requiremen

How to test serializability of a Flink job

2016-04-05 Thread Simone Robutti
Hello, last week I got a problem where my job worked in local mode but could not be serialized on the cluster. I assume that local mode does not really serialize all the operators (the problem was with a custom map function) and I need to enforce this behaviour in local mode or, better, be able to

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Simone Robutti
To my knowledge there is nothing like that. PMML is not supported in any form and there's no custom saving format yet. If you really need a quick and dirty solution, it's not that hard to serialize the model into a file. 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath : > Flinksters, > > Is there

Connecting to a remote jobmanager - problem with Akka remote

2016-03-22 Thread Simone Robutti
Hello, we are trying to set up our system to do remote debugging through Intellij. Flink is running on a yarn long running session. We are launching Flink's CliFrontend with the following parameters: > run -m **::48252 /Users//Projects/flink/build-target/examples/batch/WordCount.jar The error r

Flink Checkpoint on yarn

2016-03-19 Thread Simone Robutti
Hello, I'm testing the checkpointing functionality with hdfs as a backend. For what I can see it uses different checkpointing files and resume the computation from different points and not from the latest available. This is to me an unexpected behaviour. I log every second, for every worker, a c

Re: Flink Checkpoint on yarn

2016-03-19 Thread Simone Robutti
ts > > You can share the complete job manager log file as well if you like. > > – Ufuk > > On Wed, Mar 16, 2016 at 2:50 PM, Simone Robutti > wrote: > > Hello, > > > > I'm testing the checkpointing functionality with hdfs as a backend. > > >

Re: Flink Checkpoint on yarn

2016-03-19 Thread Simone Robutti
x27;s something wrong with my configuration because otherwise this really looks like a bug. Thanks in advance, Simone 2016-03-16 18:55 GMT+01:00 Simone Robutti : > Actually the test was intended for a single job. The fact that there are > more jobs is unexpected and it will be the first thin

Re: Flink Checkpoint on yarn

2016-03-19 Thread Simone Robutti
e an idea why they are out of order? Maybe something > is mixed up in the way we gather the logs and we only think that > something is wrong because of this. > > > On Wed, Mar 16, 2016 at 6:11 PM, Simone Robutti > wrote: > > I didn't resubmitted the job. Also the jobs are

Re: Flink Checkpoint on yarn

2016-03-19 Thread Simone Robutti
ll be > rolled back to an earlier consistent state. > > Can you please share the complete job manager logs of your program? > The most helpful thing will be to have a log for each started job > manager container. I don't know if that is easily possible. > > – Ufuk > >

Re: Java Maps and Type Information

2016-03-02 Thread Simone Robutti
able method found for returns(java.lang.Class) method Should I open an issue? 2016-03-01 21:45 GMT+01:00 Simone Robutti : > I tried to simplify it to the bones but I'm actually defining a custom > MapFunction,java.util.Map> that > even with a simple identity function fails at runtime givi

Re: Java Maps and Type Information

2016-03-01 Thread Simone Robutti
42 GMT+01:00 Aljoscha Krettek : > Hi, > what kind of program are you writing? I just wrote a quick example using > the DataStream API where I’m using Map> as > the output type of one of my MapFunctions. > > Cheers, > Aljoscha > > On 01 Mar 2016, at 16:33, Simone Robutti &

Java Maps and Type Information

2016-03-01 Thread Simone Robutti
Hello, to my knowledge is not possible to use a java.util.Map for example in a FlatMapFunction. Is that correct? It gives a typer error at runtime and it doesn't work even with explicit TypeInformation hints. Is there any way to make it work? Thanks, Simone

Compilation error while instancing FlinkKafkaConsumer082

2016-02-10 Thread Simone Robutti
Hello, the compiler has been raising an error since I added this line to the code val testData=streamEnv.addSource(new FlinkKafkaConsumer082[String]("data-input",new SimpleStringSchema(),kafkaProp)) Here is the error: Error:scalac: Class org.apache.flink.streaming.api.checkpoint.CheckpointN