Re: Example - Reading Avro Generic records

2016-04-07 Thread Sourigna Phetsarath
when I was write the PR. -Gna On Thu, Apr 7, 2016 at 11:08 AM, Sourigna Phetsarath < gna.phetsar...@teamaol.com> wrote: > Tranadeep, > > Thanks for pasting your code! > > I have a PR ready that extends AvroInputFormat and will submit it soon. > > Still waiting fo

Re: Example - Reading Avro Generic records

2016-04-07 Thread Sourigna Phetsarath
arser().parse(new > File("/path/to/schemafile.avsc")); > DataSet dataSet = env.createInput(new > GenericAvroInputFormat(inPath, schema)); > dataSet.map(new MapFunction>() { > @Override > public Tuple2 map(GenericRecord record) { > Long id = (Long

Re: Example - Reading Avro Generic records

2016-04-01 Thread Sourigna Phetsarath
There is a way yet, but I am proposing to do one: https://issues.apache.org/jira/browse/FLINK-3691 On Fri, Apr 1, 2016 at 4:04 AM, Tarandeep Singh wrote: > Hi, > > Can someone please point me to an example of creating DataSet using Avro > Generic Records? > > I tried this code - > > final Ex

Re: Example - Reading Avro Generic records

2016-04-01 Thread Sourigna Phetsarath
Tarandeep, There isn't a way yet, but I am proposing to do one: https://issues.apache.org/jira/browse/FLINK-3691 -Gna On Fri, Apr 1, 2016 at 4:04 AM, Tarandeep Singh wrote: > Hi, > > Can someone please point me to an example of creating DataSet using Avro > Generic Records? > > I tried this co

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Sourigna Phetsarath
> ​ > > On Tue, Mar 29, 2016 at 10:46 AM, Simone Robutti < > simone.robu...@radicalbit.io> wrote: > >> To my knowledge there is nothing like that. PMML is not supported in any >> form and there's no custom saving format yet. If you really need a quick >

Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-28 Thread Sourigna Phetsarath
Flinksters, Is there an example of saving a Trained Model, loading a Trained Model and then scoring one or more feature vectors using Flink ML? All of the examples I've seen have shown only sequential fit and predict. Thank you. -Gna -- *Gna Phetsarath*System Architect // AOL Platforms // Da

Re: DataSet.randomSplit()

2016-03-28 Thread Sourigna Phetsarath
t part of Flink yet. > > Did you have a look at the sampling methods in DataSetUtils? Maybe they > can be helpful for what you are trying to achieve. > > – Ufuk > > On Wed, Mar 23, 2016 at 5:19 PM, Sourigna Phetsarath < > gna.phetsar...@teamaol.com> wrote: > >>

DataSet.randomSplit()

2016-03-23 Thread Sourigna Phetsarath
All: Does Flink DataSet have a randomSplit(weights:Array[Double], seed: Long): Array[DataSet[T]] function? There is this pull request: https://github.com/apache/flink/pull/921 Does anyone have an update of the progress of this? Thank you. -- *Gna Phetsarath*System Architect // AOL Platforms

Re: Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-23 Thread Sourigna Phetsarath
g/contribute-code.html > http://flink.apache.org/how-to-contribute.html > > On Wed, Mar 23, 2016 at 12:00 AM, Fabian Hueske wrote: > >> Hi Gna, >> >> thanks for sharing the good news and opening the JIRA! >> >> Cheers, Fabian >> >> 2016-03-22 2

Re: Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-22 Thread Sourigna Phetsarath
On Mon, Mar 21, 2016 at 10:04 AM, Sourigna Phetsarath < gna.phetsar...@teamaol.com> wrote: > Fabian, > > I'll try extending InputFormat as you suggested and will create a JIRA > issue as well. > > I also have an AvroGenericRecordInput format class that I would like to

Re: Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-21 Thread Sourigna Phetsarath
tom > InputFormat based on FileInputFormat by overriding the createInputSplits() > method. > > Best, Fabian > > 2016-03-21 0:11 GMT+01:00 Sourigna Phetsarath > : > >> All, >> >> Do any of the Flink Data Sources support comma separated directories with >> wildcar

Re: Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-21 Thread Sourigna Phetsarath
A issue with your suggestions? >> >> Until this is added to Flink, it can be implemented as a custom >> InputFormat based on FileInputFormat by overriding the createInputSplits() >> method. >> >> Best, Fabian >> >> 2016-03-21 0:11 GMT+01:00 Sou

Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-20 Thread Sourigna Phetsarath
All, Do any of the Flink Data Sources support comma separated directories with wildcards? For example: env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/* ") Thanks in advance for any help that you can provide. -- *Gna Phetsarath*System Architect // AOL Platforms //

Setting taskmanager.network.numberOfBuffers does not seem to have an affect - Flink 0.10.2

2016-03-19 Thread Sourigna Phetsarath
All: Flink Version 0.10.2 The number that I set for *taskmanager.network.numberOfBuffers* doesn't seem to have any affect, even if I set it to a very high number. There might be a race condition here where the upper bound is not enforced or computer correctly. java.io.IOException: Insufficient n

Re: S3 Timeouts with lots of Files Using Flink 0.10.2

2016-03-19 Thread Sourigna Phetsarath
gt; fs.s3a.connection.timeout > 5 > Socket connection timeout in milliseconds. > > > > -- Ken > > > -- > > *From:* Sourigna Phetsarath > > *Sent:* March 17, 2016 2:05:39pm PDT > > *To:* user@flink.apache.org > > *Su

S3 Timeouts with lots of Files Using Flink 0.10.2

2016-03-19 Thread Sourigna Phetsarath
All: I'm trying to read lots of files from S3 and I am getting timeouts from S3: java.io.IOException: Error opening the Input Split [0,558574890]: Input opening request timed out. Opener was alive. Stack of split open thread: at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.

Re: Setting taskmanager.network.numberOfBuffers does not seem to have an affect - Flink 0.10.2

2016-03-19 Thread Sourigna Phetsarath
//ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers > > On Fri, Mar 18, 2016 at 9:38 PM, Sourigna Phetsarath < > gna.phetsar...@teamaol.com> wrote: > >> It's batch. >> >> Reading a few thousand Avro files from

Setting taskmanager.network.numberOfBuffers and getting errors...

2016-03-03 Thread Sourigna Phetsarath
All: I'm running a Flink 0.10.2 App by submitting to YARN as an application. I'm using an AWS EMR cluster of 1 Master and 10 d2.8xlarge. When I submit the job using: bin/flink run \ -m yarn-cluster \ -yjm 20480 \ -yn 10 \ -ytm 80960 \ -ys 36 \ -yD taskmanager.network.num

FlinkML 0.10.1 - Using SparseVectors with MLR does not work

2016-02-03 Thread Sourigna Phetsarath
All: I'm trying to use SparseVectors with FlinkML 0.10.1. It does not seem to be working. Here is a UnitTest that I created to recreate the problem: *package* com.aol.ds.arc.ml.poc.flink > *import* org.junit.After > *import* org.junit.Before > *import* org.slf4j.LoggerFactory > *import* org.