Re: Is there any Natural Language Processing samples for flink?

2022-07-27 Thread Suneel Marthi
Yes you can there have been uses of Apache OpenNLP in Flink pipelines and other Machine Translation libraries - happy to chat with you offline. On Wed, Jul 27, 2022 at 1:07 PM John Smith wrote: > But we can use some sort of Java/Scala NLP lib within our Fink Jobs I > guess... > > On Tue, Jul 26,

Re: AWS Client Builder with default credentials

2020-02-24 Thread Suneel Marthi
Not sure if this helps - this is how I invoke a Sagemaker endpoint model from a flink pipeline. See https://github.com/smarthi/NMT-Sagemaker-Inference/blob/master/src/main/java/de/dws/berlin/util/AwsUtil.java On Mon, Feb 24, 2020 at 10:08 AM David Magalhães wrote: > Hi Robert, thanks for your

Re: Read multiline JSON/XML

2019-11-29 Thread Suneel Marthi
For XML, u could look at Mahout's XMLInputFormat (if u r using HadoopInput Format). On Fri, Nov 29, 2019 at 9:01 AM Chesnay Schepler wrote: > Why vino? > > He's specifically asking whether Flink offers something _like_ spark. > > On 29/11/2019 14:39, vino yang wrote: > > Hi Flavio, > > IMO, it w

Re: Do flink have plans to support Deep Learning?

2019-04-18 Thread Suneel Marthi
that's a very open-ended question. There's been enough work done on using Flink for Deep Learning model inference - with TensorFlow (look at Eron Wright's Flink-Tensorflow project), with Amazon Sagemaker (i have code for that) or work from LightBend on Flink Model serving. So yes, there's enuf of

Re: Producing binary Avro to Kafka

2019-01-09 Thread Suneel Marthi
This was presented at Flink Forward Ber;lin 2017 - see the slide deck here https://smarthi.github.io/flink-forward-berlin-2017-moving-beyond-moving-bytes/#/19 You should be able to leverage Confluent/Horton schema registries from flink pipelines. On Wed, Jan 9, 2019 at 4:14 PM Elliot West wrote:

Re: Flink Anomaly Detection

2017-07-20 Thread Suneel Marthi
FWIW, We have a built a similar Log Aggregator internally using Apache Nifi + KFC stack (KFC = Kafka, Flink, Cassandra) Using Apache NiFi for ingesting logs from Openstack via rsyslog and writing them out to Kafka topics -> Flink Streaming + CEP for detecting anomalous patterns -> persist the pat

Re: Integrating Flink CEP with a Rules Engine

2017-06-23 Thread Suneel Marthi
hat is planned to be >> integrated for the SQL effort, and iii) what else is required, and >> consolidate the resources >> available. >> >> This will allow the community to move faster and with a clear roadmap. >> >> Kostas >> >> On Jun 23, 2017

Re: Integrating Flink CEP with a Rules Engine

2017-06-23 Thread Suneel Marthi
FWIW, here's an old Cloudera blog about using Drools with Spark. https://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/ It should be possible to invoke Drools from Flink in a similar way (I have not tried it). It all depends on what the use

Re: Suggestion for top 'k' products

2017-03-13 Thread Suneel Marthi
For an example implementation using Flink, check out https://github.com/bigpetstore/bigpetstore-flink/blob/master/src/main/java/org/apache/bigtop/bigpetstore/flink/java/FlinkStreamingRecommender.java On Mon, Mar 13, 2017 at 1:29 PM, Suneel Marthi wrote: > A simple way is to populate a Prior

Re: Suggestion for top 'k' products

2017-03-13 Thread Suneel Marthi
A simple way is to populate a Priority Queue of max size 'k' and implement a comparator on ur records. That would ensure that u always have Top k records at any instant in time. On Mon, Mar 13, 2017 at 1:25 PM, Meghashyam Sandeep V < vr1meghash...@gmail.com> wrote: > Hi All, > > I'm trying to u

Re: Question tableEnv.toDataSet : Table to DataSet>

2016-08-20 Thread Suneel Marthi
I can confirm that the code u have works in Flink 1.1.0 On Sat, Aug 20, 2016 at 3:37 PM, Camelia Elena Ciolac wrote: > > Good evening, > > > I started working with the beta Flink SQL in BatchTableEnvironment and I > am interested to convert the resulted Table object into a > DataSet>. > > I give

Re: flink1.0 DataStream groupby

2016-07-21 Thread Suneel Marthi
It should be keyBy(0) for DataStream API (since Flink 0.9) Its groupBy() in DataSet API. On Fri, Jul 22, 2016 at 1:27 AM, wrote: > Hi, > today,I use flink to rewrite my spark project,in spark ,data is > rdd,and it have much transformations and actions,but in flink,the > DataStream does not

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-21 Thread Suneel Marthi
I meant to respond to this thread yesterday, but got busy with work and slipped me. This is possible doable using Flink Streaming, others can correct me here. *Assumption:* Both the Batch and Streaming processes are reading from a single Kafka topic and by "Batched data", I am assuming its the sa

Re: Random access to small global state

2016-07-09 Thread Suneel Marthi
U could use ignite too, I believe they have a plugin for flink streaming. Sent from my iPhone > On Jul 9, 2016, at 8:05 AM, Sebastian wrote: > > Hi, > > I'm planning to work on a streaming recommender in Flink, and one problem > that I have is that the algorithm needs random access to a small

Re: Reading whole files (from S3)

2016-06-07 Thread Suneel Marthi
You can use Mahout XMLInputFormat with Flink - HAdoopInputFormat definitions. See http://stackoverflow.com/questions/29429428/xmlinputformat-for-apache-flink http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Read-XML-from-HDFS-td7023.html On Tue, Jun 7, 2016 at 10:11 PM, Jamie Grie

Re: how to convert datastream to collection

2016-05-03 Thread Suneel Marthi
DataStream> *newCentroids = new DataStream<>.()* *Iterator> iter = DataStreamUtils.collect(newCentroids);* *List> list = Lists.newArrayList(iter);* On Tue, May 3, 2016 at 10:26 AM, subash basnet wrote: > Hello all, > > Suppose I have the datastream as: > DataStream> *newCentroids*; > > How

Re: Anyone going to ApacheCon Big Data in Vancouver?

2016-04-28 Thread Suneel Marthi
I'll be there Ken, still waiting for Part 3 of ur blog series from 2013 on text processing. :) On Thu, Apr 28, 2016 at 6:34 PM, Ken Krugler wrote: > Hi all, > > Is anyone else from the community going? > > It would be fun to meet up with other Flink users during the event. > > I’ll be there from

Re: Gelly CommunityDetection in scala example

2016-04-27 Thread Suneel Marthi
Recall facing a similar issue while trying to contribute a gelly-scala example to flink-training. See https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/scala/com/dataartisans/flinktraining/exercises/gelly_scala/PageRankWithEdgeWeights.scala On Wed, Apr 27, 2016 at 11:3

Re: Powered by Flink

2016-04-06 Thread Suneel Marthi
t; >> Ah ok, sorry. I think linking to the wiki is also ok. >>> >> >>> >> >>> >> On 19.10.2015 15:18, Fabian Hueske wrote: >>> >>> >>> >>> @Timo: The proposal

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Suneel Marthi
U may want to use FlinkMLTools.persist() methods which use TypeSerializerFormat and don't enforce IOReadableWritable. On Tue, Mar 29, 2016 at 2:12 PM, Sourigna Phetsarath < gna.phetsar...@teamaol.com> wrote: > Till, > > Thank you for your reply. > > Having this issue though, WeightVector does n

Re: Upserts with Flink-elasticsearch

2016-03-28 Thread Suneel Marthi
Would it be useful to modify the existing Elasticsearch 1x sink to be able to handle Upserts ? On Mon, Mar 28, 2016 at 5:32 PM, Zach Cox wrote: > Hi Madhukar - with the current Elasticsearch sink in Flink 1.0.0 [1], I > don't think an upsert is possible, since IndexRequestBuilder can only > ret

Re: Convert Datastream to Collector or List

2016-03-19 Thread Suneel Marthi
DataStream ds = ... Iterator iter = DataStreamUtils.collect(ds); List list = Lists.newArrayList(iterator); Hope that helps. On Wed, Mar 16, 2016 at 7:37 AM, Ahmed Nader wrote: > Hi, > I want to pass an object of type DataStream ,after applying map function > on it, as a parameter to be u

Re: Availability for the ElasticSearch 2 streaming connector

2016-02-18 Thread Suneel Marthi
k/pull/1479, modified it a bit, > and just included it in my own project that uses the Elasticsearch 2 java > api. Seems to work well. Here are the files so you can do the same: > > https://gist.github.com/zcox/59e486be7aeeca381be0 > > -Zach > > > On Wed, Feb 17, 2016 at 4:06

Re: Availability for the ElasticSearch 2 streaming connector

2016-02-17 Thread Suneel Marthi
Hey I missed this thread, sorry about that. I have a basic connector working with ES 2.0 which I can push out. Its not optimized yet and I don't have the time to look at it, if someone would like to take it over go ahead I can send a PR. On Wed, Feb 17, 2016 at 4:57 PM, Robert Metzger wrote: >

Re: Reading Binary Data (Matrix) with Flink

2016-01-24 Thread Suneel Marthi
;>> >>>> You can use the input format from Hadoop in Flink by using readHadoopFile >>>> method. The method returns a dataset which of type is Tuple2. >>>> Note that MapReduce equivalent transformation in Flink is composed of map, >>>> groupB

Re: Reading Binary Data (Matrix) with Flink

2016-01-19 Thread Suneel Marthi
Guess u r looking for Flink's BinaryInputFormat to be able to read blocks of data from HDFS https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/api/common/io/BinaryInputFormat.html On Wed, Jan 20, 2016 at 12:45 AM, Saliya Ekanayake wrote: > Hi, > > I am trying

Re: Using Hadoop Input/Output formats

2015-11-24 Thread Suneel Marthi
Guess, it makes sense to add readHadoopXXX() methods to StreamExecutionEnvironment (for feature parity with what's existing presently in ExecutionEnvironment). Also Flink-2949 addresses the need to add relevant syntactic sugar wrappers in DataSet api for the code snippet in Fabian's previous email

Re: Flink Streaming Core 0.10 in maven repos

2015-11-23 Thread Suneel Marthi
This is what I used for a Flink Streaming talk and demo at a meetup last week, this is with Flink 0.10.0 org.apache.flink flink-core ${flink.version} org.apache.flink flink-streaming-java ${flink.version} org.apache.flink flink-runtime ${flink.version} org.apa

Re: Powered by Flink

2015-10-19 Thread Suneel Marthi
+1 to this. On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske wrote: > Sounds good +1 > > 2015-10-19 14:57 GMT+02:00 Márton Balassi : > > > Thanks for starting and big +1 for making it more prominent. > > > > On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske > wrote: > > > >> Thanks for starting this

Re: setSlotSharing NPE: Starting a stream consumer in a thread

2015-10-03 Thread Suneel Marthi
While on that Marton, would it make sense to have a dataStream.writeAsJson() method? On Sat, Oct 3, 2015 at 11:54 PM, Márton Balassi wrote: > Hi Jay, > > As for the NPE: the file monitoring function throws it when the location > is empty. Try running the datagenerator first! :) This behaviour i

[ANNOUNCE] Apache Mahout 0.10.1 Released

2015-05-31 Thread Suneel Marthi
The Apache Mahout PMC is pleased to announce the release of Mahout 0.10.1. Mahout's goal is to create an environment for quickly creating machine learning applications that scale and run on the highest performance parallel computation engines available. Mahout comprises an interactive environment a