Yes, this code seems very reasonable. :D The way to use this to "modify" a file on HDFS is to read the file, then filter out some elements and write a new modified file that does not contain the filtered out elements. As said before, Flink (or HDFS), does not allow in-place modification of files.
On Fri, Jun 5, 2015 at 4:55 AM, Chiwan Park <chiwanp...@icloud.com> wrote: > Basically Flink uses Data Model in functional programming model. All DataSet > is immutable. This means we cannot modify DataSet but ㅐonly can create new > DataSet with modification. Update, delete query are represented as writing > filtered DataSet. > Following scala sample shows select, insert, update, and remove query in > Flink. (I’m not sure this is best practice.) > > case class MyType(id: Int, value1: String, value2: String) > > // load data (you can use readCsvFile, or something else.) > val data = env.fromElements(MyType(0, “test”, “test2”), MyType(1, “hello”, > “flink”), MyType(2, “flink”, “good”)) > > // selecting > // same as SELECT * FROM data WHERE id = 1 > val selectedData1 = data.filter(_.id == 1) > // same as SELECT value1 FROM data WHERE id = 1 > val selectedData2 = data.filter(_.id == 1).map(_.value1) > > // removing is same as selecting such as following > // same as DELETE FROM data WHERE id = 1, but DataSet data is not changed. > the result is removedData > val removedData = data.filter(_.id != 1) > > // inserting > // same as INSERT INTO data (id, value1, value2) VALUES (3, “new”, “data”) > val newData = env.fromElements(MyType(3, “new”, “data”)) > val insertedData = data.union(newData) > > // updating > // UPDATE data SET value1 = “updated”, value2 = “data” WHERE id = 1, but > DataSet data is not changed. > val updatedData = data.map { x => if (x.id == 1) MyType(x.id, “updated”, > “data”) else x } > > Regards, > Chiwan Park > >> On Jun 5, 2015, at 9:22 AM, hawin <hawin.ji...@gmail.com> wrote: >> >> Hi Chiwan >> >> Thanks for your information. I knew Flink is not DBMS. I want to know what >> is the flink way to select, insert, update and delete data on HDFS. >> >> >> @Till >> Maybe union is a way to insert data. But I think it will cost some >> performance issue. >> >> >> @Stephan >> Thanks for your suggestion. I have checked apache flink roadmap. SQL on >> flink will be released on Q3/Q4 2015. Will it support insertion, deletion >> and update data on HDFS? >> You guys already provided a nice example for selecting data on HDFS. Such >> as: TPCHQuery10 and TPCHQuery3. >> Do you have other examples for inserting, updating and removing data on HDFS >> by Apache flink >> >> Thanks >> >> >> >> >> -- >> View this message in context: >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Apache-Flink-transactions-tp1457p1491.html >> Sent from the Apache Flink User Mailing List archive. mailing list archive >> at Nabble.com. > > > >