Re: Apache Flink transactions

Aljoscha Krettek Thu, 04 Jun 2015 23:18:52 -0700

Yes, this code seems very reasonable. :D

The way to use this to "modify" a file on HDFS is to read the file,
then filter out some elements and write a new modified file that does
not contain the filtered out elements. As said before, Flink (or
HDFS), does not allow in-place modification of files.


On Fri, Jun 5, 2015 at 4:55 AM, Chiwan Park <chiwanp...@icloud.com> wrote:
> Basically Flink uses Data Model in functional programming model. All DataSet 
> is immutable. This means we cannot modify DataSet but ㅐonly can create new 
> DataSet with modification. Update, delete query are represented as writing 
> filtered DataSet.
> Following scala sample shows select, insert, update, and remove query in 
> Flink. (I’m not sure this is best practice.)
>
> case class MyType(id: Int, value1: String, value2: String)
>
> // load data (you can use readCsvFile, or something else.)
> val data = env.fromElements(MyType(0, “test”, “test2”), MyType(1, “hello”, 
> “flink”), MyType(2, “flink”, “good”))
>
> // selecting
> // same as SELECT * FROM data WHERE id = 1
> val selectedData1 = data.filter(_.id == 1)
> // same as SELECT value1 FROM data WHERE id = 1
> val selectedData2 = data.filter(_.id == 1).map(_.value1)
>
> // removing is same as selecting such as following
> // same as DELETE FROM data WHERE id = 1, but DataSet data is not changed. 
> the result is removedData
> val removedData = data.filter(_.id != 1)
>
> // inserting
> // same as INSERT INTO data (id, value1, value2) VALUES (3, “new”, “data”)
> val newData = env.fromElements(MyType(3, “new”, “data”))
> val insertedData = data.union(newData)
>
> // updating
> // UPDATE data SET value1 = “updated”, value2 = “data” WHERE id = 1, but 
> DataSet data is not changed.
> val updatedData = data.map { x => if (x.id == 1) MyType(x.id, “updated”, 
> “data”) else x }
>
> Regards,
> Chiwan Park
>
>> On Jun 5, 2015, at 9:22 AM, hawin <hawin.ji...@gmail.com> wrote:
>>
>> Hi  Chiwan
>>
>> Thanks for your information.  I knew Flink is not DBMS. I want to know what
>> is the flink way to select, insert, update and delete data on HDFS.
>>
>>
>> @Till
>> Maybe union is a way to insert data. But I think it will cost some
>> performance issue.
>>
>>
>> @Stephan
>> Thanks for your suggestion.  I have checked apache flink roadmap.  SQL on
>> flink will be released on Q3/Q4 2015. Will it support insertion, deletion
>> and update data on HDFS?
>> You guys already provided a nice example for selecting data on HDFS.  Such
>> as: TPCHQuery10 and TPCHQuery3.
>> Do you have other examples for inserting, updating and removing data on HDFS
>> by Apache flink
>>
>> Thanks
>>
>>
>>
>>
>> --
>> View this message in context: 
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Apache-Flink-transactions-tp1457p1491.html
>> Sent from the Apache Flink User Mailing List archive. mailing list archive 
>> at Nabble.com.
>
>
>
>

Re: Apache Flink transactions

Reply via email to