Hi, this looks roughly as below
---- val env = ExecutionEnvironment.getExecutionEnvironment() val ds: DataSet[…] = env .readTextFile(path) .map(yourCsvLineParser) val tableEnv = TableEnvironment.getTableEnvironment(env) tableEnv.registerDataSet("myTable", ds) val result = tableEnv.sqlQuery("SELECT …. FROM myTable ….") ---- Best Fabian 2018-05-09 15:09 GMT+02:00 Esa Heikkinen <esa.heikki...@student.tut.fi>: > Hi > > > > Sorry the stupid question, but how to connect readTextFile (or > readCsvFile), MapFunction and SQL together in Scala code ? > > > > Best, Esa > > > > *From:* Fabian Hueske <fhue...@gmail.com> > *Sent:* Tuesday, May 8, 2018 10:26 PM > > *To:* Esa Heikkinen <esa.heikki...@student.tut.fi> > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-files in parallel > > > > Hi, > > the Table API / SQL and the DataSet API can be used together in the same > program. > > So you could read the data with a custom input format or a TextInputFormat > and a custom MapFunction parser and hand it to SQL afterwards. > > The program would be a regular Scala DataSet program with an > ExecutionEnvironment as in the examples or the documenation. > > To read many different files, you can put them all in a single folder and > scan the whole folder. If you run on the master, you can try the new > multi-path feature of FileInputFormats. Alterantively you can add many > sources and use a union operator to union all data sets. > > Best, Fabian > > > > 2018-05-08 15:49 GMT+02:00 Esa Heikkinen <esa.heikki...@student.tut.fi>: > > Hi > > > > Would it better to use DataSet API, Table (Relational) and readCsvFile() , > because it is little but upper level implementation ? > > > > SQL also sounds very good in this (batch processing) case, but is it > possible to use (because many different type of csv-files) ? > > And does it understand timeseries-data ? > > > > By the way, how to the control flow is running in main (Scala) program and > what is the structure of main program ? > > I did mean, if I want to read many csv-files and I have certain > consecutive reading order of them. Is that possible and how ? > > > > Actually I want to implement upper level (state-machine-based) logic for > reading csv-files by certain order. > > > > Esa > > > > *From:* Fabian Hueske <fhue...@gmail.com> > *Sent:* Tuesday, May 8, 2018 2:00 PM > > > *To:* Esa Heikkinen <esa.heikki...@student.tut.fi> > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-files in parallel > > > > Hi, > > the easiest approach is to read the CSV files linewise as regular text > files (ExecutionEnvironment.readTextFile()) and apply custom parse logic > in a MapFunction. > > Then you have all freedom to deal with records of different schema. > > Best, Fabian > > > > 2018-05-08 12:35 GMT+02:00 Esa Heikkinen <esa.heikki...@student.tut.fi>: > > Hi > > > > At this moment a batch query is ok. > > > > Do you know any good (Scala) examples how to query batches (different type > of csv-files) in parallel ? > > > > Or do you have example of a custom source function, that read csv-files > parallel ? > > > > Best, Esa > > > > *From:* Fabian Hueske <fhue...@gmail.com> > *Sent:* Monday, May 7, 2018 3:48 PM > *To:* Esa Heikkinen <esa.heikki...@student.tut.fi> > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-files in parallel > > > > Hi Esa, > > you can certainly read CSV files in parallel. This works very well in a > batch query. > > For streaming queries, that expect data to be ingested in timestamp order > this is much more challenging, because you need 1) read the files in the > right order and 2) cannot split files (unless you guarantee that splits are > read in the right order). > > The CsvTableSource does not guarantee to read files in timestamp order (it > would have to know the timestamps in each file for that). > > Having files with different schema is another problem. The SQL / Table API > require a fixed schema per table (source). > > > > The only recommendation when reading files in parallel for a streaming use > case is to implement a custom source function and be careful when > generating watermarks. > > Best, Fabian > > > > 2018-05-07 12:44 GMT+02:00 Esa Heikkinen <esa.heikki...@student.tut.fi>: > > Hi > > > > I would want to read many different type csv-files (time series data) > parallel using by CsvTableSource. Is that possible in Flink application ? > If yes, are there exist the examples about that ? > > > > If it is not, do you have any advices how to do that ? > > > > Should I combine all csv-files to one csv-file in pre-processing phase ? > But this has little problem, because there are not same type (columns are > different, except timestamp-column). > > > > Best, Esa > > > > > > > > >