Re: Query big mssql Data Source [Batch]

2018-12-06 Thread Flavio Pompermaier
That inputformat is a batch one, so there's no state backend. You need to output the fetched data somewhere AFAIK On Thu, Dec 6, 2018 at 3:49 PM miki haiat wrote: > Hi Flavio , > That working fine for and im able to pull ~17m rows in 20 seconds. > > Im a bit confuse regarding the state backhand ,

Re: Query big mssql Data Source [Batch]

2018-12-06 Thread miki haiat
Hi Flavio , That working fine for and im able to pull ~17m rows in 20 seconds. Im a bit confuse regarding the state backhand , I could find a way to configure it so im guessing the data is in the memory ... thanks, Miki On Thu, Dec 6, 2018 at 12:06 PM Flavio Pompermaier wrote: > the construc

Re: Query big mssql Data Source [Batch]

2018-12-06 Thread Flavio Pompermaier
the constructor of NumericBetweenParametersProvider takes 3 params: long fetchSize, long minVal, long maxVal. If you want parallelism you should use a 1 < fetchSize < maxVal. In your case, if you do new NumericBetweenParametersProvider(50, 3, 300) you will produce 6 parallel tasks: 1. SELECT

Re: Query big mssql Data Source [Batch]

2018-12-06 Thread miki haiat
hi Flavio , This is the query that im trying to coordinate > .setQuery("SELECT a, b, c, \n" + > "FROM dbx.dbo.x as tls\n"+ > "WHERE tls.a BETWEEN ? and ?" > > And this is the way im trying to parameterized ParameterValuesProvider pramProvider = new NumericBetweenParametersProvide

Re: Query big mssql Data Source [Batch]

2018-12-05 Thread Flavio Pompermaier
whats your query? Have you used '?' where query should be parameterized? Give a look at https://github.com/apache/flink/blob/master/flink-connectors/flink-jdbc/src/test/java/org/apache/flink/api/java/io/jdbc/JDBCFullTest.java

Re: Query big mssql Data Source [Batch]

2018-12-05 Thread miki haiat
Im using jdts driver to query mssql . I used the ParametersProvider as you suggested but for some reason the job wont run parallel . [image: flink_in.JPG] Also the sink , a simple print out wont parallel [image: flink_out.JPG] On Tue, Dec 4, 2018 at 10:05 PM Flavio Pompermaier wrot

Re: Query big mssql Data Source [Batch]

2018-12-04 Thread Flavio Pompermaier
You can pass a ParametersProvider to the jdbc input format in order to parallelize the fetch. Of course you don't have to kill the mysql server with too many request in parallel so you'll probably put a limit to the parallelism of the input format. On Tue, 4 Dec 2018, 17:31 miki haiat HI , > I

Query big mssql Data Source [Batch]

2018-12-04 Thread miki haiat
HI , I want to query some sql table that contains ~80m rows. There is a few ways to do that and i wonder what is the best way to do that . 1. Using JDBCINPUTFORMAT -> convert to dataset and output it without doing any logic in the dataset, passing the full query in the JDBCINPUTFORM