On Thu, Jun 2, 2016 at 5:03 PM, Patrick Baker <patrickbake...@gmail.com> wrote:
> > > 2016-06-03 2:10 GMT+12:00 David G. Johnston <david.g.johns...@gmail.com>: > >> On Thu, Jun 2, 2016 at 1:04 AM, Patrick Baker <patrickbake...@gmail.com> >> wrote: >> >>> >>>>> It's all working, except the LIMIT... if possible can you please give >>> me an example of that LIMIT in some of those queries? >>> >>> >> You also should use ORDER BY when using LIMIT and OFFSET; though >> depending on the setup it could be omitted. Usually as long as the second >> execution cannot select any of the records the first execution touched you >> can choose a random quantity. But if you want random then using OFFSET is >> pointless. >> >> SELECT * >> FROM generate_series(1, 10) >> ORDER BY 1 >> LIMIT 5 >> OFFSET 3 >> >> generate_series >> ---------------------- >> 4 >> 5 >> 6 >> 7 >> 8 >> >> You are going to have difficultly finding people willing to help when you >> cannot put together a self-contained and syntax error free example (I think >> the last one is...) of what you want to do. The PostgreSQL parser is very >> good at reading code and telling you what it doesn't like. I'm not >> inclined to spend time reading queries that obviously cannot run and point >> out those same problems. If you can a particular error you don't >> understand I'll be happy to try and explain what it is trying to tell you. >> >> You probably need to reformulate your update to read: >> >> UPDATE tbl >> FROM ( >> SELECT 50 RECORDS >> ) src >> WHERE src = tbl; >> >> And ensure that the 50 being selected each time through are a different >> 50. >> >> Writeable CTEs will probably help here. >> >> https://www.postgresql.org/docs/current/static/queries-with.html >> >> David J. >> >> > > Hi David. > > The SQLs inside the function works.... > Really? You seem to have lost your FOR loop for starters, and your RETURN statement, and a semi-colon after END, and I doubt crtRow.file_id works, should I go on...so, yes, you can run the four individual SQL statements correctly but the function itself is bogus. > I'm just having problem about limiting the query to the number of rows I > want, and also, to teach the update SQL to only touch the records the other > SQLs inside the function have touched. > This is the function updated: > > CREATE or REPLACE FUNCTION function_data_1(rows integer) > > RETURNS INTEGER AS $$ > > > declare > > completed integer; > > offset_num integer; > > crtRow record; > > > BEGIN > > offset_num = 0; > > > INSERT INTO table2_y_b (note_id, size, file_id, full_path) > > ( > > SELECT > > t1.note_id, > > t1.size, > > t1.file_id, > > t1.full_path > > FROM > > table1_n_b t1 > > JOIN > > table3_n_b t3 ON t3.file_id = t1.file_id > > ); > > > UPDATE table2_y_b t2 SET segment_data = > > ( > > SELECT > > o1.data > > FROM > > original_table1_b o1 > > JOIN > > table3_n_b t3 ON t3.file_id = o1.file_id > > WHERE > > t2.migrated = 0 > > AND > > t2.file_id = o1.file_id > > ); > > > UPDATE table2_y_b SET migrated = 1 WHERE file_id = crtRow.file_id AND > migrated = 0; > > > UPDATE original_table1_b SET data = NULL WHERE file_id = crtRow.file_id; > > > END > > > $$ language 'plpgsql'; > > > > > > - As you can see, the first *insert*, inserts data into a new table from > another select. This query must be limited by the number of rows I'll > provide when calling the function; example: > > select function_data_1(5000); >> select function_data_1(60000); >> select function_data_1(15000); > > > - The first *update*, copies the BLOBS from the original_table1_b table > into the new one (as above). Here, I also need the query knows to only > touch those records that have been touched by the above query. > > > - The second *update*, set the table2_y_b.migrated column from 0 to 1, > telling me that, that record has been touched by the query. So the next > call ( select function_data_1(60000); ) will already know that it does not > need to touch that record; example: > > WHERE >> t2.migrated = 0 > > > - The third and last *update*, deletes (set the blobs column as null) the > blobs that have already been touched by the above queries.... Still.. don't > know how to tell postgres to only touches the rows that have been touched > by the above queries.... > > Here's a fish - though you will still need to clean it. This is not tested, and I haven't ever build this exact query for real, but it should work in theory... --assumes that to be migrated records have previously had their migrated flag set to 0 function name (number_of_rows_to_process integer) LANGUAGE sql -- this no longer requires procedural logic so no need for plpgsql RETURNS SETOF bigint --returns the affected ids AS $$ WITH the_records_I_want_to_affect AS ( -- pick N records to process SELECT id, ... FROM source_tbl WHERE migrated = 0 ORDER BY ... LIMIT number_of_rows_to_process -- your function argument goes here FOR UPDATE ), migrate_the_data AS ( -- place a copy of them into the archive table INSERT INTO migration_table SELECT id, ... FROM the_records_I_want_to_affect RETURNING * ), mark_as_migrated AS ( -- mark them as having been archived and nullify the blob data UPDATE source_tbl SET migrated = 1, data = null FROM migrate_the_data recs WHERE recs.id = source_tbl.id RETURNING source_tbl.id ) SELECT id FROM mark_as_migrated; $$ I am sure a fully working version of this idiom in present in one and more places on the internet. Feel free to search out fully working examples with additional commentary. You can make a FOR loop version of this work, and had to many years ago before writable CTEs were implemented. David J.