[fpc-pascal] TSQLQuery and buffering.
Hi everyone, Firstly, I realise this is an old subject, but I still can't find any easy answer. Really, the question is simple: Is there ANY way of TSQLQuery NOT reading the entire result set into memory? This is really killing me. I've got quite a lot of web code using sqldb and it works really well. The classes for connection, transaction, query etc. are perfect for general use and make it so easy to write database applications. I find it excellent that I only need to change the connection class in my code to use a different database. For web code (mine included), fetching a few dozen, hundred or even a few thousand rows is no real problem. The sqldb classes are *very* fast and very easy to use. The problem comes when I need to write some apps to either analyse or move a lot of data. Initially all I need to do is analyse some data by simply reading through rows, doing some calculations and other analysis. For even 1 million rows this is very fast (10 seconds using MySQL) so no major problems. However, it does use quite a lot of memory. Further use of this simply explodes. I have tables for analysis (and later extracting/moving) that are over 40 million rows. This currently is simply not possible in fpc, at least using the excellent sqldb classes. After blowing 16GB of RAM+swap it takes a while to recover. To be fair I've tried looking through the code and see how it's implemented, but I can't see any easy way of avoiding the memory buffering of the entire result set. Apart from re-implementing a lot of classes or going down to the driver level I can't currently see any easy way round this. I know there is some old discussion on this, but I'm kind of hoping that there may have been a few changes or something recently that I've missed. Any help appreciated. Regards, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Fri, 24 Mar 2017, Gary Doades wrote: > >> Really, the question is simple: Is there ANY way of TSQLQuery NOT >> reading the entire result set into memory? > Set Unidirectional to True, and it will keep only 1 row in memory. > When you are simply scanning through the result set, this is all you need. Hi Michael, Many thanks for your reply. I had read about UniDirectional and I have indeed tried this. It doesn't seem to make any significant difference. Looking through the source code for TBufDataset it looks like UniDirectional just turns off building various indexes/structures etc. and fetches the result set all in one go instead of 10 row chunks. It still buffers everything in memory. I can see this from running a test program. When I call the SQL query .Open method then I can see from Task Manager that it is fetching all rows from the database. My test program reaches about 1.7GB of ram fetching 4.8 million rows. Only after the complete fetching does the next line of code execute and my processing loop start. Clearly this is looping over the now buffered dataset. The memory is finally freed when the Query's Close method is called. It may be that UniDirectional is meant to not buffer all rows, but at the moment it certainly seems to. This is FPC 3.0.2 BTW. I've also tried setting ReadOnly to true on the SQL Query. I Understand about the "normal" use of TSQLQuery and data sets and I also use this for small result set that need the data changing some way and writing back to the DB. What I need now is an equally convenient and powerful way of processing "big data". Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
> > MSEgui has TSQLResult for that purpose without the TDataset overhead. > https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/msesqlresult.pas Thanks. Although it looks interesting I'd rather not get into another set of classes etc. at this point. I may come back to it later though! ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
>> I had read about UniDirectional and I have indeed tried this. It doesn't >> seem to make any significant difference. Looking through the source code for >> TBufDataset it looks like UniDirectional just turns off building various >> indexes/structures etc. and fetches the result set all in one go instead of >> 10 row chunks. >> It still buffers everything in memory. > In that case, it is a bug. The very purpose of UniDirectional is exactly NOT > to buffer anything, just to keep the current record in memory. Ah, OK. I might have another trawl through the sqldb source then to see if it is a simple mistake I can fix or a bigger job. I guess I should file a bug report? Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
> From your original message one could think that you were using > PacketRecords:=-1 which means fetch all records at once, but if you are using > the standard setting, which is 10 and yet it is still fetching everything at > once, it > sounds like a bug to me. As far as I can tell setting UniDirectional to true also sets PacketRecords to -1. Even if PacketRecords were still 10 though it still buffers the entire result set, but it just fetches 10 rows at a time from the DB server until it has them all. Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
> >> It may be that UniDirectional is meant to not buffer all rows, but at the >> moment it certainly seems to. > Strange, looking at source code it seems to me, that buffering should not > happen. > As far as TUniDirectionalBufIndex should be used and his AddRecord method > does not allocate new memory. Indeed, that's why I can't currently see where the problem lies. It shouldn't buffer the rows/records, but it does... or at least something does and I'm pretty sure it's not my program. I've pared my test program down to the bare minimum of just looping over the result set, but all the actual fetching from the DB and buffering occurs on just the Open call on the TSQLQuery. Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
> >> Indeed, that's why I can't currently see where the problem lies. It >> shouldn't buffer the rows/records, but it does... or at least >> something does and I'm pretty sure it's not my program. > The DB-client library maybe? Yup! I was running some other tests and copied the table contents over to both postgres and MS SQL Server. Postgres has the same issue, all rows are (seemingly) buffered on the TSQLQuery.Open and huge amounts of RAM used. However, using MS SQL Server almost no memory is used at all! All program results are identical and all I did in the program was swap out the TSQLConnection with different "drivers". The problem therefore lies in either the Pascal layer on top of the native client libs or in the way the client libs themselves work :( I suspect the latter. I will need to do some more investigation in a different way if I am to work round this one. I apologise for potentially wasting people's time. It might be good for people to know the above issues (limitations?) anyway. Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Mon, 27 Mar 2017, Gary Doades wrote: >> The problem therefore lies in either the Pascal layer on top of the native >> client libs or in the way the client libs themselves work :( I suspect the >> latter. > As far as I know, the DB-Specific pascal layer does not buffer anything, it > just fetches the result. > There is of course little to no control over how the client lib fetches the > result. > If memory serves well, mySQL has 2 separate calls: mysql_use_result and > mysql_store_result where the difference is exactly how it fetches the result > set. > Please report your findings, if any :) Yeah, exactly right. The Pascal code in mysqlconn.inc calls mysql_store_result which does indeed fetch the entire result set into memory. It could be changed (perhaps) to use mysql_use_result instead which does row-at-a-time streaming, but I'm not sure I have the expertise to make that change or what any other implications of that change might be. I may hack it for my own purposes in the short term to see what happens! Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Threading vs Parallelism ?
> I would offer the following definitions: > - Parallelism is a (design) concept for expressing collateral actions in > which the processing order of the actions is unspecified. They may take place > serially or > contemporaneously in real time, or a mixture of the two. > - Threads are an implementation mechanism for realising collateral actions > within a single processing environment. > Neither of the above implies multiple CPUs or processing units. I would agree wholeheartedly with most of that. Parallelism is purely a concept of multiple tasks running at the same time Threads or processes are just implementations of that concept. Threads tend to be used for related tasks in a single process. Separate processes tend to be used for unrelated or independent tasks. Those are not hard and fast rules. However, multiple independent compute units must be required for *true* parallelism. On a single processor any tasks running at the same time is just an illusion, normally created by the OS in time slicing between tasks based on certain criteria (priority, I/O, cpu usage etc.). That applies equally to threads or processes Various languages assist, or purely exist, to make creating multi-tasking easier, but it ultimately all boils down to the same thing. Regards, Gary ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Threading vs Parallelism ?
>> On Mar 31, 2017, at 5:32 PM, Michael Schnell wrote: >> >> Regarding the view of the application (disregarding execution speed) or of >> the application programmer, there is no difference between real ("Hardware") >> and virtual (e.g. threads) parallelism. These dirty basics need to be >> handled by the software and hardware infrastructure. >> >> The use of real (e.g. multi CPU) parallelism that the application allows for >> being divided into multiple parallel "Threads". his fact given Hardware >> parallelism can speed up the execution, while even virtual parallelism >> allows for improving the latency of definable parts the application. > I’m not understanding how parallelism could apply to anything besides > breaking down a task so that it can run on multiple hardware compute units. > Why would you ever break a task into 100 threads when you could just run it > one thread? "Events". One gets into the grey area of threads and "processors". As an example you could divide a program into two threads, one reading and one writing. Immediately after issuing a write request, you could start reading the next item in a separate thread before the write is complete. This works because the I/O subsystem is mostly independent so that the OS can schedule another thread (or process) which is not waiting for the I/O subsystem to reply. Using only a single thread, the whole program has to wait for the I/O write to finish before starting the next read. In this way a single process on a single "processor" (at least CPU) can interleave tasks to speed up the overall performance of the application. This could be extended to a myriad of cases of course. Hence the recent upsurge in "async" routines, which only works if used properly of course. Cheers, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] unexpected termination with no errors
Windows represents exception codes as an unsigned int. The error -1073741819 is actually0xC005. This represents some form of access violation, usually associated with trying to write to freed memory,double freeing memory etc. If the stack is involved/corrupted then a trackback and useful information would be difficult. Whereabouts down the line a disk full translates into an access violation is difficult to say, but it results in a mess to say the least. Regards, Gary. On 15/05/2019 16:43, James Richters wrote: It's a simple single thread console app. I found my problem... years ago I implemented a batch file to run my program in the test environment to help be with debugging... and what it does is redirect the errors to a file so that if flashed by real quick, I would be able to just look at the file. Then I just echo the file to the screen, and if I detect an error, I also pause so I can see it.This a common solution to the windows limitation of not be capable directing STDERR to console AND to a file. Well the file wasn't reporting the error and it wasn't on the screen... it looked like a normal exit, BUT it was actually giving me the proper report... unfortunately the error was the one thing that my batch file could not possibly display EInOutError: Disk Full DOH!!! With the disk full my log file could only show up to but not including the error... because it could not write anymore on a full disk... I still don't have a great solution for this... I see methods of implementing something like a Tee function on windows, but the problem is I don't want ALL the output to go to the log, I only want STDERR to go to the log file and the screen... not my output I am sending with the CRT unit that sends colored text for various purposes. This is such a pain with windows to accomplish this seemingly simple task.Anyway I know what the problem is and can put in something to detect it. Maybe I will just check if the errorlevel is negative and if so write a suggestion to the screen that disk full may have caused this and then pause since I can't necessarily write anything to the file. Does anyone know what the errorlevel for EInOutError: Disk Full is -1073741819, (I'm not sure it's always that number... ) is this on purpose?, or a bug?, or a side effect of the disk being full so it can't generate the correct error code? if it was a normal errorcode I would have got that on my screen but since it's less than zero it got treated like a normal exit I did fix my batch file to treat anything less than zero as an error. so it doesn't really matter, I just didn't know they could be negative, but I'm curious why this strange errorlevel. Jim -Original Message- From: fpc-pascal On Behalf Of Karoly Balogh (Charlie/SGR) Sent: Wednesday, May 15, 2019 9:17 AM To: FPC-Pascal users discussions Subject: Re: [fpc-pascal] unexpected termination with no errors Hi, On Wed, 15 May 2019, James Richters wrote: Has anyone encountered anything like this before or know how I can make sure I always get the maximum amount of debugging info when my program crashes? Is it a subthreaded app? The only case when I noticed something similar (under Linux though), when a certain subthread throws an exception, it just silently disappears without any further handling. It doesn't throw any exception, unless you wrap the entire Execute method in a try-except. (Sidenote: I've been pondering for a while if I should report this as a bug. I think the RTL should put a try-except around there, to show a stacktrace on unhandled exceptions, just like the main thread dying does, but who knows which Delphi de-facto standard behavior would that violate, so meh...) In Linux/Darwin (on x64/ARM at least), only the thread causing the problem dies, no clue what happens under Windows. Maybe this helps. Charlie ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal