[fpc-pascal] TSQLQuery and buffering.

2017-03-25 Thread Gary Doades
Hi everyone,

 

Firstly, I realise this is an old subject, but I still can't find any easy
answer.

 

Really, the question is simple: Is there ANY way of TSQLQuery NOT reading
the entire result set into memory?

 

This is really killing me. I've got quite a lot of web code using sqldb and
it works really well. The classes for connection, transaction, query etc.
are perfect for general use and make it so easy to write database
applications. I find it excellent that I only need to change the connection
class in my code to use a different database.

 

For web code (mine included), fetching a few dozen, hundred or even a few
thousand rows is no real problem. The sqldb classes are *very* fast and very
easy to use.

 

The problem comes when I need to write some apps to either analyse or move a
lot of data. Initially all I need to do is analyse some data by simply
reading through rows, doing some calculations and other analysis. For even 1
million rows this is very fast (10 seconds using MySQL) so no major
problems. However, it does use quite a lot of memory.

 

Further use of this simply explodes. I have tables for analysis (and later
extracting/moving) that are over 40 million rows. This currently is simply
not possible in fpc, at least using the excellent sqldb classes. After
blowing 16GB of RAM+swap it takes a while to recover.

 

To be fair I've tried looking through the code and see how it's implemented,
but I can't see any easy way of avoiding the memory buffering of the entire
result set.

 

Apart from re-implementing a lot of classes or going down to the driver
level I can't currently see any easy way round this. I know there is some
old discussion on this, but I'm kind of hoping that there may have been a
few changes or something recently that I've missed.

 

Any help appreciated.

 

Regards,

Gary.

 

 

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] TSQLQuery and buffering.

2017-03-25 Thread Gary Doades

On Fri, 24 Mar 2017, Gary Doades wrote:
>
>> Really, the question is simple: Is there ANY way of TSQLQuery NOT 
>> reading the entire result set into memory?

> Set Unidirectional to True, and it will keep only 1 row in memory.
> When you are simply scanning through the result set, this is all you need.

Hi Michael,

Many thanks for your reply. I had read about UniDirectional and I have indeed 
tried this. It doesn't seem to make any significant difference. Looking through 
the source code for TBufDataset it looks like UniDirectional just turns off 
building various indexes/structures etc. and fetches the result set all in one 
go instead of 10 row chunks. It still buffers everything in memory.

I can see this from running a test program. When I call the SQL query .Open 
method then I can see from Task Manager that it is fetching all rows from the 
database. My test program reaches about 1.7GB of ram fetching 4.8 million rows.

Only after the complete fetching does the next line of code execute and my 
processing loop start. Clearly this is looping over the now buffered dataset. 
The memory is finally freed when the Query's Close method is called.

It may be that UniDirectional is meant to not buffer all rows, but at the 
moment it certainly seems to. This is FPC 3.0.2 BTW. I've also tried setting 
ReadOnly to true on the SQL Query.

I Understand about the "normal" use of TSQLQuery and data sets and I also use 
this for small result set that need the data changing some way and writing back 
to the DB. What I need now is an equally convenient and powerful way of 
processing "big data".

Thanks,
Gary.



___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] TSQLQuery and buffering.

2017-03-25 Thread Gary Doades
>
> MSEgui has TSQLResult for that purpose without the TDataset overhead.
> https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/msesqlresult.pas

Thanks.

 Although it looks interesting I'd rather not get into another set of classes 
etc. at this point. I may come back to it later though!

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] TSQLQuery and buffering.

2017-03-25 Thread Gary Doades
>> I had read about UniDirectional and I have indeed tried this. It doesn't 
>> seem to make any significant difference. Looking through the source code for
>> TBufDataset it looks like UniDirectional just turns off building various 
>> indexes/structures etc. and fetches the result set all in one go instead of 
>> 10 row chunks.
>> It still buffers everything in memory.

> In that case, it is a bug. The very purpose of UniDirectional is exactly NOT 
> to buffer anything, just to keep the current record in memory.

Ah, OK. I might have another trawl through the sqldb source then to see if it 
is a simple mistake I can fix or a bigger job.

I guess I should file a bug report?

Thanks,
Gary.


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] TSQLQuery and buffering.

2017-03-27 Thread Gary Doades

> From your original message one could think that you were using
> PacketRecords:=-1 which means fetch all records at once, but if you are using 
> the standard setting, which is 10 and yet it is still fetching everything at 
> once, it 
> sounds like a bug to me.

As far as I can tell setting UniDirectional to true also sets PacketRecords to 
-1. Even if PacketRecords were still 10 though it still buffers the entire 
result set, but it just fetches 10 rows at a time from the DB server until it 
has them all.

Thanks,
Gary.

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] TSQLQuery and buffering.

2017-03-27 Thread Gary Doades

>
>> It may be that UniDirectional is meant to not buffer all rows, but at the 
>> moment it certainly seems to.
> Strange, looking at source code it seems to me, that buffering should not 
> happen.
> As far as TUniDirectionalBufIndex should be used and his AddRecord method 
> does not allocate new memory.

Indeed, that's why I can't currently see where the problem lies. It shouldn't 
buffer the rows/records, but it does... or at least something does and I'm 
pretty sure it's not my program. I've pared my test program down to the bare 
minimum of just looping over the result set, but all the actual fetching from 
the DB and buffering occurs on just the Open call on the TSQLQuery.

Thanks,
Gary.


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] TSQLQuery and buffering.

2017-03-27 Thread Gary Doades
>
>> Indeed, that's why I can't currently see where the problem lies. It 
>> shouldn't buffer the rows/records, but it does... or at least 
>> something does and I'm pretty sure it's not my program.

> The DB-client library maybe?

Yup!

I was running some other tests and copied the table contents over to both 
postgres and MS SQL Server.

Postgres has the same issue, all rows are (seemingly) buffered on the 
TSQLQuery.Open and huge amounts of RAM used. However, using MS SQL Server 
almost no memory is used at all! All program results are identical and all I 
did in the program was swap out the TSQLConnection with different "drivers".

The problem therefore lies in either the Pascal layer on top of the native 
client libs or in the way the client libs themselves work :( I suspect the 
latter.

I will need to do some more investigation in a different way if I am to work 
round this one.

I apologise for potentially wasting people's time. It might be good for people 
to know the above issues (limitations?) anyway.

Thanks,
Gary.




___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] TSQLQuery and buffering.

2017-03-27 Thread Gary Doades

On Mon, 27 Mar 2017, Gary Doades wrote:

>> The problem therefore lies in either the Pascal layer on top of the native 
>> client libs or in the way the client libs themselves work :( I suspect the 
>> latter.

> As far as I know, the DB-Specific pascal layer does not buffer anything, it 
> just fetches the result.
> There is of course little to no control over how the client lib fetches the 
> result.

> If memory serves well, mySQL has 2 separate calls: mysql_use_result and 
> mysql_store_result where the difference is exactly how it fetches the result 
> set.

> Please report your findings, if any :)

Yeah, exactly right. The Pascal code in mysqlconn.inc calls mysql_store_result 
which does indeed fetch the entire result set into memory. It could be changed 
(perhaps) to use mysql_use_result instead which does row-at-a-time streaming, 
but I'm not sure I have the expertise to make that change or what any other 
implications of that change might be.

I may hack it for my own purposes in the short term to see what happens!

Thanks,
Gary.


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Threading vs Parallelism ?

2017-03-31 Thread Gary Doades
> I would offer the following definitions:

> - Parallelism is a (design) concept for expressing collateral actions in 
> which the processing order of the actions is unspecified. They may take place 
> serially or 
> contemporaneously in real time, or a mixture of the two.

> - Threads are an implementation mechanism for realising collateral actions 
> within a single processing environment.

> Neither of the above implies multiple CPUs or processing units.

I would agree wholeheartedly with most of that. Parallelism is purely a concept 
of multiple tasks running at the same time

Threads or processes are just implementations of that concept. Threads tend to 
be used for related tasks in a single process. Separate processes tend to be 
used for unrelated or independent tasks. Those are not hard and fast rules.

However, multiple independent compute units must be required for *true* 
parallelism. On a single processor any tasks running at the same time is just 
an illusion, normally created by the OS in time slicing between tasks based on 
certain criteria (priority, I/O, cpu usage etc.). That applies equally to 
threads or processes

Various languages assist, or purely exist, to make creating multi-tasking 
easier, but it ultimately all boils down to the same thing.

Regards,
Gary


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Threading vs Parallelism ?

2017-04-01 Thread Gary Doades

>> On Mar 31, 2017, at 5:32 PM, Michael Schnell  wrote:
>> 
>> Regarding the view of the application (disregarding execution speed) or of 
>> the application programmer, there is no difference between real ("Hardware") 
>>  and virtual (e.g. threads) parallelism. These dirty basics need to be 
>> handled by the software and hardware infrastructure.
>> 
>> The use of real (e.g. multi CPU) parallelism that the application allows for 
>> being divided into multiple parallel "Threads". his fact given Hardware 
>> parallelism can speed up the execution, while even virtual parallelism 
>> allows for improving the latency of definable parts the application.

> I’m not understanding how parallelism could apply to anything besides 
> breaking down a task so that it can run on multiple hardware compute units.

> Why would you ever break a task into 100 threads when you could just run it 
> one thread?

"Events".

One gets into the grey area of threads and "processors". As an example you 
could divide a program into two threads, one reading and one writing. 
Immediately after issuing a write request, you could start reading the next 
item in a separate thread before the write is complete. This works because the 
I/O subsystem is mostly independent so that the OS can schedule another thread 
(or process) which is not waiting for the I/O subsystem to reply. Using only a 
single thread, the whole program has to wait for the I/O write to finish before 
starting the next read.

In this way a single process on a single "processor" (at least CPU) can 
interleave tasks to speed up the overall performance of the application. This 
could be extended to a myriad of cases of course.

Hence the recent upsurge in "async" routines, which only works if used properly 
of course.

Cheers,
Gary.


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] unexpected termination with no errors

2019-05-15 Thread Gary Doades

Windows represents exception codes as an unsigned int. The error -1073741819 is 
actually0xC005.

This represents some form of access violation, usually associated with 
trying to write to freed memory,double freeing memory etc. If the stack 
is involved/corrupted then a trackback and useful information would be 
difficult.


Whereabouts down the line a disk full translates into an access 
violation is difficult to say, but it results in a mess to say the least.


Regards,

Gary.

On 15/05/2019 16:43, James Richters wrote:

It's a simple single thread console app.   I found my problem...  years ago I 
implemented a batch file to run my program in the test environment to help be 
with debugging... and what it does is redirect the errors to a file so that if 
flashed by real quick, I would be able to just look at the file.   Then I just 
echo the file to the screen, and if I detect an error, I also pause so I can 
see it.This a common solution to the windows limitation of not be capable 
directing STDERR to console AND to a file.

Well the file wasn't reporting the error and it wasn't on the screen... it 
looked like a normal exit, BUT it was actually giving me the proper report... 
unfortunately the error was the one thing that my batch file could not possibly 
display  EInOutError: Disk Full   DOH!!!  With the disk full my log file 
could only show up to but not including the error... because it could not write 
anymore on a full disk... I still don't have a great solution for this... I see 
methods of implementing something like a Tee function on windows, but the 
problem is I don't want ALL the output to go to the log, I only want STDERR to 
go to the log file and the screen...  not my output I am sending with the 
CRT unit that sends colored text for various purposes.  This is such a pain 
with windows to accomplish this seemingly simple task.Anyway I know what 
the problem is and can put in something to detect it.  Maybe I will just check 
if the errorlevel is negative and if so write a suggestion to the screen that 
disk full may have caused this and then pause since I can't necessarily 
write anything to the file.

Does anyone know what the errorlevel for EInOutError: Disk Full is  
-1073741819,   (I'm not sure it's always that number... ) is this on purpose?, 
or a bug?, or a side effect of the disk being full so it can't generate the 
correct error code?   if it was a normal errorcode I would have got that on my 
screen but since it's less than zero it got treated like a normal exit I 
did fix my batch file to treat anything less than zero as an error. so it 
doesn't really matter, I just didn't know they could be negative, but I'm 
curious why this strange errorlevel.

Jim


-Original Message-
From: fpc-pascal  On Behalf Of Karoly 
Balogh (Charlie/SGR)
Sent: Wednesday, May 15, 2019 9:17 AM
To: FPC-Pascal users discussions 
Subject: Re: [fpc-pascal] unexpected termination with no errors

Hi,

On Wed, 15 May 2019, James Richters wrote:


Has anyone encountered anything like this before or know how I can
make sure I always get the maximum amount of debugging info when my
program crashes?

Is it a subthreaded app?

The only case when I noticed something similar (under Linux though), when a 
certain subthread throws an exception, it just silently disappears without any 
further handling. It doesn't throw any exception, unless you wrap the entire 
Execute method in a try-except.

(Sidenote: I've been pondering for a while if I should report this as a bug. I 
think the RTL should put a try-except around there, to show a stacktrace on 
unhandled exceptions, just like the main thread dying does, but who knows which 
Delphi de-facto standard behavior would that violate, so meh...)

In Linux/Darwin (on x64/ARM at least), only the thread causing the problem 
dies, no clue what happens under Windows. Maybe this helps.

Charlie
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org 
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal