Even, we just upgraded to gdal 1.7. I tested gdal_translate and CreateCopy() again and it still dies with similar conditions.
Since valgrind did not detect any memory leak related to CreateCopy(), I suspect this problem is caused by poor memory management in doing CreateCopy(). It seems to be continuously allocating memory as it progresses in the copy process. Think of it like a ever growing linked-list or some similar data structure. This memory in the data structure will be properly released after the copy process is done (thus valgrind does not see this as a leak). But for a large file, the data structure might grow beyond the available memory and swap. In my test, gdal_translate operates on a 40Kx100K 16-bit image (NITF, JPEG2000 compressed) used up all the swap (8GB) and up to 98.5% resident mem (8GB) before the system killed it. When this happened the progress indicator shows 80% completion. Ozy On Wed, Jan 13, 2010 at 4:52 PM, ozy sjahputera <[email protected]>wrote: > Even, > > We use the JP2ECW driver. > > I did the valgrind test and did not see any reported leak. Here is some of > the outputs from valgrind: > > ==11469== Invalid free() / delete / delete[] > ==11469== at 0x4C2222E: free (in > /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so) > ==11469== by 0x95D1CDA: (within /lib64/libc-2.9.so) > ==11469== by 0x95D1879: (within /lib64/libc-2.9.so) > ==11469== by 0x4A1D60C: _vgnU_freeres (in > /usr/lib64/valgrind/amd64-linux/vgpreload_core.so) > ==11469== by 0x950AB98: exit (in /lib64/libc-2.9.so) > ==11469== by 0x94F55EA: (below main) (in /lib64/libc-2.9.so) > ==11469== Address 0x40366f0 is not stack'd, malloc'd or (recently) free'd > ==11469== > ==11469== ERROR SUMMARY: 13177 errors from 14 contexts (suppressed: 0 from > 0) > ==11469== malloc/free: in use at exit: 376 bytes in 9 blocks. > ==11469== malloc/free: 8,856,910 allocs, 8,856,902 frees, 5,762,693,361 > bytes allocated. > ==11469== For counts of detected errors, rerun with: -v > ==11469== Use --track-origins=yes to see where uninitialised values come > from > ==11469== searching for pointers to 9 not-freed blocks. > ==11469== checked 1,934,448 bytes. > ==11469== > ==11469== LEAK SUMMARY: > ==11469== definitely lost: 0 bytes in 0 blocks. > ==11469== possibly lost: 0 bytes in 0 blocks. > ==11469== still reachable: 376 bytes in 9 blocks. > ==11469== suppressed: 0 bytes in 0 blocks. > ==11469== Reachable blocks (those to which a pointer was found) are not > shown. > > I will check gdal trunk, but we are looking forward to an upgrade to 1.7. > For now, I try to find a scanline and uncompressed NITF image and perform > the same gdal_translate operation on it. If the memory use does not climb > when operating on uncompressed image, then we can say with more certainty > that the problems lay with JPG2000 drivers. I'll let you know. > > Thanks. > Ozy > > > On Wed, Jan 13, 2010 at 1:46 PM, Even Rouault < > [email protected]> wrote: > >> Ozy, >> >> The interesting info is that your input image is JPEG2000 compressed. >> This explains why you were able to read a scanline oriented NITF with >> blockwidth > 9999. My guess would be that the leak is in the JPEG2000 >> driver in question, so this may be more a problem on the reading part >> than on the writing part. You can check that by running : gdalinfo >> -checksum NITF_IM:0:input.ntf. If you see the memory increasing again >> and again, there's definitely a problem. In case you have GDAL >> configured with several JPEG2000 drivers, you'll have to find which one >> is used : JP2KAK (Kakadu based), JP2ECW (ECW SDK based), JPEG2000 >> (Jasper based, but I doubt you're using it with such a big dataset), >> JP2MRSID. Normally, they are selected in the order I've described >> (JP2KAK first, etc). As you're on Linux, it might be interesting that >> you run valgrind to see if it reports leaks. As it might very slow on >> such a big dataset, you could try translating just a smaller window of >> your input dataset, like >> >> valgrind --leak-check=full gdal_translate NITF_IM:0:input.ntf output.tif >> -srcwin 0 0 37504 128 >> >> I've selected TIF as output format as it shouldn't matter if you confirm >> that the problem is in the reading part. As far as the window size is >> concerned, it's difficult to guess which value will show the leak. >> >> Filing a ticket with your findings on GDAL Trac might be appropriate. >> >> It might be good trying with GDAL trunk first though, in case the leak >> might have been fixed since 1.6.2. The beta2 source zip are to be found >> here : http://download.osgeo.org/gdal/gdal-1.7.0b2.tar.gz >> >> Best regards, >> >> Even >> >> ozy sjahputera a écrit : >> > Hi Even, >> > >> > yes, I tried: >> > gdal_translate -of "NITF" -co "ICORDS=G" -co "BLOCKXSIZE=128" -co >> > "BLOCKYSIZE=128" NITF_IM:0:input.ntf output.ntf >> > >> > I monitored the memory use using top and it was steadily increasing >> > till it reached 98.4% (I have 8GB of RAM and 140 GB of local disk for >> > swap etc.) before the node died (not just the program, but the whole >> > system just stopped responding). >> > >> > My GDAL version is 1.6.2. >> > >> > gdalinfo on this image shows the raster size of (37504, 98772) and >> > Block=37504x1. >> > The image is compressed using JPEG2000 option and contains two >> > subdatasets (data and cloud data ~ I used only the data for >> > gdal_translate test). >> > >> > Band info from gdalinfo: >> > Band 1 Block=37504x1 Type=UInt16, ColorInterp=Gray >> > >> > Ozy >> > >> > On Tue, Jan 12, 2010 at 5:38 PM, Even Rouault >> > <[email protected] <mailto:[email protected]>> >> > wrote: >> > >> > Ozy, >> > >> > Did you try with gdal_translate -of NITF src.tif output.tif -co >> > BLOCKSIZE=128 ? Does it give similar results ? >> > >> > I'm a bit surprised that you even managed to read a 40Kx100K large >> > NITF >> > file organized as scanlines. There was a limit until very recently >> > that >> > prevented to read blocks whose one dimension was bigger than 9999. >> > This >> > was fixed recently in trunk ( see ticket >> > http://trac.osgeo.org/gdal/ticket/3263 ) and branches/1.6, but it >> has >> > not yet been released to an officially released version. So which >> GDAL >> > version are you using ? >> > >> > Does the output of gdalinfo on your scanline oriented input NITF >> gives >> > something like : >> > Band 1 Block=40000x1 Type=Byte, ColorInterp=Gray >> > >> > Is your input NITF compressed or uncompressed ? >> > >> > Anyway, with latest trunk, I've simulated creating a similarly large >> > NITF image with the following python snippet : >> > >> > import gdal >> > ds = gdal.GetDriverByName('NITF').Create('scanline.ntf', 40000, >> > 100000) >> > ds = None >> > >> > and then creating the tiled NITF : >> > >> > gdal_translate -of NITF scanline.ntf tiled.ntf -co BLOCKSIZE=128 >> > >> > The memory consumption is very reasonnable (less than 50 MB : the >> > default block cache size of 40 MB + temporary buffers ), so I'm not >> > clear why you would have a problem of increasing memory use. >> > >> > ozy sjahputera a écrit : >> > > I was trying to make a copy of a very large NITF image (about >> > 40Kx100K >> > > pixels) using GDALDriver::CreateCopy(). The new file was set to >> have >> > > different block-size (input was a scanline image, output is to >> > have a >> > > 128x128 blocksize). The program keeps getting killed by the system >> > > (Linux). I monitor the memory use of the program as it was >> executing >> > > CreateCopy and the memory use was steadily increasing as the >> > progress >> > > indicator from CreateCopy was moving forward. >> > > >> > > Why does CreateCopy() use so much memory? I have not perused the >> > > source code of CreateCopy() yet, but I am guessing it employs >> > > RasterIO() to perform the read/write? >> > > >> > > I was trying different sizes for GDAL cache from 64MB, 256MB, >> > 512MB, >> > > 1GB, and 2GB. The program got killed in all these cache sizes. In >> > > fact, my Linux box became unresponsive when I set >> > GDALSetCacheMax() to >> > > 64MB. >> > > >> > > Thank you. >> > > Ozy >> > > >> > > >> > > >> > >> ------------------------------------------------------------------------ >> > > >> > > _______________________________________________ >> > > gdal-dev mailing list >> > > [email protected] <mailto:[email protected]> >> > > http://lists.osgeo.org/mailman/listinfo/gdal-dev >> > >> > >> > >> >> >> >
_______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
