This new part of the series focuses on the use of fast compressors such as Snappy to improve access speed to image data:
https://medium.com/@p.rozas.larraondo/divide-compress-and-conquer-building-an-earth-data-server-in-go-part-2-88670cafc167 IMO fast compressors will play a very important role in the design of high performance systems - as a method to overcome RAM speed limitations on CPUs. Feedback, comments and experiences from the community are highly appreciated! Pablo On Thu, Dec 21, 2017 at 2:04 AM, Michael Jones <michael.jo...@gmail.com> wrote: > Certainly as you say, individual user patterns are not generally > predictable. Sometimes aggregate patterns can be. The "sea of tiles" is the > natural design and works great in normal cases. It seems the way to teach > it in any case. > > Where the filesystem issue comes in would be, for example, the nominal 1 > meter per pixel Google Earth, which in plate carrée or like form with > 400x400 pixel tiles consists of 253,701,184 tiles at that one ground sample > distance. That is a lot for "ls" and a lot for most file systems to enjoy > quickly accessing in a single directory. A pyramidal reduced resolution > dataset hierarchy will require 4/3rds of this in total, or 338,268,246 > tiles. Finer details, such as 50cm advanced satellite and 10cm aerial/drone > images scale the portions covered by 4x to 100x. So in the limit one will > find the edge of what any OS designer expects "sane" developers to expect. > :-) > > On Wed, Dec 20, 2017 at 3:33 PM, Pablo Rozas Larraondo < > p.rozas.larrao...@gmail.com> wrote: > >> Hi Michael, >> >> Thanks for your comments, I totally agree with them. File systems will >> struggle with the explosion of files resulting from the tile operation. As >> you point out, other formats, such as geoTIFF, HDF5 or NetCDF define the >> tiling or chunking process internally at the file level. >> >> The reason for creating the tiles as individual files in the article was >> because this is ultimately intended to be stored on the cloud as objects >> (this will be covered in the 3rd article). As far as I know, cloud object >> stores (ie AWS S3, Google Cloud Storage) do not have a limitation in the >> number of objects stored in a bucket (If someone has more information about >> this, please share). That is why I proposed to split the tiles as separate >> files in the article. >> >> I also find the caching considerations quite amusing. It is a complex >> matter and, in my experience, cache optimisations are quite dependent on >> the user access patterns, which are normally hard to predict. >> >> Cheers, >> Pablo >> >> >> >> On Wednesday, December 20, 2017 at 2:24:01 AM UTC+1, Michael Jones wrote: >>> >>> Thank you, Pablo. Very helpful to have this kind of step by step example >>> for Go developers. >>> >>> I have some familiarity in this area and I'd say the practical issues in >>> large-scale, high-throughput operation tend to relate to the native >>> filesystem. Too many small files overwhelm them and can make directory >>> lookups slow. Too many directory levels leads to slow filesystem traversal. >>> Sometimes it can help to dice the big image into small independent tiles >>> and store those tiles as a mosaic in one's own file type. This is the >>> nature of TILED vs ROW storage in the TIFF format. The next level of tuning >>> is about leverage the operating system's cache of data read from disk in a >>> productive way. You can have our own cache in RAM, of course, but the OS >>> likely has that same data cached. There are cases where memory mapping the >>> small tile files does what you would want. >>> >>> There are also dynamic considerations. It may well be that a client >>> accessing tile [i][j] will soon want one of the eight surrounding tiles. >>> over time, it may be that a direction of browsing through tile-space can be >>> established and this can encourage read-ahead, though the benefit is not >>> always assured; maybe the accesses are structured and maybe they are not. >>> >>> Some high-throughput servers in the era of smart web clients (aka Google >>> Maps / leaflet ./ etc.) refuse to build custom images and only supply tiles >>> in response to a request--leaving tile assembly to the client. >>> >>> Just some thoughts. None of them would help make what you've done any >>> clearer or more helpful to the reader. >>> >>> Best, >>> Michael >>> >>> >>> On Tue, Dec 19, 2017 at 3:37 PM, Pablo Rozas Larraondo < >>> p.rozas....@gmail.com> wrote: >>> >>>> Thank you Thomas for the link to the vips library. I didn't know about >>>> it and now I want to read more about its design and internals. >>>> >>>> The objective of the article was to set a baseline using the Go image >>>> library and play with several factors to see how it affects performance. In >>>> this first article, I wasn't really trying to come up with the fastest >>>> possible image server but to point a few basic techniques that can improve >>>> access speed and reduce memory consumption. These techniques should be >>>> applicable to any image library, so similar relative performance gains can >>>> be achieved with any language or library. >>>> >>>> The next part, which I'm currently writing, proposes the snappy >>>> compression as a way of improving access speed to the data. >>>> >>>> Cheers, >>>> Pablo >>>> >>>> On Tuesday, December 19, 2017 at 10:28:48 AM UTC+1, Thomas Bruyelle >>>> wrote: >>>>> >>>>> Interesting and nice pieces of code. I wonder if the performances can >>>>> be compared to something like `vips` (https://jcupitt.github.io/lib >>>>> vips). >>>>> >>>>> Le lundi 18 décembre 2017 22:51:49 UTC+1, Pablo Rozas Larraondo a >>>>> écrit : >>>>>> >>>>>> Hi, >>>>>> >>>>>> For those interested on serving or using satellite imagery, I've just >>>>>> published the first of a three part series on this subject using Go: >>>>>> >>>>>> https://medium.com/@p.rozas.larraondo/divide-compress-and-co >>>>>> nquer-building-an-earth-data-server-in-go-part-1-d82eee2eceb1 >>>>>> >>>>>> Any feedback or comment that you might have would be greatly >>>>>> appreciated! >>>>>> >>>>>> Thanks, >>>>>> Pablo >>>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "golang-nuts" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to golang-nuts...@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> >>> -- >>> Michael T. Jones >>> michae...@gmail.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "golang-nuts" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to golang-nuts+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Michael T. Jones > michael.jo...@gmail.com > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.