Great story and congrats on your project
On Wed, Nov 23, 2016, at 06:24 PM, Jeremy Echols wrote: > *Project:* > > This one's been out a long time, but I wanted to get to a place where > it felt solid before announcing it to this list. RAIS > (https://github.com/uoregon-libraries/rais-image-server) is CC0- > licensed and backs all the dynamic pan/zoom image-serving needs for > Oregon Historic Newspapers (e.g., > http://oregonnews.uoregon.edu/lccn/sn94052322/1888-05-03/ed-1/seq-1/). > > The project conforms to the IIIF Image 2.0 spec > (http://iiif.io/api/image/2.0/), but its main purpose is serving JP2 > images as fast as possible, which it achieves with low-level CGO calls > into libopenjpeg. JP2 images are incredibly small for their quality, > but decoding is notoriously slow. Additionally, while there are very > fast alternatives to JP2, very few are as space-efficient, and none > are as memory-efficient. For files in the 20-megapixel-plus range, we > needed a format that doesn't require reading the whole image into > memory, which tiled JP2 images do very well. > > It's a pretty niche service, but I think its history tells a really > great Go story even if nobody here has need of the service itself. > > *Story:* > > At the time RAIS was initially created, we had a pretty big problem: > we had somewhere in the range of 30 terabytes of TIFFs backing our > pan/zoom viewer, and we knew that number was only going up. While we > will always preserve the TIFF files (we're a library, it's what we > do!), keeping them online at all times was far more expensive than, > say, tape backups or a dark archive. And, of course, reading TIFFs > into memory meant 20+ megs of RAM **per request** (these are grayscale > TIFFs for those about to say 20 megapixels should mean 60 megs of > RAM). During times of high traffic, RAM could become a significant > bottleneck. > > We considered pyramidal TIFFs with embedded JPGs and IIP image server, > but found that we would "only" save about 80% on disk in order to get > similar quality to the JP2 files we already had. JP2 images, on the > other hand, saved closer to 95% disk. > > We considered pre-generating the tiles for about half a second. But > at the time we had about 500,000 individual newspaper pages. Pre- > generating of tiles would absolutely not work for us. At least, not > with any kind of disk savings. > > We considered using proprietary JP2 libraries, which we knew could > solve the problem really well. But we wanted the software to be as > open as possible. One of our biggest contributions to the newspaper > world was getting the software which runs our site open-sourced to > begin with (it isn't something we wrote, just something we customized > heavily, and convinced the authors to open-source). Having done that > work, we felt like it was a disservice to the community if we had to > use proprietary software just to get the open-sourced software > working. > > We considered a slow JP2 server with a giant cache. The software > which runs the site *can* serve JP2 tiles without proprietary > software... but the initial image can take 10+ seconds to load, and > heavy traffic can make it almost unusable. Hence, caching! ...but > can we get a lot more hits than misses? Caching thumbnails turned out > to be valuable for us, but tiles... not so much. Looking at what was > requested in the Apache logs, it seemed that the tiles served in any > given week were mostly (75%) tiles that had *not* been served > throughout the entire month. Caching would certainly have some > benefit, but a large number of our users would be hitting the very > slow cache-miss pages, or else our cache would have been far too big > to be feasible. > > Sometime in late 2013 or early 2014, somebody at the Library of > Congress showed us this project which he'd called "Brikker". It was > written months prior in Go as a proof-of-concept to solve similar > problems to ours. It required pulling a specific commit of openjpeg, > manually patching it, and compiling it. But it was capable of serving > JP2 tiles dynamically, and the author claimed it was fairly > performant. We decided we should at least look into it, even though > nobody knew Go and I for one was pretty skeptical of this silly "new" > language. > > We realized quickly that something like PHP or Python just wouldn't be > able to do what Go could, at least not with anywhere near the same > performance, and this use case was one which demanded performance. > Calling into C is a bit of a pain in every language we considered, and > the performance of the rest of the server could be an immediate > bottleneck. Better to stick with something that already appeared to > have potential than take that risk. > > So we dove into the world of Go, and slowly improved the original > application until it could be put into production. > > Now our TIFFs are somewhere far away from the web server and our RAM > usage is incredibly low. During peak traffic, I've seen RAIS spike to > about 400 megs of RAM. With load testing pushing its limits, it can > even jump as high as a gig before CPU bottlenecks slow the requests > down too much. But our rather modest server is still running with the > same specs it's had for at least five years. It has better > performance than before RAIS, despite the fact that we have increased > our image count by about 40%, we now support color images as part of > our "born digital" initiative, and traffic has more than doubled. > > Go, with its very low-overhead C bindings, gave us a huge win here. > We probably could have crafted something in C or C++ with better > performance, but it would have been such a big undertaking in > comparison to Go (even though I have some basic C and C++ experience, > the syntax and gotchas are ... let's just say, "tricky") that the > project wouldn't have gotten the green light, or else would have been > scrapped mid-dev. Go's syntax is simple enough that I was able to > jump right in and get work done quickly. And unlike many languages I > use, I can jump from Go to other projects and back again without > losing very much productivity. I don't feel like I *have* to live in > Go in order to keep it in my head. When I'm in Rails, on the other > hand... well, if you can't say something nice.... > > Today the project uses "gb" to avoid vendoring dependencies in the > repo (I don't know what happens if we license our project as CC0 and > then include a bunch of others' code, and I don't want to find out), > and has a very simple docker image available to take it for a test > drive quickly (or even use it in production, if they have a docker > environment already). It's no longer tied to funky commits of > openjpeg since that project has since released a viable version with > the functionality we needed. And it's (hopefully) a lot more > idiomatic than when I started. > > I have to say, for a first project in a new tech, it's one of very few > which I don't look at as a disaster. And probably the only one I see > as a raging success despite my inexperience. > > -- > You received this message because you are subscribed to the Google > Groups "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to golang-nuts+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.