Great story and congrats on your project




On Wed, Nov 23, 2016, at 06:24 PM, Jeremy Echols wrote:

> *Project:*

> 

> This one's been out a long time, but I wanted to get to a place where
> it felt solid before announcing it to this list.  RAIS
> (https://github.com/uoregon-libraries/rais-image-server) is CC0-
> licensed and backs all the dynamic pan/zoom image-serving needs for
> Oregon Historic Newspapers (e.g.,
> http://oregonnews.uoregon.edu/lccn/sn94052322/1888-05-03/ed-1/seq-1/).
> 

> The project conforms to the IIIF Image 2.0 spec
> (http://iiif.io/api/image/2.0/), but its main purpose is serving JP2
> images as fast as possible, which it achieves with low-level CGO calls
> into libopenjpeg.  JP2 images are incredibly small for their quality,
> but decoding is notoriously slow.  Additionally, while there are very
> fast alternatives to JP2, very few are as space-efficient, and none
> are as memory-efficient.  For files in the 20-megapixel-plus range, we
> needed a format that doesn't require reading the whole image into
> memory, which tiled JP2 images do very well.
> 

> It's a pretty niche service, but I think its history tells a really
> great Go story even if nobody here has need of the service itself.
> 

> *Story:*

> 

> At the time RAIS was initially created, we had a pretty big problem:
> we had somewhere in the range of 30 terabytes of TIFFs backing our
> pan/zoom viewer, and we knew that number was only going up.  While we
> will always preserve the TIFF files (we're a library, it's what we
> do!), keeping them online at all times was far more expensive than,
> say, tape backups or a dark archive.  And, of course, reading TIFFs
> into memory meant 20+ megs of RAM **per request** (these are grayscale
> TIFFs for those about to say 20 megapixels should mean 60 megs of
> RAM).  During times of high traffic, RAM could become a significant
> bottleneck.
> 

> We considered pyramidal TIFFs with embedded JPGs and IIP image server,
> but found that we would "only" save about 80% on disk in order to get
> similar quality to the JP2 files we already had.  JP2 images, on the
> other hand, saved closer to 95% disk.
> 

> We considered pre-generating the tiles for about half a second.  But
> at the time we had about 500,000 individual newspaper pages.  Pre-
> generating of tiles would absolutely not work for us.  At least, not
> with any kind of disk savings.
> 

> We considered using proprietary JP2 libraries, which we knew could
> solve the problem really well.  But we wanted the software to be as
> open as possible.  One of our biggest contributions to the newspaper
> world was getting the software which runs our site open-sourced to
> begin with (it isn't something we wrote, just something we customized
> heavily, and convinced the authors to open-source).  Having done that
> work, we felt like it was a disservice to the community if we had to
> use proprietary software just to get the open-sourced software
> working.
> 

> We considered a slow JP2 server with a giant cache.  The software
> which runs the site *can* serve JP2 tiles without proprietary
> software... but the initial image can take 10+ seconds to load, and
> heavy traffic can make it almost unusable.  Hence, caching!  ...but
> can we get a lot more hits than misses?  Caching thumbnails turned out
> to be valuable for us, but tiles... not so much.  Looking at what was
> requested in the Apache logs, it seemed that the tiles served in any
> given week were mostly (75%) tiles that had *not* been served
> throughout the entire month.  Caching would certainly have some
> benefit, but a large number of our users would be hitting the very
> slow cache-miss pages, or else our cache would have been far too big
> to be feasible.
> 

> Sometime in late 2013 or early 2014, somebody at the Library of
> Congress showed us this project which he'd called "Brikker".  It was
> written months prior in Go as a proof-of-concept to solve similar
> problems to ours.  It required pulling a specific commit of openjpeg,
> manually patching it, and compiling it.  But it was capable of serving
> JP2 tiles dynamically, and the author claimed it was fairly
> performant.  We decided we should at least look into it, even though
> nobody knew Go and I for one was pretty skeptical of this silly "new"
> language.
> 

> We realized quickly that something like PHP or Python just wouldn't be
> able to do what Go could, at least not with anywhere near the same
> performance, and this use case was one which demanded performance.
> Calling into C is a bit of a pain in every language we considered, and
> the performance of the rest of the server could be an immediate
> bottleneck.  Better to stick with something that already appeared to
> have potential than take that risk.
> 

> So we dove into the world of Go, and slowly improved the original
> application until it could be put into production.
> 

> Now our TIFFs are somewhere far away from the web server and our RAM
> usage is incredibly low.  During peak traffic, I've seen RAIS spike to
> about 400 megs of RAM.  With load testing pushing its limits, it can
> even jump as high as a gig before CPU bottlenecks slow the requests
> down too much.  But our rather modest server is still running with the
> same specs it's had for at least five years.  It has better
> performance than before RAIS, despite the fact that we have increased
> our image count by about 40%, we now support color images as part of
> our "born digital" initiative, and traffic has more than doubled.
> 

> Go, with its very low-overhead C bindings, gave us a huge win here.
> We probably could have crafted something in C or C++ with better
> performance, but it would have been such a big undertaking in
> comparison to Go (even though I have some basic C and C++ experience,
> the syntax and gotchas are ... let's just say, "tricky") that the
> project wouldn't have gotten the green light, or else would have been
> scrapped mid-dev.  Go's syntax is simple enough that I was able to
> jump right in and get work done quickly.  And unlike many languages I
> use, I can jump from Go to other projects and back again without
> losing very much productivity.  I don't feel like I *have* to live in
> Go in order to keep it in my head.  When I'm in Rails, on the other
> hand... well, if you can't say something nice....
> 

> Today the project uses "gb" to avoid vendoring dependencies in the
> repo (I don't know what happens if we license our project as CC0 and
> then include a bunch of others' code, and I don't want to find out),
> and has a very simple docker image available to take it for a test
> drive quickly (or even use it in production, if they have a docker
> environment already).  It's no longer tied to funky commits of
> openjpeg since that project has since released a viable version with
> the functionality we needed.  And it's (hopefully) a lot more
> idiomatic than when I started.
> 

> I have to say, for a first project in a new tech, it's one of very few
> which I don't look at as a disaster.  And probably the only one I see
> as a raging success despite my inexperience.
> 



> --

>  You received this message because you are subscribed to the Google
>  Groups "golang-nuts" group.
>  To unsubscribe from this group and stop receiving emails from it,
>  send an email to golang-nuts+unsubscr...@googlegroups.com.
>  For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to