>> Anyway, thanks so much for your help.  This discussion has been very useful, 
>> and I think I will proceed at first, exactly how you suggested, by queuing 
>> every validation job (using celery).  Then I will explore whether or not I 
>> can apply the "on timeout" strategy in a small patch.
>> Incidentally, during our Wednesday meeting this week, we actually opened our 
>> public instance to the world for the first time, in preparation for the 
>> upcoming publication.  This discussion is about the data submission 
>> interface, but that interface is actually disabled on the public-facing 
>> instance.  The other part of the codebase that I was primarily responsible 
>> for was the advanced search.  Everything else was primarily by other team 
>> members.  If you would like to check it out, let me know what you think: 
>> http://tracebase.princeton.edu <http://tracebase.princeton.edu>
> 
> I would have to hit the books again to understand all of what is going on 
> here.

It's a mass spec tracing database.  Animals are infused with radio labeled 
compounds and mass spec is used to see what the animal's biochemistry turns 
those compounds into.  (My undergrad was biochem, so I've been resurrecting my 
biochem knowledge, as needed for this project.  I've been mostly doing RNA and 
DNA sequence analysis since undergrad, and most of that was prokaryotic.

> One quibble with the Download tab, there is no indication of the size of the 
> datasets. I generally like to know what I am getting into before I start a 
> download. Also, is there explicit throttling going on? I am seeing 
> 10.2kb/sec, whereas from here 
> https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page I downloaded a 
> 47.65M file at 41.9MB/s

Thank you!  Not knowing the download size is exactly a complaint I had.  That 
download actually uses my advanced search interface (in browse mode).  There is 
the same issue with the download buttons on the advanced search.  With the 
streaming, we're not dealing with temp files, which is nice, at least for the 
advanced search, but we can't know the download size that way.  So I had wanted 
a progress bar to at least show progress (current record per total).  I could 
even estimate the size (an option I explored for a few days).  Eventually, I 
proposed a celery solution for that and I was overruled.

As for the download in the nav bar, we have an issue to change that to a 
listing of actual files broken down by study (3 files per study).  There's not 
much actual utility from a user perspective for downloading everything anyway.  
We've just been focussed on other things.  In fact, we have a request from a 
user for that specific feature, done in a way that's compatible with curl/scp.  
We just have to figure out how to not have to CAS authenticate each command, 
something I don't have experience with.

Reply via email to