On 2/7/25 10:20, Robert Leach wrote:
Anyway, thanks so much for your help. This discussion has been very
useful, and I think I will proceed at first, exactly how you suggested,
by queuing every validation job (using celery). Then I will explore
whether or not I can apply the "on timeout" strategy in a small patch.
Incidentally, during our Wednesday meeting this week, we actually opened
our public instance to the world for the first time, in preparation for
the upcoming publication. This discussion is about the data submission
interface, but that interface is actually disabled on the public-facing
instance. The other part of the codebase that I was primarily
responsible for was the advanced search. Everything else was primarily
by other team members. If you would like to check it out, let me know
what you think: http://tracebase.princeton.edu
<http://tracebase.princeton.edu>
I would have to hit the books again to understand all of what is going
on here. One quibble with the Download tab, there is no indication of
the size of the datasets. I generally like to know what I am getting
into before I start a download. Also, is there explicit throttling going
on? I am seeing 10.2kb/sec, whereas from here
https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page I
downloaded a 47.65M file at 41.9MB/s
Cheers,
Rob
Robert William Leach
Research Software Engineer
133 Carl C. Icahn Lab
Lewis-Sigler Institute for Integrative Genomics
Princeton University
Princeton, NJ 08544
--
Adrian Klaver
adrian.kla...@aklaver.com