Hey everyone,
We just had our regular backend catch-up. It was not recorded but I did
take some stream-of-consciousness notes on what we discussed:
* train 74 was cut on time!
* but the deploy doc was empty!
* a db migration was not mentioned
* some config changes weren't mentioned
* we should all try to remember to keep the deploy doc up to data as we go
* oauth token purging
* on the back burner at the moment, last worked on a few weeks ago
* currently up to november 2015, which contains twice as many tokens for
some reason
* were 500s returned on requests from profile server?
* verification reminders disabled
* no smoking gun except replication couldn't keep up
* replication thread spending most time in system-locked state
* there was a transaction that had millions of rows locked
* stored procedures make it difficult to see exactly what the problem is
* once that transaction unwinds, everything returns to a good state
* it's recommended not to use temporary tables with replication, is that
the issue?
* or could it be something to do with select for update?
* do we need a consensus protocol for competing processes?
* we don't actually need verification reminders running on all instances
anyway
* only needs to be 1 query every 30 seconds
* with replication we're making a query every second
* no need to compete
* what are the release mechanics if we limit it to 1 instance?
* teamcity permissions
* deployed a box to try out teamcity 10 with github integration
* it wants write permissions to the github org, including private repos
* means the teamcity box would need serious hardening
* what if there's a zero-day exploit in teamcity?
* vlad's going to email teamcity
* sentry
* deployed with train 73 content server
* where are all the errors?
* only two errors reported, one from git clone, one from git fetch
* "<path>/experiments already exists and is not empty"
* if things go wrong it should show up there
* profile server
* docker stack is slow, seeing 4-10 times higher latency than non-docker
* seconds of latency
* cpu is only a little bit higher
* also docker stack never returns a 304 for /v1/profile
* usually there are around 4% or 5% 304s
* absolute dead zero inside docker
* how the heck can docker affect this?
* etag-related?
* maybe a significant clue of where the slowness comes from
* try disabling all logging so there is no disk i/o, will that fix the
latency?
* log data is handled differently in docker
* different centos version, different log flow, uses systemd and
journald
* one of the devs to try running in docker locally to debug the 304 issue
* profile server avatars
* only one avatar per user
* drop some tables
* content server should have no more code for get-avatars api
* if the endpoint is not being elsewhere (it shouldn't be), we should get
rid of it
And that was it.
Cheerio,
Pb
_______________________________________________
Dev-fxacct mailing list
[email protected]
https://mail.mozilla.org/listinfo/dev-fxacct