Re: Guix Data Service client module

2022-02-04 Thread Christopher Baines

Ludovic Courtès  writes:

> Hello Guix!
>
> Here’s a client module for the Guix Data Service, allowing you to access
> a subset of the Guix Data Service interfaces from the comfort of your
> REPL.
>
> I had it sitting in my source tree for a while and Chris sent me an
> impressive shell one-liner that made me want to try from Scheme:
>
>   wget
> "https://data.guix-patches.cbaines.net/revision/47f85c53d954f857b45cebefee27ec512d917484/lint-warnings.json?locale=en_US.UTF-8&linter=input-labels&field=linter&field=message&field=location";
> -O - | jq -r '.lint_warnings | .[] | .package.name' | sort | uniq | wc
> -l
>
> Turns out we can do the same in two long lines of Scheme!
>
> scheme@(guix data-service)> (define s (open-data-service 
> "https://data.guix-patches.cbaines.net";))
> scheme@(guix data-service)> (length (delete-duplicates (map 
> lint-warning-package (revision-lint-warnings s 
> "47f85c53d954f857b45cebefee27ec512d917484" "input-labels"
> $6 = 3560
>
>
> (That counts the number of packages at that revision that have one or
> more warnings from the new ‘input-labels’ lint checker.)
>
> We can do other things, such as browsing package versions:
>
> scheme@(guix data-service)> (define s (open-data-service 
> "https://data.guix.gnu.org";))
> scheme@(guix data-service)> (package-version-branches (car (package-versions 
> (lookup-package s "emacs"
> $9 = (#< name: "master" repository-id: 1>)
> scheme@(guix data-service)> (package-version-history s (car $9) "emacs")
> $10 = (#< version: "27.2" first-revision: #< 
> commit: "cc33f50d0e2a7835e99913226cb4c4b0e9e961ae" date: # second: 54 minute: 30 hour: 20 day: 25 month: 3 year: 2021 zone-offset: 0>> 
> last-revision: #< commit: 
> "364b56124b88398c199aacbfd4fdfc9a1583e634" date: # 31 minute: 16 hour: 13 day: 27 month: 6 year: 2021 zone-offset: 0>>> 
> #< version: "27.1" first-revision: #< 
> commit: "36a09d185343375a5cba370431870f9c4435d623" date: # second: 52 minute: 16 hour: 4 day: 28 month: 8 year: 2020 zone-offset: 0>> 
> last-revision: #< commit: 
> "ac29d37e2ffd7a85adfcac9be4d5bce018289bec" date: # 2 minute: 36 hour: 17 day: 25 month: 3 year: 2021 zone-offset: 0>>> 
> #< version: "26.3" first-revision: #< 
> commit: "43412ab967ee00789fe933f916d804aed9961c57" date: # second: 29 minute: 36 hour: 3 day: 30 month: 8 year: 2019 zone-offset: 0>> 
> last-revision: #< commit: 
> "bf19d5e4b26a2e38fe93a45f9341e14476ea5f82" date: # 19 minute: 50 hour: 21 day: 27 month: 8 year: 2020 zone-offset: 0>>> 
> #< version: "26.2" first-revision: #< 
> commit: "5069baedb8a902c3b1ea9656c11471658a1de56b" date: # second: 8 minute: 46 hour: 22 day: 12 month: 4 year: 2019 zone-offset: 0>> 
> last-revision: #< commit: 
> "02c61278f1327d403f072f42e6b92a1dc62fc93a" date: # 35 minute: 44 hour: 0 day: 30 month: 8 year: 2019 zone-offset: 0>>> 
> #< version: "26.1" first-revision: #< 
> commit: "897f303d2fa61497a931cf5fcb43349eb5f44c14" date: # second: 47 minute: 31 hour: 7 day: 1 month: 1 year: 2019 zone-offset: 0>> 
> last-revision: #< commit: 
> "ee6c4b62b88640f3828cf73a30377124e16cb95f" date: # 51 minute: 8 hour: 20 day: 12 month: 4 year: 2019 zone-offset: 0>>>)
>
> Now all we need to do is plug it into the right tools and enjoy!

Thanks for writing this Ludo, sorry it's taken so long for me to have a
look.

I've had a little play around with it locally, and it seems to work
well.

I added some exports (included below) so that I could more easily use
the module.

Maybe open-data-service could have the url default to
"https://data.guix.gnu.org";.

The only thing I can see that's required before merging though is the
exports. I'm now thinking about this kind of thing (getting data out of
the data service) in the context of patch/branch review.

Thanks,

Chris



  #:export (repository?
repository-id
repository-label
repository-url
repository-branches

branch?
branch-name
branch-repository-id

package-version?
package-version-string
package-version-branches

package?
package-name
package-versions

revision?
revision-commit
revision-date

build?
build-server-id
build-id
build-time

channel-instance?
channel-instance-system
channel-instance-derivation
channel-instance-builds

lint-warning?
lint-warning-package
lint-warning-package-version
lint-warning-message
lint-warning-location

open-data-service

lookup-package
lookup-repository
package-version-history
revision-channel-instances
revision-lint-warnings))


signature.asc
Description: PGP signature


nar-herder design retrospective

2022-02-04 Thread Christopher Baines
Hi!

I rushed the nar-herder [1] in to existence back in December, to address
the buildup of nars on bayfront by moving the nars to another machine
with more space to store them.

1: https://git.cbaines.net/guix/nar-herder/about/

This was something I was planning for a while though, I sent an email to
guix-devel outlining some points of the design and aims around a year
ago [2].

2: https://lists.gnu.org/archive/html/guix-devel/2021-02/msg00104.html

The nar-herder is currently in a stable state. It's packaged for guix
and there's a service to use it. There are probably some bugs, but I
think I've fixed the important initial ones.

I covered some information about the deployment of the nar-herder in
this email back in December [3] and that led to some really good
replies, I'll copy/paste some of the important bits below.

3: https://lists.gnu.org/archive/html/guix-devel/2021-12/msg00140.html

https://lists.gnu.org/archive/html/guix-devel/2021-12/msg00196.html:

> Regarding nar-herder, I think it’d be nice to have a solution to
> mirroring in Guix proper, developed similarly to other components,
> because it could be a fairly central tool.

> Usually I’m the one asking for blog posts :-), but I’d really like us as
> a project to collectively engage on the topic before we publicize this
> specific approach.

https://lists.gnu.org/archive/html/guix-devel/2021-12/msg00201.html:

> Why not extend “guix archive”?
> 
> However, without all the details so my remark is totally naive, I miss
> what “nar-herder” is doing that “guix archive”+rsync+“guix publish” is
> not doing – other said I miss why another SQL database is required to
> serve stuff from one place to another.  I have read README but I did not
> get the point.

https://lists.gnu.org/archive/html/guix-devel/2021-12/msg00204.html:

> I'm quite interested in learning more and potentially trying out the
> nar-herder! Some thoughts that I'd like to add to the design space:
> 
> I think it would be great if one of the pastures to which we herd the
> nars would be a free and open source software mirror site. In my
> experience, these are usually some static web hosting in front of a
> large disk with a place to run scripts to sync the content. A database
> server may not be available. I'd like to support this use case because I
> think it is a great way to build bridges to the communities who run or
> gather around these mirrors.
> 
> I'd also like the ability fetch nars directly from the local-to-me
> mirror rather than having them be proxied through a far way server.
> 
> One of the things that I really like and find empowering about Guix is
> that the developer/system administration tools are as available, easy
> to use, and convenient as the every day tooling. To the extent
> possible, I think that we should strive to make our syncing/mirroring
> solution practical to run for local, small setups, and not require
> project-scale infrastructure or coordination between many programs
> that are not captured in a Guix service.

So, currently the nar-herder can be used to move nars between machines,
which is what I wanted the initial implementation to be capable of. This
functionality should be sufficient for operating mirrors, although I'm
not aware of any being setup yet. I'd also like to be able to get
metrics about nar requests, but this isn't supported yet.

I'm all for having a solution to mirroring in Guix itself, although I
don't have a plan for this. Maybe the nar-herder could just be moved in
to Guix, maybe with a different name? Any other ideas?

I think moving the nar-herder repository on to Savannah is probably a
good thing to do regardless.

Thanks,

Chris


signature.asc
Description: PGP signature


Re: extend ’guix archive’?

2022-02-04 Thread Christopher Baines

zimoun  writes:

> On Mon, 20 Dec 2021 at 23:07, Ludovic Courtès  wrote:
>
>> Regarding nar-herder, I think it’d be nice to have a solution to
>> mirroring in Guix proper, developed similarly to other components,
>> because it could be a fairly central tool.
>>
>> ‘guix publish’ is probably not extensible enough to support that, but we
>> could make it a new ‘guix mirror’ or ‘guix sync’ or whatever command.
>
> Why not extend “guix archive”?
>
> However, without all the details so my remark is totally naive, I miss
> what “nar-herder” is doing that “guix archive”+rsync+“guix publish” is
> not doing – other said I miss why another SQL database is required to
> serve stuff from one place to another.  I have read README but I did not
> get the point.

Apologies, I missed replying to this at the time.

Using an SQL database (sqlite) isn't required in my opinion, but I do
like that aspect of the design.

You could for example keep the narinfo's as files on the disk, and then
use rsync say to copy them between machines to setup mirrors. I think
this would perform worse compared to storing the narinfos in a database
when initially setting up a mirror. With a database, you only have to
download one large file, whereas with individual narinfo files, you have
to download lots of individual small files. Storing all the narinfo
files individually also generally takes up more space than storing them
in a database, since there's an overhead associated with each file on
the filesystem.

Additionally, the database is used to do extra things on top of just
storing the narinfos. The references from the narinfos are stored
separately to facilitate doing GC like removal of the
nars. Additionally, there's a way to tag nars, something which I was
thinking of using to allow selectively removing some nars which only
need to be stored for a short time.

https://git.cbaines.net/guix/nar-herder/tree/nar-herder/database.scm#n74

I hope that makes some sense,

Chris


signature.asc
Description: PGP signature


Re: extend ’guix archive’?

2022-02-04 Thread Christopher Baines

Jack Hill  writes:

> On Mon, 20 Dec 2021, zimoun wrote:
>
>> Hi,
>>
>> On Mon, 20 Dec 2021 at 23:07, Ludovic Courtès  wrote:
>>
>>> Regarding nar-herder, I think it’d be nice to have a solution to
>>> mirroring in Guix proper, developed similarly to other components,
>>> because it could be a fairly central tool.
>>>
>>> ‘guix publish’ is probably not extensible enough to support that, but we
>>> could make it a new ‘guix mirror’ or ‘guix sync’ or whatever command.
>>
>> Why not extend “guix archive”?
>
> I'm quite interested in learning more and potentially trying out the
> nar-herder! Some thoughts that I'd like to add to the design space:

Apologies for the slow reply, it's great that you're interested!

> I think it would be great if one of the pastures to which we herd the
> nars would be a free and open source software mirror site. In my
> experience, these are usually some static web hosting in front of a
> large disk with a place to run scripts to sync the content. A database
> server may not be available. I'd like to support this use case because
> I think it is a great way to build bridges to the communities who run
> or gather around these mirrors.

I think there's a general discrepancy between how Guix works and how
mirror sites generally work, but there are probably ways of bridging
that gap. Maybe all the nars for the latest release could be mirrored
for example, and the nar-herder could probably help with that.

> I'd also like the ability fetch nars directly from the local-to-me
> mirror rather than having them be proxied through a far way server.

I think setting up some mirrors closer to the people that use Guix is
now easier to do with the help of the nar-herder.

> One of the things that I really like and find empowering about Guix is
> that the developer/system administration tools are as available, easy
> to use, and convenient as the every day tooling. To the extent
> possible, I think that we should strive to make our syncing/mirroring
> solution practical to run for local, small setups, and not require
> project-scale infrastructure or coordination between many programs
> that are not captured in a Guix service.

Indeed, and this is something to strive for in the design.


signature.asc
Description: PGP signature


Re: missing patch for texlive-bin (e77412362f)

2022-02-04 Thread Timothy Sample
Hi,

zimoun  writes:

> On Thu, 03 Feb 2022 at 10:46, Timothy Sample  wrote:
>
>> The bad news is that 0.75 is not there.  At first I was going to
>> apologize for the shortcomings of the sampling approach... until I
>> realized you are trying to trick me!  ;)  Unless I’m misreading the Git
>> history, that patch appeared and disappeared on core-updates and was
>> never part of master.
>
> Because of the good news, the same could be applied for these patches,
> no? [...]  [I]t “only” misses to dissamble this data and add an entry
> to the database, no?

Yes.  I could add that commit to the database, evaluate it, and load all
the sources.  I’m inclined not to, but I’m open to being convinced.  (I
really like how simple the current system is conceptually.)

> I miss what you mean by «was never part of master».  After the merge,
> what was core-updates and what was master is somehow indistinguishable,
> no?  Or are you walking only to first-parent after merge commit?  Well,
> Git history and sorting leads to headache; as git-log doc shows. :-)
>
> I think it is fine to simplify “complex” history with a sampling
> considering only first-parent walk.

That’s about it.  To my mind, “The History of the Guix Package Database”
*is* the first parent walk that you describe.  Of course, that’s just my
feeling.  There’s lots of room for disagreement there.  Basically, if
you can’t reach a commit by starting at 1.0.0 and running ‘guix pull’
without arguments, it doesn’t exist!

>> That being said, coverage is not perfect.  The most obvious problem (to
>> me) is the sampling approach.  Surely there are sources that are missed
>> by only examining one commit per week.  This can be checked and fixed by
>> using data from the Guix Data Service, which has data from essentially
>> every Guix commit.
>
> No, the Data Service and even Cuirass are using a sampling approach too;
> they do not process all the commits.
>
> Cuirass uses a «every 5 minutes» approach; please CI savvy people
> correct me if I mistake.  The Data Service uses a «batch guix-commits»
> approach; more details in this thread [1].

Thanks for letting me know about this.  Maybe I’m too optimistic!
Either way, the Data Service data is likely much more accurate than PoG,
and could still help build confidence.

> Well, the coverage is twofold, IMHO.
>
>  1. preserve what is currently entering in Guix
>  2. archive what was available in Guix
>
> About #1, the main mechanism are sources.json, “guix lint”, and update
> disarchive-db (now done by CI).  What is missed should be fixed by #2.
>
> About #2, it is hard to fix all the issues at once.  One commit per week
> already provides a good view to spot some problems.  Somehow, process
> all the commits just means burn more CPU; it seems “easy” once the
> infrastructure is in-place, no?

More or less.  Burning CPU is definitely the main thing holding back
processing all the commits, but it would likely take a bit of effort to
get code that works for around one hundred commits to work for
thousands.  The second thing is diminishing returns.  Burning *way* more
CPU to track down a couple sources feels a little wasteful to me.

For me, the scope of PoG is perfect the way it is.  It’s big enough to
be useful, but not so big to be overwhelming.  There are lots of serious
problems to be addressed, too.

That being said, I’m willing to change things.  A lot of this is just my
gut feeling.  :)  If everyone else is clamouring to have more commits
or to track core-updates or whatever, I’m all ears!


-- Tim