Using multi-level feeds with passthrough is failing

johnsen_7555 · 24 Jun 2024, 10:56

Hi,
We may have hit another snag with using a trial-licensed copy of ProGet (see also Enforcing Licence Policies/Blocking?):

We're in the process of evaluating ProGet as part of larger solution for software supply chain control that relies heavily on automated, API-based interaction with the package manager.

Part of this workflow is a triage process for unapproved packages with the option of an explicit approval. To that end I've configured two linked Python feeds in ProGet: python-accessible and passthrough-test-python, with the former being configured for explicit whitelisting and the latter being fully open. The python-accessible feed permits anonymous pulls and is configured with a connector to pull packages from the open feed, and the underlying idea is that packages that are blocked in the restricted feed can be seen in the open feed and run through a triage process with the option of granting security approval for their use.
Unfortunately, the setting up of this configuration seems to be failing at the first hurdle. While upstream packages in the open feed are visible in the restricted feed, in the UI:

Installing packages does not seem to be possible and failing with a status 400:

Permissions do not appear to matter, since the same thing happens when using the Admin accocunt.

The deployment is running 2024.9 Build 1, and is deployed to a Kubernetes cluster.

Many thanks!

stevedennis · 24 Jun 2024, 15:17

This post is deleted!

stevedennis · 24 Jun 2024, 16:34

Hi @johnsen_7555 ,

Sounds like you're building a sort of Python Package Approval Workflow, which is great to see.

If the user doesn't have permission to download the file, I would expect a 401 (if anonymous) or 403 if authenticated.

A 400 error is a bad request. It could be coming from ProGet, as ProGet will occasionally throw that message when there is unexpected input. But it could also be coming from an intermediate server that's processing the request before forwarding to ProGet.

In this case, I believe pip is simply just performing a GET on the URL in the error message:

.../download/numpy/2.0.0/numpy-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=6d7696c615765091cc5093f76fd1fa069870304beaccfd58b5dcc69e55ef49c1

I'm not 100% sure that's what pip is doing, but why don't doing a curl -v against that URL and see if you also get a 400?

If so, then you should get an error in the message body from curl. ProGet will write this out to the response stream.

If not, then you'll need to capture the traffic and see what the difference is. Maybe it's a header that's different? I'm not sure what would cause ProGet to yield a 400.

Let us know what you find,
Steve

johnsen_7555 · 24 Jun 2024, 16:49

Hi Steve,
Thanks for the quick response. Good shout on using curl; it looks like this is indeed either another trial licence issue or its a problem with the licence policy failing to detect a permissible licence (which it sucessfully does if accessing the open feed directly):

stevedennis · 24 Jun 2024, 20:52

Thanks for clarifying @johnsen_7555.

I'm struggling a bit to see what kind of configuration might cause this issue or reproduce the issue. Is your python-accessible feed connected directly to PyPi.org?

If you go to re-analyze the package, you should get a really long set of debug logs (no need to send them). But after you do that, can you try the download again?

johnsen_7555 · 25 Jun 2024, 06:07

Hi Steve

Thanks!
No, the python-accessible feed is not directly connected to Pypi, instead it uses passthrough-test-python as its upstream source, that in turn is connected 3rd party feeds including pypi.
Only the python-accessible has blocking policies configured, while passthrough-test-python is configured to pass through non-compliant packages.

If I try to list packages in the python-accessible feed I get:

However, I can successfully retrieve packages from the internal usptream feed when entering the exact package name, as instructed:

Digging into the package, and clicking on the latest version I arrive at:

After downloading the package, by clicking on "Pull to ProGet", I get:

which is wrong, but the licence is then correctly displayed as the (compliant) 0BSD licence:

in the overview.

stevedennis · 25 Jun 2024, 15:48

Hi @johnsen_7555 ,

Ah ha, thanks for clarifying that!

This is the expected behavior, and the reason is a bit complex.

Unlike most package repositories, the PyPI Repository API (which a ProGet feed implements) does not provide any licensing information about packages. It's just a very basic listing of names and versions, which means that there is no license information (or description, author, etc). All of that is embedded in the package files.

However, pypi.org has a special API that ProGet queries to provide more information about a package hosted on pypi.org. This way, description and license information can be displayed on remote packages. But this API is only for pypi.org, and the pip client doesn't use it.

When you connect to another feed in ProGet, the regular API is used. And since the PyPi Repository API doesn't provide package metadata, this information isn't available. It's on our long-term roadmap to use a special API / method for ProGet->ProGet connections, but that's a ways off and requires a lot of internal refactoring.

That said, the workflow we support to accomplish what you want is as follows:
https://blog.inedo.com/python/pypi-approval-workflow/

Thanks,
Steve

johnsen_7555 · 25 Jun 2024, 17:04

Many thanks for the explanation and the link, Steve!

One key objective of this exercise is to arrive at a mostly automated workflow. One way I can still see this being achievable despite the lack of the special metadata API in the ProGet feed is to set up a second quarantine feed that only contains packages blocked by a first, main feed that connects directly to pypi.org but with very restrictive compliance settings.

Could the following webhook provide those events to drive this quarantine workflow:

stevedennis · 25 Jun 2024, 20:15

Hi @johnsen_7555,

I'm not really sure I totally understand the automated worfklow you want to create; you mentioned earlier having an approval process?

Are there any gaps with the workflow I mentioned? Basically two feeds (approved, unapproved), which you then use package promotion as the approval action.

We don't recommend using webhooks to automate ProGet itself. This can create some loops that will cause headaches.

Thanks,
Steve

johnsen_7555 · 26 Jun 2024, 06:08

Thanks for the reply, Steve.
The main problem with the conventional approvals workflow you linked to is the lack of sufficient approver capacity. Hence, the plan is run a setup that allows packages through unchallenged if their licences are acceptable and there are no recorded vulnerabilities. I'm aware of the typo squatter and malicious insider, etc. risks that this does not address.

At any rate, the idea above was that if a package fails that first filter, it can be automatically quarantined and flagged for review. Upon review, if approved, that same package can then via the "manual" promotion mechanic that you reference be inserted into the general feed, and found and consumed by its users via their standard index URL. To achieve this ProGet would need to notify a service that orchestrates this process, that in turn inserts the package into the quarantine feed and notifies the approvers.

stevedennis · 26 Jun 2024, 14:56

Thanks @johnsen_7555! If I can offer some advice....

The workflow you're creating is a bit complicated, and adding in the automation component is "yet another product/process" to own/maintain. On our end, we get support inquiries from confused new administrators who notice "undocumented" behavior (i.e. not on docs.inedo.com) in ProGet.

If you're not "worried" about malicious packages, then the main risks you are mitigating are:

legal liability with using wrong licenses
vulnerabilities in your own software via OSS packages
developer time in fixing the aforementioned problems

Both licenses and vulnerabilities are only a problem if they go to production, and keep in mind that vulnerabilities need to be monitored after a package is being used, since they are often discovered long after the package is used in your production software.

How about something like this:

block noncompliant packages
set A/GPL licenses to be noncompliant
auto assess severe vulnerabilities , set those to be noncompliant
set unknown licenses an unassessed vulnerabilities to be warn
use pgutil in your CI/CD pipeline to prevent unaddressed warn from going to production
routinely monitor warn packages as you have time

Note that, in a future version of ProGet, we intend to add more intelligence to package analysis for OSS packages. For example, we would like to say "this nunnpy package has 1 version, is recently published, has no GitHub repo, etc., and therefore is noncompliant".

johnsen_7555 · 26 Jun 2024, 15:06

Many thanks for that proposal, Steve! The warning label in conjunction with dumping of the package cache sounds like it is worth exploring. I shall discuss it with my security and governance collaborators :-)