Welcome to the Inedo Forums! Check out the Forums Guide for help getting started.

If you are experiencing any issues with the forum software, please visit the Contact Form on our website and let us know!

Using multi-level feeds with passthrough is failing



  • Hi,
    We may have hit another snag with using a trial-licensed copy of ProGet (see also Enforcing Licence Policies/Blocking?):

    We're in the process of evaluating ProGet as part of larger solution for software supply chain control that relies heavily on automated, API-based interaction with the package manager.

    Part of this workflow is a triage process for unapproved packages with the option of an explicit approval. To that end I've configured two linked Python feeds in ProGet: python-accessible and passthrough-test-python, with the former being configured for explicit whitelisting and the latter being fully open. The python-accessible feed permits anonymous pulls and is configured with a connector to pull packages from the open feed, and the underlying idea is that packages that are blocked in the restricted feed can be seen in the open feed and run through a triage process with the option of granting security approval for their use.
    Unfortunately, the setting up of this configuration seems to be failing at the first hurdle. While upstream packages in the open feed are visible in the restricted feed, in the UI:
    506c6dce-fadd-4065-9e17-7d69c1d687d2-image.png
    Installing packages does not seem to be possible and failing with a status 400:
    aad0c4cc-d152-4279-998f-0c3cf9569d02-image.png

    Permissions do not appear to matter, since the same thing happens when using the Admin accocunt.

    The deployment is running 2024.9 Build 1, and is deployed to a Kubernetes cluster.

    Many thanks!


  • inedo-engineer

    This post is deleted!

  • inedo-engineer

    Hi @johnsen_7555 ,

    Sounds like you're building a sort of Python Package Approval Workflow, which is great to see.

    If the user doesn't have permission to download the file, I would expect a 401 (if anonymous) or 403 if authenticated.

    A 400 error is a bad request. It could be coming from ProGet, as ProGet will occasionally throw that message when there is unexpected input. But it could also be coming from an intermediate server that's processing the request before forwarding to ProGet.

    In this case, I believe pip is simply just performing a GET on the URL in the error message:

    .../download/numpy/2.0.0/numpy-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=6d7696c615765091cc5093f76fd1fa069870304beaccfd58b5dcc69e55ef49c1

    I'm not 100% sure that's what pip is doing, but why don't doing a curl -v against that URL and see if you also get a 400?

    If so, then you should get an error in the message body from curl. ProGet will write this out to the response stream.

    If not, then you'll need to capture the traffic and see what the difference is. Maybe it's a header that's different? I'm not sure what would cause ProGet to yield a 400.

    Let us know what you find,
    Steve



  • Hi Steve,
    Thanks for the quick response. Good shout on using curl; it looks like this is indeed either another trial licence issue or its a problem with the licence policy failing to detect a permissible licence (which it sucessfully does if accessing the open feed directly):
    53933d76-8e74-4357-aa7a-96876b551a7d-image.png


  • inedo-engineer

    Thanks for clarifying @johnsen_7555.

    I'm struggling a bit to see what kind of configuration might cause this issue or reproduce the issue. Is your python-accessible feed connected directly to PyPi.org?

    If you go to re-analyze the package, you should get a really long set of debug logs (no need to send them). But after you do that, can you try the download again?



  • Hi Steve

    Thanks!
    No, the python-accessible feed is not directly connected to Pypi, instead it uses passthrough-test-python as its upstream source, that in turn is connected 3rd party feeds including pypi.
    Only the python-accessible has blocking policies configured, while passthrough-test-python is configured to pass through non-compliant packages.

    If I try to list packages in the python-accessible feed I get:
    9461ae1d-e3ef-4738-b6c5-91c2968b171c-image.png
    However, I can successfully retrieve packages from the internal usptream feed when entering the exact package name, as instructed:
    e2709cc6-70e1-4629-971a-47cde216bfa5-image.png

    Digging into the package, and clicking on the latest version I arrive at:
    c5873df3-0501-4492-8254-daad84597d3e-image.png

    After downloading the package, by clicking on "Pull to ProGet", I get:
    3c7f6384-a3d0-4053-b0c3-12648efba0b0-image.png
    which is wrong, but the licence is then correctly displayed as the (compliant) 0BSD licence:
    14fdb401-5859-44a6-aa16-5b985816844f-image.png
    in the overview.


  • inedo-engineer

    Hi @johnsen_7555 ,

    Ah ha, thanks for clarifying that!

    This is the expected behavior, and the reason is a bit complex.

    Unlike most package repositories, the PyPI Repository API (which a ProGet feed implements) does not provide any licensing information about packages. It's just a very basic listing of names and versions, which means that there is no license information (or description, author, etc). All of that is embedded in the package files.

    However, pypi.org has a special API that ProGet queries to provide more information about a package hosted on pypi.org. This way, description and license information can be displayed on remote packages. But this API is only for pypi.org, and the pip client doesn't use it.

    When you connect to another feed in ProGet, the regular API is used. And since the PyPi Repository API doesn't provide package metadata, this information isn't available. It's on our long-term roadmap to use a special API / method for ProGet->ProGet connections, but that's a ways off and requires a lot of internal refactoring.

    That said, the workflow we support to accomplish what you want is as follows:
    https://blog.inedo.com/python/pypi-approval-workflow/

    Thanks,
    Steve



  • Many thanks for the explanation and the link, Steve!

    One key objective of this exercise is to arrive at a mostly automated workflow. One way I can still see this being achievable despite the lack of the special metadata API in the ProGet feed is to set up a second quarantine feed that only contains packages blocked by a first, main feed that connects directly to pypi.org but with very restrictive compliance settings.

    Could the following webhook provide those events to drive this quarantine workflow:
    64529264-3523-4807-8361-847bdeb80289-image.png


  • inedo-engineer

    Hi @johnsen_7555,

    I'm not really sure I totally understand the automated worfklow you want to create; you mentioned earlier having an approval process?

    Are there any gaps with the workflow I mentioned? Basically two feeds (approved, unapproved), which you then use package promotion as the approval action.

    We don't recommend using webhooks to automate ProGet itself. This can create some loops that will cause headaches.

    Thanks,
    Steve



  • Thanks for the reply, Steve.
    The main problem with the conventional approvals workflow you linked to is the lack of sufficient approver capacity. Hence, the plan is run a setup that allows packages through unchallenged if their licences are acceptable and there are no recorded vulnerabilities. I'm aware of the typo squatter and malicious insider, etc. risks that this does not address.

    At any rate, the idea above was that if a package fails that first filter, it can be automatically quarantined and flagged for review. Upon review, if approved, that same package can then via the "manual" promotion mechanic that you reference be inserted into the general feed, and found and consumed by its users via their standard index URL. To achieve this ProGet would need to notify a service that orchestrates this process, that in turn inserts the package into the quarantine feed and notifies the approvers.


  • inedo-engineer

    Thanks @johnsen_7555! If I can offer some advice....

    The workflow you're creating is a bit complicated, and adding in the automation component is "yet another product/process" to own/maintain. On our end, we get support inquiries from confused new administrators who notice "undocumented" behavior (i.e. not on docs.inedo.com) in ProGet.

    If you're not "worried" about malicious packages, then the main risks you are mitigating are:

    • legal liability with using wrong licenses
    • vulnerabilities in your own software via OSS packages
    • developer time in fixing the aforementioned problems

    Both licenses and vulnerabilities are only a problem if they go to production, and keep in mind that vulnerabilities need to be monitored after a package is being used, since they are often discovered long after the package is used in your production software.

    How about something like this:

    • block noncompliant packages
    • set A/GPL licenses to be noncompliant
    • auto assess severe vulnerabilities , set those to be noncompliant
    • set unknown licenses an unassessed vulnerabilities to be warn
    • use pgutil in your CI/CD pipeline to prevent unaddressed warn from going to production
    • routinely monitor warn packages as you have time

    Note that, in a future version of ProGet, we intend to add more intelligence to package analysis for OSS packages. For example, we would like to say "this nunnpy package has 1 version, is recently published, has no GitHub repo, etc., and therefore is noncompliant".



  • Many thanks for that proposal, Steve! The warning label in conjunction with dumping of the package cache sounds like it is worth exploring. I shall discuss it with my security and governance collaborators :-)


Log in to reply
 

Inedo Website HomeSupport HomeCode of ConductForums GuideDocumentation