Welcome to the Inedo Forums! Check out the Forums Guide for help getting started.

If you are experiencing any issues with the forum software, please visit the Contact Form on our website and let us know!

ProGet SCA 2024 Preview Feedback - Package detection still hit or miss



  • Package detection is still a lot hit or miss like in the previous version.

    Here are two packages pulled in via nuget.org ProGet proxy, one is detected perfectly fine the other one raised an issue as "Package not in feed". When navigating to both of them they are found and reported with green label (no vulnerabilities or license violations).

    35af4143-fff9-4795-981e-a73b95b59622-image.png

    What is also a bit confusing is the "inconclusive" label. To me this would mean the state of the package could not be determined, but on the issues package it gives a proper "Package not in feed" explanation. I would expect to see this state here too.


  • inedo-engineer

    Hi @jw ,

    First, the reason you're getting "Package not in feed" (which would also happen in the ProGet 2023 feature as an Issue) is because that Sqlite package has not been cached or pulled to ProGet. However, if you just click Download (and thus cache) the package, then it would be in the feed, and this would go away.

    When you browse a remote package in the UI, ProGet is querying nuget.org and displaying whatever their API shows. This query/data is not cached or retained otherwise - which is why it's missing when doing an analysis.

    In ProGet 2024, "missing packages" wont be issues per se. Instead, an analysis will be "Inconclusive" -- and this means that there's not enough information to complete the analysis. If your policies don't check license rules (or there's an exception for license checking of Microsoft.* packages), then we wouldn't need the local package to analyze it - and this would be considered compliant.

    However, this functionality doesn't work yet. That's just how it will work.

    Alex



  • Hi @apxltd

    Thanks for the response. I will try to not reiterate the cache issue too much, as I have raised it here before
    https://forums.inedo.com/topic/3953/proget-sca-missing-package-because-of-nuget-proxy-cache-miss/6?_=1710244266850 and also in my mail about ProGet SCA Feedback to you.

    We were really hoping this would be improved as part of the SCA 2024 changes. Both current solutions, either manually downloading missing packages or maintaining exclusion lists seems like a workaround to something that could be fully automated.

    I expect this to be one of the first questions my colleagues will ask me, once I introduce them to the SCA features. ProGet should try to do its absolute best to create the most complete analysis possible, even if that means it takes a few extra seconds. All the manual effort that should go into it, should be focused on solving the actual issues, like assessing and fixing vulnerabilities.


  • inedo-engineer

    Hi @jw ,

    Although the released version will be able check for vulnerabilities without needing the package metadata, reading server properties (deprecation/unlisting), checking if it's latest patch version, doing license detection, etc. require having the package metadata.

    However, the package metadata should already be in ProGet by the time you upload the sbom. When doing package restores from ProGet, the packages will be cached automatically. If that's not happening for you, make sure to clear your nuget package caches.

    Ultimately we designed the SCA feature is designed to be used in conjunction with ProGet as a proxy to the public repositories. It's not a "stand-alone" tool, so it won't work well if packages aren't in ProGet.

    The reason is, if the package metadata isn't in ProGet, it has to be searched for on a remote server. In your sample (one build, two packages), you're right.. it's just a few seconds to search that data on nuget.org. But in production, users have 1000's of active builds each with 1000's of packages... and that *currently * takes about an hour to run an analysis.

    Adding 100k's of network requests to connectors to constantly query nuget.org/npmjs.org for server metadata would add hours to that time, triggers api rate limits, and causes lots of performance headaches. Plus, this "leaks" a lot of data about package usage, which is an added security concern. This is a major issue with tools like DependencyTrack - they're basically impossible to scale like ProGet.

    Thanks,
    Alex



  • However, the package metadata should already be in ProGet by the time you upload the sbom. When doing package restores from ProGet, the packages will be cached automatically. If that's not happening for you, make sure to clear your nuget package caches.

    Not all packages will always be acquired via the remote restore mechanism, as I eluded to here.
    They still should be analyzed by SCA and not show up on SCA as "inconclusive".

    For example, I just cannot get the System.Security.Cryptography.Primitives populated in the ProGet cache, via regular dotnet restore. The package is always taken from the local dotnet SDK installation folder.

    There are also other cases like these framework packages. We have a number of people working via VPN and for performance reasons they are access nuget.org directly, via Microsoft's CDN, which gives them much better performance. Their restore operations will not trigger cache population in ProGet.

    Bottom line, there are plenty of scenarios why a package might not be readily available in the ProGet package cache.

    Ultimately we designed the SCA feature is designed to be used in conjunction with ProGet as a proxy to the public repositories. It's not a "stand-alone" tool, so it won't work well if packages aren't in ProGet.

    The reason is, if the package metadata isn't in ProGet, it has to be searched for on a remote server. In your sample (one build, two packages), you're right.. it's just a few seconds to search that data on nuget.org. But in production, users have 1000's of active builds each with 1000's of packages... and that *currently * takes about an hour to run an analysis.

    That is how every other SCA systems works, that does not have a built-in package server.

    Right now we are using DepTrack behind ProGet as a caching proxy. If DepTrack wants to analyze any package it will just pull the information from ProGet, if that package is already cached, great. If that package is not cached ProGet will download it and it will be cached from here on out, zero manual intervention required.

    This very same scenario will not work in ProGet without manual intervention by either adding packages on some exclusion list or downloading them manually to populate the cache.

    This limitation will always put ProGet SCA in a disadvantage when being compared to other systems.

    Adding 100k's of network requests to connectors to constantly query nuget.org/npmjs.org for server metadata would add hours to that time, triggers api rate limits, and causes lots of performance headaches. Plus, this "leaks" a lot of data about package usage, which is an added security concern. This is a major issue with tools like DependencyTrack - they're basically impossible to scale like ProGet.

    I agree, this is not how things should be set up. If someone decides to run a setup like this, they are not doing a good job.

    ProGet should always be in the middle as a caching proxy and the cache download should only be there to fill the gaps for packages that for whatever reason are not available yet.


  • inedo-engineer

    @jw thanks for additional insight!

    Unfortunately we simply won't have the opportunity to explore this until well past ProGet 2024, and only after we've gotten sufficient feedback from other early adopters on other gaps. I think there are other important things we need to consider as well, and handlign this is so much more complicated to handle this than it may seem, especially at scale and with how our ProGet is configured in the field.

    There are also other mechanisms like policy exceptions built-in that could easily handle System.* and runtime.* packages, as I suspect the only thing you would worry about those are vulnerabilities.

    As an alternative, I would if you could just write a tool/script to:

    1. query for inconclusive builds
    2. download inconclusive/missing package builds through a feed
    3. trigger a reanalysis of the build

    That's not optimal, but that is one thousand times easier than getting something liket his working in PRoGet.



  • I can understand that this is a lot of effort and really appreciate that this request is not discarded right away.

    The workaround you proposed is something that I have already looked into myself.

    To make this work smoothly, a webhook for SCA events would really be immensely helpful. Is something like that already on the 2024 SCA roadmap?



  • Another somewhat related question:

    When a SBOM scan is uploaded, no issues are created initially even though the UI suggests that analysis was done already. One has to run analysis a second time with the issue checkbox set for issues to be populated.

    Is this intentional or what is the idea behind that?

    I was expecting to get a full analysis after uploading either via API or UI.


  • inedo-engineer

    To make this work smoothly, a webhook for SCA events would really be immensely helpful. Is something like that already on the 2024 SCA roadmap?

    We do have a webhook notifier for "non-compliant packages found in build" planned, so perhaps this would be on the list!

    When a SBOM scan is uploaded, no issues are created initially even though the UI suggests that analysis was done already. One has to run analysis a second time with the issue checkbox set for issues to be populated.

    I just published some preview documentation, but the concept/model is slightly changed here:

    When builds in certain stages are analyzed, an "Issue" for each noncompliant or inconclusive package will be created. These are intended to allow a human review and override noncompliant packages.

    Basically, the idea is that nearly every build will be created through a CI process and ignored until it needs to be later tested. And that happens later in the release pipeline, after the build is promoted to a testing stage.

    Our new guidance will be run pgutil builds create (basically new name for pgscan inspect) at build time, eactly like it's done now. And the later, when you deploy to a testing environment or otherwise are ready for testing, run pgutil builds promote. At that point, the issues are created.

    We were thinking to have "Unresolved Issues" present on the project overview page, and it'd be really messy if it's mostly just CI builds.

    Hope taht helps explain the thought process.



  • I managed to implement the workaround for the uncached packages.

    Right now I am doing the following:

    1. API call to $"api/sca/builds?project={projectName}&version={version}"
      • Parse the build Id out of the ViewBuildUrl property
    2. API call to native API $"api/json/Projects_GetBuildInfo?ProjectBuild_Id={projectBuildId}"
      • Cross reference ProjectBuildPackagesFeeds with ProjectBuildPackagesExtended to find out which packages could not be mapped to feeds
    3. Call to the download button URL $"nuget/{feedName}/package/{packageName}/{packageVersion}" for each package that could not be mapped to any feed
    4. API call to ($"api/sca/analyze-build?project={projectName}&version={version}" to update the build

    While it does work, I'm not fully happy with the implementation and would like to ask for some improvements to the regular API.

    Compliance information in PackageInfo

    Right now the PackageInfo object does not contain any information about compliance violations. Would it be possible to extend it with the warnings that are shown in the compliance column of the /projects2/packages page?

    Ideally, it would be some sort of enum, that has atomic values for all the known violations and can be filtered easily. This would help to to avoid the call to the native API in step 2, which as far as I understand you don't recommend using anyways.

    {
      "purl": "pkg:myGroup/myPackage@1.2.3",
      "vulnerabilities": [],
      "licenses": ["MIT", "Apache-2.0"],
      "compilanceWarnings": [
         "PackageNotFound",
         "NoLicenseDetected",
         "Deprecated"
      ]
    }
    

    What would also be nice to have is atomic values for the package name and version, so one doesn't have to parse it out of the purl.

    Download Package API behavior
    Right now the /api/packages/MyNugetFeed/download?purl=pkg:nuget/MyNugetPackage@1.0.0 API returns 404 when trying to download a package that is not cached yet.

    As a workaround I am using the URL of the download button used in the UI, but I would prefer to use a proper API endpoint that has more chances to be stable in the future.

    I think it would be good if the download API could be changed to also trigger package downloading and caching from connectors, so basically the same behavior as the endpoint behind the download button.


  • inedo-engineer

    Hi @jw ,

    Thanks for the update, that sounds like a decent work-around for the time being. It will likely be a while before we can develop something more generalized.

    I'm curious if you looked at any of the audit endpoints/commands in pgutil yet? That's kind of the direction we're thinking it will make sense to go - basically pgutil packages audit --package=myGroup/myPackage --version=1.2.3

    I don't know what the HTTP Endpoint is offhand, but that does make sense to add something to PackageInfo, since we have it in the database already pretty easily. We could display a complianceStatus (Compliant, Warn, Noncompliant, Inconclusive, Error) and a complianceDetail string - that's what we have in the database. I think properties are easier to work with than objects... what do you think?

    As for Download Package behavior -- we do intend to get the Common Packages API to work with connectors. That involves a lot of refactoring that just didn't make it in ProGet 2024 (only PyPi and Apk were refactored).

    Cheers,
    Alana



  • I'm curious if you looked at any of the audit endpoints/commands in pgutil yet? That's kind of the direction we're thinking it will make sense to go - basically pgutil packages audit --package=myGroup/myPackage --version=1.2.3

    I haven't yet, but thanks for the pointer, I will have a look.

    I don't know what the HTTP Endpoint is offhand, but that does make sense to add something to PackageInfo, since we have it in the database already pretty easily. We could display a complianceStatus (Compliant, Warn, Noncompliant, Inconclusive, Error) and a complianceDetail string - that's what we have in the database. I think properties are easier to work with than objects... what do you think?

    From what I understood from the docs, PackageInfo is part of ReleaseInfo (now probably called BuildInfo after the release=>build rename?) and used in the /api/sca/builds?project= endpoint.

    Adding complianceStatus (Compliant, Warn, Noncompliant, Inconclusive, Error) to PackageInfo already makes a lot of sense, though it needs be very precisely defined what each status means, especially "Inconclusive" and "Error".

    Right now, I'm specifically caring about the state when a package could not be fully scanned because it was not found in cache, not sure if "Inconclusive" means exactly that or if there are other triggers for that state.

    As for the detail string, you are probably referring to what is currently shown in the tooltip when hovering warn on the /projects2/packages?buildId=5 page?

    5328590d-1f59-44c7-a1f4-eca4803a96e2-image.png

    Generally, I believe the type of compliance violation is an important piece of information and should be stored as atomic values. At the moment it appears to be a concatenated string of violations?

    On the API I would like to see something like an array of enum strings (like my code example above) or a dedicated object within PackageInfo, something like ComplianceViolationInfo with boolean properties for each violation type would also be fine.

    In the UI it could look something like this:

    5a98634b-401a-4012-b6fe-a3294c957c54-image.png

    This would be much more user-friendly than having to hover each warning and read a long string, or alternatively having to sift through all the generated issues.

    As for Download Package behavior -- we do intend to get the Common Packages API to work with connectors. That involves a lot of refactoring that just didn't make it in ProGet 2024 (only PyPi and Apk were refactored).

    Glad to hear that this is already on the roadmap

    Cheers


  • inedo-engineer

    Hi @jw ,

    We added a compliance property via PG-2658 in the next maintenance release.

    It basically shows what's in the database (which is also what the page in the UI does):

    writer.WritePropertyName("compliance");
    writer.WriteStartObject();
    writer.WriteString("result", Domains.PackageAnalysisResults.GetName(package.Result_Code));
    if (package.Detail_Text is not null)
        writer.WriteString("detail", package.Detail_Text);
    if (package.Analysis_Date.HasValue)
        writer.WriteString("date", package.Analysis_Date.Value);
    writer.WriteEndObject();
    

    I think you can rely on result=Inconclusive meaning the package isn't in ProGet. That's all we use the status for now, but in the future it might be used for something else. A result=Error means that our code crashed and you shouldn't ever see that.

    We'll definitely considering doing something other than a single result string down the line, but for now this was the easiest :)

    Thanks,
    Steve



  • Thanks for adding the properties to 2024.3.

    During testing I encountered unexpected behavior regarding the values:

    To trigger the "Inconclusive" state I used the "Delete Cached Package" menu option on the package site and then re-ran the analysis on a build.

    My expectation was that the package would be reported as inconclusive, since it is no longer available in cache and thus can't be fully scanned.

    Instead I get the following results:

            {
                "purl": "pkg:nuget/AutoMapper@10.1.1",
                "licenses": [
                    "MIT"
                ],
                "compliance": {
                    "result": "Warn",
                    "detail": " because of Package Status is Unlisted, Package Status is Deprecated, No license detected.",
                    "date": "2024-05-13T09:55:04.467Z"
                },
                "vulnerabilities": []
            },
    

    The warnings in the detail string do not make sense, since the package is neither unlisted, nor deprecated and the license is also correctly detected as MIT.


  • inedo-engineer

    Hi @jw ,

    I haven't investigated this yet, but I assume that the results are the same in the UI? That's all just pulling data from the database, so I would presume so.

    Could you find the relavent parts of the analysis logs? That helps us debug much easier.

    Thanks,
    Steve



  • I haven't investigated this yet, but I assume that the results are the same in the UI? That's all just pulling data from the database, so I would presume so.

    Yes, the UI shows the same.

    Could you find the relavent parts of the analysis logs? That helps us debug much easier.

    It was actually not that easy to find the AutoMapper package in the logs, since the name does not appear anywhere. I made a custom SBOM with just the AutoMapper package and this is what the log looks like:

    Analyzing compliance...
    Beginning license rule analysis...
    Default rules: undectableLicense=Warn, unspecifiedLicense=Compliant
    The package is not cached or local to any feed; without package metadata, license detection is limited.
    No licenses detected on package; applying undectableLicense rule (Warn)
    License rule analysis complete.
    The package is not cached or local to any feed; cannot determine if Deprecated.
    No policies define a deprecation rule, so default Warn will be used.
    The package is not cached or local to any feed; cannot determine if Unlisted.
    No policies define an unlisted rule, so default Warn will be used.
    Package is Warn because of Package Status is Unlisted, Package Status is Deprecated, No license detected.
    

  • inedo-engineer

    @jw thanks for the detailed research, this definitely is wrong. We should log the package (I think that was there at one point), but also this should be inconclusive.

    We'll get this fixed via PG-2676 in an upcoming maintenance release, hopefully this Friday's or the following :)


Log in to reply
 

Inedo Website HomeSupport HomeCode of ConductForums GuideDocumentation