ProGet SCA 2024 Preview Feedback - Package detection still hit or miss

jw · 12 Mar 2024, 09:50

Package detection is still a lot hit or miss like in the previous version.

Here are two packages pulled in via nuget.org ProGet proxy, one is detected perfectly fine the other one raised an issue as "Package not in feed". When navigating to both of them they are found and reported with green label (no vulnerabilities or license violations).

What is also a bit confusing is the "inconclusive" label. To me this would mean the state of the package could not be determined, but on the issues package it gives a proper "Package not in feed" explanation. I would expect to see this state here too.

apxltd · 12 Mar 2024, 10:21

Hi @jw ,

First, the reason you're getting "Package not in feed" (which would also happen in the ProGet 2023 feature as an Issue) is because that Sqlite package has not been cached or pulled to ProGet. However, if you just click Download (and thus cache) the package, then it would be in the feed, and this would go away.

When you browse a remote package in the UI, ProGet is querying nuget.org and displaying whatever their API shows. This query/data is not cached or retained otherwise - which is why it's missing when doing an analysis.

In ProGet 2024, "missing packages" wont be issues per se. Instead, an analysis will be "Inconclusive" -- and this means that there's not enough information to complete the analysis. If your policies don't check license rules (or there's an exception for license checking of Microsoft.* packages), then we wouldn't need the local package to analyze it - and this would be considered compliant.

However, this functionality doesn't work yet. That's just how it will work.

Alex

jw · 12 Mar 2024, 12:17

Hi @apxltd

Thanks for the response. I will try to not reiterate the cache issue too much, as I have raised it here before
https://forums.inedo.com/topic/3953/proget-sca-missing-package-because-of-nuget-proxy-cache-miss/6?_=1710244266850 and also in my mail about ProGet SCA Feedback to you.

We were really hoping this would be improved as part of the SCA 2024 changes. Both current solutions, either manually downloading missing packages or maintaining exclusion lists seems like a workaround to something that could be fully automated.

I expect this to be one of the first questions my colleagues will ask me, once I introduce them to the SCA features. ProGet should try to do its absolute best to create the most complete analysis possible, even if that means it takes a few extra seconds. All the manual effort that should go into it, should be focused on solving the actual issues, like assessing and fixing vulnerabilities.

apxltd · 13 Mar 2024, 03:57

Hi @jw ,

Although the released version will be able check for vulnerabilities without needing the package metadata, reading server properties (deprecation/unlisting), checking if it's latest patch version, doing license detection, etc. require having the package metadata.

However, the package metadata should already be in ProGet by the time you upload the sbom. When doing package restores from ProGet, the packages will be cached automatically. If that's not happening for you, make sure to clear your nuget package caches.

Ultimately we designed the SCA feature is designed to be used in conjunction with ProGet as a proxy to the public repositories. It's not a "stand-alone" tool, so it won't work well if packages aren't in ProGet.

The reason is, if the package metadata isn't in ProGet, it has to be searched for on a remote server. In your sample (one build, two packages), you're right.. it's just a few seconds to search that data on nuget.org. But in production, users have 1000's of active builds each with 1000's of packages... and that *currently * takes about an hour to run an analysis.

Adding 100k's of network requests to connectors to constantly query nuget.org/npmjs.org for server metadata would add hours to that time, triggers api rate limits, and causes lots of performance headaches. Plus, this "leaks" a lot of data about package usage, which is an added security concern. This is a major issue with tools like DependencyTrack - they're basically impossible to scale like ProGet.

Thanks,
Alex

jw · 14 Mar 2024, 10:43

However, the package metadata should already be in ProGet by the time you upload the sbom. When doing package restores from ProGet, the packages will be cached automatically. If that's not happening for you, make sure to clear your nuget package caches.

Not all packages will always be acquired via the remote restore mechanism, as I eluded to here.
They still should be analyzed by SCA and not show up on SCA as "inconclusive".

For example, I just cannot get the System.Security.Cryptography.Primitives populated in the ProGet cache, via regular dotnet restore. The package is always taken from the local dotnet SDK installation folder.

There are also other cases like these framework packages. We have a number of people working via VPN and for performance reasons they are access nuget.org directly, via Microsoft's CDN, which gives them much better performance. Their restore operations will not trigger cache population in ProGet.

Bottom line, there are plenty of scenarios why a package might not be readily available in the ProGet package cache.

Ultimately we designed the SCA feature is designed to be used in conjunction with ProGet as a proxy to the public repositories. It's not a "stand-alone" tool, so it won't work well if packages aren't in ProGet.

The reason is, if the package metadata isn't in ProGet, it has to be searched for on a remote server. In your sample (one build, two packages), you're right.. it's just a few seconds to search that data on nuget.org. But in production, users have 1000's of active builds each with 1000's of packages... and that *currently * takes about an hour to run an analysis.

That is how every other SCA systems works, that does not have a built-in package server.

Right now we are using DepTrack behind ProGet as a caching proxy. If DepTrack wants to analyze any package it will just pull the information from ProGet, if that package is already cached, great. If that package is not cached ProGet will download it and it will be cached from here on out, zero manual intervention required.

This very same scenario will not work in ProGet without manual intervention by either adding packages on some exclusion list or downloading them manually to populate the cache.

This limitation will always put ProGet SCA in a disadvantage when being compared to other systems.

Adding 100k's of network requests to connectors to constantly query nuget.org/npmjs.org for server metadata would add hours to that time, triggers api rate limits, and causes lots of performance headaches. Plus, this "leaks" a lot of data about package usage, which is an added security concern. This is a major issue with tools like DependencyTrack - they're basically impossible to scale like ProGet.

I agree, this is not how things should be set up. If someone decides to run a setup like this, they are not doing a good job.

ProGet should always be in the middle as a caching proxy and the cache download should only be there to fill the gaps for packages that for whatever reason are not available yet.

apxltd · 15 Mar 2024, 12:09

@jw thanks for additional insight!

Unfortunately we simply won't have the opportunity to explore this until well past ProGet 2024, and only after we've gotten sufficient feedback from other early adopters on other gaps. I think there are other important things we need to consider as well, and handlign this is so much more complicated to handle this than it may seem, especially at scale and with how our ProGet is configured in the field.

There are also other mechanisms like policy exceptions built-in that could easily handle System.* and runtime.* packages, as I suspect the only thing you would worry about those are vulnerabilities.

As an alternative, I would if you could just write a tool/script to:

query for inconclusive builds
download inconclusive/missing package builds through a feed
trigger a reanalysis of the build

That's not optimal, but that is one thousand times easier than getting something liket his working in PRoGet.

jw · 18 Mar 2024, 10:36

I can understand that this is a lot of effort and really appreciate that this request is not discarded right away.

The workaround you proposed is something that I have already looked into myself.

To make this work smoothly, a webhook for SCA events would really be immensely helpful. Is something like that already on the 2024 SCA roadmap?

jw · 18 Mar 2024, 12:11

Another somewhat related question:

When a SBOM scan is uploaded, no issues are created initially even though the UI suggests that analysis was done already. One has to run analysis a second time with the issue checkbox set for issues to be populated.

Is this intentional or what is the idea behind that?

I was expecting to get a full analysis after uploading either via API or UI.

apxltd · 20 Mar 2024, 03:27

To make this work smoothly, a webhook for SCA events would really be immensely helpful. Is something like that already on the 2024 SCA roadmap?

We do have a webhook notifier for "non-compliant packages found in build" planned, so perhaps this would be on the list!

When a SBOM scan is uploaded, no issues are created initially even though the UI suggests that analysis was done already. One has to run analysis a second time with the issue checkbox set for issues to be populated.

I just published some preview documentation, but the concept/model is slightly changed here:

When builds in certain stages are analyzed, an "Issue" for each noncompliant or inconclusive package will be created. These are intended to allow a human review and override noncompliant packages.

Basically, the idea is that nearly every build will be created through a CI process and ignored until it needs to be later tested. And that happens later in the release pipeline, after the build is promoted to a testing stage.

Our new guidance will be run pgutil builds create (basically new name for pgscan inspect) at build time, eactly like it's done now. And the later, when you deploy to a testing environment or otherwise are ready for testing, run pgutil builds promote. At that point, the issues are created.

We were thinking to have "Unresolved Issues" present on the project overview page, and it'd be really messy if it's mostly just CI builds.

Hope taht helps explain the thought process.

jw · 3 May 2024, 15:02

I managed to implement the workaround for the uncached packages.

Right now I am doing the following:

API call to $"api/sca/builds?project={projectName}&version={version}"
- Parse the build Id out of the ViewBuildUrl property
API call to native API $"api/json/Projects_GetBuildInfo?ProjectBuild_Id={projectBuildId}"
- Cross reference ProjectBuildPackagesFeeds with ProjectBuildPackagesExtended to find out which packages could not be mapped to feeds
Call to the download button URL $"nuget/{feedName}/package/{packageName}/{packageVersion}" for each package that could not be mapped to any feed
API call to ($"api/sca/analyze-build?project={projectName}&version={version}" to update the build

While it does work, I'm not fully happy with the implementation and would like to ask for some improvements to the regular API.

Compliance information in PackageInfo

Right now the PackageInfo object does not contain any information about compliance violations. Would it be possible to extend it with the warnings that are shown in the compliance column of the /projects2/packages page?

Ideally, it would be some sort of enum, that has atomic values for all the known violations and can be filtered easily. This would help to to avoid the call to the native API in step 2, which as far as I understand you don't recommend using anyways.

{
  "purl": "pkg:myGroup/myPackage@1.2.3",
  "vulnerabilities": [],
  "licenses": ["MIT", "Apache-2.0"],
  "compilanceWarnings": [
     "PackageNotFound",
     "NoLicenseDetected",
     "Deprecated"
  ]
}

What would also be nice to have is atomic values for the package name and version, so one doesn't have to parse it out of the purl.

Download Package API behavior
Right now the /api/packages/MyNugetFeed/download?purl=pkg:nuget/MyNugetPackage@1.0.0 API returns 404 when trying to download a package that is not cached yet.

As a workaround I am using the URL of the download button used in the UI, but I would prefer to use a proper API endpoint that has more chances to be stable in the future.

I think it would be good if the download API could be changed to also trigger package downloading and caching from connectors, so basically the same behavior as the endpoint behind the download button.

atripp · 7 May 2024, 04:01

Hi @jw ,

Thanks for the update, that sounds like a decent work-around for the time being. It will likely be a while before we can develop something more generalized.

I'm curious if you looked at any of the audit endpoints/commands in pgutil yet? That's kind of the direction we're thinking it will make sense to go - basically pgutil packages audit --package=myGroup/myPackage --version=1.2.3

I don't know what the HTTP Endpoint is offhand, but that does make sense to add something to PackageInfo, since we have it in the database already pretty easily. We could display a complianceStatus (Compliant, Warn, Noncompliant, Inconclusive, Error) and a complianceDetail string - that's what we have in the database. I think properties are easier to work with than objects... what do you think?

As for Download Package behavior -- we do intend to get the Common Packages API to work with connectors. That involves a lot of refactoring that just didn't make it in ProGet 2024 (only PyPi and Apk were refactored).

Cheers,
Alana

jw · 7 May 2024, 13:04

I'm curious if you looked at any of the audit endpoints/commands in pgutil yet? That's kind of the direction we're thinking it will make sense to go - basically pgutil packages audit --package=myGroup/myPackage --version=1.2.3

I haven't yet, but thanks for the pointer, I will have a look.

I don't know what the HTTP Endpoint is offhand, but that does make sense to add something to PackageInfo, since we have it in the database already pretty easily. We could display a complianceStatus (Compliant, Warn, Noncompliant, Inconclusive, Error) and a complianceDetail string - that's what we have in the database. I think properties are easier to work with than objects... what do you think?

From what I understood from the docs, PackageInfo is part of ReleaseInfo (now probably called BuildInfo after the release=>build rename?) and used in the /api/sca/builds?project= endpoint.

Adding complianceStatus (Compliant, Warn, Noncompliant, Inconclusive, Error) to PackageInfo already makes a lot of sense, though it needs be very precisely defined what each status means, especially "Inconclusive" and "Error".

Right now, I'm specifically caring about the state when a package could not be fully scanned because it was not found in cache, not sure if "Inconclusive" means exactly that or if there are other triggers for that state.

As for the detail string, you are probably referring to what is currently shown in the tooltip when hovering warn on the /projects2/packages?buildId=5 page?

Generally, I believe the type of compliance violation is an important piece of information and should be stored as atomic values. At the moment it appears to be a concatenated string of violations?

On the API I would like to see something like an array of enum strings (like my code example above) or a dedicated object within PackageInfo, something like ComplianceViolationInfo with boolean properties for each violation type would also be fine.

In the UI it could look something like this:

This would be much more user-friendly than having to hover each warning and read a long string, or alternatively having to sift through all the generated issues.

As for Download Package behavior -- we do intend to get the Common Packages API to work with connectors. That involves a lot of refactoring that just didn't make it in ProGet 2024 (only PyPi and Apk were refactored).

Glad to hear that this is already on the roadmap

Cheers

stevedennis · 9 May 2024, 19:41

Hi @jw ,

We added a compliance property via PG-2658 in the next maintenance release.

It basically shows what's in the database (which is also what the page in the UI does):

writer.WritePropertyName("compliance");
writer.WriteStartObject();
writer.WriteString("result", Domains.PackageAnalysisResults.GetName(package.Result_Code));
if (package.Detail_Text is not null)
    writer.WriteString("detail", package.Detail_Text);
if (package.Analysis_Date.HasValue)
    writer.WriteString("date", package.Analysis_Date.Value);
writer.WriteEndObject();

I think you can rely on result=Inconclusive meaning the package isn't in ProGet. That's all we use the status for now, but in the future it might be used for something else. A result=Error means that our code crashed and you shouldn't ever see that.

We'll definitely considering doing something other than a single result string down the line, but for now this was the easiest :)

Thanks,
Steve

jw · 13 May 2024, 10:09

Thanks for adding the properties to 2024.3.

During testing I encountered unexpected behavior regarding the values:

To trigger the "Inconclusive" state I used the "Delete Cached Package" menu option on the package site and then re-ran the analysis on a build.

My expectation was that the package would be reported as inconclusive, since it is no longer available in cache and thus can't be fully scanned.

Instead I get the following results:

        {
            "purl": "pkg:nuget/AutoMapper@10.1.1",
            "licenses": [
                "MIT"
            ],
            "compliance": {
                "result": "Warn",
                "detail": " because of Package Status is Unlisted, Package Status is Deprecated, No license detected.",
                "date": "2024-05-13T09:55:04.467Z"
            },
            "vulnerabilities": []
        },

The warnings in the detail string do not make sense, since the package is neither unlisted, nor deprecated and the license is also correctly detected as MIT.

stevedennis · 14 May 2024, 18:59

Hi @jw ,

I haven't investigated this yet, but I assume that the results are the same in the UI? That's all just pulling data from the database, so I would presume so.

Could you find the relavent parts of the analysis logs? That helps us debug much easier.

Thanks,
Steve

jw · 14 May 2024, 19:11

I haven't investigated this yet, but I assume that the results are the same in the UI? That's all just pulling data from the database, so I would presume so.

Yes, the UI shows the same.

Could you find the relavent parts of the analysis logs? That helps us debug much easier.

It was actually not that easy to find the AutoMapper package in the logs, since the name does not appear anywhere. I made a custom SBOM with just the AutoMapper package and this is what the log looks like:

Analyzing compliance...
Beginning license rule analysis...
Default rules: undectableLicense=Warn, unspecifiedLicense=Compliant
The package is not cached or local to any feed; without package metadata, license detection is limited.
No licenses detected on package; applying undectableLicense rule (Warn)
License rule analysis complete.
The package is not cached or local to any feed; cannot determine if Deprecated.
No policies define a deprecation rule, so default Warn will be used.
The package is not cached or local to any feed; cannot determine if Unlisted.
No policies define an unlisted rule, so default Warn will be used.
Package is Warn because of Package Status is Unlisted, Package Status is Deprecated, No license detected.

atripp · 15 May 2024, 03:38

@jw thanks for the detailed research, this definitely is wrong. We should log the package (I think that was there at one point), but also this should be inconclusive.

We'll get this fixed via PG-2676 in an upcoming maintenance release, hopefully this Friday's or the following :)

stevedennis · 15 May 2024, 19:17

Hi @jw ,

FYI - We just wanted to clarify what "inconclusive" meant - this was a "late" change on our end, and we realized the documentation wasn't very clear. Here is how we describe it now:

Inconclusive Analysis

A build package (and thus a build as a whole) can be have an "inconclusive" compliance status. This will occur when two conditions are met:

A rule would cause the build package to be Noncompliant, such as Undetected Licenses = Noncompliant or Deprecated = Noncompliant

The package is not cached or otherwise pulled to ProGet, which means ProGet doesn't have enough information about the package to perform an analysis because the package is

You can resolve this by pulling or downloading (i.e. caching) the package in a feed in ProGet, or not defining rules that require server-based metadata. For example, vulnerability-based rules can be checked without the package, but deprecation or license detection cannot.

The analysis message is incorrect however, it should be "Package is Warn because of Package Status is unknown, No license detected."

Thanks,
Steve

jw · 15 May 2024, 20:13

Inconclusive Analysis

A build package (and thus a build as a whole) can be have an "inconclusive" compliance status. This will occur when two conditions are met:

A rule would cause the build package to be Noncompliant, such as Undetected Licenses = Noncompliant or Deprecated = Noncompliant

The package is not cached or otherwise pulled to ProGet, which means ProGet doesn't have enough information about the package to perform an analysis because the package is

Not sure I fully understand condition 1. Even if not such rule is configured, compliance could not be established because licensing information is not available, due to the package being not cached. Am I missing something?

The analysis message is incorrect however, it should be "Package is Warn because of Package Status is unknown, No license detected."

Should the string really say "Warn" when the package state is Inconclusive? Aren't Warn (Policies failing, Vulnerabilities detected, etc.) and Inconclusive (Package not found) two mutually exclusive states..?

atripp · 6 Jun 2024, 07:55

Hi @jw ,

You are technically correct. In retrospect, it's not a perfect design.

However, this behavior is based on most user perception/desire. "Noncompliant / Inconclusive / Error" are all considered "Red" conditions (i.e. bad/blocked, etc.), and to the user they are basically the same thing. "Warn" is a "Yellow" condition (i.e. caution).

A package that would only become "Yellow" if a rule is violated shouldn't turn "Red" if that rule can't be evaluated. So that's why it's Warn.

This addresses one of the major complaints about the "missing packages" problem in ProGet 2023, and the "Red" packages in the ProGet 2024 preview feature.

Thanks,
Alana

sebastian... · 6 Jun 2024, 07:55

Hi everyone,

I stumbled upon this thread because we have been faced with a similar problem in the past (missing packages due to projects not yet configured to use ProGet as a proxy, especially for npm packages), and I was wondering whether this whole process could be simplified with a specific webhook notifier for missing packages (or at least for inconclusive packages). There already is an event type called "Noncompliant Package Discovered", so it would seem rather straight forward to add another event type called "Inconclusive Package Discovered" or "Package Not Found During Analysis".

Such a webhook could be used for two things:

Trigger a download of the missing package from ProGet, resulting in the package being cached there (basically the solution that has been discussed in this thread, except that it's now done on demand) and the SCA analysis being able to handle the package correctly the next time it runs.
Notify project owners that their project apparently is not configured to use ProGet as a proxy for all package types or that there is some other kind of problem.

Or maybe this is already possible with one of the existing webhook events and I'm just reading the documentation wrong?

apxltd · 7 Jun 2024, 02:00

Hi @sebastian,

Thanks for sharing your thoughts on this! Few things to point out...

[1] The "Missing Package Problem" is not as bad in ProGet 2024, mostly because it will only apply when there's a license rule. In ProGet 2023, a "missing package" would happen even for vulnerabilities.

[2] We're working on a new feature/module (tentatively called "Remote Metadata") that is intended to routinely cache/update meatdata from public repos like nuget.org, npmjs.org, etc. This feature enables two usecases:

Routinely update "server-side metadata" like Deprecated, Unlisted on cached packages
Fetch metadata for packages not in ProGet during build analysis

It works somewhat independently, and basically it'll just show up on the Admin tabs as like "Remote Metadata" and you can configure providers, urls, etc.

I hope to have a prototype in a couple weeks and will post some details on a new forum posts. As an FYI this is something we will limit in the Free/Basic editions and have full-featured in the ProGet Enterprise product.

[3] "Package In ProGet" could be a policy rule to add after RMetadata feature, though it's probably not a big deal if ProGet can detect licenses thanks to RMetadata

Best,

Alex