Hi @pmsensi,
we are facing a similar challenge. Basically, what I am interested in is a list of packages that we are currently using in our products that don't have a known license assigned to them (or: Proget was unable to identify the license, e.g. because the license info just states "see licenense.txt" or something like that). I think that you are looking for something similar and are not asking to analyze all packages that exist on Nuget or npm, because that would probably would not be feasible (as @atripp already mentioned in her reply).
We are using the new Reporting & SCA feature extensively, and we are getting some very nice license statistics out of it. However, the one thing that is currently missing is the number or list of packages with an unknown license. I hope that this will be added in a future release. In the mean time, looking at the database can be a workaround to at least get a list of the package name.
I think starting with 2022, ProGet started storing license infos of known packages in the database. I am assuming that that info is stored for packages that have been downloaded via ProGet (i.e. "cached") or maybe also for packages that have been reported to ProGet via pgscan (@atripp please feel free to add some info to this).
One simple approach to get the license info of all "known" packages, would be this:
select
pi.Package_Name, l.External_Id, l.Title_Text
from
PackageIds pi
left join
PackageLicenses pl on pl.Package_Id = pi.Package_Id
left join
Licenses l on pl.License_Id = l.License_Id
group by pi.Package_Name, l.External_Id, l.Title_Text
order by 1;
Note that this select groups by package names and ignores versions. You could now simply look for packages without a known license like this:
select
pi.Package_Name, l.External_Id, l.Title_Text
from
PackageIds pi
left join
PackageLicenses pl on pl.Package_Id = pi.Package_Id
left join
Licenses l on pl.License_Id = l.License_Id
where l.External_Id is null
group by pi.Package_Name, l.External_Id, l.Title_Text
order by 1;
However, if you look at the result of first query, you might notice that there might be multiple entries for some packages. That might be because a package has changed its license or just its license info (maybe switching from a "see license.txt" to an actual SPDX tag), or maybe because there was a bug in previous versions of ProGet that has been fixed in 2202.18 (https://inedo.myjetbrains.com/youtrack/issue/PG-2263). So you might get some false positives with this approach.
To get a list of packages without any entry of a known license, we have to eliminate the ones with multiple entries. There might be a better and more readable way to get this done, but the query that worked for us is this one:
select
distinct pi.Package_Name
from
PackageIds pi
left join
PackageLicenses pl on pl.Package_Id = pi.Package_Id
left join
Licenses l on pl.License_Id = l.License_Id
where l.External_Id is null
and pi.Package_Name in
(select Package_Name
from
(
select
a.Package_Name
from
(
select
pi.Package_Name, l.External_Id, l.Title_Text
from
PackageIds pi
left join
PackageLicenses pl on pl.Package_Id = pi.Package_Id
left join
Licenses l on pl.License_Id = l.License_Id
group by pi.Package_Name, l.External_Id, l.Title_Text
) a
group by a.Package_Name
having count(*) = 1
) b);
BTW: in case you are curious: that select gives us a list of 1222 packages.