Welcome to the Inedo Forums! Check out the Forums Guide for help getting started.

If you are experiencing any issues with the forum software, please visit the Contact Form on our website and let us know!

pgscan: Different results for npm dependencies



  • Hi,

    I noticed that the list of npm dependencies differs depending on which type is given.
    If the input is a package-lock.json file with type "npm" only the dependencies of this file are being processed.
    If the input is a .sln with no type, pgscan scans for nuget and npm dependencies. But for npm dependencies all package-lock.json files are being processed, also the ones under "node_modules". This results in different npm dependencies.

    I don't think that package-lock.json files of node_modules should be processed since all necessary dependencies are part of my projects package-lock.json file.
    Further, I would like to call pgscan on a .sln because all project dependencies (nuget and npm) are listed in one sbom file.

    Anyone else have issues with this procedure? Or is there a valid reason why package-lock.json files of node_modules should be processed as well? I just think both pgscan calls should result in the same npm dependencies.

    Thank you,
    Caterina


  • inedo-engineer

    Hi @caterina,

    Just for some background. In pgscan, if the type is not specified or is set to auto and .NET is detected, it will perform a scan for .NET dependencies and for npm package dependencies and include them in the SBOM. When specifying a type that is not auto, pgscan will only scan for dependencies of that type. If you run 2 or more scans with pgscan, the results of each scan will append the new packages to the SCA project in ProGet, allowing you to append different dependency types as needed.

    I know we discussed this with your team on issue #27 in the GitHub repository and determined there were no actual differences. Are you able to provide an example case where there are differences?

    Just so other users can see a snippet of the conversation:

    That is a fair point to make. My thought was that including the node_modules folder in the recursive search would allow us to include the child dependencies used by installed packages that were not marked as dependencies in the npm package. But in my research and testing, I have found the package-lock.json at the root of the node_modules folder includes a subset of the data in the main package-lock.json. So no extra information was added. Do your package-lock.json files under the node-modules folder have additional information the parent doesn't? Also, do your packages in that folder have package-lock.json outside of the root of that folder?

    Looking at the hidden lock file documentation. The information in that file should be redundant as it is only used to improve performance, but if there is manual change in the node_modules tree by something other than npm, then the lock file is ignored (and should probably be removed anyways). I'm inclined to just exclude files from the node_modules folder as you suggest.

    I can confirm your observation. There is no extra information in our package-lock.json files under the node-modules folder. Further, we do not have additional package-lock.json outside of the root folder.

    We have created a low-priority issue #30 to remove the node_modules scan in the future, but it has not been prioritized based on the details in issue #27. If this truly is causing an issue we can prioritize it, but I would be interested to understand why your node_modules folder detected more dependencies.

    Thanks,
    Rich



  • Hi Rich,

    thank you, I was looking for this conversation!

    So our current problem is that ProGet determines a critical vulnerability in one of our projects:
    0b0ca653-c097-48ca-9697-ad9c95139384-image.png

    Our affected project team contacted me and told me that the project is not referencing json-schema in any version. And this dependency is also not listed in the package-lock.json of the project.
    But I noticed that our project references "minipass-sized" and "npm-normalize-package-bin" as developer dependencies. Both of them have "json-schema" as dependency in their package-lock.json files.

    And I guess I just fixed my own problem here... DevDependencies should not be part of my production output and thus not part of my node_modules folder. I guess I will have to take a look at the buildprocess of the product again.
    I will keep you updated.

    Thanks
    Caterina


  • inedo-engineer

    Hi Caterina,

    No problem! This is a good catch! Please let me know what you do to resolve this. I'm thinking the node_modules scan may be more helpful in situations like this. If that package is being released (even if by accident), it makes sense that it is reflected in the SCA project. Let me know your thoughts on that as well.

    Thanks,
    Rich



  • Hi Rich,

    I had to dive deeper into this topic to have a better understandig of it.
    So basically what we are doing is calling "npm ci" followed by "ng build --configuration production".
    The package-lock.json file contains all dependencies, production and dev dependencies. "npm ci" installs them all which means that the node_modules folder contains dev dependencies as well. But "ng build --configuration production" creates our production output which has no dev references.
    I tried to call "npm ci --omit=dev". In this case only production dependencies are installed and part of node_modules. But unfortunately, "@angular/cli" is a dev dependency which is needed to call "ng build".

    Therefore, I would say that the node_modules scan should still be removed since this folder could contain dev dependencies which are not part of the final product.
    Further, I guess we should add a filter for the dev dependencies while parsing the package-lock.json in pgscan. Right now dev dependencies are part of the generated sbom file. But packages like "@angular/cli" for example are never being shipped with our product.

    Hope it gets clear what I am trying to say.

    Thanks
    Caterina


  • inedo-engineer

    Hi @caterina,

    Thank you for that explanation. That makes a lot of sense how and what is being included. I did some other research on this topic as well and it looks like dev dependencies will vary from environment to environment whether these should be included or not in the SBOM. From my research, it sounds like there is not a definitive answer on best-practice for this. Furthermore, it looks like the CycloneDX implementation of the dependencies scan has options on what to scan:

    1. package-lock-only: Whether to only use the lock file, ignoring "node_modules".
      1. This means the output will be based only on the few details in the tree described by the "npm-shrinkwrap.json" or "package-lock.json", rather than the contents of "node_modules" directory.
      2. default: false
    2. omit: Dependency types to omit from the installation tree.
      1. can be set multiple times
      2. choices: "dev", "optional", "peer", default: "dev" if the NODE_ENV environment variable is set to "production", otherwise empty

    So as a summary, their defaults are to scan the node_modules folders but omit the dev packages when building a production package. I'm inclined to make that the default for pgscan. The pgscan library has been geared to be a lightweight alternative and when more complex scans are needed, it is suggested to use a tool like CycloneDX to generate an SBOM and upload that file to ProGet.

    What are your thoughts on those defaults for pgscan? I will also discuss this internally with the team and post back what our thoughts are.

    Thanks,
    Rich



  • Hi Rich,

    we talked about this options as well and we think that maybe a switch to exclude dev dependencies could be helpful.
    If pgscan takes the argument --exclude-dev (e.g.) node_modules folders are ignored and only dependencies with "dev: false" in the package-lock.json file are written into the sbom file.
    Otherwise all dependencies in package-lock.json are listed and node_modules folders are included.

    I can't imagine a scenario where I want "package-lock-only" without "omit:dev". Because dev dependencies would be listed in the sbom but the node_modules would not be scanned which would lead to an incomplete output I guess.

    Let me know how you think about it and what the thoughts of your team are.

    Thanks
    Caterina


  • inedo-engineer

    Hi @caterina,

    I was able to chat with the team and here was our consensus:

    • When using the auto type and scanning for NuGet and npm dependencies:
      • The default configuration should be to omit dev dependencies and scan the node_modules directory
    • When using the npm type and a package-lock.json file is specified
      • The default is to only scan the specified package-lock.json file and omit dev dependencies
    • When using the npm type and a package-lock.json file is not specified
      • The default configuration should be to omit dev dependencies and scan the node_modules directory
    • Each of these options would have an optional parameter to include the dev dependencies (--include-dev)

    The thought is that this lines up with the other SBOM scanners' defaults as well as handles any hidden dependencies in the node_modules folder. This also handles the case of scanning only package-lock.json since you can explicitly specify it.

    How does this sound to you?

    Thanks,
    Rich



  • Hi Rich,

    to get back to my "initial problem".

    If I would use pgscan with auto type I would run into the same problem. Because the dev dependencies within the package-lock.json would be ommited but the node_modules directory contains also dev dependencies and their package-lock.json files would be read as well leading to my initial problem (having dev dependencies in the sbom file). I think we are not able to distinguish between dev dependency and "real" dependency within the node_modules folder.

    Of course I could explicitly specify only to scan the package-lock.json file with the npm type but I would have to make a second pgscan call for nuget packages and would end up with two sbom files. It is a lot more comfortable to have all dependencies in one sbom file.

    Further, pgscan with auto type and pgscan with npm type would by default list different npm dependencies.

    Or did I understand something wrong?

    Thanks
    Caterina


  • inedo-engineer

    Hi @caterina,

    I see the problem now, the package-lock.json of the dev dependency contains non-dev dependencies which would cause the extra dependencies. I may have a solution for this, but I will need to run a couple of tests.

    I still think the two scans in this case would be best. When you run pgscan those two times (one for npm and one for NuGet), configure the scan to push the results of each scan to the same SCA project in ProGet. This will append the new dependencies to the project. This way, when you export the SBOM from ProGet, only one SBOM will be generated and exported including all the related dependencies (npm and NuGet).

    Thanks,
    Rich



  • Hi Rich,

    please take your time.

    We used to have two different pgscan calls for nuget and npm and we ended up with two files on ProGet:
    a01ba139-8194-4ecf-86cb-64a05a4f1fd5-image.png

    But did I get it right that if I export the sbom those two files are being merged into one? In this case we would have to think about separating those pgscan calls again.

    Thank you
    Caterina


  • inedo-engineer

    Hi @caterina,

    That is correct, those two files will be merged. The page you are looking at is just a history of each SBOM that has been uploaded to it. When you export the SBOM for that project, it generates an SBOM based on all the packages included in that project release and combines them in one file. Also if you remove a package dependency on the packages tab (like an npm dev dependency), those will not be included in the generated SBOM.

    Thanks,
    Rich


  • inedo-engineer

    Hi @caterina,

    Here is the final solution:

    • When using the auto type and scanning for NuGet and npm dependencies:
      • The default configuration should be to omit dev dependencies and scan the node_modules directory
    • When using the npm type and a package-lock.json file is specified
      • The default is to only scan the specified package-lock.json file and omit dev dependencies
    • When using the npm type and a package-lock.json file is not specified
      • The default configuration should be to omit dev dependencies and scan the node_modules directory
    • Each of these options would have an optional parameter to include the dev dependencies (--include-dev)
    • Each of these options would have an optional parameter to ignore pacakge-lock.json files found under node_modules (--package-lock-only )

    This has been implemented in pgscan 1.5.6 which I will be pushing shortly, and these options will be added to BuildMaster 2023.2.

    Thanks,
    Rich


Log in to reply
 

Inedo Website HomeSupport HomeCode of ConductForums GuideDocumentation