Welcome to the Inedo Forums! Check out the Forums Guide for help getting started.
If you are experiencing any issues with the forum software, please visit the Contact Form on our website and let us know!
Connection reset while downloading npm packages
-
Hi,
So this is a strange error we are having, and it may not be related to proget itself. However, maybe you can help us narrow it down further. Here is the context.
We have a set of compilers a while ago that started getting errors while downloading NPM packages, specifically when downloading Typescript. The package will be downloaded by the compiler along with all other dependencies, but it will at some point consistently get an error like this one :
npm WARN tarball tarball data for typescript@3.7.4 (sha1-F0Ol7F/vah+p8+RwjjPIHHOHbBk=) seems to be corrupted. Trying one more time.
However, downloading the dependency by doing a manual install (using "npm install typescript@3.7.4") does not give the mentioned error. It succeeds, and subsequent builds will succeed because the freshly downloaded package is in the cache. But once the package gets cleared from the npm cache, either forcefully or for some other reason, the error comes back again.
We also have to point out that developers do not have this problem: forcing a cache clear, the project does build without error.
So this is an environment-specific problem we have. I have analyzed the network traffic between one compiler and the proget server. I see a few things to point out :
- We get the error during the package download, not at the beginning
- At the moment we get the error, we receive a RST ACK packet, terminating the connection
- There is a lot of TCP keep alive packet sent from our compiler to the server near the end of the process. npm is downloading many packages in parallel it seems, and I guess some connections are kept alive for quite some time.
We are currently using Proget version 5.3.17.
My question is, what is the behaviour of proget regarding the download of npm packages on a possibly slow/unreliable network, or with many concurrent connections? Is there a known issue regarding the download of npm packages?
-
@mathieu-belanger_6065 thanks for all of the diagnostic and additional information. I think you're right, it's environment / network specific, and not related to ProGet.
I would check the ProGet Diagnostic Center, under Admin as well.
Otherwise, proGet doesn't operate at the TCP-level, but uses ASP.NET's network stack. There's really nothing special about how NPM packages are handled, compared with other packages, and we haven't heard of any other issues regarding this.
For reference, here's code on how a package file is transmitted. Note that, if you're using connectors and the package isn't cached on ProGet, then each connector must be queried. This can yield quite a lot of network traffic.
if (metadata.IsLocal) { using (var stream = await feed.OpenPackageAsync(packageName, metadata.Version, OpenPackageOptions.DoNotUseConnectors)) { await context.Response.TransmitStreamAsync(stream, "package.tgz", MediaTypeNames.Application.Octet); } } else { var nameText = packageName.ToString(); var validConnectors = feed .Connectors .Where(c => c.IsPackageIncluded(nameText)); foreach (var connector in validConnectors) { var remoteMetadata = await connector.GetRemotePackageMetadataAsync(packageName.Scope, packageName.Name, metadata.Version.ToString()); if (remoteMetadata != null) { var tarballUrl = GetTarballUrl(remoteMetadata); if (!string.IsNullOrEmpty(tarballUrl)) { var request = await connector.CreateWebRequestInternalAsync(tarballUrl); request.AutomaticDecompression = DecompressionMethods.None; using (var response = (HttpWebResponse)await request.GetResponseAsync()) using (var responseStream = response.GetResponseStream()) { context.Response.BufferOutput = false; context.Response.ContentType = MediaTypeNames.Application.Octet; context.Response.AppendHeader("Content-Length", response.ContentLength.ToString()); if (feed.CacheConnectors) { using (var tempStream = TemporaryStream.Create(response.ContentLength)) { await responseStream.CopyToAsync(tempStream); tempStream.Position = 0; try { await feed.CachePackageAsync(tempStream); } catch { } tempStream.Position = 0; await tempStream.CopyToAsync(context.Response.OutputStream); } } else { await responseStream.CopyToAsync(context.Response.OutputStream); } return true; } } } } }
-
Thanks @atripp,
I am curious, would there be an impact on performance when "piping" connectors together? For example, internal feed A has a connector to internal feed B, which has a connector to internal feed C, which has a connector to npmjs.org?
Reading the code here makes me think that each connector layer would allocate some resources for what whould be the same package every step of the way, am I wrong?
-
@mathieu-belanger_6065 said in Connection reset while downloading npm packages:
I am curious, would there be an impact on performance when "piping" connectors together? For example, internal feed A has a connector to internal feed B, which has a connector to internal feed C, which has a connector to npmjs.org?
Connectors are accessed over HTTP. So assuming you have a "chain" like A --> B --> C --> npm.js, (i.e. different 3 feeds and 3 different connectors), each request may yield 3 additional requests.
So when your browser asks feed
A
for packagetypescript@3.7.4
, then following will happen.- If the package is cached or local, the file is streamed to the browser
- Each connector (just
B
, in this case) is queried over HTTP fortypescript@3.7.4
- The first connector that returns a response, the response body is streamed to the browser
Each connector follows the same logic. When ProGet (via a request to feed
A
) asks feedB
for that package, the same logic is followed:- If the package is cached or local, the file is streamed to the browser
- Each connector (just
C
, in this case) is queried over HTTP fortypescript@3.7.4
- The first connector that returns a response, the response body is streamed to the browser
Continuing the pipe, when ProGet (via a request to feed
B
via a request to feedA
) asks feedC
for that package, the same logic is followed:- If the package is cached or local, the file is streamed to the browser
- Each connector (just
nuget.org
, in this case) is queried over HTTP fortypescript@3.7.4
- The first connector that returns a response, the response body is streamed to the browser
This is why caching is important, but also why chaining may not be a good solution for high-trafficked npm developer libraries like
typescript
. Thenpm
client basically does a DoS by requesting hundreds of packages at once. Same is true withnuget.exe
as well.