Welcome to the Inedo Forums! Check out the Forums Guide for help getting started.
If you are experiencing any issues with the forum software, please visit the Contact Form on our website and let us know!
Constant timeouts doing NuGet restore
-
Hello,
I am running ProGet 5.3.21 in Docker against an Azure Sql Database. During a NuGet package restore, I continually see timeouts, to the point that I'm not able to successfully complete a restore. The majority of the requests to ProGet are for packages which are not held locally, but are instead provided from nuget.org (but are requested from ProGet regardless).
I've increased the max pool size to 400, but I never see the connection count go over around 120 connections.
The database is not shared, and there is minimal load on any of the systems.
Anyone have any ideas?
Thanks
-
Unfortunately, this this behavior is to be expected with "dotnet restore"; it's massively parallel, and will issue as many parallel requests as the build agent allows. Typically, this is more than the server can handle. The end result of this is that the "network stack" gets overloaded. This is why the server is unreachable.
The reason is that, each request to a ProGet feed can potentially open another requests to each connector configured (maybe NuGet.org, maybe others), and if you are doing a lot of requests at once, you'll get a lot of network activity queuing up. SQL Server also running on the network, so those just get added to the queue, and eventually you run out of connection.
One way to solve this is by reducing network traffic (removing connectors to nuget.org, restricting the build agent if possible, etc.), but the best bet is to move to load-balancing with ProGet Enterprise - see this article to learn more: https://blog.inedo.com/proget-free-to-proget-enterprise
Another option is make sure you're not using
--no-cache
when you usedotnet restore
command. NuGet typically will create a local cache of downloaded packages which would help alleviate some of the load on the ProGet server. Passing--no-cache
will bypass that local cache and it will cause it to always pull from the server.Another thing that might help is using the
--disable-parallel
option indotnet restore
. That will prevent restoring multiple projects in parallel, which will also reduce the load on ProGet.Fortunately and unfortunately, NuGet does a lot of parallel options that can saturate a NuGet server. When you are restoring a lot of simultaneous builds and solutions with a large number of projects, it can really affect the performance of a server.
This is ultimately where load balancing will come in.
-
I'm sorry, but moving to Enterprise is an unacceptable solution. If the standard version can't handle restoring a single project in isolation, why does it exist?
Any reasonable server should be able to handle 100 simultaneous requests in under the timeout period, which is currently over 1 minute. Even if every request required a database query, which is already a suboptimal solution, even the slowest database should be able to handle 100 queries in under a minute. The database I'm using is not even close to saturated, so the problem lies elsewhere.
I have no connectors to nuget.org. nuget restore will query all available sources for all packages, so I can't prevent it from hitting proget for all packages.
--no-cache or not won't matter, because this is part of a CI build process that builds from a clean environment, so it won't have a cache.
I'm only restoring a single project, so --disable-parallel also won't matter.
-
@brett-polivka the standard version of ProGet can indeed handle a single project, and in fact many projects from many users at the same time -- it's the most performant private server on the market, by far, in fact.
But a single server/computer/container can only handle so many network connections, and what you're experiencing is the network stack being overloaded
It could be related to the virtualization or container settings, etc. It could be authentication across networks, which shouldn't happen in Azure but it might. But it's not really related to the database, or database performance at all. This is the first place you'll see stack overload, since database connections have short timeouts, compared to connectors like nuget.org.
There is an optimization in 5.3.22 (PG-1891) that might help in scenarios like yours (Linux connected to slow-ish database), so I would try to upgrade.
Otherwise investigate why the network is being overloaded.
-
I just upgraded. Made no difference.
My CI builder and Proget are running in the same kubernetes cluster in Azure, so there's no chance that network saturation is an issue.
The database is also in Azure and isn't near capacity, and there's no way that Microsoft's network between my cluster and the DB is being saturated by 100 queries.
As I said before, 100 simultaneous queries should be easy for any server to handle. That's nowhere near enough to saturate a network stack.
Is there any way to turn on more detailed debugging logs to see what the server is actually doing?
-
If you've upgraded, then the problem is most certainly in your Azure/cluster configuration. I don't know, maybe it's some default configuration setting or something in your cluster that's causing strange network/routing problems... a total guess, because I don't know how to advice to kubernets cluster diagnostic. But that's where to look.
As you said, 100 simultaneous queries are really easy to handle -- the micro-servers (like 512MB RAM, including SQL Server) we use in our testing labs can do that no problem.
I'm afraid there's nothing in ProGet that can really help diagnose this... it would just tell you what you already know -- there are a lot of incoming/waiting connections and the database connections are timing out. All of that's happening at the operating system level, not in the application level of our codes.
-
As a test I reverted to the 5.3.19 Mono version, and it is extremely fast and completes the entire restore within seconds.
I then flipped back to the 5.3.19 .NET Core version, and it is so slow that the restore will not complete.
Nothing else changed between these two runs other than Mono vs. .NET Core.
-
Thanks for the update @brett-polivka
The ProGet code (application layer) is effectively identical between those images; the main differences is are the underlying container image and .NET5 vs Mono. We had some initial bugs with .NET5, but is the first we've heard of Mono vs DotNetCore problems of this nature out of a lot of users.
Unfortunately, we really don't have any means to diagnose this further at this point. We're not even sure what code we could put in place to do network / operating-system level debugging. And worse, we can't reproduce this in our containers, even on Azure host.
We're open to ideas here, but I think your best bet would be go to as basic of an installation as possible; just a simple container with a simple database, and then add the pieces back until you figure out which thing is causing problems.
-
While I wouldn't think it would be enough to cause the type of performance issues I'm seeing, I've noticed that the DotNetCore version logs significantly more data to the console than the mono version.
Is there a way to configure the console log level?
-
More info...
I was able to attach a debugger to the DotNetCore version, and there are approximately 140 threads, all waiting on the same ManualResetEvent. Here's the stack trace:
System.Private.CoreLib.dll!System.Threading.ManualResetEventSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) C#
System.Private.CoreLib.dll!System.Threading.Tasks.Task.SpinThenBlockingWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) C#
System.Private.CoreLib.dll!System.Threading.Tasks.Task.InternalRunSynchronously(System.Threading.Tasks.TaskScheduler scheduler, bool waitForCompletion) C#
System.Linq.Parallel.dll!System.Linq.Parallel.SpoolingTask.SpoolStopAndGo<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo, int>(System.Linq.Parallel.QueryTaskGroupState groupState, System.Linq.Parallel.PartitionedStream<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo, int> partitions, System.Linq.Parallel.SynchronousChannel<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>[] channels, System.Threading.Tasks.TaskScheduler taskScheduler) C#
System.Linq.Parallel.dll!System.Linq.Parallel.DefaultMergeHelper<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo, int>.System.Linq.Parallel.IMergeHelper<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.Execute() C#
System.Linq.Parallel.dll!System.Linq.Parallel.MergeExecutor<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.Execute() C#
System.Linq.Parallel.dll!System.Linq.Parallel.MergeExecutor<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.Execute<int>(System.Linq.Parallel.PartitionedStream<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo, int> partitions, bool ignoreOutput, System.Linq.ParallelMergeOptions options, System.Threading.Tasks.TaskScheduler taskScheduler, bool isOrdered, System.Linq.Parallel.CancellationState cancellationState, int queryId) C#
System.Linq.Parallel.dll!System.Linq.Parallel.PartitionedStreamMerger<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.Receive<int>(System.Linq.Parallel.PartitionedStream<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo, int> partitionedStream) C#
System.Linq.Parallel.dll!System.Linq.Parallel.FirstQueryOperator<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.WrapHelper<int>(System.Linq.Parallel.PartitionedStream<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo, int> inputStream, System.Linq.Parallel.IPartitionedStreamRecipient<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo> recipient, System.Linq.Parallel.QuerySettings settings) C#
System.Linq.Parallel.dll!System.Linq.Parallel.FirstQueryOperator<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.WrapPartitionedStream<int>(System.Linq.Parallel.PartitionedStream<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo, int> inputStream, System.Linq.Parallel.IPartitionedStreamRecipient<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo> recipient, bool preferStriping, System.Linq.Parallel.QuerySettings settings) C#
System.Linq.Parallel.dll!System.Linq.Parallel.UnaryQueryOperator<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo, Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.UnaryQueryOperatorResults.ChildResultsRecipient.Receive<int>(System.Linq.Parallel.PartitionedStream<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo, int> inputStream) C#
System.Linq.Parallel.dll!System.Linq.Parallel.ListQueryResults<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.GivePartitionedStream(System.Linq.Parallel.IPartitionedStreamRecipient<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo> recipient) C#
System.Linq.Parallel.dll!System.Linq.Parallel.UnaryQueryOperator<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo, Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.UnaryQueryOperatorResults.GivePartitionedStream(System.Linq.Parallel.IPartitionedStreamRecipient<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo> recipient) C#
System.Linq.Parallel.dll!System.Linq.Parallel.QueryOperator<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.GetOpenedEnumerator(System.Linq.ParallelMergeOptions? mergeOptions, bool suppressOrder, bool forEffect, System.Linq.Parallel.QuerySettings querySettings) C#
System.Linq.Parallel.dll!System.Linq.Parallel.QueryOpeningEnumerator<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.OpenQuery() C#
System.Linq.Parallel.dll!System.Linq.Parallel.QueryOpeningEnumerator<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>.MoveNext() C#
System.Linq.Parallel.dll!System.Linq.ParallelEnumerable.GetOneWithPossibleDefault<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>(System.Linq.Parallel.QueryOperator<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo> queryOp, bool throwIfTwo, bool defaultIfEmpty) C#
System.Linq.Parallel.dll!System.Linq.ParallelEnumerable.FirstOrDefault<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo>(System.Linq.ParallelQuery<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo> source, System.Func<Inedo.Web.PageFree.PageFreeHandler.RegexHandlerInfo, bool> predicate) C#
InedoLib.dll!Inedo.Web.PageFree.PageFreeHandler.GetHttpHandler(Inedo.Web.Handlers.GetHandlerParams args) C#
InedoLib.dll!Inedo.Web.Handlers.DynamicHttpHandling.GetHandler(System.Web.HttpContext context, string requestType, string url, string pathTranslated) C#
InedoLib.dll!Inedo.Web.AhWebMiddleware.InvokeAsync(Microsoft.AspNetCore.Http.HttpContext context) C#
Microsoft.AspNetCore.HostFiltering.dll!Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware.Invoke(Microsoft.AspNetCore.Http.HttpContext context) C#
Microsoft.AspNetCore.Hosting.dll!Microsoft.AspNetCore.Hosting.HostingApplication.ProcessRequestAsync(Microsoft.AspNetCore.Hosting.HostingApplication.Context context) C#
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests<Microsoft.AspNetCore.Hosting.HostingApplication.Context>(Microsoft.AspNetCore.Hosting.Server.IHttpApplication<Microsoft.AspNetCore.Hosting.HostingApplication.Context> application) C#
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync<Microsoft.AspNetCore.Hosting.HostingApplication.Context>(Microsoft.AspNetCore.Hosting.Server.IHttpApplication<Microsoft.AspNetCore.Hosting.HostingApplication.Context> application) C#
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync<Microsoft.AspNetCore.Hosting.HostingApplication.Context>(Microsoft.AspNetCore.Hosting.Server.IHttpApplication<Microsoft.AspNetCore.Hosting.HostingApplication.Context> httpApplication) C#
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnectionMiddleware<Microsoft.AspNetCore.Hosting.HostingApplication.Context>.OnConnectionAsync(Microsoft.AspNetCore.Connections.ConnectionContext connectionContext) C#
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.TransportManager.BindAsync.AnonymousMethod__0(Microsoft.AspNetCore.Connections.ConnectionContext c) C#
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection<Microsoft.AspNetCore.Connections.ConnectionContext>.ExecuteAsync() C#
Microsoft.AspNetCore.Server.Kestrel.Core.dll!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection<System.__Canon>.System.Threading.IThreadPoolWorkItem.Execute() C#
System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() C#
[Native to Managed Transition]
-
@brett-polivka fantastic, thanks for the additional research. This will make it so much easier to debug from here.
Do most of the stack traces look like this? I.e. coming from
Inedo.Web.PageFree.PageFreeHandler.GetHttpHandler
?If so, the issue is happening in a totally unexpected place (basically routing a URL). This is just using a relatively simple query with Linq's
ParallelEnumerable
library methods. Under-the-hood (as the stack trace shows), that seems to leverage operating-system level mutex support... which is likely why it's problematic only on your specific hardware/environment.We saw a ton of Mono-related bugs early-on with this (some even would
SEGFAULT
on certain hosts), largely because we tend to find the "corner-cases" with optimizations, like working-aroundActivator.CreateInstance
slowness using usingSystem.Reflection.Emit
directly.Anyways, I'm going to have to involve our engineering team for ideas, because at this point we have to guess how to work-around this.
-
Glad to help.
Yes, almost every thread shows that exact same stack trace.
If there's any additional testing or information I can provide, please let me know.
-
One more piece of of info... I was able run the same test on Windows (I extracted the apps from the Docker image and copied them to a Windows machine), and it exhibits exactly the same behavior.
-
If I can venture a guess as to what's happening, I think that the ThreadPool is being completely saturated.
Kestrel will use the ThreadPool to service a request. That request is then using a parallel query, which will also use the ThreadPool. At some point, every thread in the pool is blocked, waiting for a task to complete, but the task can't complete because it needs to be scheduled onto the ThreadPool, but every thread in the pool is blocked.
I'd be curious to see what happens if the ParallelEnumerable query is replaced with a standard Enumerable query.
-
@brett-polivka thanks again for all the help and insights!
I talked this over with the engineering team, and we just decided to remove the
ParallelEnumerable
query.Would you be able to try a pre-release container?
inedo/proget:5.3.23-ci.4
-
I'm still seeing the same problem and the stack traces. Can you confirm that the 5.3.23-ci.4 release is the right version, because it still contains the parallel query.
-
Hi @brett-polivka,
It looks like there was a mix-up in one of our branches. I just pushed an updated image with the ParallelEnumerable removed. Can you please give the
proget.inedo.com/productimages/inedo/proget:5.3.23-ci.5
image a try?Thanks,
Rich
-
Hi Rich,
This seems to have done the trick! I've been running constant restores against it for an hour with no timeouts.
Thanks!!
-
Thanks to @atripp as well. Appreciate your patience dealing with this (and me).
-
Whoops, my bad!
@brett-polivka thanks so much for the help, this was a baffling .NET5 issue, but so glad we could identify and fix it. This will definitely go in the maintence release as well, of course.
-
@atripp This is probably related with the same issue i kept mentioning which you were saying it was possibly due to an overloaded network stack or other type of configurations. I still have the problem even after setting up a completely new server so it also shows that any timeout is related with proget application code. Looking forward to try out this fix. This apparently was happening to other users as well. :)
-
@nuno-guerreiro-rosa_9280 very glad we could fix it! It's definitely a .NET5 bug, and in a totally unexpected place
-
I also confirm I didn't get any more timeouts on the latest version. Very good!