Welcome to the Inedo Forums! Check out the Forums Guide for help getting started.

If you are experiencing any issues with the forum software, please visit the Contact Form on our website and let us know!

Even after .NET Core upgrade a single client doing a .NET restore causes timeouts too easily



  • If for some reason the client has no packages locally (ie: Within a docker build where there are no local packages) it will query proget in order to obtain them. Rarely I am able to do a full dotnet restore without seeing a timeout error in Proget. The error seems to be coming from timeouts querying the DB, so even with enterprise version and load balancing this problem would probably be worse.

    The server where it is hosted is more than enough to handle the load of a single client. SSD, 4 cores, and 32 GB RAM and with enterprise SQL Server too which has very few connections open when these errors occur.

    I see many errors like the one below: (this package doesn't exist in proget, however since it is in NuGet.Config will still query ProGet.

    Retrying 'FindPackagesByIdAsyncCore' for source 'https://proget.xxx.io/nuget/xxx-nuget/FindPackagesById()?id='System.Diagnostics.Process'&semVerLevel=2.0.0'.
    Response status code does not indicate success: 504 (Gateway Timeout).

    In proget web I see many errors like:

    An error occurred processing a GET request to http://proget.xxx.io/nuget/xxx-nuget/v3/flatcontainer/system.text.json/index.json: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

    stack trace:
    System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
    ---> System.ComponentModel.Win32Exception (258): Unknown error 258
    at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult) at System.Data.SqlClient.SqlCommand.EndExecuteNonQuery(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar, Func2 endFunction, Action1 endAction, Task`1 promise, Boolean requiresSynchronization)
    --- End of stack trace from previous location where exception was thrown ---
    at Inedo.Data.SqlServerDatabaseContext.CreateConnectionAsync()
    at Inedo.Data.DatabaseContext.ExecuteInternalAsync(String storedProcName, GenericDbParameter[] parameters)
    at Inedo.Data.SqlServerDatabaseContext.ExecuteInternalAsync(String storedProcName, GenericDbParameter[] parameters)
    at Inedo.Data.DatabaseContext.ExecuteTableAsync[TRow](String storedProcName, GenericDbParameter[] parameters)
    at Inedo.ProGet.Feeds.NuGet.V2.LocalPackageSource.FindPackagesByIdAsync(NuGetQueryOptions options, String id) in C:\InedoAgent\BuildMasterTemp\192.168.44.60\Temp_E75101\Src\ProGetCoreEx\Feeds\NuGet\V2\LocalPackageSource.cs:line 161
    at Inedo.ProGet.Feeds.NuGet.NuGetFeed.FindPackagesByIdAsync(NuGetQueryOptions options, String id) in C:\InedoAgent\BuildMasterTemp\192.168.44.60\Temp_E75101\Src\ProGetCoreEx\Feeds\NuGet\NuGetFeed.cs:line 157
    at Inedo.ProGet.WebApplication.FeedEndpoints.NuGet.V3.PackageBaseAddressHandler.ProcessRequestAsync(NuGetFeed feed, HttpContext context, String relativeUrl) in C:\InedoAgent\BuildMasterTemp\192.168.44.60\Temp_E75101\Src\ProGet.WebApplication\FeedEndpoints\NuGet\V3\PackageBaseAddressHandler.cs:line 39
    at Inedo.ProGet.WebApplication.FeedEndpoints.FeedEndpointHandler.FeedRequestHandler.ProcessRequestAsync(HttpContext context) in C:\InedoAgent\BuildMasterTemp\192.168.44.60\Temp_E75101\Src\ProGet.WebApplication\FeedEndpoints\FeedEndpointHandler.cs:line 192
    ClientConnectionId:631e8659-835c-43aa-8815-79f9c5633cf5
    Error Number:-2,State:0,Class:11

    Some other errors in the container logs:
    An error occurred processing a GET request to http://proget.xxx.io/nuget/xxx-nuget/v3/flatcontainer/system.memory/index.json: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

    An error occurred processing a GET request to http://proget.xxx.io/nuget/xxx-nuget/v3/flatcontainer/microsoft.extensions.primitives/index.json: ExecuteNonQuery requires an open and available Connection. The connection's current state: Broken.

    Can you please investigate what can we do about it? I would rather it takes longer to restore than all these timeout errors.

    Note that this is with proget core. Also maybe there are more efficient ways to identify if a package does exist or not before querying the DB. Cache options?

    PS: While this situation is occurring, the proget web becomes unresponsive and it stays loading forever until the requests are completed.

    Thank you


  • inedo-engineer

    Hi @nuno-guerreiro-rosa_9280,

    Do you have any other applications connecting to your SQL enterprise instance? Do you see any timeouts in those applications? Would you be able to run a docker stats when these timeouts are occuring? Could you also up your timeout in ProGet's SQL Server connection string?

    Also, what version of SQL server are you running?

    Thanks,
    Rich



  • @rhessinger said in Even after .NET Core upgrade a single client doing a .NET restore causes timeouts too easily:

    Hi @nuno-guerreiro-rosa_9280,
    Do you have any other applications connecting to your SQL enterprise instance? Do you see any timeouts in those applications? Would you be able to run a docker stats when these timeouts are occuring? Could you also up your timeout in ProGet's SQL Server connection string?
    Also, what version of SQL server are you running?
    Thanks,
    Rich

    I do not have any other applications connected to the SQL instance. Only Proget.
    Using "mcr.microsoft.com/mssql/server:2019-latest" version.

    As you can see below there is no load at all on the server.

    4233a975-74eb-43c3-867a-8b306cfa7e3a-image.png

    If I look at SQL Server connections there is almost no load in regards to the number of connections.
    Increasing the connection timeout made no actual difference. I have also profiled the SQL Server and all queries are relatively fast so there are no slow performance queries either.

    b6f2e706-4461-43a7-94ac-471c10c79743-image.png

    This clearly only happens when it's a full dotnet restore due to several parallel requests. If I throttle to 1 request there are no issues in Proget, however, it doesn't make sense that with a single user on a single dotnet restore this timeouts start occurring and the UI becomes also unresponsive.

    ca4ce030-d4e6-462f-b42b-ca988f9cc08a-image.png


  • inedo-engineer

    Hello; this is definitely quite strange.

    We haven't had any problems in our test labs, using significantly less-powerful hosts and significantly more traffic. Other users aren't reporting this problem on any platform, so I'm inclined to say it's configuration-related, but what configuration?

    How is ProGet configured? If it's just a single feed and a single connector to that feed, then it's not your Proget configuration.

    In any case, I'm certain it's not the database itself, but it's related to the network stack related. SQL Server uses network connections, and the "connection timeout" happens when the network stack gets overwhelmed. This can happen when a TON of connections are open, but not closed.

    One thing we've seen is that certain network-based reporting tools (monitoring/logging) end up causing problems. They try to send a error over the network that the stack is overloaded, which then gets queued up, and continues to overload the stack. Eventually it calms down.

    We've also seen bad hardware cause this. One time, it was even a bad wireless access point in. No idea how that happened, but something to do with routing and packets.

    So perhaps try a new server, like make one at LightSail , totally fresh. If you can find a way to reproduce it, then it'll be good, because we can at least investigate it further then.



  • @atripp Its only a single connector with one feed. Try to do a single restore which pulls 1GB of NuGet packages but from ProGet it pulls only 5mb however NuGet will still hit ProGet for all those requests and 90 % of responses will be to reply that the package was not found. When I have some free time I will create a complete different setup on another Linux server to verify if its something with the traefik proxy. I would like that you attempt this type of restore in your test server and clear all nuget cache and tell me your results. Restore time in sec / try to load Proget UI during the restore, etc.


  • inedo-engineer

    As I mentioned before, we haven't seen these sorts of issues, but NuGet is a very "chatty" protocol, so a lot of requests are to be expected.

    If you can provide some kind of guidance on how to reproduce things, we can certainly consider trying to reproduce it --- but right off the bat, unless you made a mistypo about "1GB of NuGet packages" in a restore that's a kind of red-flag to me.

    The "chatty" protocol was never designed for that sort of traffic (tens of megabytes in a restore, maybe), so you'll need to do some networking tweaks (QoS?) to make it so the network stack doesn't get overloaded (which is what sounds like is happening).



  • @atripp I did not mean 1GB of packages from ProGet. It was just an example, I get them from public NuGet without a problem since they can handle the load. On docker build there is no package cache, only docker layer cache, that means everytime a csproj is modified a full .net restore will be performed and I might get around 100+ packages downloaded. From 100+ packages around 5 are in fact in ProGet. 100 requests ( I think NuGet performs 16 concurrent requests at a time) is in fact very low to cause any network stack overload. All I was asking was to test 100+ packages being restored (NuGet.Config with a public NuGet feed and a ProGet authenticated feed) and only 5-10 of those packages should be in ProGet. 90+ requests will return no package found or something similar. Maybe not found requests are the ones that might cause the problem ?
    If you have tested similar examples and have no issues then I might have some problem with my configuration. I will setup a new server this time with NGINX. Is it possible to use PostgreSQL with ProGet instead of SQL server or not supported anymore ? Thank you for your help.


  • inedo-engineer

    @nuno-guerreiro-rosa_9280 we have definitely tested similar configurations, and of course our customers have such usage all the time; there haven't been problems like this, and moving to SQL Server has significantly improved performance across the user base (I'm afraid Postgres is not supported)

    Do note when you have connectors configured in ProGet, then almost each request to ProGet will often yield other network requests to those connectors. When NuGet builds a dependency tree with 100+ packages, it makes a tremendous amount of requests, often asking "what's the latest version of this package", and the like.

    But anyways, it still should be okay. At this point, I'd recommend you to just try setting up a basic virtual machine at like, AWS LightSail or something, and see what you can reproduce.


Log in to reply
 

Inedo Website HomeSupport HomeCode of ConductForums GuideDocumentation