Welcome to the Inedo Forums! Check out the Forums Guide for help getting started.
If you are experiencing any issues with the forum software, please visit the Contact Form on our website and let us know!
Proget 2024.3 SQL Timeouts
-
We occasionally experience SQL timeouts with ProGet, which is set up in HA (3 nodes). These timeouts occur randomly, sometimes under heavy load, but also during lighter loads. Surprisingly, the CPU usage of the SQL server remains low, so it doesn't seem to be struggling. What is worrying is that when we profiled our SQL instance, we observed that ProGet is sending thousands of 'Attention' events per minute. Is this expected?
We experienced the same problem with version 2023.31 and recently updated to 2024.3, hoping it would solve the issue, but unfortunately, it still occurs.
Example Npm exception:
Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (258): The wait operation timed out. at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, SqlCommand command, Boolean callerHasConnectionLock, Boolean asyncClose) at Microsoft.Data.SqlClient.TdsParserStateObject.ThrowExceptionAndWarning(Boolean callerHasConnectionLock, Boolean asyncClose) at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync() at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket() at Microsoft.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer() at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadByteArray(Span`1 buff, Int32 len, Int32& totalRead) at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadPlpBytes(Byte[]& buff, Int32 offset, Int32 len, Int32& totalBytesRead) at Microsoft.Data.SqlClient.SqlDataReader.TryGetBytesInternal(Int32 i, Int64 dataIndex, Byte[] buffer, Int32 bufferIndex, Int32 length, Int64& remaining) at Microsoft.Data.SqlClient.SqlDataReader.GetBytes(Int32 i, Int64 dataIndex, Byte[] buffer, Int32 bufferIndex, Int32 length) at Inedo.Data.StrongDataReader.ReadBytes(DbDataReader reader, Int32 ordinal, String propertyName, BufferData bufferData) at lambda_method55(Closure, DbDataReader, BufferData) at Inedo.Data.StrongDataReader.<Read>g__readRow|11_0[TRow](<>c__DisplayClass11_0`1&) at Inedo.Data.StrongDataReader.Read[TRow](IDbDataResult dbResult)+MoveNext() at Inedo.Data.StrongDataReader.Read[TRow](Func`1 getReader, Boolean disposeReader)+MoveNext() at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection) at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source) at Inedo.ProGet.WebApplication.FeedEndpoints.Npm.NpmPackageMetadataHandler.TryProcessRequestAsync(AhHttpContext context, WebApiContext apiContext, NpmFeed feed, String relativeUrl) at Inedo.ProGet.WebApplication.FeedEndpoints.Npm.NpmHandler.ProcessRequestAsync(AhHttpContext context, WebApiContext apiContext, NpmFeed feed, String relativeUrl) at Inedo.ProGet.WebApplication.FeedEndpoints.FeedEndpointHandler.FeedRequestHandler.ProcessRequestAsync(AhHttpContext context) ClientConnectionId:8e300222-5b4c-40cb-8100-1334cf7d17db Error Number:-2,State:0,Class:11
Example Nuget exception:
Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (258): The wait operation timed out. at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, SqlCommand command, Boolean callerHasConnectionLock, Boolean asyncClose) at Microsoft.Data.SqlClient.TdsParserStateObject.ThrowExceptionAndWarning(Boolean callerHasConnectionLock, Boolean asyncClose) at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync() at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket() at Microsoft.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer() at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadByteArray(Span`1 buff, Int32 len, Int32& totalRead) at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadPlpBytes(Byte[]& buff, Int32 offset, Int32 len, Int32& totalBytesRead) at Microsoft.Data.SqlClient.SqlDataReader.TryGetBytesInternal(Int32 i, Int64 dataIndex, Byte[] buffer, Int32 bufferIndex, Int32 length, Int64& remaining) at Microsoft.Data.SqlClient.SqlDataReader.GetBytes(Int32 i, Int64 dataIndex, Byte[] buffer, Int32 bufferIndex, Int32 length) at Inedo.Data.StrongDataReader.ReadBytes(DbDataReader reader, Int32 ordinal, String propertyName, BufferData bufferData) at lambda_method39(Closure, DbDataReader, BufferData) at Inedo.Data.StrongDataReader.ReadAllAsync[TRow](IDbDataResult dbResult) at Inedo.Data.DatabaseContext.ExecuteTableAsync[TRow](String storedProcName, GenericDbParameter[] parameters) at Inedo.Data.DatabaseContext.ExecuteTableAsync[TRow](String storedProcName, GenericDbParameter[] parameters) at Inedo.ProGet.Feeds.NuGet.NuGetFeed.GetPackageV3Async(String packageId, PackageVersion`1 nuGetVersion) at Inedo.ProGet.WebApplication.FeedEndpoints.NuGet.V3.PackageMetadataHandler.ProcessRequestAsync(NuGetFeed feed, AhHttpContext context, WebApiContext apiContext, String relativeUrl) at Inedo.ProGet.WebApplication.FeedEndpoints.NuGet.NuGetFeedHandler.ProcessRequestAsync(AhHttpContext context, WebApiContext apiContext, NuGetFeed feed, String relativeUrl) at Inedo.ProGet.WebApplication.FeedEndpoints.FeedEndpointHandler.FeedRequestHandler.ProcessRequestAsync(AhHttpContext context) ClientConnectionId:bf5213a4-e81f-4eb1-80ff-5cd43685e3f0 Error Number:-2,State:0,Class:11
Regards,
Pawel
-
Unfortunately this is going to be tricky to troubleshoot, but in general a timeout will occur because SQL Server is taking too long to respond to a query. This typically related to query performance, but it can also be caused by network issues.
Looking at the stack traces you shared, these are very basic, highly performative queries that would instantly return a single row. There is nothing that should be slowing down those queries, which means could probably rule out query performance.
The fact that the timeouts occur randomly, during heavy and lighter load is also an indicator that it's probably not performance related, and a sign of network issues.
And finally, the fact that you profiled and didn't see any problems is a good sign this is not performance related. That's my current thinking at least.
Otherwise, I'm not familiar with "Attention" events. That must be something done internally by the driver? The SQL Server docs says this:
The Attention event class indicates that an attention event, such as cancel, client-interrupt requests, or broken client connections, has occurred. Cancel operations can also be seen as part of implementing data access driver time-outs.
I'm not sure what that means to be honest, but I found this article that might be of use:
https://www.red-gate.com/simple-talk/blogs/identifying-client-timeouts/I think this would be worth getting Microsoft's SQL Server team involved, because we really don't know where to go form here, and they would have more experience troubleshooting this (especially if it's network-related) than we do.
Thanks,
Alana
-
Hi!
I just wanted to say, we have the same issue, running anything from 2023.31 to 2024.4, in docker on Azure Container Apps (Kubernetes) And we run 4 instances with 3 cores, 6gb memmory as minimum and scaling to 8 as max.We are running Azure SQL Database with PE in the same vNet as Container Apps Env. so it is not the connection between the Container and SQL server.
I have a case open with this issue, but I was told only one other have had this issue and it was fixed with downgrade, but now we are three that has the same issue.
Hopefully a solution can be found.
-
Hi @mortenf_3736 ,
I also mentioned this in your ticket, but the issue you're experiencing is a bit different. In Pawal's case, it's a different error message (both related to the database) and it was happening in ProGet 2023 and ProGet 2024 (yours happened only after upgrade). In addition, his error was happening randomly (high/low traffic), whereas yours seems to be high traffic.
You're also running on ACA and use auto-scaling, and seem to have a very high occurrence of container stop/starts. Anyway, we will continue to troubleshoot your issue in that ticket.
Thanks,
Alana
-
Just wanted to give a brief update to the issue from @mortenf_3736 that we discussed via the support ticket.
We were never able to get to the bottom of it, but it was entirely related to running on ACA.
Container Setup VM Setup Azure Container Apps - Running 4 minimum replicas, max 8 replicas, each 3 cores and 6gb memory Two D4 (4 cores, 16GB memory) VM with internal load balancer in front between them Mapped storage from Azure Storage Account v2 Fileshare Managed Premium Shared Disk, connected to Fail-over Cluster and shared between both VM Azure SQL Database with 4 cores dedicated Azure SQL Database Server less, scaling between 0.5 to 4 cores This doesn't seem to be related to Linux/Docker/Kubernetes, as we have several high-traffic users on Kubernetes clusters without issues like this. However, we have seen a handful of Azure-related problems over the years that manifested in ProGet:
- a bad hard drive that was constantly corrupting packages
- another had really slow disk I/O (one server out of a handful)
- we've seen a buggy storage driver cause some big impacts across some kind of storage configuration
So we believe the issue is the Azure platform itself, similar to the above hardware/software glitches we've uncovered over the past.
We're doing our best to research/identifying issues, and Inedo/ProGet Users aren't the only ones who are experiencing pain like this. Consider this report from a Azure "big data" user:
I have suffered from chronic socket exceptions in multiple Azure platforms - just as everyone else is describing. The main pattern I've noticed is that they happen within Microsoft's proprietary VNET components (private endpoints). They are particularly severe in multi-tenant environments where several customers can be hosted on the same hardware (or even in containers within the same VM.
The problems are related to bugs in Microsofts software-defined-networking components (SDN proxies like "private endpoints" or "managed private endpoints"). I will typically experience these SDN bugs in a large "wave" that impacts me for an hour and then goes away. The problems have been trending for the worse over the past year, and I've opened many tickets (Power BI, ADF, and Synapse Spark).Other Azure users (who are much more technical than we are) have confirmed that there are indeed severe issues with their SDN infrastructure. Microsoft does appear to be aware of these endemic issues with their platform, and for the time being we simply cannot recommend using Azure's container services for anything that will have any kind of load.
Hope that gives some insight in case anyone stumbles across this thread.
Alana