Online Forums
Technical support is provided through Support Forums below. Anybody can view them; you need to Register/Login to our site (see links in upper right corner) in order to Post questions. You do not have to be a licensed user of our product.
Please read Rules for forum posts before reporting your issue or asking a question. OPC Labs team is actively monitoring the forums, and replies as soon as possible. Various technical information can also be found in our Knowledge Base. For your convenience, we have also assembled a Frequently Asked Questions page.
Do not use the Contact page for technical issues.
Threading/Periodic Worker Processes
and thank you for all the details.
For clarification from my side, the two PeriodicWorker.Process threads that you are seeing look precisely as expected.
I am somewhat surprised by the threads near the bottom of your picture (Managed Ids 15, 16, 17), as they seem to have something to do Visual Studio design-time mode - I have not seen them before. But I suppose you know what you are doing/what is happening there.
I think it would definitely be worth testing .NET Framework 4.8.4110.0 or 4.8.3928.0 (those from the good machines) on the bad machine.
I also suggest to make absolutely sure that the problem is not actually related to the instrumentation (which is in part put in place to troubleshoot it, but probably it was there before as well). I have seen a situation where a lock somewhere inside the .NET tracing infrastructure caused mysterious problems. Note that we have some tracing calls - although the traces normally do not normally end up anywhere - in our code as well - so there can be contention. Try to remove all existing external-facing instrumentation except for some simple test of the occurrence of the problem, and check if the problem persists.
The ideas above are just that - ideas, wild guesses. But I think they are quote important to test, before going any further. Plus, it is not clear to me how to proceed afterwards. If I had a reproducible scenario here, I could perhaps turn on/off parts of the code that gets executed as a consequence of "new EasyUAClient()", and search for the part that triggers the issue. That would be very tedious, but in principle doable. But I cannot imagine doing that remotely. In addition, it is possible that the outcome of such troubleshooting process would be that the problem is triggered by some to a lower-level library - a call that we need to make, but cannot influence what is happening inside.
Unfortunately, Windows/.NET do not provide real-time execution guarantees, so what you are observing still falls under "normal behavior", seen from that perspective. Normally it is managed by making the necessary intermediate queues long enough to cope with situations encountered in the wild. I am not sure whether that is a possibility in your case.
Please Log in or Create an account to join the conversation.
Attachments:
Please Log in or Create an account to join the conversation.
If it is not related to garbage collection, and is not something we can reproduce locally, it will be difficult to diagnose. So, I will have some more questions and perhaps some suggestions, but it just to obtain some more clarity and perhaps trigger an idea that could lead us in the right direction.
1. Why precisely is the issue titled "Threading/Periodic Worker Processes"? In the original report, there is only one sentence that appears related to this, saying "Ultimately, we are looking to get information on the possible threading/periodic worker processes that could be having an effect on this.". We have a class (that runs a thread in its instance) called PeriodicWorker. Are you referring to this, i.e. do you see some (or many) threads named "OpcLabs.PeriodicWorker.Process"? If so, how does it look in "normal" situation, and how does it look when the problem occurs (if it can be determined at that time)?
2. Is the general CPU usage on the computer, and CPU usage by the application, "good" (low)?
3. The architecture ("block diagram") of the whole system is not fully clear to me. But I assume that whatever dequeues the entries from the "event status queue" in the .NET application must be running on some thread. How is this thread created? Is it with "new Thread()"? Or is a thread from a .NET thread pool, perhaps a Task? Or something else?
4. Can you obtain the actual full .NET Framework version being used by the application on the "good", and "bad" machines? Suggestion is to read and capture System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription from inside the app.
5. Have you tried to break into the debugger when the problem occurs, and look for suspicious symptoms? Mainly, does the number and "structure" of managed threads significantly differ, when the problem occurs? (this is more or less repeating a part of question #1, but is more general here)
6. Does the issue possibly appear multiple times during a single run of the application, if you let it run long enough- or just once? (If it happens once, it may indicate an issue with .NET JIT compilation)
7. In the original report, you wrote "When they removed the call to client = new EasyUAClient(), the problems stopped occurring.". Does this mean that adding "client = new EasyUAClient()", *without any further operation on that 'client' object* , is enough to introduce the problem? Or did you actually mean that besides "client = new EasyUAClient()", a whole set of new functionality has also been enabled inside the app, consequently?
Regards
Please Log in or Create an account to join the conversation.
I also monitored Garbage Collection using Perfmon and see no issues with it.
Do you have any other suggestions?
Please Log in or Create an account to join the conversation.
There can be various reasons, but the main suspect is .NET Garbage Collector (GC). In its default behavior, it kicks in at unpredictable times, and it stops all other threads in the running process, performs memory cleanup, and then resumes the normal execution of the program. Pulling in our library (well, any library, or adding any code) can change the memory usage patterns and therefore the GC invocations - which is nobody's fault, but can stull have negative effect.
Please determine which .NET or .NET Framework version are they running the program in (because the GC behavior depends on that).
It is important to know the version is truly running, and not just the version they are targeting. Example: They may be targeting .NET Framework 4.7.2, but run under .NET Framework 4.8. Or, they can be targeting .NET 6, but run under .NET 7.
In order to determine whether the above suspicion is correct, please ask the customer to correlate the occurrences of the problem with the times when .NET GC runs. There are tools such as PerfMon (and others) that can help with that. Microsoft has all the docs. Based on the outcome, we can decide on further steps. If the culprit is truly the GC, there are some (limited) ways to influence its behavior (and they depend greatly on the .NET Runtime version).
Best regards
Please Log in or Create an account to join the conversation.
I am currently travelling, please allow some time before I can answer.
Best regards
Please Log in or Create an account to join the conversation.
Ultimately, we are looking to get information on the possible threading/periodic worker processes that could be having an effect on this.
A PCIe-C1553 card from Avionics Interface Technologies (AIT) is installed. This PCIe card is connected to the customer’s Time Server to read IRIG time and to receive Sync Pulses at 100Hz. The PCIe card receives a pulse every 10 ms and communicates to the application via an interrupt-driven event handler. The event handler reads the IRIG time and updates a pulse counter to be used for application synchronization. If the events are not handled in a timely manner the event status queue will overflow. The event status queue has a depth of 256.
Version is OPC Data Client v2022.1, though the user is in the process of testing with the latest release. The user has been successfully running their application on 5 different PCs, using Windows 10 Enterprise version 1607 (OS Build 14393.1198 and 14393.447) and Enterprise version 1703 (OS build 15063.726 and 15063.502).
When they tried to run the application on a new PC running Windows 10 Enterprise version 22H2 (OS Build 19045.3996), that is where they ran into problems. After a variable amount of time the event status queue for the incoming Sync Pulses overflows. After approximately 2.5 seconds, the event handler resumes receiving Sync Pulse events. They were able to determine this because the Sync Pulse counter and the IRIG time is sent to a queue in the Event handler to be logged to a file in another task. They also log when the status queue overflows.
To determine the root cause of the Sync Pulse interrupt problem, the user have tried to isolate the various components of the application code. Having only the AIT IRIG and Sync pulse portion of the code running along with some basic logging does not present any problems.
When they removed the call to client = new EasyUAClient(), the problems stopped occurring. They were able to confirm that when using ClientAce in place of OPC Data Client, the application does not exhibit any problems either.
Please Log in or Create an account to join the conversation.