Wednesday, March 27, 2013

OSB synchronous services not returning any error/hanging due to stuck thread issue


In one of our OSB process flow we are doing a service callout to a JBOSS hosted service. Our request used to hit the load balancer and then gets routed to different channels. Each channel had a front end Apache and behind that a JBOSS App server. While testing some requests were serviced properly and some of the request were hanging without any reply. It was as if the requests were hanging and no reply comes back from these services. So the synchronous requests started failing intermittently and OSB was not throwing the error back. So it was puzzling as there was no track of the request once it calls the underlying layer.
                 I was puzzled on why the error was not getting propagated back to the caller. But after 10 minutes there was lot of errors in log files throwing the stuck thread issue which gets killed automatically after the weblogic configured HTTP timeout.


Error stack trace from log file

HTTP/1.1
]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
Thread-1043 "[STUCK] ExecuteThread: '28' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, waiting, priority=1, DAEMON> {
    -- Waiting for notification on: java.lang.Object@2a1d4ae8[fat lock]
    java.lang.Object.wait(Object.java:485)
    com.bea.wli.sb.pipeline.PipelineContextImpl$SynchronousListener.waitForResponse(PipelineContextImpl.java:1563)
    com.bea.wli.sb.pipeline.PipelineContextImpl.dispatchSync(PipelineContextImpl.java:525)
    stages.transform.runtime.WsCalloutRuntimeStep$WsCalloutDispatcher.dispatch(WsCalloutRuntimeStep.java:1385)
    stages.transform.runtime.WsCalloutRuntimeS
Mar 21, 2013 12:23:33 PM PDT> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "631" seconds working on the request "weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@29b37d16", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
Thread-189 "[STUCK] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> {
    jrockit.net.SocketNativeIO.readBytesPinned(SocketNativeIO.java:???)
    jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:24)
    java.net.SocketInputStream.socketRead0(SocketInputStream.java:???)
    java.net.SocketInputStream.read(SocketInputStream.java:107)
    weblogic.utils.io.ChunkedInputStream.read(ChunkedInputStream.java:149)
    java.io.InputStream.read(InputStream.java:82)
    com.certicom.tls.record.ReadHandler.readFragment(Unknown Source)
    com.certicom.tls.record.ReadHandler.readRecord




Analysis
As per the Oracle documentation
we rechecked the Read timeout and Connection timeout values. 
Parameters for Configuring HTTP Transport for Business Service
Parameter
Description
Read Timeout
Enter the read timeout interval in seconds.
A zero (0) value indicates no timeout.
Connection Timeout
Enter the connection timeout interval in seconds. If the timeout expires before the connection can be established, Oracle Service Bus raises a connection error.
A zero (0) value indicates no timeout.



Current configuration




Conclusion
              If service callouts are used and the service is via HTTP or HTTPS, then always set the service timeout to an appropriate value.  This should be less than the stuck thread default timeout of 600 seconds so the proxy can handle the problem rather than the server killing the process automatically.
This will need to be done for the business services based on each service requirement.



No comments: