In one of our OSB process flow we are doing a service callout to a JBOSS hosted service. Our request used to hit the load balancer and then gets routed to different channels. Each channel had a front end Apache and behind that a JBOSS App server. While testing some requests were serviced properly and some of the request were hanging without any reply. It was as if the requests were hanging and no reply comes back from these services. So the synchronous requests started failing intermittently and OSB was not throwing the error back. So it was puzzling as there was no track of the request once it calls the underlying layer.
I was puzzled on why the error was not getting propagated back to the
caller. But after 10 minutes there was lot of errors in log files throwing the
stuck thread issue which gets killed automatically after the weblogic configured
HTTP timeout.
Error
stack trace from log file
HTTP/1.1
]", which is more than
the configured time (StuckThreadMaxTime) of "600" seconds. Stack
trace:
Thread-1043 "[STUCK] ExecuteThread: '28' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, waiting, priority=1, DAEMON> {
-- Waiting for notification on: java.lang.Object@2a1d4ae8[fat lock]
java.lang.Object.wait(Object.java:485)
com.bea.wli.sb.pipeline.PipelineContextImpl$SynchronousListener.waitForResponse(PipelineContextImpl.java:1563)
com.bea.wli.sb.pipeline.PipelineContextImpl.dispatchSync(PipelineContextImpl.java:525)
stages.transform.runtime.WsCalloutRuntimeStep$WsCalloutDispatcher.dispatch(WsCalloutRuntimeStep.java:1385)
stages.transform.runtime.WsCalloutRuntimeS
Thread-1043 "[STUCK] ExecuteThread: '28' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, waiting, priority=1, DAEMON> {
-- Waiting for notification on: java.lang.Object@2a1d4ae8[fat lock]
java.lang.Object.wait(Object.java:485)
com.bea.wli.sb.pipeline.PipelineContextImpl$SynchronousListener.waitForResponse(PipelineContextImpl.java:1563)
com.bea.wli.sb.pipeline.PipelineContextImpl.dispatchSync(PipelineContextImpl.java:525)
stages.transform.runtime.WsCalloutRuntimeStep$WsCalloutDispatcher.dispatch(WsCalloutRuntimeStep.java:1385)
stages.transform.runtime.WsCalloutRuntimeS
Mar 21, 2013 12:23:33 PM PDT> <Error>
<WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '26' for
queue: 'weblogic.kernel.Default (self-tuning)' has been busy for
"631" seconds working on the request "weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@29b37d16",
which is more than the configured time (StuckThreadMaxTime) of "600"
seconds. Stack trace:
Thread-189 "[STUCK] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> {
jrockit.net.SocketNativeIO.readBytesPinned(SocketNativeIO.java:???)
jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:24)
java.net.SocketInputStream.socketRead0(SocketInputStream.java:???)
java.net.SocketInputStream.read(SocketInputStream.java:107)
weblogic.utils.io.ChunkedInputStream.read(ChunkedInputStream.java:149)
java.io.InputStream.read(InputStream.java:82)
com.certicom.tls.record.ReadHandler.readFragment(Unknown Source)
com.certicom.tls.record.ReadHandler.readRecord
Thread-189 "[STUCK] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> {
jrockit.net.SocketNativeIO.readBytesPinned(SocketNativeIO.java:???)
jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:24)
java.net.SocketInputStream.socketRead0(SocketInputStream.java:???)
java.net.SocketInputStream.read(SocketInputStream.java:107)
weblogic.utils.io.ChunkedInputStream.read(ChunkedInputStream.java:149)
java.io.InputStream.read(InputStream.java:82)
com.certicom.tls.record.ReadHandler.readFragment(Unknown Source)
com.certicom.tls.record.ReadHandler.readRecord
Analysis
As per the Oracle documentation:
we rechecked the
Read timeout and Connection timeout values.
Parameters for Configuring HTTP Transport for Business Service
Parameter
|
Description
|
Read Timeout
|
Enter the
read timeout interval in seconds.
A zero (0)
value indicates no timeout.
|
Connection
Timeout
|
Enter the connection
timeout interval in seconds. If the timeout expires before the connection can
be established, Oracle Service Bus raises a connection error.
A zero (0)
value indicates no timeout.
|
Current
configuration
Conclusion
If service callouts are used and
the service is via HTTP or HTTPS, then always set the service timeout to an
appropriate value. This should be less
than the stuck thread default timeout of 600 seconds so the proxy can handle
the problem rather than the server killing the process automatically.
This
will need to be done for the business services based on each service requirement.