Showing posts with label synchronous service. Show all posts
Showing posts with label synchronous service. Show all posts

Friday, August 9, 2013

Modelling osb error handling for synchronous services


Recently one of the queries I got was regarding the OSB error handling, how errors should be handled. The error handling will differ based on the communication pattern, whether the call is synchronous or asynchronous.
If the services are modeled to be synchronous it will be always useful to return a consistent error structure with a minimum of the error occurring service information, unique id w.r.t message so as to track it, unique error code, summary/details of the error. This needs to be defined properly and laid out before we start on a project. If the services are more asynchronous in nature, then it makes sense to have a common Error logging/handling/auditing framework which will receive the errors and based on any business rules can persist /notify the errors. In both approaches it is mandatory to define the Error message structure. The asynchronous can have more information whereas synchronous can be a subset of this since the caller or consumer need not be given a lot of unnecessary information related to error. The synchronous response should be more formatted to be humanly readable and no stack trace.
  In case of OSB there can be multiple proxies and you will need to control the response message flow and track it. Instead of having multiple reply points from error catch blocks it will be better to have multiple catch blocks which will Raise an Error with the error payload populated into the body. The Global catch block will do a reply with success to the caller Proxy with the body


OSB Implementation
The different stages in OSB should be segregated and defined properly. And each stage should be having Error Handler-catch block. In each catch block

Catch the error globally and Reply with success
An error should contain minimum of below listed information so that it helps in identifying/tracking the error.
ResponseStatus: "-1"
ErrorSource: Atomic service
MessageDate: "2013-08-08T15:48:13.013-07:00",
ServiceName: "Validation Service",
OperationName: ": validateRequest",
Code: "VALIDATION_ERROR-ERR103",
Summary: "Validation Failure ",
Details: "BEA-382515: oops!!! Duplicate Request Signature",
SeverityLevel: "1"

If we are having a Gateway architecture then we can evaluate the error response at the gateway, see the error code and based on that can work out appropriate error handling mechanisms.

Wednesday, March 27, 2013

OSB synchronous services not returning any error/hanging due to stuck thread issue


In one of our OSB process flow we are doing a service callout to a JBOSS hosted service. Our request used to hit the load balancer and then gets routed to different channels. Each channel had a front end Apache and behind that a JBOSS App server. While testing some requests were serviced properly and some of the request were hanging without any reply. It was as if the requests were hanging and no reply comes back from these services. So the synchronous requests started failing intermittently and OSB was not throwing the error back. So it was puzzling as there was no track of the request once it calls the underlying layer.
                 I was puzzled on why the error was not getting propagated back to the caller. But after 10 minutes there was lot of errors in log files throwing the stuck thread issue which gets killed automatically after the weblogic configured HTTP timeout.


Error stack trace from log file

HTTP/1.1
]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
Thread-1043 "[STUCK] ExecuteThread: '28' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, waiting, priority=1, DAEMON> {
    -- Waiting for notification on: java.lang.Object@2a1d4ae8[fat lock]
    java.lang.Object.wait(Object.java:485)
    com.bea.wli.sb.pipeline.PipelineContextImpl$SynchronousListener.waitForResponse(PipelineContextImpl.java:1563)
    com.bea.wli.sb.pipeline.PipelineContextImpl.dispatchSync(PipelineContextImpl.java:525)
    stages.transform.runtime.WsCalloutRuntimeStep$WsCalloutDispatcher.dispatch(WsCalloutRuntimeStep.java:1385)
    stages.transform.runtime.WsCalloutRuntimeS
Mar 21, 2013 12:23:33 PM PDT> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "631" seconds working on the request "weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@29b37d16", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
Thread-189 "[STUCK] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> {
    jrockit.net.SocketNativeIO.readBytesPinned(SocketNativeIO.java:???)
    jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:24)
    java.net.SocketInputStream.socketRead0(SocketInputStream.java:???)
    java.net.SocketInputStream.read(SocketInputStream.java:107)
    weblogic.utils.io.ChunkedInputStream.read(ChunkedInputStream.java:149)
    java.io.InputStream.read(InputStream.java:82)
    com.certicom.tls.record.ReadHandler.readFragment(Unknown Source)
    com.certicom.tls.record.ReadHandler.readRecord




Analysis
As per the Oracle documentation
we rechecked the Read timeout and Connection timeout values. 
Parameters for Configuring HTTP Transport for Business Service
Parameter
Description
Read Timeout
Enter the read timeout interval in seconds.
A zero (0) value indicates no timeout.
Connection Timeout
Enter the connection timeout interval in seconds. If the timeout expires before the connection can be established, Oracle Service Bus raises a connection error.
A zero (0) value indicates no timeout.



Current configuration




Conclusion
              If service callouts are used and the service is via HTTP or HTTPS, then always set the service timeout to an appropriate value.  This should be less than the stuck thread default timeout of 600 seconds so the proxy can handle the problem rather than the server killing the process automatically.
This will need to be done for the business services based on each service requirement.