Random Cerebrations on SOA: synchronous service

Friday, August 9, 2013

Modelling osb error handling for synchronous services

Recently one of the queries I got was regarding the OSB error handling, how errors should be handled. The error handling will differ based on the communication pattern, whether the call is synchronous or asynchronous.

If the services are modeled to be synchronous it will be always useful to return a consistent error structure with a minimum of the error occurring service information, unique id w.r.t message so as to track it, unique error code, summary/details of the error. This needs to be defined properly and laid out before we start on a project. If the services are more asynchronous in nature, then it makes sense to have a common Error logging/handling/auditing framework which will receive the errors and based on any business rules can persist /notify the errors. In both approaches it is mandatory to define the Error message structure. The asynchronous can have more information whereas synchronous can be a subset of this since the caller or consumer need not be given a lot of unnecessary information related to error. The synchronous response should be more formatted to be humanly readable and no stack trace.

In case of OSB there can be multiple proxies and you will need to control the response message flow and track it. Instead of having multiple reply points from error catch blocks it will be better to have multiple catch blocks which will Raise an Error with the error payload populated into the body. The Global catch block will do a reply with success to the caller Proxy with the body

OSB Implementation

The different stages in OSB should be segregated and defined properly. And each stage should be having Error Handler-catch block. In each catch block

Catch the error globally and Reply with success

An error should contain minimum of below listed information so that it helps in identifying/tracking the error.

ResponseStatus: "-1"

ErrorSource: Atomic service

MessageDate: "2013-08-08T15:48:13.013-07:00",

ServiceName: "Validation Service",

OperationName: ": validateRequest",

Code: "VALIDATION_ERROR-ERR103",

Summary: "Validation Failure ",

Details: "BEA-382515: oops!!! Duplicate Request Signature",

SeverityLevel: "1"

If we are having a Gateway architecture then we can evaluate the error response at the gateway, see the error code and based on that can work out appropriate error handling mechanisms.

Wednesday, March 27, 2013

OSB synchronous services not returning any error/hanging due to stuck thread issue

In one of our OSB process flow we are doing a service callout to a JBOSS hosted service. Our request used to hit the load balancer and then gets routed to different channels. Each channel had a front end Apache and behind that a JBOSS App server. While testing some requests were serviced properly and some of the request were hanging without any reply. It was as if the requests were hanging and no reply comes back from these services. So the synchronous requests started failing intermittently and OSB was not throwing the error back. So it was puzzling as there was no track of the request once it calls the underlying layer.

I was puzzled on why the error was not getting propagated back to the caller. But after 10 minutes there was lot of errors in log files throwing the stuck thread issue which gets killed automatically after the weblogic configured HTTP timeout.

Error stack trace from log file

HTTP/1.1

]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
Thread-1043 "[STUCK] ExecuteThread: '28' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, waiting, priority=1, DAEMON> {
    -- Waiting for notification on: java.lang.Object@2a1d4ae8[fat lock]
    java.lang.Object.wait(Object.java:485)
    com.bea.wli.sb.pipeline.PipelineContextImpl$SynchronousListener.waitForResponse(PipelineContextImpl.java:1563)
    com.bea.wli.sb.pipeline.PipelineContextImpl.dispatchSync(PipelineContextImpl.java:525)
    stages.transform.runtime.WsCalloutRuntimeStep$WsCalloutDispatcher.dispatch(WsCalloutRuntimeStep.java:1385)
    stages.transform.runtime.WsCalloutRuntimeS

Mar 21, 2013 12:23:33 PM PDT> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "631" seconds working on the request "weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@29b37d16", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
Thread-189 "[STUCK] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> {
    jrockit.net.SocketNativeIO.readBytesPinned(SocketNativeIO.java:???)
    jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:24)
    java.net.SocketInputStream.socketRead0(SocketInputStream.java:???)
    java.net.SocketInputStream.read(SocketInputStream.java:107)
    weblogic.utils.io.ChunkedInputStream.read(ChunkedInputStream.java:149)
    java.io.InputStream.read(InputStream.java:82)
    com.certicom.tls.record.ReadHandler.readFragment(Unknown Source)
    com.certicom.tls.record.ReadHandler.readRecord

Analysis

As per the Oracle documentation:

we rechecked the Read timeout and Connection timeout values.

Parameters for Configuring HTTP Transport for Business Service

Parameter	Description
Read Timeout	Enter the read timeout interval in seconds. A zero (0) value indicates no timeout.
Connection Timeout	Enter the connection timeout interval in seconds. If the timeout expires before the connection can be established, Oracle Service Bus raises a connection error. A zero (0) value indicates no timeout.

Current configuration

Conclusion

If service callouts are used and the service is via HTTP or HTTPS, then always set the service timeout to an appropriate value. This should be less than the stuck thread default timeout of 600 seconds so the proxy can handle the problem rather than the server killing the process automatically.

This will need to be done for the business services based on each service requirement.