Mind the Timeout configurations in EI Endpoints

Jan 26, 2021 · 7 minute read · Comments
WSO2 EI

As the WSO2 Micro Integrator acts as a middle layer in a distributed system architecture connecting different systems , one of the main aspects that it needs to deal with is failures. One of the things that needs to be considered in this communication is Timeout. Usually this is a thing that is not considered while developing the integrations but can generate lots of problems in a production environment.

In this post we will show how the timeout settings can impact the behavior of an integration in WSO2 Micro Integrator.

Happy Path Scenario

To illustrate the scenario, we will have typical MI use case like below:

Scenario 1

It is a typical Passthrough Scenario where:

Client sent a Service/API request to MI;
MI sends the request to the Backend Service;
MI sends back the response to the Client;

For this very simple scenario we will use the ProxyService below:

<?xml version="1.0" encoding="UTF-8"?>
<proxy name="EpNoTimeoutProxy" startOnLoad="true" transports="http https" xmlns="http://ws.apache.org/ns/synapse">
    <target>
        <endpoint name="endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed">
            <address uri="http://localhost:3100/api/users">
                <suspendOnFailure>
                    <initialDuration>-1</initialDuration>
                    <progressionFactor>1</progressionFactor>
                </suspendOnFailure>
                <markForSuspension>
                    <retriesBeforeSuspension>0</retriesBeforeSuspension>
                </markForSuspension>
            </address>
        </endpoint>
        <inSequence/>
        <outSequence>
            <payloadFactory media-type="xml">
                <format>
                    <response>
                        $1
                    </response>
                </format>
                <args>
                    <arg evaluator="xml" expression="$body/*"/>
                </args>
            </payloadFactory>
            <send/>
        </outSequence>
        <faultSequence>
        	<log>
        		<property name="EpNoTimeoutProxy" value="faultSequence"/>
        		<property name="ERROR_CODE" expression="$ctx:ERROR_CODE"/>
        		<property name="ERROR_MESSAGE" expression="$ctx:ERROR_MESSAGE"/>
        	</log>
        	<respond />
        </faultSequence>
    </target>
</proxy>

As we could see, it is a very simple proxy that directs to a Rest Endpoint and then uses a payloadFactory in the response to generate an XML response. In the normal days it would not cause any problems.

And we would have responses like below:

Scenario 1 Response

Then the service starts receiving more requests..

Everything goes ok and then it started receiving more requests and the backend service starts to become slow:

Scenario 2

As in the picture above, we can see the backend started to respond very slow like 30s, 60s, 70s.., but the client keeps receiving the responses:

Scenario 2

Then backend services start crashing…

Then something unexpected happened.. the backend services start crashing and the response times became super slow:

Scenario 3

As the response times starts increasing we start seeing errors in the client and in MI Logs:

[2021-01-24 23:15:23,010]  WARN {TimeoutHandler} - Expiring message ID : urn:uuid:1e44b5b2-7030-4686-ad61-25df35972f3b; dropping message after GLOBAL_TIMEOUT of : 120 seconds for Endpoint [endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed], URI : http://localhost:3100/api/users, Received through Proxy service : EpNoTimeoutProxy

Client Impact with the backend failure Scenario 3

As we could see in the above error message, EI is dropping the message after the GLOBAL_TIMEOUT that is 120 seconds. The first question that comes is: why did it use that global timeout?

The answer is because in the above endpoint definition we do not specify the timeout settings, so by default it will use the value specified in the property synapse.global_timeout_interval set in the deployment.toml.

A good practice when dealing with remote communication is to always specify how long the socket will be waiting for a response. This way, an obvious solution is to specify the timeout settings in the Endpoint. In our example, we consider that the Proxy will wait for 30 seconds. The endpoint will look like below:

<endpoint name="endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed">
    <address uri="http://localhost:3100/api/users">
        <timeout>
            <duration>30000</duration>
        </timeout>
        <suspendOnFailure>
            <initialDuration>-1</initialDuration>
            <progressionFactor>1</progressionFactor>
        </suspendOnFailure>
        <markForSuspension>
            <retriesBeforeSuspension>0</retriesBeforeSuspension>
        </markForSuspension>
    </address>
</endpoint>

If we try the same endpoint again, now we get the following error:

[2021-01-25 20:46:24,931]  WARN {TimeoutHandler} - Expiring message ID : urn:uuid:d8039eb2-9970-41b1-959e-fb55287e905b; dropping message after ENDPOINT_TIMEOUT of : 30 seconds for Endpoint [endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed], URI : http://localhost:3100/api/users, Received through Proxy service : EpNoTimeoutProxy

But the client did not receive any response again. And now we come to the next parameter that is the responseAction, that specifies which action will be taken by the synapse engine if no response is received in the timeout value defined in the endpoint. The possible values for that are:

never: This is the default value. In this case, the message will be discarded and no fault handlers will be engaged, but the endpoint will never goes to the timeout state;
discard: Likewise the above mentioned the fault handlers will not be engaged but in case of timeout, the endpoint will go to the suspended state;
fault: In this case, the faultHandler will be engaged in case of a timeout and the endpoint will be put in a suspension mode;

Let us see the behavior when changing the response action parameter value.

Using response action discard

Let us change our definition a bit to add the responseAction and set it to discard.

<endpoint name="endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed">
    <address uri="http://localhost:3100/api/users">
        <timeout>
            <duration>30000</duration>
            <responseAction>discard</responseAction>
        </timeout>
        <suspendOnFailure>
            <initialDuration>-1</initialDuration>
            <progressionFactor>1</progressionFactor>
        </suspendOnFailure>
        <markForSuspension>
            <retriesBeforeSuspension>0</retriesBeforeSuspension>
        </markForSuspension>
    </address>
</endpoint>

Like the never it does not trigger the fault handling process but we see different log messages saying the endpoint now became suspended

[2021-01-26 20:10:50,946]  INFO {EndpointContext} - Endpoint : endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed with address http://localhost:3100/api/users has been marked for SUSPENSION, but no further retries remain. Thus it will be SUSPENDED.
[2021-01-26 20:10:50,947]  WARN {EndpointContext} - Suspending endpoint : endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed with address http://localhost:3100/api/users - current suspend duration is : 30000ms - Next retry after : Tue Jan 26 20:11:20 BRT 2021
[2021-01-26 20:10:50,948]  INFO {AddressEndpoint} - Ignoring fault handlers since the timeout action is set to DISCARD
[2021-01-26 20:10:50,948]  WARN {TimeoutHandler} - Expiring message ID : urn:uuid:756df2b1-f788-4d08-bf0d-0655224f9bfd; dropping message after ENDPOINT_TIMEOUT of : 30 seconds for Endpoint [endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed], URI : http://localhost:3100/api/users, Received through Proxy service : EpNoTimeoutProxy

We do not see the error logs specified in the faultSequence and no response was sent to the client. But, if we try a next request in the next 30s while the endpoint is suspended, the client receives the response and we will see the following in the logs:

[2021-01-26 20:20:39,820]  INFO {LogMediator} - {proxy:EpNoTimeoutProxy} To: /services/EpNoTimeoutProxy.EpNoTimeoutProxyHttpSoap11Endpoint, WSAction: urn:mediate, SOAPAction: urn:mediate, MessageID: urn:uuid:7666720b-08b1-4b2e-af1a-e2b6192b2fa7, Direction: request, EpNoTimeoutProxy = faultSequence, ERROR_CODE = 303001, ERROR_MESSAGE = Currently , Address endpoint : [ Name : endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed ] [ State : SUSPENDED ]

So, differently from the default behaviour after a timeout, the endpoint goes to the suspended mode, while in this state it will not forward messages to the backend and it will also direct the flow to the faultSequence as we can see the error logs we defined. We can change the number of retries before suspension, the maximum time of suspension and even if the endpoint will be suspended by tuning the endpoint settings.

Using response action fault

Now let us try our endpoint with the response action fault.

<endpoint name="endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed">
    <address uri="http://localhost:3100/api/users">
        <timeout>
            <duration>30000</duration>
            <responseAction>fault</responseAction>
        </timeout>
        <suspendOnFailure>
            <initialDuration>-1</initialDuration>
            <progressionFactor>1</progressionFactor>
        </suspendOnFailure>
        <markForSuspension>
            <retriesBeforeSuspension>0</retriesBeforeSuspension>
        </markForSuspension>
    </address>
</endpoint>

Now when trying the endpoint after the timeout it will make the endpoint suspended and engages the faultSequence:

[2021-01-26 20:57:21,367]  INFO {EndpointContext} - Endpoint : endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed with address http://localhost:3100/api/users has been marked for SUSPENSION, but no further retries remain. Thus it will be SUSPENDED.
[2021-01-26 20:57:21,367]  WARN {EndpointContext} - Suspending endpoint : endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed with address http://localhost:3100/api/users - last suspend duration was : 30000ms and current suspend duration is : 30000ms - Next retry after : Tue Jan 26 20:57:51 BRT 2021
[2021-01-26 20:57:21,368]  INFO {LogMediator} - {proxy:EpNoTimeoutProxy} To: /services/EpNoTimeoutProxy.EpNoTimeoutProxyHttpSoap11Endpoint, WSAction: urn:mediate, SOAPAction: urn:mediate, MessageID: urn:uuid:926e8ae9-6839-4c2b-8778-9de693236827, Direction: request, EpNoTimeoutProxy = faultSequence, ERROR_CODE = 101504, ERROR_MESSAGE = Send timeout
[2021-01-26 20:57:21,369]  WARN {TimeoutHandler} - Expiring message ID : urn:uuid:fdcc2b55-d673-4188-bc7b-f5b457e9d4f8; dropping message after ENDPOINT_TIMEOUT of : 30 seconds for Endpoint [endpoint_urn_uuid_5f516685-86fa-4ba7-ab78-340598a190ed], URI : http://localhost:3100/api/users, Received through Proxy service : EpNoTimeoutProxy

And in the subsequent messages will go to the fault sequence while the endpoint is suspended, it will display the same error messages displayed above.

Conclusion

As we could see, we need to carefully choose what the timeout value we will use and the action that will be taken in case of a timeout, once these can lead to unexpected behaviours depending on the configurations applied.

We can find a very good explanation of the different timeouts involved in a Request/Response flow in EI in the MI Best Practices Page.

I hope this helps! See you in the next post.

Xico Junior