Why Is The First Call To Amazon Lex Slow?

Amazon Lex is a chatbot-building tool offered by Amazon AWS that allows developers to create a complete chatbot experience using intuitive visual building blocks. We can use it to build simple dialog flows or extremely complicated multi-chatbot applications. One of the main considerations when deciding to use a chatbot platform is performance. To that end, in this article, we are going to explore why is the first call to Amazon Lex slow sometimes.

Amazon Lex As a Managed Service

Amazon offers Lex as a fully managed service for chatbot building and serving. This makes it more like a SaaS (Software as a Service) offering. This also means that we as consumers of the service have limited visibility and control over the underlying workings of the service. Now, this has the upside of eliminating the effort required to maintain the infrastructure, but it also gives us less control to customize how the service runs specific to our use case.

Amazon Lex Cold Starts

You might have guessed by now that Lex runs on serverless technology. This deployment choice from Amazon makes sense for several reasons:

PAYG (Pay as you go): Amazon Lex is priced by usage, where each interaction with the service is counted towards usage charges.
Variable demand: Amazon does not know beforehand how much demand it will face. Therefore, a serverless approach can dynamically address this issue by only spinning up exactly the required amount of resources when required. This is also known as the JIT (Just in time) model.

The downside to the fact that Lex is serverless is, of course, that it suffers from all the classic serverless drawbacks. In particular, it suffers from cold starts.¹ You may have been working with Lex in a low-traffic environment and noticed that it can take several seconds for Lex to respond to your first query. This can especially happen after a period of inactivity (e.g. in a dev/test environment).

Also, a surge in demand can mean that new compute resources are required and some requests are met with cold starts. This can happen at every “step” of scaling up. In the next section, we will explore this in a bit more detail.

Lex Cold Starts Deep Dive

Let us now take a look at how Amazon Lex Cold Starts work. We will go through an example starting with a few requests and then ramp up to multiple simultaneous requests.

The following diagram shows an initial request from users after a period of inactivity.

Figure 1: Amazon Lex Cold Start after no activity

From Figure 1 above:

A request from the users is made to the Amazon Lex service to use the chatbot.
Amazon Lex spins up a new serverless Lex compute resource. This provisioning can take a few seconds to complete which blocks the request for the duration. Subsequent requests will reuse the same Lex compute resource.

Now, what happens when Amazon Lex receives multiple requests simultaneously? Figure 2 describes this scenario:

Figure 2: Amazon Lex Cold Start due to simultaneous requests

From Figure 2 above:

Multiple requests arrive from distinct users simultaneously to the Lex service.

One of the requests is served by the existing Lex compute resource. For the others, Amazon Lex provisions new compute resources.

The Lex compute resources remain for a short while after they receive no requests before being released.

What This Means For Your Chatbot

In most cases, this will not have any noticeable impact on your use case. However, if you have a use case where every request is sensitive to latency (e.g. for regularity requirements such as response time for call center messages in Amazon Connect) then this might become an issue, especially for low-traffic applications with this constraint. If your application is suffering from this phenomenon and you are interested in possible solutions, we will explore that in the next section.

Possible Solutions

The obvious solution would be to define a set amount of compute resources that your application requires and to configure Lex to maintain that size pool of compute resources. However, at the time of writing, there is no option from Lex to do this “reservation” of resources.

The alternative is to keep the Lex service “warm” by sending periodic requests to the Lex service. The following diagram shows one possible solution to do this:²

Figure 3: Keeping Lex warm using EventBridge and Lambda

From Figure 3 above:

The Amazon EventBridge service is configured to invoke a Lambda function every X minutes. The exact number of minutes depends on your configuration but in most cases would be around 5 minutes.
The Lambda function calls the Amazon Lex API endpoint one or more times. multiple requests are sent in threads to ensure concurrency.
Amazon Lex provisions the requests compute resources to serve the request(s). These compute resources are maintained in the pool of resources for a short period, ready to serve real traffic.

Example Lambda Function Implementation (Lex V1)

Here’s an example of Python code for a Lambda function that receives an Amazon EventBridge event and invokes a Lex bot (V1) with a test request:

import boto3

def lambda_handler(event, context):
    # Extracting necessary details from the EventBridge event
    event_detail = event['detail']
    bot_name = event_detail['bot_name']
    bot_alias = event_detail['bot_alias']
    user_id = event_detail['user_id']
    
    # Initializing the Lex client
    lex_client = boto3.client('lex-runtime')
    
    # Building the request parameters
    lex_request = {
        'botName': bot_name,
        'botAlias': bot_alias,
        'userId': user_id,
        'inputText': 'Hello'  # Test request input for the Lex bot
    }
    
    # Invoking the Lex bot with the test request
    lex_response = lex_client.post_text(**lex_request)
    
    # Do something with the Lex bot response
    # For example, you can log it or process it further
    
    # Returning a response (optional)
    return {
        'statusCode': 200,
        'body': 'Lex bot warmed up successfully'
    }

In this example, the Lambda function receives an EventBridge event as the event parameter and the function context as the context parameter. It extracts the necessary details from the event, such as the Lex bot name, alias, and user ID.

It then initializes the Lex client using the boto3 library. After that, it builds the request parameters for the Lex bot, including the bot name, bot alias, user ID, and the test input text (e.g., ‘Hello’).

The function invokes the Lex bot using the post_text method of the Lex client and passes the request parameters. The response from the Lex bot is stored in the lex_response variable. You can perform further processing or logging based on the response as per your requirements.

Finally, the function can return a response (status code and message) to indicate the successful warming up of the Lex bot.

Note: Make sure to configure the appropriate IAM permissions for your Lambda function to interact with Lex and EventBridge services.

Using Threading

Here’s an updated version of the Lambda function code that uses threading to send multiple requests simultaneously to the Lex V1 bot. The number of requests/threads is controlled by a parameter:

import boto3
import threading

def send_lex_request(lex_client, lex_request):
    lex_response = lex_client.post_text(**lex_request)
    # Do something with the Lex bot response
    # For example, you can log it or process it further

def lambda_handler(event, context):
    # Extracting necessary details from the EventBridge event
    event_detail = event['detail']
    bot_name = event_detail['bot_name']
    bot_alias = event_detail['bot_alias']
    user_id = event_detail['user_id']
    
    # Number of requests/threads (controlled by a parameter)
    num_requests = event_detail.get('num_requests', 1)
    
    # Initializing the Lex client
    lex_client = boto3.client('lex-runtime')
    
    # Creating and starting multiple threads
    threads = []
    for i in range(num_requests):
        lex_request = {
            'botName': bot_name,
            'botAlias': bot_alias,
            'userId': f'{user_id}_{i}',
            'inputText': 'Hello'  # Test request input for the Lex bot
        }
        
        thread = threading.Thread(target=send_lex_request, args=(lex_client, lex_request))
        thread.start()
        threads.append(thread)
    
    # Waiting for all threads to complete
    for thread in threads:
        thread.join()
    
    # Returning a response (optional)
    return {
        'statusCode': 200,
        'body': f'{num_requests} Lex bot requests sent successfully'
    }

Similar to the previous version, this updated code creates and starts multiple threads to send simultaneous requests to the Lex V1 bot. The number of requests/threads is controlled by the num_requests parameter extracted from the EventBridge event.

The send_lex_request function is defined to handle the Lex request and response. It takes the Lex client and Lex request as arguments and invokes the Lex bot using the post_text method. You can customize the function to perform specific processing or logging based on the Lex bot response.

Within the lambda_handler function, a loop is used to create the Lex request for each thread, and a thread is created and started for each request. The threads are stored in a list for later joining.

After all threads are started, the code waits for all of them to complete using the join method. This ensures that the Lambda function doesn’t terminate prematurely before all Lex requests are processed.

Finally, an optional response is returned to indicate the successful completion of the Lex requests, including the number of requests that were sent.

Note: Threading in AWS Lambda has some limitations, such as the maximum number of threads that can be created. Make sure to review the AWS Lambda documentation to ensure your implementation adheres to the relevant limits and best practices.

Example Lambda Function Implementation (Lex V2)

Here’s an example of Python code for a Lambda function that receives an Amazon EventBridge event and invokes a Lex bot (V2) with a test request:

import boto3

def lambda_handler(event, context):
    # Extracting necessary details from the EventBridge event
    event_detail = event['detail']
    bot_id = event_detail['bot_id']
    bot_alias_id = event_detail['bot_alias_id']
    locale_id = event_detail['locale_id']
    user_id = event_detail['user_id']
    
    # Initializing the Lex V2 client
    lex_client = boto3.client('lexv2-runtime')
    
    # Building the request parameters
    lex_request = {
        'botId': bot_id,
        'botAliasId': bot_alias_id,
        'localeId': locale_id,
        'sessionId': user_id,
        'messages': [
            {
                'contentType': 'text/plain',
                'content': 'Hello'  # Test request input for the Lex V2 bot
            }
        ]
    }
    
    # Invoking the Lex V2 bot with the test request
    lex_response = lex_client.recognize_text(**lex_request)
    
    # Do something with the Lex V2 bot response
    # For example, you can log it or process it further
    
    # Returning a response (optional)
    return {
        'statusCode': 200,
        'body': 'Lex V2 bot warmed up successfully'
    }

This version of the code uses the lexv2-runtime client from the boto3 library to interact with Lex V2. The request parameters are slightly different, as Lex V2 uses a session-based approach. It requires the botId, botAliasId, localeId, sessionId, and messages parameters.

The messages parameter is a list containing one or more message objects. In this example, we pass a single message object with a content type 'text/plain' and the test input text as the content.

The function invokes the Lex V2 bot using the recognize_text method of the Lex V2 client and passes the request parameters. The response from the Lex V2 bot is stored in the lex_response variable.

You can perform further processing or logging based on the Lex V2 bot response as per your requirements.

Note: Ensure that you have the appropriate IAM permissions for your Lambda function to interact with Lex V2 and EventBridge services. Additionally, make sure you have the latest version of the boto3 library installed in your Lambda environment to support Lex V2.

Using Threading

Here’s an updated version of the Lambda function code that uses threading to send multiple requests simultaneously to the Lex bot. The number of requests/threads is controlled by a parameter:³

import boto3
import threading

def send_lex_request(lex_client, lex_request):
    lex_response = lex_client.recognize_text(**lex_request)
    # Do something with the Lex bot response
    # For example, you can log it or process it further

def lambda_handler(event, context):
    # Extracting necessary details from the EventBridge event
    event_detail = event['detail']
    bot_id = event_detail['bot_id']
    bot_alias_id = event_detail['bot_alias_id']
    locale_id = event_detail['locale_id']
    user_id = event_detail['user_id']
    
    # Number of requests/threads (controlled by a parameter)
    num_requests = event_detail.get('num_requests', 1)
    
    # Initializing the Lex V2 client
    lex_client = boto3.client('lexv2-runtime')
    
    # Creating and starting multiple threads
    threads = []
    for i in range(num_requests):
        lex_request = {
            'botId': bot_id,
            'botAliasId': bot_alias_id,
            'localeId': locale_id,
            'sessionId': f'{user_id}_{i}',
            'messages': [
                {
                    'contentType': 'text/plain',
                    'content': 'Hello'  # Test request input for the Lex V2 bot
                }
            ]
        }
        
        thread = threading.Thread(target=send_lex_request, args=(lex_client, lex_request))
        thread.start()
        threads.append(thread)
    
    # Waiting for all threads to complete
    for thread in threads:
        thread.join()
    
    # Returning a response (optional)
    return {
        'statusCode': 200,
        'body': f'{num_requests} Lex V2 bot requests sent successfully'
    }

In this updated version, the code creates and starts multiple threads to send simultaneous requests to the Lex bot. The number of requests/threads is controlled by a parameter called num_requests, which is extracted from the EventBridge event.

The send_lex_request function is defined to handle the Lex request and response. It takes the Lex client and Lex request as arguments and invokes the Lex bot using the recognize_text method. You can customize the function to perform specific processing or logging based on the Lex bot response.

Finally, an optional response is returned to indicate the successful completion of the Lex requests, including the number of requests that were sent.

Conclusion

In this article, we explored the Amazon Lex Cold Start problem. We took a look at the nature of the Amazon Lex compute service and how it provisions resources. We have explored solutions to overcome the Cold Start problem. While it is quite an edge case that you would need to worry about this cold start, some applications require its elimination. Perhaps in the future, the Lex service will include an option to define reserved compute resources, similar to Amazon Lambda reserved concurrency.

References

Obey, J. (2023, February 16). Understand serverless function performance with Cold Start Tracing. Datadog. https://www.datadoghq.com/blog/serverless-cold-start-traces/ ↩︎
Sendler, C. (2022, January 7). Prevent AWS Lambda Cold Starts with Scheduled Event Rules. Medium. https://aws.plainenglish.io/preventing-aws-lambda-cold-starts-using-scheduled-event-rules-d468c1681f8 ↩︎
Beshah, E. (2021, December 7). Warming up applications that use AWS Lambda for load testing. Medium. https://medium.com/omnicell-engineering/warming-up-applications-that-use-aws-lambda-for-load-testing-5e04d7301dbf ↩︎

Amazon Lex As a Managed Service

Amazon Lex Cold Starts

Lex Cold Starts Deep Dive

What This Means For Your Chatbot

Possible Solutions

Example Lambda Function Implementation (Lex V1)

Using Threading

Example Lambda Function Implementation (Lex V2)

Using Threading

Conclusion

References

Related posts: