Python Webserver Framework Performance Benchmark

Complete Python Webserver Framework Performance Benchmark

To find out which Python webserver frameworks perform the best, I’m going to benchmark several of them. I will be measuring metrics such as requests per second and latency under different circumstances do that under different situations.

For updates and roadmap see the bottom of this page.

Benchmarking Methodology

Hardware setup

ComponentSpecification
CPUIntel Core i7 12700H
Memory Max Frequency3200.0 MHz

Test-specific components (e.g. CPU/memory) are reported on each test. Most tests use 100% of a single vCPU. This ensures there is plenty of resources available to the application server. This is also fair if all servers are given the same resources during each test. Do note, however, that your results may be very different since in real scenarios the results will be much higher since most deployments don’t use only 100% of a vCPU. The goal of these benchmarks is to get an idea of the performance level and compare results between different frameworks given a level playing field.

Restricting the resources also means restricting the throughput that the framework is able to run through. The upside to this is that the testing tool won’t require massive amounts of resources, and therefore is not resource-restricted and is free to use as many resources as it wants in order not to be the bottleneck. All requests are issued from the same machine hosting the server.

Testing Environment and Tools

All tests are performed inside docker to ensure consistency. You can check the docker environment used by each test by viewing its source code.

Metrics are measured using the wrk tool. You can check the details of the configuration used for each test in the test’s docker-compose in the source code (coming soon). Otherwise, the test would specify any configurations used.

Tests

Simple “Hello World” / Echo application

This simple test runs the application with as little work as possible by simply returning a “Hello World” string with a content type “text/plain” where possible.

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

ServerAvg msStdev msMax msReq/SecTotal Requests
falcon116.11572.146,900.004,81796,434
falcon-gunicorn3.040.4312.776,551131,154
falcon-uwsgi1.330.339.2313,711274,340
falcon-bjoern0.330.228.5562,7331,254,669
falcon-bjoern-nuitka0.320.177.0762,8131,256,254
falcon-bjoern-pypy9.3025.78153.8633,665673,989
falcon-uvicorn3.580.3012.465,583111,766
falcon-uvicorn-uvloop2.480.3310.488,066161,511
cherrypy12.2114.32213.462,53750,799
cherrypy-uwsgi6.610.5016.992,99760,023
cherrypy-tornado9.190.8019.392,17443,555
cherrypy-twisted9.992.0954.902,00340,112
flask7.500.5423.592,65853,220
flask-fastwsgi1.950.2911.6210,301206,186
flask-gunicorn-eventlet28.0945.32398.874,55291,154
flask-gunicorn-gevent27.9942.84301.404,50290,124
flask-gunicorn-gthread5.550.5815.803,60072,072
flask-gunicorn-meinheld1.990.469.7310,062201,365
flask-gunicorn-tornado6.320.6816.843,16463,347
flask-gunicorn4.660.5016.734,27985,654
flask-meinheld2.170.2411.129,142182,964
flask-bjoern1.630.279.8512,312246,376
flask-bjoern-nuitka1.310.279.2315,275305,634
fastwsgi0.110.208.99214,1124,303,606
bottle64.17332.823,500.005,319106,470
bjoern0.120.207.68178,3533,567,317
bjoern-pypy894.351,800.007,400.0014,181283,954
aiohttp2.180.2110.079,172183,588
aiohttp-uvloop1.290.248.9915,419309,930
aiohttp-gunicorn1.800.2310.4011,131222,769
aiohttp-gunicorn-uvloop0.930.2310.8921,519430,447
hug369.861,500.0014,000.004,24685,017
meinheld0.460.6238.9544,706895,005
muffin-uvicorn4.880.4716.474,10082,078
netius2.080.207.889,598192,114
pycnic-gunicorn3.570.4314.915,580111,709
tornado3.960.4915.045,055101,201

There’s no surprise that almost all servers perform well in this test given that this is simply a max throughput test without any real application work. Most hover in the 3-6k request per second range. The ridiculously fast servers such as bjoern, meinheld, and fastwsgi (the latter pulling in 214k requests per second) are of course the compiled servers that are written in C so they can pull in these numbers. When these servers are used with a framework such as flask, the framework – which is written in Python – becomes the bottleneck. Although having said that, 62k requests per second by falcon and bjoern is still very high.

The ASGI framework aiohttp performed really well compared to the WSGI-based falcon and flask when paired with gunicorn’s default sync worker (9.1k, 6.5k, and 4.2k respectively). Although the WSGI-based servers pull ahead when paired with optimized servers such as bjoern.

The slowest framework in this test was cherrypy, which handled less than 3k requests per second regardless of the server used.

JSON serialization

This test returns a 5.5KB JSON string with the content type “application/json”.

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

ServerAvg msStdev msMax msReq/SecTotal Requests
falcon313.751,400.0013,800.003,89978,077
falcon-gunicorn4.160.6019.914,79996,091
falcon-uwsgi3.020.3112.326,528130,908
falcon-bjoern1.270.218.2015,756315,492
falcon-bjoern-nuitka1.280.229.0915,618312,426
falcon-bjoern-pypy2.073.7440.9716,466329,427
falcon-uvicorn4.690.339.044,26385,360
falcon-uvicorn-uvloop3.550.4717.485,639112,984
cherrypy15.2515.17122.741,60432,134
cherrypy-uwsgi11.520.8022.171,72834,609
cherrypy-tornado13.580.9323.051,47129,476
cherrypy-twisted14.522.6474.541,37727,595
flask9.490.6822.272,10242,082
flask-fastwsgi3.390.4312.065,902118,178
flask-gunicorn-eventlet17.5024.33188.123,10062,061
flask-gunicorn-gevent18.8525.62178.693,12562,578
flask-gunicorn-gthread7.500.6314.662,66353,347
flask-gunicorn-meinheld3.430.8115.265,834116,763
flask-gunicorn-tornado8.000.7417.592,49950,070
flask-gunicorn6.480.5317.173,07461,556
flask-meinheld3.730.3515.145,335106,797
flask-bjoern3.080.3821.986,498130,064
flask-bjoern-nuitka2.860.3520.946,997140,071
fastwsgi1.410.3310.5714,251285,268
bottle307.211,500.0014,000.004,08581,826
bjoern1.130.228.9317,803356,312
bjoern-pypy1.291.3831.9218,514370,420
aiohttp3.360.4014.775,964119,389
aiohttp-uvloop2.390.3510.888,389167,978
aiohttp-gunicorn2.940.2510.076,790135,875
aiohttp-gunicorn-uvloop2.010.248.009,952199,244
hug323.221,500.0014,100.003,19363,935
meinheld1.580.2010.5512,537250,936
muffin-uvicorn5.760.5610.133,47369,504
netius5.230.4610.653,80276,134
pycnic-gunicorn4.760.5115.724,18883,847
tornado5.060.4213.903,95479,164

Like in the previous simple test, it is no surprise that most servers perform quite well in this JSON serialization test. Most servers are handling a respectable 2-6k requests per second. Although the JSON is small, all operations are performed in memory since the JSON is preloaded from a file on server start-up. The serialization of the JSON object is a CPU-intensive operation that happens in the Python code, therefore the optimization gains seen in the previous test from the servers written in C (bjoern, fastwsgi, and meinheld) begin to become less exaggerated.

The winner, although not by much is the bjoern server running with pypy which would have optimized the Python code to run more efficiently and was able to handle 18.5k requests per second. The slowest performing framework in this test was cherrypy which handled less than 2k requests per second regardless of the server type.

Simulated CPU Bound

This test returns a “Hello World” string with a content type “text/plain” where possible but also runs a loop that hogs the CPU for around 90-100ms before returning.

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

ServerAvg msStdev msMax msReq/SecTotal Requests
falcon897.931,600.0014,500.0011220
falcon-gunicorn1,600.00285.831,800.0011226
falcon-uwsgi1,600.00269.281,700.0012233
falcon-bjoern1,700.00867.707,000.0011223
falcon-bjoern-nuitka1,700.00868.657,000.0011223
falcon-bjoern-pypy50.625.19222.493957,924
falcon-uvicorn1,600.00273.511,800.0011228
falcon-uvicorn-uvloop1,700.00253.092,900.0011218
cherrypy1,800.00268.352,100.0010209
cherrypy-uwsgi1,600.00274.331,700.0011230
cherrypy-tornado1,800.0051.311,900.0011220
cherrypy-twisted1,900.00409.843,200.0010198
flask1,700.00250.092,000.0011222
flask-fastwsgi1,600.00445.823,300.0011228
flask-gunicorn-eventlet1,500.001,100.004,700.0011227
flask-gunicorn-gevent1,600.001,000.004,200.0011227
flask-gunicorn-gthread1,600.00281.041,700.0011227
flask-gunicorn-meinheld1,500.00618.523,100.0011227
flask-gunicorn-tornado1,700.0016.391,700.0011220
flask-gunicorn1,700.00295.801,900.0011220
flask-meinheld1,600.00274.711,700.0011228
flask-bjoern1,800.00912.257,400.0011217
flask-bjoern-nuitka1,000.00481.174,800.0020395
fastwsgi1,600.00429.573,300.0011229
bottle865.071,600.0014,500.0012233
bjoern1,800.00882.967,100.0011223
bjoern-pypy19.331.97106.061,03520,744
aiohttp1,700.00292.941,800.0011220
aiohttp-uvloop1,700.00161.271,800.0011222
aiohttp-gunicorn1,700.00292.771,800.0011222
aiohttp-gunicorn-uvloop1,700.00242.122,200.0011220
hug1,200.002,000.0014,100.008156
meinheld1,800.00317.471,900.0010208
muffin-uvicorn1,700.00290.471,900.0011220
netius1,700.00156.771,900.0011222
pycnic-gunicorn1,600.00278.181,800.0011230
tornado1,600.00276.691,800.0011230

For this test, perhaps unsurprisingly, most servers performed almost identically since the majority of CPU time is spent in the artificial loop for around 90-100ms for each request. This leads to most servers hovering around the 10-11 request per second mark here.

The exception is when using pypy to run the server. Falcon running with bjoern and pypy ran around 36 times faster than most servers, and the bjoern with pypy ran around 94 times faster than most servers, topping 1000 requests per second.

In the real world, your mileage may vary with pypy depending on your application, but I wanted to highlight that no matter the framework when you have a CPU-bound server request, most framework and server combinations will have very similar results (no magic here I’m afraid).

Simulated IO Bound

This test returns a “Hello World” string with a content type “text/plain” where possible but also sleeps the thread for 100ms before returning (note that the sleep method is not guaranteed to wake the thread up after exactly 100ms).

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

ServerAvg msStdev msMax msReq/SecTotal Requests
falcon1,100.001,800.0013,900.007135
falcon-gunicorn9,900.005,700.0019,900.0010196
falcon-uwsgi7,600.003,300.0010,300.0010197
falcon-bjoern4,200.002,300.0019,000.009183
falcon-bjoern-nuitka4,300.002,400.0019,200.009185
falcon-bjoern-pypy4,300.002,400.0019,300.009185
falcon-bjoern-fork222.41255.882,600.001,79135,992
falcon-uvicorn114.165.32134.192,60352,267
falcon-uvicorn-uvloop113.8110.04269.772,61152,463
cherrypy2,100.001,700.0016,200.00951,917
cherrypy-uwsgi7,700.003,400.0010,500.0010194
cherrypy-tornado18,500.000.8418,500.00130
cherrypy-twisted1,400.00292.012,000.001923,848
flask129.0318.06363.611,44829,030
flask-fastwsgi8,300.006,200.0019,800.0010196
flask-gunicorn-eventlet101.942.23143.242,91958,587
flask-gunicorn-gevent101.681.90136.132,92158,674
flask-gunicorn-gthread9,900.005,700.0019,900.0010195
flask-gunicorn-meinheld5,400.004,800.0017,900.0010197
flask-gunicorn-tornadoN/AN/AN/AN/AN/A
flask-gunicorn9,900.005,700.0019,900.0010195
flask-meinheld10,000.005,700.0019,900.0010197
flask-bjoern4,000.002,000.0018,200.009178
flask-bjoern-nuitka4,300.002,500.0019,400.009185
fastwsgi9,200.006,300.0019,800.0010198
bottle815.781,100.0014,900.0010196
bjoern4,200.002,200.0018,500.009184
bjoern-pypy3,800.002,300.0019,700.009177
aiohttp127.758.99151.922,31646,508
aiohttp-uvloop122.319.85255.232,43448,920
aiohttp-gunicorn120.128.67141.702,47649,752
aiohttp-gunicorn-uvloop111.936.03135.042,65653,320
hug932.481,400.0014,800.0010194
meinheld9,900.005,700.0019,800.0010198
muffin-uvicorn112.235.09140.912,64753,190
netius5,100.002,000.008,100.00481
pycnic-gunicorn9,900.005,700.0019,900.0010197
tornado9,700.005,900.0019,900.0010197

This test is designed to simulate a server request that does not do any local processing on the server itself but makes requests to other servers or resources instead as a middleman. In real-world scenarios, this intermediate server will usually run some lightweight logic to do things such as figure out where to fetch data from, aggregate data once fetched, or any other “stitching” logic required.

This test category is where ASGI frameworks shine since each request can be handled independently of each other, whereas WSGI applications will typically block the request thread while waiting for “external” resources to respond (in this test, the main thread is blocked for the entire 100ms of the mock resource fetch).

The winner in this test was flask with gunicorn and either of gevent and eventlet with 2.9k requests per second. falcon with uvicorn (with or without uvloop) handling 2.6k requests per second is a close runner up. aiohttp with uvloop, gunicorn (with and without uvloop) handled 2.3-2.6k requests per second. muffin with uvicorn handled 2.6k requests.

Using bjoern‘s forking spawns more “workers” to handle more requests simultaneously, but this does not appear as optimized as the ASGI servers.

The slowest servers were the WSGI servers as expected since they are blocked in between handling requests.

Todo

  • Add more frameworks and servers, and combinations of frameworks and servers.
  • Publish source code
  • Explore more parameters to play with such as nuitka compilation options and pypy
  • Publish charts for visual comparison

Changelog

  • 11/10/2023 – Initial post.
Published