To find out which Python webserver frameworks perform the best, I’m going to benchmark several of them. I will be measuring metrics such as requests per second and latency under different circumstances do that under different situations.
For updates and roadmap see the bottom of this page.
Benchmarking Methodology
Hardware setup
Component | Specification |
CPU | Intel Core i7 12700H |
Memory Max Frequency | 3200.0 MHz |
Test-specific components (e.g. CPU/memory) are reported on each test. Most tests use 100% of a single vCPU. This ensures there is plenty of resources available to the application server. This is also fair if all servers are given the same resources during each test. Do note, however, that your results may be very different since in real scenarios the results will be much higher since most deployments don’t use only 100% of a vCPU. The goal of these benchmarks is to get an idea of the performance level and compare results between different frameworks given a level playing field.
Restricting the resources also means restricting the throughput that the framework is able to run through. The upside to this is that the testing tool won’t require massive amounts of resources, and therefore is not resource-restricted and is free to use as many resources as it wants in order not to be the bottleneck. All requests are issued from the same machine hosting the server.
Testing Environment and Tools
All tests are performed inside docker to ensure consistency. You can check the docker environment used by each test by viewing its source code.
Metrics are measured using the wrk tool. You can check the details of the configuration used for each test in the test’s docker-compose in the source code (coming soon). Otherwise, the test would specify any configurations used.
Tests
Simple “Hello World” / Echo application
This simple test runs the application with as little work as possible by simply returning a “Hello World” string with a content type “text/plain” where possible.
Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.
Server | Avg ms | Stdev ms | Max ms | Req/Sec | Total Requests |
falcon | 116.11 | 572.14 | 6,900.00 | 4,817 | 96,434 |
falcon-gunicorn | 3.04 | 0.43 | 12.77 | 6,551 | 131,154 |
falcon-uwsgi | 1.33 | 0.33 | 9.23 | 13,711 | 274,340 |
falcon-bjoern | 0.33 | 0.22 | 8.55 | 62,733 | 1,254,669 |
falcon-bjoern-nuitka | 0.32 | 0.17 | 7.07 | 62,813 | 1,256,254 |
falcon-bjoern-pypy | 9.30 | 25.78 | 153.86 | 33,665 | 673,989 |
falcon-uvicorn | 3.58 | 0.30 | 12.46 | 5,583 | 111,766 |
falcon-uvicorn-uvloop | 2.48 | 0.33 | 10.48 | 8,066 | 161,511 |
cherrypy | 12.21 | 14.32 | 213.46 | 2,537 | 50,799 |
cherrypy-uwsgi | 6.61 | 0.50 | 16.99 | 2,997 | 60,023 |
cherrypy-tornado | 9.19 | 0.80 | 19.39 | 2,174 | 43,555 |
cherrypy-twisted | 9.99 | 2.09 | 54.90 | 2,003 | 40,112 |
flask | 7.50 | 0.54 | 23.59 | 2,658 | 53,220 |
flask-fastwsgi | 1.95 | 0.29 | 11.62 | 10,301 | 206,186 |
flask-gunicorn-eventlet | 28.09 | 45.32 | 398.87 | 4,552 | 91,154 |
flask-gunicorn-gevent | 27.99 | 42.84 | 301.40 | 4,502 | 90,124 |
flask-gunicorn-gthread | 5.55 | 0.58 | 15.80 | 3,600 | 72,072 |
flask-gunicorn-meinheld | 1.99 | 0.46 | 9.73 | 10,062 | 201,365 |
flask-gunicorn-tornado | 6.32 | 0.68 | 16.84 | 3,164 | 63,347 |
flask-gunicorn | 4.66 | 0.50 | 16.73 | 4,279 | 85,654 |
flask-meinheld | 2.17 | 0.24 | 11.12 | 9,142 | 182,964 |
flask-bjoern | 1.63 | 0.27 | 9.85 | 12,312 | 246,376 |
flask-bjoern-nuitka | 1.31 | 0.27 | 9.23 | 15,275 | 305,634 |
fastwsgi | 0.11 | 0.20 | 8.99 | 214,112 | 4,303,606 |
bottle | 64.17 | 332.82 | 3,500.00 | 5,319 | 106,470 |
bjoern | 0.12 | 0.20 | 7.68 | 178,353 | 3,567,317 |
bjoern-pypy | 894.35 | 1,800.00 | 7,400.00 | 14,181 | 283,954 |
aiohttp | 2.18 | 0.21 | 10.07 | 9,172 | 183,588 |
aiohttp-uvloop | 1.29 | 0.24 | 8.99 | 15,419 | 309,930 |
aiohttp-gunicorn | 1.80 | 0.23 | 10.40 | 11,131 | 222,769 |
aiohttp-gunicorn-uvloop | 0.93 | 0.23 | 10.89 | 21,519 | 430,447 |
hug | 369.86 | 1,500.00 | 14,000.00 | 4,246 | 85,017 |
meinheld | 0.46 | 0.62 | 38.95 | 44,706 | 895,005 |
muffin-uvicorn | 4.88 | 0.47 | 16.47 | 4,100 | 82,078 |
netius | 2.08 | 0.20 | 7.88 | 9,598 | 192,114 |
pycnic-gunicorn | 3.57 | 0.43 | 14.91 | 5,580 | 111,709 |
tornado | 3.96 | 0.49 | 15.04 | 5,055 | 101,201 |
There’s no surprise that almost all servers perform well in this test given that this is simply a max throughput test without any real application work. Most hover in the 3-6k request per second range. The ridiculously fast servers such as bjoern, meinheld, and fastwsgi (the latter pulling in 214k requests per second) are of course the compiled servers that are written in C so they can pull in these numbers. When these servers are used with a framework such as flask, the framework – which is written in Python – becomes the bottleneck. Although having said that, 62k requests per second by falcon and bjoern is still very high.
The ASGI framework aiohttp performed really well compared to the WSGI-based falcon and flask when paired with gunicorn’s default sync worker (9.1k, 6.5k, and 4.2k respectively). Although the WSGI-based servers pull ahead when paired with optimized servers such as bjoern.
The slowest framework in this test was cherrypy, which handled less than 3k requests per second regardless of the server used.
JSON serialization
This test returns a 5.5KB JSON string with the content type “application/json”.
Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.
Server | Avg ms | Stdev ms | Max ms | Req/Sec | Total Requests |
falcon | 313.75 | 1,400.00 | 13,800.00 | 3,899 | 78,077 |
falcon-gunicorn | 4.16 | 0.60 | 19.91 | 4,799 | 96,091 |
falcon-uwsgi | 3.02 | 0.31 | 12.32 | 6,528 | 130,908 |
falcon-bjoern | 1.27 | 0.21 | 8.20 | 15,756 | 315,492 |
falcon-bjoern-nuitka | 1.28 | 0.22 | 9.09 | 15,618 | 312,426 |
falcon-bjoern-pypy | 2.07 | 3.74 | 40.97 | 16,466 | 329,427 |
falcon-uvicorn | 4.69 | 0.33 | 9.04 | 4,263 | 85,360 |
falcon-uvicorn-uvloop | 3.55 | 0.47 | 17.48 | 5,639 | 112,984 |
cherrypy | 15.25 | 15.17 | 122.74 | 1,604 | 32,134 |
cherrypy-uwsgi | 11.52 | 0.80 | 22.17 | 1,728 | 34,609 |
cherrypy-tornado | 13.58 | 0.93 | 23.05 | 1,471 | 29,476 |
cherrypy-twisted | 14.52 | 2.64 | 74.54 | 1,377 | 27,595 |
flask | 9.49 | 0.68 | 22.27 | 2,102 | 42,082 |
flask-fastwsgi | 3.39 | 0.43 | 12.06 | 5,902 | 118,178 |
flask-gunicorn-eventlet | 17.50 | 24.33 | 188.12 | 3,100 | 62,061 |
flask-gunicorn-gevent | 18.85 | 25.62 | 178.69 | 3,125 | 62,578 |
flask-gunicorn-gthread | 7.50 | 0.63 | 14.66 | 2,663 | 53,347 |
flask-gunicorn-meinheld | 3.43 | 0.81 | 15.26 | 5,834 | 116,763 |
flask-gunicorn-tornado | 8.00 | 0.74 | 17.59 | 2,499 | 50,070 |
flask-gunicorn | 6.48 | 0.53 | 17.17 | 3,074 | 61,556 |
flask-meinheld | 3.73 | 0.35 | 15.14 | 5,335 | 106,797 |
flask-bjoern | 3.08 | 0.38 | 21.98 | 6,498 | 130,064 |
flask-bjoern-nuitka | 2.86 | 0.35 | 20.94 | 6,997 | 140,071 |
fastwsgi | 1.41 | 0.33 | 10.57 | 14,251 | 285,268 |
bottle | 307.21 | 1,500.00 | 14,000.00 | 4,085 | 81,826 |
bjoern | 1.13 | 0.22 | 8.93 | 17,803 | 356,312 |
bjoern-pypy | 1.29 | 1.38 | 31.92 | 18,514 | 370,420 |
aiohttp | 3.36 | 0.40 | 14.77 | 5,964 | 119,389 |
aiohttp-uvloop | 2.39 | 0.35 | 10.88 | 8,389 | 167,978 |
aiohttp-gunicorn | 2.94 | 0.25 | 10.07 | 6,790 | 135,875 |
aiohttp-gunicorn-uvloop | 2.01 | 0.24 | 8.00 | 9,952 | 199,244 |
hug | 323.22 | 1,500.00 | 14,100.00 | 3,193 | 63,935 |
meinheld | 1.58 | 0.20 | 10.55 | 12,537 | 250,936 |
muffin-uvicorn | 5.76 | 0.56 | 10.13 | 3,473 | 69,504 |
netius | 5.23 | 0.46 | 10.65 | 3,802 | 76,134 |
pycnic-gunicorn | 4.76 | 0.51 | 15.72 | 4,188 | 83,847 |
tornado | 5.06 | 0.42 | 13.90 | 3,954 | 79,164 |
Like in the previous simple test, it is no surprise that most servers perform quite well in this JSON serialization test. Most servers are handling a respectable 2-6k requests per second. Although the JSON is small, all operations are performed in memory since the JSON is preloaded from a file on server start-up. The serialization of the JSON object is a CPU-intensive operation that happens in the Python code, therefore the optimization gains seen in the previous test from the servers written in C (bjoern, fastwsgi, and meinheld) begin to become less exaggerated.
The winner, although not by much is the bjoern server running with pypy which would have optimized the Python code to run more efficiently and was able to handle 18.5k requests per second. The slowest performing framework in this test was cherrypy which handled less than 2k requests per second regardless of the server type.
Simulated CPU Bound
This test returns a “Hello World” string with a content type “text/plain” where possible but also runs a loop that hogs the CPU for around 90-100ms before returning.
Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.
Server | Avg ms | Stdev ms | Max ms | Req/Sec | Total Requests |
falcon | 897.93 | 1,600.00 | 14,500.00 | 11 | 220 |
falcon-gunicorn | 1,600.00 | 285.83 | 1,800.00 | 11 | 226 |
falcon-uwsgi | 1,600.00 | 269.28 | 1,700.00 | 12 | 233 |
falcon-bjoern | 1,700.00 | 867.70 | 7,000.00 | 11 | 223 |
falcon-bjoern-nuitka | 1,700.00 | 868.65 | 7,000.00 | 11 | 223 |
falcon-bjoern-pypy | 50.62 | 5.19 | 222.49 | 395 | 7,924 |
falcon-uvicorn | 1,600.00 | 273.51 | 1,800.00 | 11 | 228 |
falcon-uvicorn-uvloop | 1,700.00 | 253.09 | 2,900.00 | 11 | 218 |
cherrypy | 1,800.00 | 268.35 | 2,100.00 | 10 | 209 |
cherrypy-uwsgi | 1,600.00 | 274.33 | 1,700.00 | 11 | 230 |
cherrypy-tornado | 1,800.00 | 51.31 | 1,900.00 | 11 | 220 |
cherrypy-twisted | 1,900.00 | 409.84 | 3,200.00 | 10 | 198 |
flask | 1,700.00 | 250.09 | 2,000.00 | 11 | 222 |
flask-fastwsgi | 1,600.00 | 445.82 | 3,300.00 | 11 | 228 |
flask-gunicorn-eventlet | 1,500.00 | 1,100.00 | 4,700.00 | 11 | 227 |
flask-gunicorn-gevent | 1,600.00 | 1,000.00 | 4,200.00 | 11 | 227 |
flask-gunicorn-gthread | 1,600.00 | 281.04 | 1,700.00 | 11 | 227 |
flask-gunicorn-meinheld | 1,500.00 | 618.52 | 3,100.00 | 11 | 227 |
flask-gunicorn-tornado | 1,700.00 | 16.39 | 1,700.00 | 11 | 220 |
flask-gunicorn | 1,700.00 | 295.80 | 1,900.00 | 11 | 220 |
flask-meinheld | 1,600.00 | 274.71 | 1,700.00 | 11 | 228 |
flask-bjoern | 1,800.00 | 912.25 | 7,400.00 | 11 | 217 |
flask-bjoern-nuitka | 1,000.00 | 481.17 | 4,800.00 | 20 | 395 |
fastwsgi | 1,600.00 | 429.57 | 3,300.00 | 11 | 229 |
bottle | 865.07 | 1,600.00 | 14,500.00 | 12 | 233 |
bjoern | 1,800.00 | 882.96 | 7,100.00 | 11 | 223 |
bjoern-pypy | 19.33 | 1.97 | 106.06 | 1,035 | 20,744 |
aiohttp | 1,700.00 | 292.94 | 1,800.00 | 11 | 220 |
aiohttp-uvloop | 1,700.00 | 161.27 | 1,800.00 | 11 | 222 |
aiohttp-gunicorn | 1,700.00 | 292.77 | 1,800.00 | 11 | 222 |
aiohttp-gunicorn-uvloop | 1,700.00 | 242.12 | 2,200.00 | 11 | 220 |
hug | 1,200.00 | 2,000.00 | 14,100.00 | 8 | 156 |
meinheld | 1,800.00 | 317.47 | 1,900.00 | 10 | 208 |
muffin-uvicorn | 1,700.00 | 290.47 | 1,900.00 | 11 | 220 |
netius | 1,700.00 | 156.77 | 1,900.00 | 11 | 222 |
pycnic-gunicorn | 1,600.00 | 278.18 | 1,800.00 | 11 | 230 |
tornado | 1,600.00 | 276.69 | 1,800.00 | 11 | 230 |
For this test, perhaps unsurprisingly, most servers performed almost identically since the majority of CPU time is spent in the artificial loop for around 90-100ms for each request. This leads to most servers hovering around the 10-11 request per second mark here.
The exception is when using pypy to run the server. Falcon running with bjoern and pypy ran around 36 times faster than most servers, and the bjoern with pypy ran around 94 times faster than most servers, topping 1000 requests per second.
In the real world, your mileage may vary with pypy depending on your application, but I wanted to highlight that no matter the framework when you have a CPU-bound server request, most framework and server combinations will have very similar results (no magic here I’m afraid).
Simulated IO Bound
This test returns a “Hello World” string with a content type “text/plain” where possible but also sleeps the thread for 100ms before returning (note that the sleep method is not guaranteed to wake the thread up after exactly 100ms).
Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.
Server | Avg ms | Stdev ms | Max ms | Req/Sec | Total Requests |
falcon | 1,100.00 | 1,800.00 | 13,900.00 | 7 | 135 |
falcon-gunicorn | 9,900.00 | 5,700.00 | 19,900.00 | 10 | 196 |
falcon-uwsgi | 7,600.00 | 3,300.00 | 10,300.00 | 10 | 197 |
falcon-bjoern | 4,200.00 | 2,300.00 | 19,000.00 | 9 | 183 |
falcon-bjoern-nuitka | 4,300.00 | 2,400.00 | 19,200.00 | 9 | 185 |
falcon-bjoern-pypy | 4,300.00 | 2,400.00 | 19,300.00 | 9 | 185 |
falcon-bjoern-fork | 222.41 | 255.88 | 2,600.00 | 1,791 | 35,992 |
falcon-uvicorn | 114.16 | 5.32 | 134.19 | 2,603 | 52,267 |
falcon-uvicorn-uvloop | 113.81 | 10.04 | 269.77 | 2,611 | 52,463 |
cherrypy | 2,100.00 | 1,700.00 | 16,200.00 | 95 | 1,917 |
cherrypy-uwsgi | 7,700.00 | 3,400.00 | 10,500.00 | 10 | 194 |
cherrypy-tornado | 18,500.00 | 0.84 | 18,500.00 | 1 | 30 |
cherrypy-twisted | 1,400.00 | 292.01 | 2,000.00 | 192 | 3,848 |
flask | 129.03 | 18.06 | 363.61 | 1,448 | 29,030 |
flask-fastwsgi | 8,300.00 | 6,200.00 | 19,800.00 | 10 | 196 |
flask-gunicorn-eventlet | 101.94 | 2.23 | 143.24 | 2,919 | 58,587 |
flask-gunicorn-gevent | 101.68 | 1.90 | 136.13 | 2,921 | 58,674 |
flask-gunicorn-gthread | 9,900.00 | 5,700.00 | 19,900.00 | 10 | 195 |
flask-gunicorn-meinheld | 5,400.00 | 4,800.00 | 17,900.00 | 10 | 197 |
flask-gunicorn-tornado | N/A | N/A | N/A | N/A | N/A |
flask-gunicorn | 9,900.00 | 5,700.00 | 19,900.00 | 10 | 195 |
flask-meinheld | 10,000.00 | 5,700.00 | 19,900.00 | 10 | 197 |
flask-bjoern | 4,000.00 | 2,000.00 | 18,200.00 | 9 | 178 |
flask-bjoern-nuitka | 4,300.00 | 2,500.00 | 19,400.00 | 9 | 185 |
fastwsgi | 9,200.00 | 6,300.00 | 19,800.00 | 10 | 198 |
bottle | 815.78 | 1,100.00 | 14,900.00 | 10 | 196 |
bjoern | 4,200.00 | 2,200.00 | 18,500.00 | 9 | 184 |
bjoern-pypy | 3,800.00 | 2,300.00 | 19,700.00 | 9 | 177 |
aiohttp | 127.75 | 8.99 | 151.92 | 2,316 | 46,508 |
aiohttp-uvloop | 122.31 | 9.85 | 255.23 | 2,434 | 48,920 |
aiohttp-gunicorn | 120.12 | 8.67 | 141.70 | 2,476 | 49,752 |
aiohttp-gunicorn-uvloop | 111.93 | 6.03 | 135.04 | 2,656 | 53,320 |
hug | 932.48 | 1,400.00 | 14,800.00 | 10 | 194 |
meinheld | 9,900.00 | 5,700.00 | 19,800.00 | 10 | 198 |
muffin-uvicorn | 112.23 | 5.09 | 140.91 | 2,647 | 53,190 |
netius | 5,100.00 | 2,000.00 | 8,100.00 | 4 | 81 |
pycnic-gunicorn | 9,900.00 | 5,700.00 | 19,900.00 | 10 | 197 |
tornado | 9,700.00 | 5,900.00 | 19,900.00 | 10 | 197 |
This test is designed to simulate a server request that does not do any local processing on the server itself but makes requests to other servers or resources instead as a middleman. In real-world scenarios, this intermediate server will usually run some lightweight logic to do things such as figure out where to fetch data from, aggregate data once fetched, or any other “stitching” logic required.
This test category is where ASGI frameworks shine since each request can be handled independently of each other, whereas WSGI applications will typically block the request thread while waiting for “external” resources to respond (in this test, the main thread is blocked for the entire 100ms of the mock resource fetch).
The winner in this test was flask with gunicorn and either of gevent and eventlet with 2.9k requests per second. falcon with uvicorn (with or without uvloop) handling 2.6k requests per second is a close runner up. aiohttp with uvloop, gunicorn (with and without uvloop) handled 2.3-2.6k requests per second. muffin with uvicorn handled 2.6k requests.
Using bjoern‘s forking spawns more “workers” to handle more requests simultaneously, but this does not appear as optimized as the ASGI servers.
The slowest servers were the WSGI servers as expected since they are blocked in between handling requests.
Todo
- Add more frameworks and servers, and combinations of frameworks and servers.
- Publish source code
- Explore more parameters to play with such as nuitka compilation options and pypy
- Publish charts for visual comparison
Changelog
- 11/10/2023 – Initial post.