Python Webserver Framework Performance Benchmark

To find out which Python webserver frameworks perform the best, I’m going to benchmark several of them. I will be measuring metrics such as requests per second and latency under different circumstances do that under different situations.

For updates and roadmap see the bottom of this page.

Benchmarking Methodology

Hardware setup

Component	Specification
CPU	Intel Core i7 12700H
Memory Max Frequency	3200.0 MHz

Test-specific components (e.g. CPU/memory) are reported on each test. Most tests use 100% of a single vCPU. This ensures there is plenty of resources available to the application server. This is also fair if all servers are given the same resources during each test. Do note, however, that your results may be very different since in real scenarios the results will be much higher since most deployments don’t use only 100% of a vCPU. The goal of these benchmarks is to get an idea of the performance level and compare results between different frameworks given a level playing field.

Restricting the resources also means restricting the throughput that the framework is able to run through. The upside to this is that the testing tool won’t require massive amounts of resources, and therefore is not resource-restricted and is free to use as many resources as it wants in order not to be the bottleneck. All requests are issued from the same machine hosting the server.

Testing Environment and Tools

All tests are performed inside docker to ensure consistency. You can check the docker environment used by each test by viewing its source code.

Metrics are measured using the wrk tool. You can check the details of the configuration used for each test in the test’s docker-compose in the source code (coming soon). Otherwise, the test would specify any configurations used.

Tests

Simple “Hello World” / Echo application

This simple test runs the application with as little work as possible by simply returning a “Hello World” string with a content type “text/plain” where possible.

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

Server	Avg ms	Stdev ms	Max ms	Req/Sec	Total Requests
falcon	116.11	572.14	6,900.00	4,817	96,434
falcon-gunicorn	3.04	0.43	12.77	6,551	131,154
falcon-uwsgi	1.33	0.33	9.23	13,711	274,340
falcon-bjoern	0.33	0.22	8.55	62,733	1,254,669
falcon-bjoern-nuitka	0.32	0.17	7.07	62,813	1,256,254
falcon-bjoern-pypy	9.30	25.78	153.86	33,665	673,989
falcon-uvicorn	3.58	0.30	12.46	5,583	111,766
falcon-uvicorn-uvloop	2.48	0.33	10.48	8,066	161,511
cherrypy	12.21	14.32	213.46	2,537	50,799
cherrypy-uwsgi	6.61	0.50	16.99	2,997	60,023
cherrypy-tornado	9.19	0.80	19.39	2,174	43,555
cherrypy-twisted	9.99	2.09	54.90	2,003	40,112
flask	7.50	0.54	23.59	2,658	53,220
flask-fastwsgi	1.95	0.29	11.62	10,301	206,186
flask-gunicorn-eventlet	28.09	45.32	398.87	4,552	91,154
flask-gunicorn-gevent	27.99	42.84	301.40	4,502	90,124
flask-gunicorn-gthread	5.55	0.58	15.80	3,600	72,072
flask-gunicorn-meinheld	1.99	0.46	9.73	10,062	201,365
flask-gunicorn-tornado	6.32	0.68	16.84	3,164	63,347
flask-gunicorn	4.66	0.50	16.73	4,279	85,654
flask-meinheld	2.17	0.24	11.12	9,142	182,964
flask-bjoern	1.63	0.27	9.85	12,312	246,376
flask-bjoern-nuitka	1.31	0.27	9.23	15,275	305,634
fastwsgi	0.11	0.20	8.99	214,112	4,303,606
bottle	64.17	332.82	3,500.00	5,319	106,470
bjoern	0.12	0.20	7.68	178,353	3,567,317
bjoern-pypy	894.35	1,800.00	7,400.00	14,181	283,954
aiohttp	2.18	0.21	10.07	9,172	183,588
aiohttp-uvloop	1.29	0.24	8.99	15,419	309,930
aiohttp-gunicorn	1.80	0.23	10.40	11,131	222,769
aiohttp-gunicorn-uvloop	0.93	0.23	10.89	21,519	430,447
hug	369.86	1,500.00	14,000.00	4,246	85,017
meinheld	0.46	0.62	38.95	44,706	895,005
muffin-uvicorn	4.88	0.47	16.47	4,100	82,078
netius	2.08	0.20	7.88	9,598	192,114
pycnic-gunicorn	3.57	0.43	14.91	5,580	111,709
tornado	3.96	0.49	15.04	5,055	101,201

There’s no surprise that almost all servers perform well in this test given that this is simply a max throughput test without any real application work. Most hover in the 3-6k request per second range. The ridiculously fast servers such as bjoern, meinheld, and fastwsgi (the latter pulling in 214k requests per second) are of course the compiled servers that are written in C so they can pull in these numbers. When these servers are used with a framework such as flask, the framework – which is written in Python – becomes the bottleneck. Although having said that, 62k requests per second by falcon and bjoern is still very high.

The ASGI framework aiohttp performed really well compared to the WSGI-based falcon and flask when paired with gunicorn’s default sync worker (9.1k, 6.5k, and 4.2k respectively). Although the WSGI-based servers pull ahead when paired with optimized servers such as bjoern.

The slowest framework in this test was cherrypy, which handled less than 3k requests per second regardless of the server used.

JSON serialization

This test returns a 5.5KB JSON string with the content type “application/json”.

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

Server	Avg ms	Stdev ms	Max ms	Req/Sec	Total Requests
falcon	313.75	1,400.00	13,800.00	3,899	78,077
falcon-gunicorn	4.16	0.60	19.91	4,799	96,091
falcon-uwsgi	3.02	0.31	12.32	6,528	130,908
falcon-bjoern	1.27	0.21	8.20	15,756	315,492
falcon-bjoern-nuitka	1.28	0.22	9.09	15,618	312,426
falcon-bjoern-pypy	2.07	3.74	40.97	16,466	329,427
falcon-uvicorn	4.69	0.33	9.04	4,263	85,360
falcon-uvicorn-uvloop	3.55	0.47	17.48	5,639	112,984
cherrypy	15.25	15.17	122.74	1,604	32,134
cherrypy-uwsgi	11.52	0.80	22.17	1,728	34,609
cherrypy-tornado	13.58	0.93	23.05	1,471	29,476
cherrypy-twisted	14.52	2.64	74.54	1,377	27,595
flask	9.49	0.68	22.27	2,102	42,082
flask-fastwsgi	3.39	0.43	12.06	5,902	118,178
flask-gunicorn-eventlet	17.50	24.33	188.12	3,100	62,061
flask-gunicorn-gevent	18.85	25.62	178.69	3,125	62,578
flask-gunicorn-gthread	7.50	0.63	14.66	2,663	53,347
flask-gunicorn-meinheld	3.43	0.81	15.26	5,834	116,763
flask-gunicorn-tornado	8.00	0.74	17.59	2,499	50,070
flask-gunicorn	6.48	0.53	17.17	3,074	61,556
flask-meinheld	3.73	0.35	15.14	5,335	106,797
flask-bjoern	3.08	0.38	21.98	6,498	130,064
flask-bjoern-nuitka	2.86	0.35	20.94	6,997	140,071
fastwsgi	1.41	0.33	10.57	14,251	285,268
bottle	307.21	1,500.00	14,000.00	4,085	81,826
bjoern	1.13	0.22	8.93	17,803	356,312
bjoern-pypy	1.29	1.38	31.92	18,514	370,420
aiohttp	3.36	0.40	14.77	5,964	119,389
aiohttp-uvloop	2.39	0.35	10.88	8,389	167,978
aiohttp-gunicorn	2.94	0.25	10.07	6,790	135,875
aiohttp-gunicorn-uvloop	2.01	0.24	8.00	9,952	199,244
hug	323.22	1,500.00	14,100.00	3,193	63,935
meinheld	1.58	0.20	10.55	12,537	250,936
muffin-uvicorn	5.76	0.56	10.13	3,473	69,504
netius	5.23	0.46	10.65	3,802	76,134
pycnic-gunicorn	4.76	0.51	15.72	4,188	83,847
tornado	5.06	0.42	13.90	3,954	79,164

Like in the previous simple test, it is no surprise that most servers perform quite well in this JSON serialization test. Most servers are handling a respectable 2-6k requests per second. Although the JSON is small, all operations are performed in memory since the JSON is preloaded from a file on server start-up. The serialization of the JSON object is a CPU-intensive operation that happens in the Python code, therefore the optimization gains seen in the previous test from the servers written in C (bjoern, fastwsgi, and meinheld) begin to become less exaggerated.

The winner, although not by much is the bjoern server running with pypy which would have optimized the Python code to run more efficiently and was able to handle 18.5k requests per second. The slowest performing framework in this test was cherrypy which handled less than 2k requests per second regardless of the server type.

Simulated CPU Bound

This test returns a “Hello World” string with a content type “text/plain” where possible but also runs a loop that hogs the CPU for around 90-100ms before returning.

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

Server	Avg ms	Stdev ms	Max ms	Req/Sec	Total Requests
falcon	897.93	1,600.00	14,500.00	11	220
falcon-gunicorn	1,600.00	285.83	1,800.00	11	226
falcon-uwsgi	1,600.00	269.28	1,700.00	12	233
falcon-bjoern	1,700.00	867.70	7,000.00	11	223
falcon-bjoern-nuitka	1,700.00	868.65	7,000.00	11	223
falcon-bjoern-pypy	50.62	5.19	222.49	395	7,924
falcon-uvicorn	1,600.00	273.51	1,800.00	11	228
falcon-uvicorn-uvloop	1,700.00	253.09	2,900.00	11	218
cherrypy	1,800.00	268.35	2,100.00	10	209
cherrypy-uwsgi	1,600.00	274.33	1,700.00	11	230
cherrypy-tornado	1,800.00	51.31	1,900.00	11	220
cherrypy-twisted	1,900.00	409.84	3,200.00	10	198
flask	1,700.00	250.09	2,000.00	11	222
flask-fastwsgi	1,600.00	445.82	3,300.00	11	228
flask-gunicorn-eventlet	1,500.00	1,100.00	4,700.00	11	227
flask-gunicorn-gevent	1,600.00	1,000.00	4,200.00	11	227
flask-gunicorn-gthread	1,600.00	281.04	1,700.00	11	227
flask-gunicorn-meinheld	1,500.00	618.52	3,100.00	11	227
flask-gunicorn-tornado	1,700.00	16.39	1,700.00	11	220
flask-gunicorn	1,700.00	295.80	1,900.00	11	220
flask-meinheld	1,600.00	274.71	1,700.00	11	228
flask-bjoern	1,800.00	912.25	7,400.00	11	217
flask-bjoern-nuitka	1,000.00	481.17	4,800.00	20	395
fastwsgi	1,600.00	429.57	3,300.00	11	229
bottle	865.07	1,600.00	14,500.00	12	233
bjoern	1,800.00	882.96	7,100.00	11	223
bjoern-pypy	19.33	1.97	106.06	1,035	20,744
aiohttp	1,700.00	292.94	1,800.00	11	220
aiohttp-uvloop	1,700.00	161.27	1,800.00	11	222
aiohttp-gunicorn	1,700.00	292.77	1,800.00	11	222
aiohttp-gunicorn-uvloop	1,700.00	242.12	2,200.00	11	220
hug	1,200.00	2,000.00	14,100.00	8	156
meinheld	1,800.00	317.47	1,900.00	10	208
muffin-uvicorn	1,700.00	290.47	1,900.00	11	220
netius	1,700.00	156.77	1,900.00	11	222
pycnic-gunicorn	1,600.00	278.18	1,800.00	11	230
tornado	1,600.00	276.69	1,800.00	11	230

For this test, perhaps unsurprisingly, most servers performed almost identically since the majority of CPU time is spent in the artificial loop for around 90-100ms for each request. This leads to most servers hovering around the 10-11 request per second mark here.

The exception is when using pypy to run the server. Falcon running with bjoern and pypy ran around 36 times faster than most servers, and the bjoern with pypy ran around 94 times faster than most servers, topping 1000 requests per second.

In the real world, your mileage may vary with pypy depending on your application, but I wanted to highlight that no matter the framework when you have a CPU-bound server request, most framework and server combinations will have very similar results (no magic here I’m afraid).

Simulated IO Bound

This test returns a “Hello World” string with a content type “text/plain” where possible but also sleeps the thread for 100ms before returning (note that the sleep method is not guaranteed to wake the thread up after exactly 100ms).

Each test is run using 100% of a vCPU and 512MB memory allocated to the test container. Tests are run sequentially and each test is run for 20 seconds.

Server	Avg ms	Stdev ms	Max ms	Req/Sec	Total Requests
falcon	1,100.00	1,800.00	13,900.00	7	135
falcon-gunicorn	9,900.00	5,700.00	19,900.00	10	196
falcon-uwsgi	7,600.00	3,300.00	10,300.00	10	197
falcon-bjoern	4,200.00	2,300.00	19,000.00	9	183
falcon-bjoern-nuitka	4,300.00	2,400.00	19,200.00	9	185
falcon-bjoern-pypy	4,300.00	2,400.00	19,300.00	9	185
falcon-bjoern-fork	222.41	255.88	2,600.00	1,791	35,992
falcon-uvicorn	114.16	5.32	134.19	2,603	52,267
falcon-uvicorn-uvloop	113.81	10.04	269.77	2,611	52,463
cherrypy	2,100.00	1,700.00	16,200.00	95	1,917
cherrypy-uwsgi	7,700.00	3,400.00	10,500.00	10	194
cherrypy-tornado	18,500.00	0.84	18,500.00	1	30
cherrypy-twisted	1,400.00	292.01	2,000.00	192	3,848
flask	129.03	18.06	363.61	1,448	29,030
flask-fastwsgi	8,300.00	6,200.00	19,800.00	10	196
flask-gunicorn-eventlet	101.94	2.23	143.24	2,919	58,587
flask-gunicorn-gevent	101.68	1.90	136.13	2,921	58,674
flask-gunicorn-gthread	9,900.00	5,700.00	19,900.00	10	195
flask-gunicorn-meinheld	5,400.00	4,800.00	17,900.00	10	197
flask-gunicorn-tornado	N/A	N/A	N/A	N/A	N/A
flask-gunicorn	9,900.00	5,700.00	19,900.00	10	195
flask-meinheld	10,000.00	5,700.00	19,900.00	10	197
flask-bjoern	4,000.00	2,000.00	18,200.00	9	178
flask-bjoern-nuitka	4,300.00	2,500.00	19,400.00	9	185
fastwsgi	9,200.00	6,300.00	19,800.00	10	198
bottle	815.78	1,100.00	14,900.00	10	196
bjoern	4,200.00	2,200.00	18,500.00	9	184
bjoern-pypy	3,800.00	2,300.00	19,700.00	9	177
aiohttp	127.75	8.99	151.92	2,316	46,508
aiohttp-uvloop	122.31	9.85	255.23	2,434	48,920
aiohttp-gunicorn	120.12	8.67	141.70	2,476	49,752
aiohttp-gunicorn-uvloop	111.93	6.03	135.04	2,656	53,320
hug	932.48	1,400.00	14,800.00	10	194
meinheld	9,900.00	5,700.00	19,800.00	10	198
muffin-uvicorn	112.23	5.09	140.91	2,647	53,190
netius	5,100.00	2,000.00	8,100.00	4	81
pycnic-gunicorn	9,900.00	5,700.00	19,900.00	10	197
tornado	9,700.00	5,900.00	19,900.00	10	197

This test is designed to simulate a server request that does not do any local processing on the server itself but makes requests to other servers or resources instead as a middleman. In real-world scenarios, this intermediate server will usually run some lightweight logic to do things such as figure out where to fetch data from, aggregate data once fetched, or any other “stitching” logic required.

This test category is where ASGI frameworks shine since each request can be handled independently of each other, whereas WSGI applications will typically block the request thread while waiting for “external” resources to respond (in this test, the main thread is blocked for the entire 100ms of the mock resource fetch).

The winner in this test was flask with gunicorn and either of gevent and eventlet with 2.9k requests per second. falcon with uvicorn (with or without uvloop) handling 2.6k requests per second is a close runner up. aiohttp with uvloop, gunicorn (with and without uvloop) handled 2.3-2.6k requests per second. muffin with uvicorn handled 2.6k requests.

Using bjoern‘s forking spawns more “workers” to handle more requests simultaneously, but this does not appear as optimized as the ASGI servers.

The slowest servers were the WSGI servers as expected since they are blocked in between handling requests.

Todo

Add more frameworks and servers, and combinations of frameworks and servers.
Publish source code
Explore more parameters to play with such as nuitka compilation options and pypy

Publish charts for visual comparison

Changelog

11/10/2023 – Initial post.

Benchmarking Methodology

Hardware setup

Testing Environment and Tools

Tests

Simple “Hello World” / Echo application

JSON serialization

Simulated CPU Bound

Simulated IO Bound

Todo

Changelog

Related posts: