How to Follow Redirect in Python: A Deep Dive

Let’s dive into something super important in web stuff – redirects. They’re like signposts on the internet highway, guiding users to new destinations. Whether it’s for logging in, fixing expired links, or just making sure folks have a smooth online ride, redirects are key.

Now, if you’re into web scraping, APIs, or sending and receiving data through the web, you’ve gotta know how to handle redirects like a pro. So, in this blog post, we’re going to explore how to follow redirects using Python. We’ll chat about different libraries and techniques to make sure you’re a redirect master.

The Requests Library

The requests library is a popular choice for handling HTTP requests in Python. It provides a simple and intuitive way to follow redirects automatically. By default, requests automatically follows redirects up to a maximum limit of 30. Here’s an example of how to use requests to follow redirects:

import requests

response = requests.get('https://deploymastery.com')
print(response.url)  # Final URL after following redirects
print(response.status_code)  # Status code of the final response
print(response.history)  # List of intermediate responses (if any)

The response.url attribute holds the final URL after following all the redirects, while response.history contains a list of intermediate responses encountered during the redirection process.

Disabling Redirects in the Requests Library

In the requests library, you can disable automatic redirect following by setting the allow_redirects parameter to False in your request. Here’s an example:

import requests

response = requests.get('https://deploymastery.com', allow_redirects=False)
status_code = response.status_code  # Status code of the initial response
headers = response.headers  # Headers of the initial response

By setting allow_redirects to False, the response object will hold the initial response without following any redirects. You can access the status code, headers, and content of the initial response as needed.

The urllib Library

The urllib library is part of Python’s standard library and provides several modules for working with URLs. The urllib.request module allows us to handle HTTP requests and follow redirects. Here’s an example of using urllib.request:

import urllib.request

response = urllib.request.urlopen('https://deploymastery.com')
final_url = response.geturl()  # Final URL after following redirects
status_code = response.getcode()  # Status code of the final response
print(final_url)
print(status_code)

The geturl() method returns the final URL after all the redirects, and getcode() retrieves the status code of the final response.

Disabling Redirects in the urllib Library

In the urllib library, you can disable automatic redirect following by using the urlopen function and passing an additional redirect parameter with a value of False. Here’s an example:

import urllib.request

request = urllib.request.Request('https://deploymastery.com', method='GET', redirect=False)
response = urllib.request.urlopen(request)
status_code = response.getcode()  # Status code of the initial response
headers = response.info()  # Headers of the initial response

By setting redirect to False in the urlopen function, the response object will contain the initial response without following any redirects. You can access the status code, headers, and other information of the initial response as required.

The httplib2 Library

The httplib2 library is another excellent option for handling redirects in Python. It offers advanced features like HTTP caching and authentication. Here’s an example of following redirects using httplib2:

import httplib2

http = httplib2.Http()
response, content = http.request('https://deploymastery.com', method='GET')
final_url = response['content-location']  # Final URL after following redirects
status_code = response.status  # Status code of the final response
print(final_url)
print(status_code)

In this example, we create an Http instance and use the request method to send an HTTP GET request. The response object contains the final URL in the 'content-location' header field, and the status attribute holds the status code of the final response.

Disabling Redirects in the httplib2 Library

In the httplib2 library, by default, redirects are automatically followed. However, you can disable automatic redirect following by using the follow_redirects parameter when creating the Http instance. Here’s an example:

import httplib2

http = httplib2.Http(follow_redirects=False)
response, content = http.request('https://deploymastery.com', method='GET')
status_code = response.status  # Status code of the initial response
headers = response.headers  # Headers of the initial response

By setting follow_redirects to False during the creation of the Http instance, the subsequent requests will not automatically follow redirects. You can then access the status code, headers, and content of the initial response.

The Treq Library

The treq library is built on top of the requests library and provides additional features for handling HTTP requests. It includes support for following redirects with customizable settings. Here’s an example of using treq to follow redirects:

import treq

response = treq.get('https://deploymastery.com', allow_redirects=True)
final_url = response.url  # Final URL after following redirects
status_code = response.status_code  # Status code of the final response
print(final_url)
print(status_code)

By setting the allow_redirects parameter to True, we enable redirect following in treq. The final URL can be obtained from the url attribute of the response object.

Disabling Redirects in the Treq Library

In the treq library, which is built on top of requests, you can prevent automatic redirect following by setting the allow_redirects parameter to False in the request. Here’s an example:

import treq

response = treq.get('https://deploymastery.com', allow_redirects=False)
status_code = response.status_code  # Status code of the initial response
headers = response.headers  # Headers of the initial response
content = response.content()  # Content of the initial response

By setting allow_redirects to False in the get function, treq will not automatically follow redirects. The response object will contain the initial response, and you can access the status code, headers, and content accordingly.