note
This article was last updated on July 3, 2023, 2 years ago. The content may be out of date.
This series of posts details how to proxy YouTube videos. Unlike using tools to download YouTube videos, we’re proxying them so there is no need for a large storage system. This can be used to avoid ads and tracking and for other media streaming purpose.
First we’ll learn how to extract YouTube video urls and write a simple dynamic reverse proxy.
Extracting YouTube Video Urls
If we want to proxy YouTube assets, first we’ll need to know the urls of the video assets. We can use yt-dlp to retrieve urls:
yt-dlp ${YouTube url} --print urls
which will usually print 2 urls, one for video, one for audio.
Embedding yt-dlp as a Service
We can use create a python subprocess and parse its stdout, or we can embed yt-dlp as a python service so that we can call yt-dlp over network connections. Below is an example yt-dlp server that uses asyncio and unix sockets:
|
|
It’s mostly adapted from python official documentation and yt-dlp example, adding exception handling and using unix socket instead of tcp. There are some highlights that are worth mentioning:
tip
- First highlight shows how to use yt-dlp as a zipimport library.
- Second highlight gives an example of turning a synchronous call to an asynchronous one.
Calling yt-dlp
Here is another example using python to call yt-dlp service:
|
|
We build the arguments as a python dictionary in highlighted area. We must supply at least the url of the video as an argument. The list of available options is here.
The result is usually json, and we can parse it to get the video urls we want.
Dynamic Reverse Proxy
Now that we can know how to extract video asset urls, we need to save them. Because these asset urls expire after 6 hours, we use redis to save these. Just create a random string as key and save the corresponding url as its value.
Next we implement a dynamic reverse proxy using this random string key to find its asset url.
func proxy(writer http.ResponseWriter, request *http.Request) {
val, err := rdb.Get(request.URL.Path).Result()
if err != nil {
http.Error(writer, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError)
return
}
var req *http.Request
req, err = http.NewRequestWithContext(request.Context(), request.Method, val, request.Body)
if err != nil {
http.Error(writer, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError)
return
}
copyHeader(req.Header, request.Header)
var resp *http.Response
resp, err = http.DefaultTransport.Do(req)
if err != nil {
http.Error(writer, http.StatusText(http.StatusBadGateway), http.StatusBadGateway)
return
}
copyHeader(writer.Header(), resp.Header)
writer.WriteHeader(resp.StatusCode)
_, _ = io.Copy(writer, resp.Body)
_ = resp.Body.Close()
}
This proxy assumes the random key is used in the request path.
To be continued
This post only shows how to embed yt-dlp as a service and implement a dynamic reverse proxy. Next part will deal with the video streaming format YouTube is using.