note

This article was last updated on July 3, 2023, 11 months ago. The content may be out of date.

This series of posts details how to proxy YouTube videos. Unlike using tools to download YouTube videos, we’re proxying them so there is no need for a large storage system. This can be used to avoid ads and tracking and for other media streaming purpose.

First we’ll learn how to extract YouTube video urls and write a simple dynamic reverse proxy.

Extracting YouTube Video Urls

If we want to proxy YouTube assets, first we’ll need to know the urls of the video assets. We can use yt-dlp to retrieve urls:

yt-dlp ${YouTube url} --print urls

which will usually print 2 urls, one for video, one for audio.

Embedding yt-dlp as a Service

We can use create a python subprocess and parse its stdout, or we can embed yt-dlp as a python service so that we can call yt-dlp over network connections. Below is an example yt-dlp server that uses asyncio and unix sockets:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import sys

sys.path.append('/path/to/yt-dlp')

from yt_dlp import YoutubeDL
import json, asyncio, traceback


async def handle_ydl(reader, writer):
    try:
        data = json.loads(await reader.readline())
        url = data.pop('url')
        yt = YoutubeDL(data)

        info = await asyncio.to_thread(yt.extract_info, url, download=False)
        reply = json.dumps(yt.sanitize_info(info))
        writer.write(bytes(reply, 'utf-8'))
        await writer.drain()
    except Exception:
        writer.write(bytes(traceback.format_exc(), 'utf-8'))
        await writer.drain()
    finally:
        writer.close()
        await writer.wait_closed()


async def main():
    server = await asyncio.start_unix_server(
        handle_ydl, '/path/to/yt-dlp.socket')

    async with server:
        await server.serve_forever()


asyncio.run(main())

It’s mostly adapted from python official documentation and yt-dlp example, adding exception handling and using unix socket instead of tcp. There are some highlights that are worth mentioning:

tip

  1. First highlight shows how to use yt-dlp as a zipimport library.
  2. Second highlight gives an example of turning a synchronous call to an asynchronous one.

Calling yt-dlp

Here is another example using python to call yt-dlp service:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import sys
import asyncio
import json


async def client(path, args):
    reader, writer = await asyncio.open_unix_connection(path)

    data = json.dumps(args)
    writer.write(bytes(data, 'utf-8'))
    writer.write(b'\n')
    await writer.drain()

    reply = await reader.read()
    print(reply.decode('utf-8'))
    writer.close()
    await writer.wait_closed()


if __name__ == '__main__':
    args = {'url': sys.argv[1]}
    keys = sys.argv[2::2]
    values = sys.argv[3::2]
    args.update({keys[i]: values[i] for i in range(len(keys)) })

    path = '/path/to/yt-dlp.socket'
    asyncio.run(client(path, args))

We build the arguments as a python dictionary in highlighted area. We must supply at least the url of the video as an argument. The list of available options is here.

The result is usually json, and we can parse it to get the video urls we want.

Dynamic Reverse Proxy

Now that we can know how to extract video asset urls, we need to save them. Because these asset urls expire after 6 hours, we use redis to save these. Just create a random string as key and save the corresponding url as its value.

Next we implement a dynamic reverse proxy using this random string key to find its asset url.

func proxy(writer http.ResponseWriter, request *http.Request) {                                                                        
        val, err := rdb.Get(request.URL.Path).Result()
        if err != nil {                                                                                                                
                http.Error(writer, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError)                    
                return                                                                                                                 
        }                                         
                                                                                                                                       
        var req *http.Request                                      
        req, err = http.NewRequestWithContext(request.Context(), request.Method, val, request.Body)
        if err != nil {                                                                                                                
                http.Error(writer, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError)
                return                               
        }                                            
                                                                                                                                       
        copyHeader(req.Header, request.Header)
        var resp *http.Response                                                                                                        
        resp, err = http.DefaultTransport.Do(req)                     
        if err != nil {                                                                                                                
                http.Error(writer, http.StatusText(http.StatusBadGateway), http.StatusBadGateway)                                      
                return                          
        }                                                          
                                                                                         
        copyHeader(writer.Header(), resp.Header)                                                                                          
        writer.WriteHeader(resp.StatusCode)                                                                                            
        _, _ = io.Copy(writer, resp.Body)      
        _ = resp.Body.Close()
}

This proxy assumes the random key is used in the request path.

To be continued

This post only shows how to embed yt-dlp as a service and implement a dynamic reverse proxy. Next part will deal with the video streaming format YouTube is using.