note
This article was last updated on October 25, 2023, 1 year ago. The content may be out of date.
Caddy has supported http3 for a long time. First it’s an experimental feature, then in v2.6.0 it is on by default. There were some bugs, but I believe with the latest commits, most of them are gone.
Problems
Socket Reuse
To improve efficiency, caddy tries to reuse sockets between configuration reloads. For Windows systems caddy has to wrap the underlying socket and set deadlines to terminate old use of the socket. For unix systems caddy applies the SO_REUSEPORT
parameter when creating the socket. For counting the number of uses of the socket, caddy also wraps these sockets.
For a long time, caddy has a buggy reuse implementation for packet type sockets. It will never terminate old uses of the socket. The problem is discovered in caddy-l4.
Performance Optimization
quic-go has a number of optimizations for the net.PacketConn
it uses to create the http3.QUICEarlyListener
. It supports *net.UDPConn
and *net.UnixConn
. Since caddy wraps them to reuse the sockets or for statics purposes, we have to implement some of the methods required to gain these optimizations. It became unwieldy after the introduction of GSO. Not only does the packet connection need to implement SetReadBuffer
, SetWriteBuffer
, SyscallConn
method, it also needs to implement net.Conn
interface because the library quic-go uses lazily type asserts it to one even none of the related methods are required. It is very messy.
ListenQUIC
For some reason, caddy needs a net.PacketConn
when creating the http3.QUICEarlyListener
. The problem manifests when the socket reuse bug is fixed. Now all packet connections will correctly terminate when the configuration is reloaded. Because http3.QUICEarlyListener
still refers to the already “closed” net.PacketConn
, it will fail to work.
Even if we managed to replace the underlying socket used by http3.QUICEarlyListener
, unless it’s on Windows platform, socket optimizations will not be redone on the new socket. Tested when specifying SO_REUSEPORT
and then changing read and write buffer sizes on a linux system.
info
When closed, http3.QUICEarlyListener
just stops reading from and writing to the socket, the socket is left as is.
Solution
Instead of separating net.PacketConn
from http3.QUICEarlyListener
, these two belong together. http3.QUICEarlyListener
depends on net.PacketConn
to serve http3. Since quic-go won’t close the underlying socket, we’ll need to do it ourselves.
info
quic-go has a very good reason why net.PacketConn
is not closed when shutting down the server. Standard library http server uses tcp. Tcp listening sockets will create new sockets whenever Accept
succeeds. Closing these sockets won’t affect the listening socket and vice versa.
quic-go uses datagram sockets. This type of socket doesn’t have the notion of Accept
, and all the message exchanges use this socket. quic-go is responsible for maintaining the state machine and distributing messages to relevant handlers.
There is another problem. Caddy wraps the net.PacketConn
. This type can’t be optimized by quic-go directly. Drawing inspiration from standard library’s handling of http.ResponseWriter
interface discovery, we can implement the Unwrap
interface on the wrapped socket to return the underlying socket, and then it can be optimized.
info
Caddy’s network listening functions allow the easy reuse of the underlying socket. They can also be configured to change the unix socket permissions.
When creating a new http3.QUICEarlyListener
, caddy goes through the following steps:
info
http3.QUICEarlyListener
is an interface that can be used to serve http3, quic-go.EarlyListener
is a concrete type provided by quic-go. Caddy wraps this type to reuse it.
We have to introduce a new method though. Since ListenQUIC
is marked as experimental, there is no guarantee that this method will work across caddy versions. Deprecating the method is acceptable.
note
When caddy http3 was fixed in c9b5e7f, it didn’t close the net.PacketConn
when the quic listener is destroyed. It didn’t expose the underlying socket for performance optimization either.