Torrenting has seen better days. It may seem futile to explore this protocol in 2024. It is also highly intriguing, even in 2024. It’s intriguing because of its peer-to-peer nature; it’s intriguing because of its wild development history of competing clients spawning extensions, incompatibilities and innovations; it’s intriguing because of its once widespread adoption among technical and non-technical people alike; and, of course, it’s intriguing because of its decline and the ever-looming “why”.
Today, I want to look at a mysterious entry I found in a torrent file, and why
it exists: the cross_seed_entry
.
The torrent in question is for Rocky Linux. It is perfectly legal and can be downloaded from linuxtracker (or directly from here) in case you want to take a look yourself.
The story starts with me hacking on an (unpublished) torrent client in Zig. When parsing the torrent it errors out:
$ zig-out/bin/zephyr ~/Downloads/rocky.torrent
thread 42922 panic: attempt to use null value
/home/armin/dev/playground/zephyr/src/torrent.zig:32:52: 0x103746a in parse (zephyr)
const info_length = info.Dict.get("length").?;
This is not very suprising. The client is not even pre-alpha and it cannot handle anything but the simplest of torrents. So I decoded the torrent to see what I’m missing 1.
:announce "http://linuxtracker.org:2710/00000000000000000000000000000000/announce"
:announce-list "udp://tracker.opentrackr.org:1337/announce"
"udp://tracker.openbittorrent.com:80/announce"
"http://linuxtracker.org:2710/00000000000000000000000000000000/announce"
:comment "https://docs.rockylinux.org/release_notes/8_10/"
:created\ by "torrenttools/0.6.2"
:creation\ date 1716999716
:info :cross_seed_entry "1e59d71e1dab6af3934a0f14b69929e5"
:files :length 1502 :path "CHECKSUM"
:length 2694053888 :path "Rocky-8.10-x86_64-minimal.iso"
[...]
The reason my torrent parser failed of course is because it can’t even handle
multi-file torrents. But the more interesting bit in here is the
cross_seed_entry
in the info
section. This is not a field found in the
BitTorrent standard. Trying to
search for it with various search engines does not bring up anything directly
related. Asking various LLMs mainly causes them to hallucinate or just
explaining the concept of cross seeding in general terms.
The only exact mention of cross_seed_entry
found by search engines is an
issue report in
transmission. The solution there, however, was to just ignore the entry without
any information what it is there for. At this point it became apparent that
this is a custom extension, used and understood only by a few tools, and not
much information on the web about it.
Let’s try fix this!
The next obvious lead is torrenttools, the creator of the Rocky Linux torrent file. Luckily this is an open source project on GitHub and GitHub’s search isn’t too shady. After some search and git blaming one arrives at issue #19. While giving already some explanation what this is there for, the most interesting part is:
This option is implemented by pyrocore with the x_cross_seed field.
Another search through the pyrocore
repo leads to issue #76. Now
finally, this provides quite some information about the problem
cross_seed_entry
tries to solve and also points towards the documentation of
pyrocore’s mktor:
If you create torrents for different trackers, they’re automatically enabled for cross-seeding, i.e. you can load several torrents for exactly the same data into your client. For the technically inclined, this is done by adding a unique key so that the info hash is always different. Use the –no-cross-seed option to disable this. You can also set the ‘source’ field many trackers use for unique info hashes, use -s info.source=LABEL for that.
Now, finally I think I untangled the knot of this mysterious entry. It is a workaround for enabling cross-seeding with torrent clients that cannot handle multiple torrents with the same info id!
The problem with cross seeding
Cross seeding is a problem you probably wouldn’t even think of when looking at BitTorrent and how it works. It is a combination of how clients, trackers, and torrent ids work together. At the root of it all is the desire to bridge network partitions which are somewhat there by design.
Warning: For simplicity we ignore DHT, PEX, LDP and similar extensions. We also do not go into private trackers. These often have their own rules which add a whole other dimension to the discussion. We also do not dive into BitTorrent v2. Instead we concentrate on the basic BitTorrent specification as defined in BEP-3. Going into all these details and special cases would make the discussion incredibly complicated to follow.
BitTorrent is a peer-to-peer protocol, but it is not fully decentralized. In fact, it is not very decentralized at all. Any BitTorrent swarm is gathering around a central tracker. Only a tracker knows which peers are available for a particular torrent and and peer lists can only be retrieved from a tracker—in a first approximation trackers naturally partition the network.
Now if a client connects to the tracker listed in rocky.torrent
file, it can
only find peers that the tracker knows about for this torrent.
You may have noticed that it is possible to define not only one but multiple trackers in a torrent. So in reality a torrent can define a swarm across multiple trackers. The interesting bit here is really which part in all of this is in control of the swarm! Trackers do partition the network, but the creator of a torrent can merge them together. Meanwhile, trackers can again decide not to track a peer and a peer can decide not to announce to any of the trackers.
This reminds a lot on how many governments are organized. No single part has all the power, all parts must opportunistically cooperate together if they want to make the whole system work.
Ok fine, so all this seems to work out somehow. But where’s now the cross seeding coming in? It looks like I can easily seed across multiple trackers by just defining all of them in the torrent.
From the perspective of a torrent it is true. There is no need for cross seeding. The torrent defines the swarm and each participant in the swarm can partition the network to some degree if they so choose. But if we look at this from the perspective of the shared content, there is another dimension coming in: It is easily possible to create and share different torrents for the same content—say two torrents containing the very same Rocky Linux distribution but with different torrent-metadata.
For example, take this made up rocky.torrent
:announce "http://linuxtracker.org:2710/announce"
:announce-list "udp://tracker.opentrackr.org:1337/announce"
"udp://tracker.openbittorrent.com:80/announce"
"http://linuxtracker.org:2710/announce"
:comment "https://docs.rockylinux.org/release_notes/8_10/"
:created\ by "torrenttools/0.6.2"
:creation\ date 1716999716
:info :files :length 1502 :path "CHECKSUM"
:length 2694053888 :path "Rocky-8.10-x86_64-minimal.iso"
[...]
And a made up rockyII.torrent
:announce "http://mytrack.org:2710/announce"
:announce-list "udp://another.org:1337/announce"
"udp://tracker.openbittorrent.com:80/announce"
:comment "https://docs.rockylinux.org/release_notes/8_10/"
:created\ by "zephyr/0.0.1"
:creation\ date 1716999716
:info :files :length 1502 :path "CHECKSUM"
:length 2694053888 :path "Rocky-8.10-x86_64-minimal.iso"
[...]
Their info
sections are exactly the same, just the trackers and creator
differ. This may create swarms akin to this (I know it starts to get
quite confusing):
The point is, the different torrents can define different sets of trackers which in turn know about different peers.
To make things even more confusing, the identity of a torrent is actually just
based on the content, or, in other words, the info
section of the torrent (see:
info hash). So even in the
example above a client could just take the trackers from both torrents and
announce to all of them, effectively merging the swarms of both torrents
client-side.
This would work and—as far as I can tell—this does solve cross seeding.
The problem is, this relies on client implementation. And clients may or may not
have a solution for this. Going back to the issue on the pyrocore bug
tracker we see the problem
when loading two torrents with identical info
sections in at least some
client:
I am trying to cross-seed torrents downloaded from several private trackers. This works if the tracker modifies the metafile in some way, but when the metafiles are 100% identical I get the message “Info hash already used by another torrent” and can’t seed the second or subsequent torrents.
[emphasis mine]
Torrent clients are probably the most diverse and uncontrollable part of the system. So the question becomes what can torrent creators do to maximize the health of the content they are trying to distribute. Network partitions seem like a waste of peers, bandwidth and resilience.
Persuading clients to cross seed
When we accept that clients can’t handle different torrents with the same info
section for cross seeding, then the only possibility seems to be changing the
info
section in a way that does not actually affect the content being
distributed. This is what cross_seed_entry
does.
The BitTorrent specification does not forbid this. It only defines what has to be in a torrent file. This leaves open the possibility to add some more without producing invalid torrent files. The specification however encourages to align extensions with the official specification (namingly, Bram Cohen) to avoid incompatible extensions.
Adding a cross_seed_entry
to the info
section does change the id of the
torrent. The id of a torrent is nothing but the SHA-1 hash of the info
section. For a client two identical torrents with just differing
cross_seed_entry
appear as two separate torrents. Note that this is not
necessarily the case if just changing the tracker list, which are not used to
calculate the torrent id.
By this it is possible to load two different torrent files into a client which
point to the same data. The files
list and pieces
defining the distributed
content stay exactly the same. So a client might just handle both torrents as
separate without duplicating or re-downloading the data. This allows for serving
two different torrents on potentially different trackers with the same data. In
other words it convinces the client to cross-seed.
Verdict (opinionated)
This might sound like quite a hack to you. And in my opinion it is. It assumes quite a lot about the implementation details of clients, which, while likely often true in the wild, is an assumption nevertheless. An assumption that is not in control of the torrent creator and an assumption not for them to make.
Worse, it now actually splits the torrent. Just think about it: The BitTorrent
specification itself assigns an identity to a torrent purely on the info
section, i.e. the data to be distributed. Metadata like trackers, comments and
creators are not part of the unique id. Implicitly this means two torrents are
the same if they distribute the same data. Adding a cross_seed_entry
subverts
this. Now we created distinct torrents based on unrelated and arbitrary
metadata.
With this it is not possible to search for different torrents on multiple sites anymore (not by id anyways). It is not possible to just ask a list of my favorite trackers if they might know anything about a certain torrent id that I happen to have obtained from who-knows-where. And without special implementation to handle an unofficial extension, trackers will not be able to recognize two torrents distributing the same data as identical and merging their peer list.
I think cross_seed_entry
tries to solve a problem on the torrent creator side
which should be solved on client side. It is hard to say how much and if this
harms the overall health of torrents. Maybe most of the cross-seeded torrents
wouldn’t be available at all on other trackers and so the network partition
caused by differing ids does not really matter. Maybe the bandwidth
cross-seeders provide far exceeds the downsides. After all, somebody caring
enough to cross-seed is likely a good citizens in the torrent ecosystem. Maybe
it is simpler to align software for torrent creation and trackers to agree on,
recognize, and handle cross_seed_entry
torrents rather than a bazillion client
implementations.
Personally, I believe all this is likely going to be a net-negative, and overall
torrenting is just worse off for it. It also won’t be the end of BitTorrent. And
to be fair, custom extensions, the possibility for any group of participants to
agree (or disagree!) on how things should work, is exactl this amazing
“democratic” property that makes BitTorrent such an interesting piece of
technology. In any case this concludes my search for the mysterious
cross_seed_entry
.
This bencode parser for Emacs works really well for interactive exploration: https://github.com/skeeto/emacs-bencode ↩︎