# OSS Rabbitholes

This weekend and the surrounding weekdays is PyCon! I've been neglecting a
lot of open source work & maintenance that I used to be oh so familiar with, so
I decided to take this time to cross off some easy tickets off my list.

## urxvt(1) man formatting oddity

When I type in `man urxvt`, mentions of `urxvtperl(3)` within it are only
half-bolded in my default pager. This caught my attention as a low-hanging
contribution to maybe tidy up their docs a slight amount.

## Precursor: get the source code

`urxvt` is kind of an obscure terminal emulator, they do things their own way &
I absolutely love them for that.

Their repository is hosted under `cvs`.

Specifically, as outlined on <http://software.schmorp.de/pkg/rxvt-unicode.html>, you can "clone" the repository using:

```bash
cvs -z3 -d :pserver:anonymous@cvs.schmorp.de/schmorpforge co rxvt-unicode
```

----------------

I don't know how to use cvs so I opted to not do this :D

tinkered around with git a bit to figure out the `git cvsimport` command:

```bash
git cvsimport -C urxvt -r cvs -k -v -d :pserver:anonymous@cvs.schmorp.de/schmorpforge rxvt-unicode
```

Discover this took longer than I had hoped, thanks to this stackoverflow post
for spelling out the invocation for me: <https://stackoverflow.com/a/11490134>.

Also, running that command took roughly two hours so that was fun.

## Attempt #1, modify the man page!

The offending manual file is located in `doc/rxvt.1.man.in`. The `.in` suffix
hints that there's some preprocessing going on before I see the final output,
but that's extraneous for our current goals.

Man pages are written in the `troff` programming language, which looks pretty
esoteric. But finding what change to enact to get my goals was pretty easy:

```diff
- @@RXVT_NAME@@\fBperl\fR\|(3)
+ \fB@@RXVT_NAME@@Bperl\fR\|(3)
```

Just have to move the special `\fB` control character to surround the entire
name, including prefix.

`man` can be run on the file as-is, with wierd pre-preprocessed artifacts:

```bash
man doc/rxvt.1.man.in
```

But to build the files, first configure the entire project with a `./configure`
in the base directory, then run `make all` from within the `doc/` directory

## Attempt #2, notice that the man page is autogenerated

At some point I realized that there is a sibling `doc/rxvt.1.pod`. A bit more
digging (from within the `Makefile`) led me to find that the `doc/*.man.in`
files are generated from the `*.pod` files:

```makefile
%.tbl: %.pod
	$(srcdir)/podtbl <$< >$@

%.1.man.in: %.1.tbl
	$(POD2MAN) -s1 <$< >$@

%.3.man.in: %.3.tbl
	$(POD2MAN) -s3 <$< >$@

%.7.man.in: %.7.tbl
	$(POD2MAN) -s7 <$< >$@
```

I don't know too much about `pod/tbl`, but upon initial search these look to be
perl-isms, `pod` standing for "Plain Old Documentation".

The file is also much cleaner, with the generated `@@RXVT_NAME@@\fBperl\fR\|(3)`
stemming from this:

```
@@RXVT_NAME@@-extensions(1)
```

But the resulting question is: where does the formatting come from?

### Aside: standards & patterns within this file

As part of this contribution, I wanted to make sure that I was following prior
work with highlighting/distinguishing man page references from within this man page

Using the search `(\d)` from within vim, here's a subset of what I found:

- `I<xterm>(1)`
- `@@RXVT_NAME@@(7)`
- `I<termcap(5)>`
- `@@RXVT_NAME@@perl(3)`
- `write(1)`
- `B<xev>(1)`
- **`L<@@RXVT_NAME@@perl>(3)`**

In conclusion, I found no rhyme or reason and got more excited to maybe
contribute some semblance of order to this obscure file.

### Aside: `pod2{man,html,xhtml,tbl(?)}`

rxvt-unicode has a *very* well formatted manpage located here:
<http://pod.tst.eu/http://cvs.schmorp.de/rxvt-unicode/doc/rxvt.1.pod>

Additionally, the `doc/Makefile` has a `%.html: %.tbl` rule, so we can build html files!

This is quite a boon because in investigating the man page inter-reference
formatting issue, seeing what different output formats output can lead to
hints!

Unfortunately, urxvt's pod to html converter uses `pod2xhtml`, which doesn't
exist on my machine, nor is packaged on default gentoo.

A quick replacement of `s/pod2xhtml/pod2html` gave me a quick working setup
though, and I continued down my path!

Unfortunately, `pod2html` doesn't seem to auto-format the man links at all!
This signals to me that there's an under-specification of what these tokens
"are", and that the pod machinery could use some more hinting


### `L<>`

I don't know how I found it, but I stumbled onto this addendum on a
[stackoverflow answer](https://stackoverflow.com/a/74202083):

> Btw, UNIX man pages work right out of docs:
> 
> L<crontab(5)>
> 
> This brings up <http://man.he.net/man5/crontab>


This is!! Exactly what I want! 
A properly documented way to link to man pages without implicit rules trying to
auto-detect things!

I hastily surrounded some of the links I was working with, resulting in:

```diff
- @@RXVT_NAME@@perl(3)
+ \L<@@RXVT_NAME@@perl(3)>
```

and after running my makefile amalgimation `make clean alldocclean alldoc
rxvt.1.html all` (with modified `s/pod2xhtml/pod2html`), 
I got exactly what I was looking for, a properly formatted manpage reference
with linking included!

**Kind of.**

---------------

### `man_url_prefix`

The resulting autogenerated man page reference URL directs to:
<http://man.he.net/man3/urxvtperl>. 

Which `404`s.

I'm not exactly sure what the bar to get a man page up on <http://man.he.net>
is, but apparently `urxvt` doesn't make it. 
There are plenty of other online man page providers that do include it though, a list:

- [linux.die.net](https://linux.die.net/man/1/urxvt)
- [helpmanual.io](https://helpmanual.io/man1/urxvt/)
- [linux.extremeoverclocking.com](https://linux.extremeoverclocking.com/man/1/urxvt)
- [www.unix.com](https://www.unix.com/man-page/suse/1/urxvt/)

There's no shortage of options. An alternative approach could be to figure out
how to get man.he.net to index urxvt's man pages too, but that requires dealing
with people & bureaucracy and I go down these rabbitholes to deal with **software**

Digging into `pod2html`, it is somewhat configurable and lets you change the
website that man page references link to. This is done through `man_url_prefix`
"variable". There doesn't seem to be any way to modify these variables from the
command line instantiations, so this distraction has kind of led to a dead end.

I've been making heavy use of `grep.app` recently, and plugging in
`man_url_prefix` into there will result with 9 (nine) total uses of it
throughout all of github. I'm not sure this variable setting has actually been
used in any real capacity.

### `pod2xhtml`

I quickly dug myself out of `pod2html` hackery, since none of that actually
forwards me towards my goal since urxvt's doc builder doesn't actually use
`pod2html`, it uses `pod2xhtml`!

[`pod2xhtml`](https://metacpan.org/pod/Pod::Xhtml) is much worse off. 
It doesn't exist in gentoo's repo tree because it hasn't been touched in over a
decade (last update: 2010). 
It uses a [legacy link parser](https://metacpan.org/pod/Pod::ParseUtils) that
unfortunately dashes my hopes of improving the html generated manual along with
the man page -- it doesn't autolink to online man page references.

By default the `L<>` wrapped man links just turn into this html:

```html
<cite>urxvtperl</cite>(3)
```

Which, honestly. Isn't the worst. This mimics some manually crafted
`I<xterm>(3)`s found within the page, so I consider it an acceptable
modification.

A consideration to me made for the future though: `pod2html` works and
generates a pretty identical looking html file. Perhaps it's worth porting over
at some point? <http://pod.tst.eu> seems to be running a cgi script providing
realtime `pod2xhtml`, not sure who owns this but that's how urxvt's
documentation is being rendered currently. 

With a bit of work, could definitely transition over to a `pod2html` / static
served html man page setup.

## The actual fix: `podlators`

The current iteration of `pod2man` lives within
[`podlators`](https://www.eyrie.org/~eagle/software/podlators/).

The crux of the issue that started this all comes from this block of regex:

```perl
# Change references to manual pages to put the page name in bold but
# the number in the regular font, with a thin space between the name and
# the number.  Only recognize func(n) where func starts with an alphabetic
# character or underscore and contains only word characters, periods (for
# configuration file man pages), or colons, and n is a single digit,
# optionally followed by some number of lowercase letters.  Note that this
# does not recognize man page references like perl(l) or socket(3SOCKET).
if ($$self{GUESSWORK}{manref}) {
    s{
        \b
        (?<! \\ )                                   # rule out \e0(1)
        ( [A-Za-z_] (?:[.:\w] | \\-)+ )
        ( \( \d [a-z]* \) )
    } {
        '\f(BS' . $1 . '\f(BE\|' . $2
    }egx;
}
```

specifically, the recognizition heuristic only matches a portion of
`@@RXVT_NAME@@perl(3)`, with or without the `L<>` construct.

[The resulting patch to `podlators`](https://github.com/rra/podlators/pull/21)
circumvents this guesswork when within the `L<>` construct and just generally
bolds the contents of the link (when not a URL), special-casing the man
reference type to not bold the suffixed section number.

& [the resulting patch to `rxvt-unicode` is
trivial](http://lists.schmorp.de/pipermail/rxvt-unicode/2023q2/002654.html).
Through a mailing list, too :D
