Mon, 24 Apr 2023 23:43:00 -0600
This weekend and the surrounding weekdays is PyCon! I’ve been neglecting a lot of open source work & maintenance that I used to be oh so familiar with, so I decided to take this time to cross off some easy tickets off my list.
When I type in man urxvt
, mentions of
urxvtperl(3)
within it are only half-bolded in my default
pager. This caught my attention as a low-hanging contribution to maybe
tidy up their docs a slight amount.
urxvt
is kind of an obscure terminal emulator, they do
things their own way & I absolutely love them for that.
Their repository is hosted under cvs
.
Specifically, as outlined on http://software.schmorp.de/pkg/rxvt-unicode.html, you can “clone” the repository using:
cvs -z3 -d :pserver:anonymous@cvs.schmorp.de/schmorpforge co rxvt-unicode
I don’t know how to use cvs so I opted to not do this :D
tinkered around with git a bit to figure out the
git cvsimport
command:
git cvsimport -C urxvt -r cvs -k -v -d :pserver:anonymous@cvs.schmorp.de/schmorpforge rxvt-unicode
Discover this took longer than I had hoped, thanks to this stackoverflow post for spelling out the invocation for me: https://stackoverflow.com/a/11490134.
Also, running that command took roughly two hours so that was fun.
The offending manual file is located in
doc/rxvt.1.man.in
. The .in
suffix hints that
there’s some preprocessing going on before I see the final output, but
that’s extraneous for our current goals.
Man pages are written in the troff
programming language,
which looks pretty esoteric. But finding what change to enact to get my
goals was pretty easy:
- @@RXVT_NAME@@\fBperl\fR\|(3)
+ \fB@@RXVT_NAME@@Bperl\fR\|(3)
Just have to move the special \fB
control character to
surround the entire name, including prefix.
man
can be run on the file as-is, with wierd
pre-preprocessed artifacts:
man doc/rxvt.1.man.in
But to build the files, first configure the entire project with a
./configure
in the base directory, then run
make all
from within the doc/
directory
At some point I realized that there is a sibling
doc/rxvt.1.pod
. A bit more digging (from within the
Makefile
) led me to find that the doc/*.man.in
files are generated from the *.pod
files:
%.tbl: %.pod
$(srcdir)/podtbl <$< >$@
%.1.man.in: %.1.tbl
$(POD2MAN) -s1 <$< >$@
%.3.man.in: %.3.tbl
$(POD2MAN) -s3 <$< >$@
%.7.man.in: %.7.tbl
$(POD2MAN) -s7 <$< >$@
I don’t know too much about pod/tbl
, but upon initial
search these look to be perl-isms, pod
standing for “Plain
Old Documentation”.
The file is also much cleaner, with the generated
@@RXVT_NAME@@\fBperl\fR\|(3)
stemming from this:
@@RXVT_NAME@@-extensions(1)
But the resulting question is: where does the formatting come from?
As part of this contribution, I wanted to make sure that I was following prior work with highlighting/distinguishing man page references from within this man page
Using the search (\d)
from within vim, here’s a subset
of what I found:
I<xterm>(1)
@@RXVT_NAME@@(7)
I<termcap(5)>
@@RXVT_NAME@@perl(3)
write(1)
B<xev>(1)
L<@@RXVT_NAME@@perl>(3)
In conclusion, I found no rhyme or reason and got more excited to maybe contribute some semblance of order to this obscure file.
pod2{man,html,xhtml,tbl(?)}
rxvt-unicode has a very well formatted manpage located here: http://pod.tst.eu/http://cvs.schmorp.de/rxvt-unicode/doc/rxvt.1.pod
Additionally, the doc/Makefile
has a
%.html: %.tbl
rule, so we can build html files!
This is quite a boon because in investigating the man page inter-reference formatting issue, seeing what different output formats output can lead to hints!
Unfortunately, urxvt’s pod to html converter uses
pod2xhtml
, which doesn’t exist on my machine, nor is
packaged on default gentoo.
A quick replacement of s/pod2xhtml/pod2html
gave me a
quick working setup though, and I continued down my path!
Unfortunately, pod2html
doesn’t seem to auto-format the
man links at all! This signals to me that there’s an under-specification
of what these tokens “are”, and that the pod machinery could use some
more hinting
L<>
I don’t know how I found it, but I stumbled onto this addendum on a stackoverflow answer:
Btw, UNIX man pages work right out of docs:
L<crontab(5)>
This brings up http://man.he.net/man5/crontab
This is!! Exactly what I want! A properly documented way to link to man pages without implicit rules trying to auto-detect things!
I hastily surrounded some of the links I was working with, resulting in:
- @@RXVT_NAME@@perl(3)
+ \L<@@RXVT_NAME@@perl(3)>
and after running my makefile amalgimation
make clean alldocclean alldoc rxvt.1.html all
(with
modified s/pod2xhtml/pod2html
), I got exactly what I was
looking for, a properly formatted manpage reference with linking
included!
Kind of.
man_url_prefix
The resulting autogenerated man page reference URL directs to: http://man.he.net/man3/urxvtperl.
Which 404
s.
I’m not exactly sure what the bar to get a man page up on http://man.he.net is, but
apparently urxvt
doesn’t make it. There are plenty of other
online man page providers that do include it though, a list:
There’s no shortage of options. An alternative approach could be to figure out how to get man.he.net to index urxvt’s man pages too, but that requires dealing with people & bureaucracy and I go down these rabbitholes to deal with software
Digging into pod2html
, it is somewhat configurable and
lets you change the website that man page references link to. This is
done through man_url_prefix
“variable”. There doesn’t seem
to be any way to modify these variables from the command line
instantiations, so this distraction has kind of led to a dead end.
I’ve been making heavy use of grep.app
recently, and
plugging in man_url_prefix
into there will result with 9
(nine) total uses of it throughout all of github. I’m not sure this
variable setting has actually been used in any real capacity.
pod2xhtml
I quickly dug myself out of pod2html
hackery, since none
of that actually forwards me towards my goal since urxvt’s doc builder
doesn’t actually use pod2html
, it uses
pod2xhtml
!
pod2xhtml
is
much worse off. It doesn’t exist in gentoo’s repo tree because it hasn’t
been touched in over a decade (last update: 2010). It uses a legacy link parser
that unfortunately dashes my hopes of improving the html generated
manual along with the man page – it doesn’t autolink to online man page
references.
By default the L<>
wrapped man links just turn
into this html:
<cite>urxvtperl</cite>(3)
Which, honestly. Isn’t the worst. This mimics some manually crafted
I<xterm>(3)
s found within the page, so I consider it
an acceptable modification.
A consideration to me made for the future though:
pod2html
works and generates a pretty identical looking
html file. Perhaps it’s worth porting over at some point? http://pod.tst.eu seems to be
running a cgi script providing realtime pod2xhtml
, not sure
who owns this but that’s how urxvt’s documentation is being rendered
currently.
With a bit of work, could definitely transition over to a
pod2html
/ static served html man page setup.
podlators
The current iteration of pod2man
lives within podlators
.
The crux of the issue that started this all comes from this block of regex:
# Change references to manual pages to put the page name in bold but
# the number in the regular font, with a thin space between the name and
# the number. Only recognize func(n) where func starts with an alphabetic
# character or underscore and contains only word characters, periods (for
# configuration file man pages), or colons, and n is a single digit,
# optionally followed by some number of lowercase letters. Note that this
# does not recognize man page references like perl(l) or socket(3SOCKET).
if ($$self{GUESSWORK}{manref}) {
s{
\b
(?<! \\ ) # rule out \e0(1)
( [A-Za-z_] (?:[.:\w] | \\-)+ )
( \( \d [a-z]* \) )
} {
'\f(BS' . $1 . '\f(BE\|' . $2
}egx;
}
specifically, the recognizition heuristic only matches a portion of
@@RXVT_NAME@@perl(3)
, with or without the
L<>
construct.
The resulting
patch to podlators
circumvents this guesswork when
within the L<>
construct and just generally bolds the
contents of the link (when not a URL), special-casing the man reference
type to not bold the suffixed section number.
& the
resulting patch to rxvt-unicode
is trivial. Through a
mailing list, too :D