Having your LLM agent prepare a document for you is really powerful. How can it be done effectively and without locking your data in a single corporate silo?
It’s often useful to have the LLM compose messages for you. It’s pretty simple to just copy and paste stuff around when using email or IM, but sometimes you need to send a physical document.
You can use the “artifacts” feature available in the interfaces for every frontier LLM provider. In my experience, this feature is really under-cooked1 and requires keeping the authoritative versions of your files, at least for some time, in The Cloud. I’d rather keep my documents on my own local disk, thank you very much.
For everyday tasks involving artifacts2, I mostly use the Pi CLI agent with an Openrouter subscription (because it easily spreads my sensitive data over multiple models and providers) and Github Copilot CLI (because the paid subscription still has no-prompt-training guarantees).
DOCX is common enough, but it’s a proprietary format. It’s also very structured and has a ton of overhead – not simple for the CLI agents.
ODT is open but also very structured and opaque to a text-first CLI agent.
LaTeX is handled pretty well by the current frontier models. It has a lot of local dependencies (Tectonic somewhat helps) which is problematic if you use multiple environments to access and edit your files. There are occasional small syntax issues that need either good LaTeX knowledge or re-prompting (which, in case of the Copilot billing model, can be problematic). It might also be difficult for me to create a proper LLM workflow for LaTeX (e.g. create some templates, decide which parts the LLM is allowed to modify and which should be fixed) because of my limited experience with this system.
Typst seems to be recommended by the LLMs themselves a lot. It’s much simpler than LaTeX but it’s less powerful, the compiler doesn’t seem super stable, there are limitations and compromises it makes in the name of simplicity, and there’s also a much smaller body of documentation and discourse about it available online.
MD -> PDF powered by Pandoc seemed like a really good idea at first, but quickly evolved into the LaTeX workflow with extra steps. Pandoc’s defaults for converting MD to PDF are not great and a lot of customization was needed for my own documents.
HTML -> PDF generation looked promising as well. The thing is that the tooling for it is all over the place. It’s either years-old projects or headless browsers under the hood. For some reason I cannot quite put into words, every single solution I’ve found so far gives me systems design anxiety.
Since there usually is a visual inspection step anyway right before printing, for now I’m settling for the browser generating the PDF via the printing dialog.
Here’s an HTML template that prints nicely in Firefox and conforms to Polish official letter standard, that for some reason puts the sender on the left and the addressee on the right (both the content and the CSS rules in the document were generated by an LLM).
I really hope there is a solution out there that doesn’t involve dead projects, bleeding edge tech or gigabytes of dependencies.