Sniff out academic papers from the command line β and from your agents. One binary. Eight providers. Markdown out, ready for an LLM context window.
A SKILL.md that teaches every flag, schema, and workflow.
-a claude-code / -a opencode
Every command speaks JSON via --json. The skill ships
a router plus a reference/ directory β your agent loads
only what it needs, when it needs it.
Drops a SKILL.md into your agent's skill directory
(~/.claude/skills/paperhound/ for Claude Code). On first
tool call the skill installs the binary itself. You do nothing else.
The shell is the original agent runtime. Don't rebuild it in JSON. β design principle β 1
Papers live locally β in folders, beside notes, on disk. A binary
slots into that world natively. Pipe paperhound search --json
into jq, fzf, rg, tee,
or your editor. No glue code. No bespoke server. The shell has spent
fifty years optimizing for exactly this.
Each MCP tool invocation costs schema + framing + parsing tokens β
paid on every call, every turn. A shell invocation is one
tool call regardless of payload size, and an agent can chain a dozen
paperhound β¦ | grep β¦ | head operations inside a single
Bash turn. The token budget goes to thinking, not framing.
Your filesystem. Your API keys. Your network. The library lives at
~/.paperhound/library/ β grep it offline, version it,
sync it, back it up. Nothing to keep alive. Nothing to restart when
it falls over.
MCP shines for stateful services that push events back to the agent β long-running sessions, subscriptions, live data. paperhound is stateless lookup and transformation. A process that answers and exits is the correct shape; anything heavier is ceremony.
arXiv, OpenAlex, DBLP, Crossref, HF Papers, Semantic Scholar, CORE. Parallel under a 10s budget, round-robin merged, deduped.
One show command, every identifier kind. Force a provider with -s to dodge poisoned aggregator records.
Figures, LaTeX equations, HTML tables β all opt-in flags. Output good enough to feed straight to an LLM.
add, list, grep over titles, abstracts and bodies. Offline. Idempotent. Yours.
Walk the OpenAlex/Semantic Scholar graph. Dedup by arXiv id, DOI, or title. Cap at limit Γ depth.
paperhound[rerank] reranks by query/abstract similarity. Silent fallback to merge-order if the extra isn't installed.
Straight from show --format. Cite keys derived deterministically; LaTeX specials escaped automatically.
--json on every command: no Rich, no progress bars. Schema = paperhound.models.Paper.
--peer-reviewed, --preprints-only, --min-citations. Pushed down to providers, re-applied client-side.
| Command | What it does |
|---|---|
| search <query> | Unified search across providers. --limit, --source, --year, --min-citations, --venue, --author, --peer-reviewed, --rerank. |
| show <id> | Metadata + abstract. --format markdown|bibtex|ris|csljson. |
| download <id> | Fetch the open-access PDF. |
| convert <pdf> | PDF β Markdown via docling. --with-figures, --equations latex, --tables html. |
| get <id> | Resolve, download, convert β one step. --keep-pdf to retain. |
| refs <id> | Works the paper cites. --depth, --limit, --source. |
| cited-by <id> | Works that cite the paper. |
| add <id> | Add to local library. --convert stores Markdown too. |
| list | List papers in the local library. |
| grep <query> | Full-text search over the library (FTS5). |
| providers | List every backend with availability + setup hint. --json for machines. |
If paperhound helped surface, retrieve, or organize the literature behind your research, a citation is appreciated. The BibTeX entry below tracks the latest released version on PyPI.