Established MMXXVI Open Source Β· MIT
β€” The Paperhound Press β€”
A field manual for paper-hunters arXiv Β· OpenAlex Β· DBLP Β· Crossref Β· S2

paperhound

Sniff out academic papers from the command line β€” and from your agents. One binary. Eight providers. Markdown out, ready for an LLM context window.

01 β€”β€”

Install the binary.

~30 seconds Β· Python 3.10+
~/projects Β· zsh
# recommended β€” isolated CLI on $PATH $ uv tool install paperhound installed paperhound 0.5.4 # or via pip $ pip install paperhound # with embedding rerank $ pip install 'paperhound[rerank]' # first sniff $ paperhound search "diffusion transformers" --limit 5
Copy & run
02 β€”β€”

Teach the hound to your agent.

one line Β· any agent
skills.sh

Skill Β· agent-ready

A SKILL.md that teaches every flag, schema, and workflow.

$ npx skills add alexfdez1010/paperhound
  • auto-installs the CLI on first use
  • progressive-disclosure docs (router + reference/)
  • JSON schema for every command
  • pass -a claude-code / -a opencode

Agent-native, not a CLI wrapper.

Every command speaks JSON via --json. The skill ships a router plus a reference/ directory β€” your agent loads only what it needs, when it needs it.

Drops a SKILL.md into your agent's skill directory (~/.claude/skills/paperhound/ for Claude Code). On first tool call the skill installs the binary itself. You do nothing else.

Supported targets: claude-code Β· opencode Β· cursor Β· windsurf Β· any skills.sh-compatible host
03 β€”β€”

Why a CLI, not an MCP.

design notes
The shell is the original agent runtime. Don't rebuild it in JSON. β€” design principle β„– 1
i. Composition

Unix is the integration layer.

Papers live locally β€” in folders, beside notes, on disk. A binary slots into that world natively. Pipe paperhound search --json into jq, fzf, rg, tee, or your editor. No glue code. No bespoke server. The shell has spent fifty years optimizing for exactly this.

ii. Token economics

Every MCP call pays a tax.

Each MCP tool invocation costs schema + framing + parsing tokens β€” paid on every call, every turn. A shell invocation is one tool call regardless of payload size, and an agent can chain a dozen paperhound … | grep … | head operations inside a single Bash turn. The token budget goes to thinking, not framing.

iii. Local-first by default

No daemon. No port. No service.

Your filesystem. Your API keys. Your network. The library lives at ~/.paperhound/library/ β€” grep it offline, version it, sync it, back it up. Nothing to keep alive. Nothing to restart when it falls over.

iv. Right tool, right shape

Stateless lookup wants a binary.

MCP shines for stateful services that push events back to the agent β€” long-running sessions, subscriptions, live data. paperhound is stateless lookup and transformation. A process that answers and exits is the correct shape; anything heavier is ceremony.

04 β€”β€”

A field kit for researchers.

eight providers Β· one query
ΒΆ
i. Unified search

Eight backends, one query.

arXiv, OpenAlex, DBLP, Crossref, HF Papers, Semantic Scholar, CORE. Parallel under a 10s budget, round-robin merged, deduped.

Β§
ii. Inspect first

Abstract + metadata.

One show command, every identifier kind. Force a provider with -s to dodge poisoned aggregator records.

⬇
iii. PDF β†’ Markdown

Docling-powered conversion.

Figures, LaTeX equations, HTML tables β€” all opt-in flags. Output good enough to feed straight to an LLM.

πŸ“š
iv. Local library

SQLite FTS5 at your fingers.

add, list, grep over titles, abstracts and bodies. Offline. Idempotent. Yours.

πŸ”—
v. Citation graph

refs + cited-by, depth-controlled.

Walk the OpenAlex/Semantic Scholar graph. Dedup by arXiv id, DOI, or title. Cap at limit Γ— depth.

🧠
vi. Embedding rerank

Smarter top-N, opt-in.

paperhound[rerank] reranks by query/abstract similarity. Silent fallback to merge-order if the extra isn't installed.

πŸ“€
vii. BibTeX in one shot

BibTeX Β· RIS Β· CSL-JSON.

Straight from show --format. Cite keys derived deterministically; LaTeX specials escaped automatically.

πŸ€–
viii. JSON everywhere

Pipe-friendly mode.

--json on every command: no Rich, no progress bars. Schema = paperhound.models.Paper.

βš™
ix. Filter the firehose

Year Β· venue Β· author Β· type.

--peer-reviewed, --preprints-only, --min-citations. Pushed down to providers, re-applied client-side.

arxiv openalex dblp crossref hf papers semantic scholar core
05 β€”β€”

The lexicon.

paperhound <cmd> --help
CommandWhat it does
search <query>Unified search across providers. --limit, --source, --year, --min-citations, --venue, --author, --peer-reviewed, --rerank.
show <id>Metadata + abstract. --format markdown|bibtex|ris|csljson.
download <id>Fetch the open-access PDF.
convert <pdf>PDF β†’ Markdown via docling. --with-figures, --equations latex, --tables html.
get <id>Resolve, download, convert β€” one step. --keep-pdf to retain.
refs <id>Works the paper cites. --depth, --limit, --source.
cited-by <id>Works that cite the paper.
add <id>Add to local library. --convert stores Markdown too.
listList papers in the local library.
grep <query>Full-text search over the library (FTS5).
providersList every backend with availability + setup hint. --json for machines.
06 β€”β€”

Cite paperhound.

if used in academic work

If paperhound helped surface, retrieve, or organize the literature behind your research, a citation is appreciated. The BibTeX entry below tracks the latest released version on PyPI.

Author Β· Alejandro FernΓ‘ndez Camello Licence Β· MIT Repository Β· github.com/alexfdez1010/paperhound

paperhound.bib
@software{paperhound, author = {Fern{\'a}ndez Camello, Alejandro}, title = {paperhound: a fast, agent-ready CLI for academic paper retrieval}, year = {2026}, url = {https://github.com/alexfdez1010/paperhound}, version = {0.5.4}, license = {MIT} }
BibTeX