AI Web ScraperURL + a sentence
becomes JSON.
Extract structured data from any website using artificial intelligence. Just provide the URL, describe what you want to extract, and choose your AI model.
With actual SSRF defense, DNS-rebinding guards and prompt-injection isolation.
Input
https://news.ycombinator.com
"top 5 story titles"
Output
[
{ "title": "Show HN: ..." },
{ "title": "Ask HN: ..." },
...
]What separates this from a 30-line script
SSRF v2
IDNA + IPv6 unwrap + IP allowlist computed before the request, re-checked on every subresource.
DNS-rebinding guard
Playwright route hooks intercept each subresource and abort if it leaves the allow-set.
Prompt injection cage
Untrusted scraped content is wrapped in a system-prompt isolation layer before reaching the LLM.
Per-key cache scoping
Cache key includes sha256 of API key — no cross-user leakage between BYOK sessions.
Headless Rendering
Playwright launches a stealth Chromium browser that renders JavaScript-heavy pages, bypasses bot detection, and captures the fully-loaded DOM.
Smart Cleanup
Strips navigation, ads, scripts and boilerplate. Converts raw HTML to clean Markdown, reducing token usage by up to 67%.
LLM Extraction
Sends the cleaned content to your chosen AI model with your prompt. The model extracts exactly the structured data you described.
Structured Output
Returns the extracted data as clean JSON or Markdown, ready for your pipeline. Handles pagination, tables, and nested structures.