Transparency spider

01S

I designed and developed a machine learning system, capable of surfing the web and recognizing the URL of an institutional data source. If the page is part of the known types, the system chooses the most suitable crawler for scraping this page. This solution saved the company a significant amount of human time.

Stefano Fiorucci
Stefano Fiorucci
NLP Engineer, Craftsman and Explorer 🧭 | Contributing to Haystack, the NLP/LLM Framework 🏗️