Relying on its expertise in web data extraction and indexation, CETIC organized a workshop on search engines technologies on December, 15. We give here a summary of this enriching event.

Specialist in web strategy and marketing, Robert Viseur introduced the afternoon with a general speech on current major actors, technologies and trends. That state-of-the-art showed that, within a market indisputably dominated by three major actors (Google, Yahoo, MSN), some (local) outsiders are really competitive and able to shake off the leaders (Clusty, Blinkx, Ujiko). They often focus their activities on a specific target (books, news, videos, blogs, geographic area...) or distinguish from their competitors with advanced functionnalities, such as semantic search, personalization or flashy user interfaces. Anyway, the search engine competition is left open and remains a great opportunity for business.

This event was also a great opportunity for CETIC researchers to present some recent and advanced technologies in search. Crawler, indexer, searcher... Christophe Noël proposed a technical overview of search engine components. He also explained how CETIC has been able to develop customized search solutions by setting up its own hardware and software infrastructure. A demonstration of two search engines, made by CETIC, concluded its talk : Illico Presto (a search engine on Walloon innovation, initiated by Agoria) and Eurobot (a search engine dedicated to Belgian companies and news).

Finally, Fabrice Estiévenart described Retrozilla, a Mozilla-based tool for web data interpretation and extraction. In the context of search engines, this multi-purpose integrated suite allows to extract and index semantically web pages allowing powerful and complex queries.