BRIGHTCODE – Michał Żarnecki Portfolio

Hi, I'm Michał Żarnecki — Programmer, Machine Learning Specialist, and Educator. I specialize in building innovative systems and solutions at the intersection of artificial intelligence, machine learning, and data-driven technologies. With a strong foundation in Python and PHP, my work focuses on delivering impactful results and web based systems in areas such as data mining, big data, and natural language processing. On this website you can check some of my projects and recent activity.

Data Science Summit 2024 at PGE National Stadium in Warsaw

Posted on 21 November 2024  in events

Lecture: Classifying unstructured texts into 1800 categories!
Problem: In this presentation, I will examine the development of a text classifier created by the team at CompanyHouse AG to address the challenge of classifying unstructured texts that describe companies’ activities into the official German industry codes, WZ 2008. Over the years, we have experimented with various techniques to manage classification across a vast number of categories (1,800 in total). I will discuss the strategies we employed to tackle this complexity and demonstrate the evolution of our model from a random forest classifier to an innovative solution based on large language models and retrieval-augmented generation (RAG) techniques.

Methodology: Our approach includes a range of methodologies: multiclass classification, retrieval-augmented generation (RAG), random forest classifiers, similarity algorithms, embedding techniques, and the use of vector databases.

Conclusions: Integrating additional knowledge into models using retrieval-augmented generation combined with similarity algorithms and techniques such as chain-of-thought reasoning can effectively address complex multiclass classification problems. This approach achieves high evaluation scores and outperforms pre-trained classifiers.

, , , , , ,