Project Overview
The project aims to develop a multiclass classification machine learning model for predicting top category ID, bottom category ID, and primary color ID of products on the Etsy e-commerce platform, utilizing text data from product titles, tags, types, and craft types. The model undergoes preprocessing using techniques like CountVectorizer and TfidfTransformer, followed by training with Support Vector Machine (SVM) algorithm with a linear kernel. The project contributes to enhancing product categorization accuracy and efficiency on Etsy, potentially leading to better user experience and increased customer satisfaction in the e-commerce domain.
The dataset provided for this project comprises Parquet and TFRecords files, totaling 14GB when zipped. It contains 245,000 records in the training dataset and 27,000 records in the test dataset. The dataset includes various product attributes such as titles, descriptions, tags, types, and craft types, with the target attributes having no missing values.