Adaptive Data Enrichment Pre-Processing System (Adeps) For Duplicate Detection, Outlier Handling, Imputation, And Encoding

R. Nisha; G.Dalin

Authors

R. Nisha Research Scholar, Hindusthan College of Arts and Science, Coimbatore, Tamilnadu, India.
G.Dalin Professor, Hindusthan College of Arts and Science, Coimbatore, Tamilnadu, India.

Keywords:

Machine learning, duplicate detection, outlier handling, imputation, and categorical encoding

Abstract

The Adaptive Data Enrichment Pre-processing System (ADEPS) is a comprehensive and flexible framework designed to optimize data quality for analytical and machine learning tasks. ADEPS integrates four critical preprocessing functions: duplicate detection, outlier handling, imputation, and categorical encoding. Each component is developed to address common data quality issues that can adversely affect model accuracy and reliability. ADEPS’s duplicate detection uses advanced similarity algorithms to identify redundant entries, ensuring dataset integrity. Outlier handling leverages clustering and normalization techniques to effectively identify and process anomalies. For missing values, enhanced MICE-based imputation fills gaps using adaptive modeling with error terms, while categorical encoding techniques, such as Target Encoding, transform high-cardinality categorical data for machine compatibility. The ADEPS framework enhances model performance by delivering a high-quality, enriched dataset ready for robust analysis and predictive modeling. Its modular design also allows for adjustments based on data type, resource requirements, and analysis needs, making it suitable for a wide range of applications.

Adaptive Data Enrichment Pre-Processing System (Adeps) For Duplicate Detection, Outlier Handling, Imputation, And Encoding

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Information

Announcements

Call for Papers

indexing

scimogo

important links

Keywords