The study utilized multi-layer perceptrons (MLPs) for their versatility with diverse data types and relationships. MLPs with 50 and 100 nodes in hidden layers were chosen to balance complexity and efficiency, considering the feature sizes of datasets like the German Credit Dataset (27 features), Credit Defaulter Dataset (13 features), and Bail Dataset (18 features). This aimed to capture patterns effectively in datasets with 1,000 to 30,000 entries and prevent overfitting in smaller datasets. More details about datasets can be found here.
MLPs were configured with various hidden layer sizes and activation functions, including 'identity', 'logistic', 'tanh', and 'relu', ranging from simple to complex structures. Hyperparameters like learning rate and batch size added complexity and affected training dynamics and generalization capabilities. The study used default settings for these parameters to minimize variability.
Dimensionality reduction techniques like PCA and t-SNE were implemented to manage high-dimensional data. PCA transformed features into principal components, preserving essential information, while t-SNE retained local data structures in reduced dimensions. PCA was iteratively applied to explore the impact of dimensionality reduction on model performance. t-SNE, limited to three components for efficiency, was used to assess its effect on MLP performance.
The study showed variations in MLP performance across datasets. For instance, an MLP with a single 100-node layer achieved an AUROC of 76.02% and an F1-score of 80.28% on the German Credit Dataset. Fairness metrics like SP and EO varied, indicating potential predictive biases. Training MLPs on data with altered sensitive attributes slightly improved fairness metrics but reduced performance, emphasizing the need for techniques balancing fairness and predictive accuracy.
Overall, the research highlighted the importance of careful model configuration and appropriate dimensionality reduction in ensuring fairness and stability in representation learning.
GitHub code can be found here: https://github.com/zkhodzhaev/representation_learning/