Textual Pre-Trained Models for Gender Identification Across Community Question-Answering Members

Cargando...
Miniatura
Fecha
2023
Profesor/a Guía
Facultad/escuela
Idioma
en
Título de la revista
ISSN de la revista
Título del volumen
Editor
IEEE
Nombre de Curso
Licencia CC
Attribution 4.0 International
Licencia CC
Resumen
Promoting engagement and participation is vital for online social networks such as community Question-Answering (cQA) sites. One way of increasing the contribution of their members is by connecting their content with the right target audience. To achieve this goal, demographic analysis is pivotal in deciphering the interest of each community fellow. Indeed, demographic factors such as gender are fundamental in reducing the gender disparity across distinct topics. This work assesses the classification rate of assorted state-of-the-art transformer-based models (e.g., BERT and FNET) on the task of gender identification across cQA fellows. For this purpose, it benefited from a massive text-oriented corpus encompassing 548,375 member profiles including their respective full-questions, answers and self-descriptions. This assisted in conducting large-scale experiments considering distinct combinations of encoders and sources. Contrary to our initial intuition, in average terms, self-descriptions were detrimental due to their sparseness. In effect, the best transformer models achieved an AUC of 0.92 by taking full-questions and answers into account (i.e., DeBERTa and MobileBERT). Our qualitative results reveal that fine-tuning on user-generated content is affected by pre-training on clean corpora, and that this adverse effect can be mitigated by correcting the case of words.
Notas
Indexación: Web of Science.
Palabras clave
Transformers, Gender issues, Computer architecture, Bit error rate, Semantics, Question answering (information retrieval), Gender identification, community question-answering sites, engagement and participation in online communities
Citación
IEEE Access, Volume 11, 3983-3995, 2023
DOI
10.1109/ACCESS.2023.3235735
Link a Vimeo