Textual Pre-Trained Models for Gender Identification Across Community Question-Answering Members

dc.contributor.authorSchwarzenberg, P.
dc.contributor.authorFigueroa, A.
dc.date.accessioned2023-11-20T14:23:51Z
dc.date.available2023-11-20T14:23:51Z
dc.date.issued2023
dc.descriptionIndexación: Web of Science.es
dc.description.abstractPromoting engagement and participation is vital for online social networks such as community Question-Answering (cQA) sites. One way of increasing the contribution of their members is by connecting their content with the right target audience. To achieve this goal, demographic analysis is pivotal in deciphering the interest of each community fellow. Indeed, demographic factors such as gender are fundamental in reducing the gender disparity across distinct topics. This work assesses the classification rate of assorted state-of-the-art transformer-based models (e.g., BERT and FNET) on the task of gender identification across cQA fellows. For this purpose, it benefited from a massive text-oriented corpus encompassing 548,375 member profiles including their respective full-questions, answers and self-descriptions. This assisted in conducting large-scale experiments considering distinct combinations of encoders and sources. Contrary to our initial intuition, in average terms, self-descriptions were detrimental due to their sparseness. In effect, the best transformer models achieved an AUC of 0.92 by taking full-questions and answers into account (i.e., DeBERTa and MobileBERT). Our qualitative results reveal that fine-tuning on user-generated content is affected by pre-training on clean corpora, and that this adverse effect can be mitigated by correcting the case of words.es
dc.description.urihttps://www-webofscience-com.recursosbiblioteca.unab.cl/wos/woscc/full-record/WOS:000917235300001
dc.identifier.citationIEEE Access, Volume 11, 3983-3995, 2023es
dc.identifier.doi10.1109/ACCESS.2023.3235735
dc.identifier.issn2169-3536
dc.identifier.urihttps://repositorio.unab.cl/xmlui/handle/ria/54000
dc.language.isoenes
dc.publisherIEEEes
dc.rights.licenseAttribution 4.0 International
dc.subjectTransformerses
dc.subjectGender issueses
dc.subjectComputer architecturees
dc.subjectBit error ratees
dc.subjectSemanticses
dc.subjectQuestion answering (information retrieval)es
dc.subjectGender identificationes
dc.subjectcommunity question-answering siteses
dc.subjectengagement and participation in online communitieses
dc.titleTextual Pre-Trained Models for Gender Identification Across Community Question-Answering Memberses
dc.typeArtículoes
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
O SCHWARZENBERG_ textual pre-trained.pdf
Tamaño:
2.11 MB
Formato:
Adobe Portable Document Format
Descripción:
TEXTO COMPLETO EN INGLES
Bloque de licencias
Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
1.71 KB
Formato:
Item-specific license agreed upon to submission
Descripción: