Assessing Variations in Open Datasets for Training Large Language Models: Biases and Benchmarking

Vincent Koc

About Journal

Introduction
The Pioneer Research Journal of Computing Science is a leading, peer-reviewed, open-access academic journal that serves as a comprehensive platform for scholars, researchers, and professionals to explore, discuss, and share advancements in the field of computing science. Published by a distinguished editorial board, the journal bridges the gap between academia and industry, offering deep insights into the latest developments in computing technologies, including artificial intelligence (AI), machine learning, cybersecurity, blockchain, and computational science.

Aims and Scope
The primary aim of the Pioneer Research Journal of Computing Science is to promote high-quality research and innovation within the computing science discipline. The journal seeks to publish original research papers, review articles, case studies, and short communications that cover both theoretical and practical aspects of computing science. By doing so, it fosters global knowledge exchange among scientists, engineers, industry professionals, policymakers, and enthusiasts.

The journal’s scope includes, but is not limited to:

Artificial Intelligence (AI) and Machine Learning

Blockchain and Distributed Ledger Technologies

Cybersecurity, AI-Driven Security, and Threat Intelligence

Data Science and Big Data Analytics

Cloud Computing and Edge Computing

Internet of Things (IoT) and Smart Systems

Quantum Computing and Computational Models

Human-Computer Interaction (HCI) and User Experience (UX)

Computational Intelligence and Neural Networks

Robotics and Intelligent Automation

Advanced Circuit Design and Embedded Systems

Editorial Board and Peer Review Process
The Pioneer Research Journal of Computing Science prides itself on maintaining a rigorous peer-review process. The journal's editorial board is composed of internationally recognized experts from leading universities and research institutions around the world. Each submitted manuscript undergoes a thorough double-blind peer review, ensuring that all research is evaluated on its originality, significance, methodology, and clarity before making a recommendation for acceptance, revision, or rejection.

Publication Frequency and Accessibility
The journal follows a quarterly publication schedule, with four issues released per year. As an open-access publication, all articles are freely available to researchers, practitioners, and students worldwide, ensuring no barriers to access. The journal is committed to promoting open science and accelerating technological progress by providing unrestricted access to high-quality research in computing science.

Ethical Guidelines and Research Integrity
The Pioneer Research Journal of Computing Science upholds the highest standards of research integrity. Authors are required to comply with strict ethical guidelines, ensuring that their work is original, free from plagiarism, and conducted with transparency and fairness. The journal uses plagiarism detection software to maintain academic integrity and adheres to ethical protocols in all areas of computing research, including fairness, accountability, and transparency in AI and cybersecurity.

Submission Guidelines
We welcome submissions from researchers across the globe. Authors should prepare their manuscripts according to the journal’s formatting guidelines, which include proper citation practices, structured abstracts, and clear methodologies. Submissions should be made through the journal's online submission platform. Authors are encouraged to follow best practices for reproducibility and open data sharing in their research.

Author Benefits and Recognition
Publishing in the Pioneer Research Journal of Computing Science offers numerous benefits to authors, including:

Increased visibility due to open-access publication

A fast, thorough peer-review process

DOI assignment for permanent citation and indexing

Opportunities to collaborate with industry leaders and policymakers

Best Paper Awards and recognition for outstanding contributions

The journal offers an excellent opportunity for authors to showcase their work in the rapidly evolving field of computing science and contribute to global technological advancements.

Published Oct 9, 2024

Download

pdf

Statistic

Vol. 1 No. 1 (2024): Volume-I, Issue-I (2024)

Vincent Koc

Abstract

Open datasets are critical to the development and training of large language models (LLMs). However, variations in dataset composition often introduce biases that can impact model performance and reliability. This Article investigates the nature and extent of these variations, categorizes biases inherent in datasets, and examines their implications on LLM training. We also evaluate benchmarking standards currently employed to measure LLM performance and propose enhancements for a fairer and more inclusive evaluation framework. Through extensive experiments and analyses, we reveal the consequences of dataset heterogeneity and demonstrate practical strategies for mitigating biases. Our findings emphasize the importance of transparent dataset curation and robust benchmarking practices to ensure the ethical development of LLMs.

How to Cite

Vincent Koc. (2024). Assessing Variations in Open Datasets for Training Large Language Models: Biases and Benchmarking. Pioneer Research Journal of Computing Science, 1(1), 83–92. Retrieved from https://prjcs.com/index.php/prjcs/article/view/20

About Journal

##plugins.themes.academic_pro.article.sidebar##

##plugins.themes.academic_pro.article.main##

Abstract

##plugins.themes.academic_pro.article.details##