How It Works
The OBFCM Data Quality Checker performs a comprehensive 7-step validation process to ensure data integrity. Each step targets specific quality issues, from basic data completeness to sophisticated pattern detection that identifies potential data fabrication. The tool generates detailed reports with visualizations and statistical summaries to help you understand and address data quality issues.
The 7-Step Validation Process
- Data Loading & Schema Adaptation: Automatically handles compressed files (ZIP, 7Z), maps column names to standard format, and adapts to schema changes with clear notifications.
- Basic Quality Checks: Identifies missing values (NA), zero values, constant columns, and low cardinality columns that may indicate data collection issues.
- Domain Range Validation: Validates values against physical and domain constraints (e.g., EDS 0-100%, Electric range ≤140 km, TA_CO₂ ≤101 g/km for PHEVs).
- Pattern Detection: Detects suspicious patterns including round numbers, repetitive sequences, Benford's Law violations, and manufacturer-specific anomalies that may indicate data fabrication.
- Paper A Validation Steps: Applies all 8 validation steps from the Paper A methodology: CS Invalid, Missing RW_EC, Missing OEM/Model, RW_EC Zero Reporting, VFN Issues, Physics CO₂/FC, Mileage/FC Inconsistency, and EDS/Energy Violations.
- Statistical Analysis: Performs overrepresentation analysis, compares flagged vs clean vehicles, and generates summary statistics to identify systematic issues.
- Report Generation: Creates comprehensive Markdown and HTML reports with embedded visualizations, tables, and detailed findings for documentation and sharing.
Get Started
Choose how you want to use the tool. You can install it as an R package for easy integration into your workflow, or download the standalone script for one-time use or custom modifications.
📦 Option 1: R Package (Recommended)
Install as an R package for easy updates, version control, and full documentation.
# Install from GitHub
devtools::install_github("philipposk/obfcmQualityChecker")
# Load the package
library(obfcmQualityChecker)
# Run the tool
main()
📄 Option 2: Standalone Script
Download the standalone R script for direct execution without package installation.
# Click "Download Script" button above to download
# Then run it directly:
Rscript OBFCM_Data_Quality_Checker_STANDALONE.R
Key Features
🔍 Comprehensive Quality Checks
NA/zero/constant detection, low cardinality identification, and domain range validation
🎯 Pattern Detection
Round numbers, repetitive sequences, Benford's Law, and manufacturer-specific patterns
✅ Paper A Validation
All 8 validation steps from the Paper A methodology for systematic data quality assessment
📊 Rich Visualizations
Quality check summaries, validation step charts, pattern detection plots, and comparison visualizations
📄 Multiple Report Formats
Generate Markdown and HTML reports with embedded results, tables, and figures
🚀 High Performance
Efficiently handles large datasets (10M+ rows) using data.table and supports compressed file formats
🔧 Flexible Configuration
Choose columns to check (Basic Package, All, Specific, Custom) and select pattern detection methods
🔄 Schema Adaptation
Automatically adapts to schema changes and provides clear notifications about column mapping