🔍 OBFCM Data Quality Checker

Professional-grade tool for validating OBFCM datasets

Detect anomalies, identify data quality issues, and uncover potential fabrication patterns

How It Works

The OBFCM Data Quality Checker performs a comprehensive 7-step validation process to ensure data integrity. Each step targets specific quality issues, from basic data completeness to sophisticated pattern detection that identifies potential data fabrication. The tool generates detailed reports with visualizations and statistical summaries to help you understand and address data quality issues.

The 7-Step Validation Process

  1. Data Loading & Schema Adaptation: Automatically handles compressed files (ZIP, 7Z), maps column names to standard format, and adapts to schema changes with clear notifications.
  2. Basic Quality Checks: Identifies missing values (NA), zero values, constant columns, and low cardinality columns that may indicate data collection issues.
  3. Domain Range Validation: Validates values against physical and domain constraints (e.g., EDS 0-100%, Electric range ≤140 km, TA_CO₂ ≤101 g/km for PHEVs).
  4. Pattern Detection: Detects suspicious patterns including round numbers, repetitive sequences, Benford's Law violations, and manufacturer-specific anomalies that may indicate data fabrication.
  5. Paper A Validation Steps: Applies all 8 validation steps from the Paper A methodology: CS Invalid, Missing RW_EC, Missing OEM/Model, RW_EC Zero Reporting, VFN Issues, Physics CO₂/FC, Mileage/FC Inconsistency, and EDS/Energy Violations.
  6. Statistical Analysis: Performs overrepresentation analysis, compares flagged vs clean vehicles, and generates summary statistics to identify systematic issues.
  7. Report Generation: Creates comprehensive Markdown and HTML reports with embedded visualizations, tables, and detailed findings for documentation and sharing.

Get Started

Choose how you want to use the tool. You can install it as an R package for easy integration into your workflow, or download the standalone script for one-time use or custom modifications.

📦 Option 1: R Package (Recommended)

Install as an R package for easy updates, version control, and full documentation.

# Install from GitHub
devtools::install_github("philipposk/obfcmQualityChecker")

# Load the package
library(obfcmQualityChecker)

# Run the tool
main()
View on GitHub Documentation

📄 Option 2: Standalone Script

Download the standalone R script for direct execution without package installation.

# Click "Download Script" button above to download
# Then run it directly:
Rscript OBFCM_Data_Quality_Checker_STANDALONE.R
Documentation

Key Features

🔍 Comprehensive Quality Checks

NA/zero/constant detection, low cardinality identification, and domain range validation

🎯 Pattern Detection

Round numbers, repetitive sequences, Benford's Law, and manufacturer-specific patterns

✅ Paper A Validation

All 8 validation steps from the Paper A methodology for systematic data quality assessment

📊 Rich Visualizations

Quality check summaries, validation step charts, pattern detection plots, and comparison visualizations

📄 Multiple Report Formats

Generate Markdown and HTML reports with embedded results, tables, and figures

🚀 High Performance

Efficiently handles large datasets (10M+ rows) using data.table and supports compressed file formats

🔧 Flexible Configuration

Choose columns to check (Basic Package, All, Specific, Custom) and select pattern detection methods

🔄 Schema Adaptation

Automatically adapts to schema changes and provides clear notifications about column mapping