Data Validation Rules
When uploading a dataset, the system automatically performs validation checks to ensure data quality and consistency. If validation fails, the dataset will not be saved and clear error messages will be displayed.
1. Supported File Formats
The following file formats are allowed:
- .csv
- .tsv
- .xls
- .xlsx
- .json
- .xml
- .geojson
If a file with an unsupported extension is uploaded, the system will display an error indicating the allowed formats.
Screenshot: Unsupported File Format Error
2. File Format Detection
The system detects the file format automatically. If the format cannot be identified or the file cannot be read, an error will be shown.
Screenshot: File Read Error
3. Missing Value Validation
The system checks each column for missing or blank values.
A value is considered missing if:
- The cell is empty
- The cell contains only whitespace
If missing values are found:
- The column name will be displayed.
- Row numbers containing missing values will be shown.
- If many rows contain missing values, only the first few rows are displayed for clarity.
- Users must fill in the missing values with appropriate data before uploading the file.
Screenshot: Missing Value Validation Error
4. Duplicate Row Validation
The system checks for duplicate rows in the dataset.
- If duplicate rows are found, their row positions will be displayed.
- If many duplicates exist, only the first few row numbers are shown along with the total count.
- Users must remove duplicate rows before uploading the file.
Note: For complex JSON or XML files containing nested objects, duplicate checking may be skipped if the data structure is not suitable for comparison.
Screenshot: Duplicate Row Validation Error
5. Error Handling
If any validation errors are detected:
- The dataset will not be saved.
- All validation errors will be displayed together.
- Users must correct the errors and re-upload the file.
These validations ensure that only clean, structured, and reliable data is published on the platform.