Preparing a New Dataset

Helper for preparing a new dataset.

antinex_utils.prepare_dataset_tools.find_all_headers(use_log_id=None, pipeline_files=[], label_rules=None)[source]
Parameters:
  • use_log_id – label for debugging in logs
  • pipeline_files – list of files to prep
  • label_rules – dict of rules to apply
antinex_utils.prepare_dataset_tools.build_csv(pipeline_files=[], fulldata_file=None, clean_file=None, post_proc_rules=None, label_rules=None, use_log_id=None, meta_suffix='metadata.json')[source]
Parameters:
  • pipeline_files – list of files to process
  • fulldata_file – output of non-edited merged data
  • clean_file – cleaned csv file should be ready for training
  • post_proc_rules – apply these rules to post processing (clean)
  • label_rules – apply labeling rules (classification only)
  • use_log_id – label for tracking the job in the logs
  • meta_suffix – file suffix
antinex_utils.prepare_dataset_tools.find_all_pipeline_csvs(use_log_id=None, csv_glob_path='/opt/antinex/datasets/**/*.csv')[source]
Parameters:
  • use_log_id – label for logs
  • csv_glob_path – path to files to process