Development of a Water Solubility Dataset to Establish Best Practices for Curating New Datasets for QSAR Modeling

The U.S. Environmental Protection Agency’s CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) hosts a plethora of environmentally-relevant chemical information, including physical property data suitable for QSAR/QSPR modeling. The development of these physical property datasets has generally involved the curation of publicly-available experimental data. The ease of accessing this data, along with the overall quality of the dataset (i.e. machine-readable formatting, inclusion of experimental conditions, etc) is highly variable. This purpose of this work is to identify the challenges associated with acquiring physical property datasets, with a focus on obtaining water solubility values for organic compounds. Common issues discovered in this data will be presented, along with solutions that can be easily implemented in a high-throughput manner. The end result will be a standard workflow a researcher can follow when curating physical property datasets. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.