Development of a Water Solubility Dataset to Establish Best Practices for Curating New Datasets for QSAR Modeling
posterposted on 02.12.2019 by EPA's Center for Computational Toxicology and Exposure
Poster sessions are particularly prominent at academic conferences. Posters are usually one frame of a powerpoint (or similar) presentation and are represented at full resolution to make them zoomable.
The U.S. Environmental Protection Agency’s CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) hosts a plethora of environmentally-relevant chemical information, including physical property data suitable for QSAR/QSPR modeling. The development of these physical property datasets has generally involved the curation of publicly-available experimental data. The ease of accessing this data, along with the overall quality of the dataset (i.e. machine-readable formatting, inclusion of experimental conditions, etc) is highly variable. This purpose of this work is to identify the challenges associated with acquiring physical property datasets, with a focus on obtaining water solubility values for organic compounds. Common issues discovered in this data will be presented, along with solutions that can be easily implemented in a high-throughput manner. The end result will be a standard workflow a researcher can follow when curating physical property datasets. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.