Therapeutics Data Commons is an open-science initiative started at Harvard University with AI/ML-ready datasets and ML tasks for therapeutics. We welcome contributions from the research community.
Since its inception a year ago, over 39,000 scientists worldwide have used Therapeutics Data Commons. In addition, TDC has 3,000 active users every month whose backgrounds span machine learning, chemistry, and biology.
TDC provides an ecosystem of tools, leaderboards, and community resources, including data functions, strategies for model benchmarking and comparison, meaningful data splits, data processors, public leaderboards, and molecule generation oracles. All resources are integrated via an open Python library.
The lack of high-quality benchmarks impedes the advancement of ML tools for drug discovery. To this end, TDC supports the development of novel ML theory and methods, with a strong bent towards developing the mathematical foundations of which ML algorithms are most suitable for drug discovery applications and why. TDC contains benchmarks for therapeutics ML tasks, including molecular property prediction, molecular interaction prediction, and molecular optimization, all accompanied by extensive programmatic support and leaderboards.
To attend TDC User Group Meeting, register HERE.