Research Notebook

alt-text-1

Problem statement and significance

This research project seeks to answer how machine learning models can be used to detect toxicity in gilled mushrooms (Agaricacaeae fungi). Due to the anticancer properties of its beta glucans, pharmaceuticals are working find mushrooms whose beta glucans are more easily extractable to develop modern medicines. A machine learning model that detects the presence of toxic chemicals in a mushroom can help pharmacologists identify nontoxic mushrooms to develop medicines safe for consumption, especially when they encounter one of the estimated 2-3 million unidentified mushroom species.

Uniqueness

The research takes on a novel approach by building binary classifiers from algorithms that have not previously tested in verified research for detecting the presence of toxic chemicals in gilled mushrooms, along with conducting a clustering analysis to determine combinations of gilled mushroom attributes that are more likely to indicate toxicity.

Procedural summary

The algorithms tested separately to build the binary classifiers are Random Forest (RF), Logistic Regression, and K-Nearest Neighbors (K-NN). The algorithms tested for clustering analysis will be K-means clustering and hierarchical clustering. The choices of classification and clustering algorithms serve as the independent variable. The dependent variables will be the percent accuracy of each classifier determined from confusion matrices, and the accuracy rate of each classifier determined from cumulative accuracy profiles.

Benefits of using machine learning

Identifying the existence of toxic chemicals in gilled mushrooms is difficult for humans, as there are no rules of thumb to abide by. Therefore, implementing machine learning in mushroom toxicity detection can also provide poison control centers with a diagnosis tool that is quicker and more inexpensive than present manual alternatives.