Youâ€™ll have 2 files: a training file and a testing file.
Question A: Conduct a classification experiment as follows:
– Using a training file, classify text samples by applying a grid search and select the best parameter values. Apply these parameter values to the training file.
– Test the classifier created by using the testing file. Give the precision, recall and f1-score of each label.
– Plot precision-recall curve and ROC curve, calculate AUC and Average Precision.
– Print best parameter values from grid search. Print testing performance.
Question B: Determine K in k-fold cross validation
– Use the training tile. Create a td-idf matrix
– Conduct k-fold cross validation for different values 2 to 20. For each k, train a classifier using multinomial Naive Bayes, train a classifier using linear support vector machine. For each classifier, collect the average AUC across folds
– Plot a line chart for relationship between sample size and AUC