Thesis

Evaluating the Effectiveness of Single Words, Chargrams, N-Grams, and Sentence Embeddings in Spam and Phishing Classification

Supervisor

Thesis: Bachelor’s/Master’s Thesis

This thesis compares the effectiveness of different text representation techniques, i.e., single words, chargrams, n-grams, or sentence embeddings, for spam and phishing detection. The goal is to identify the differences, weaknesses, and strengths for each method and compare them.

Prerequisities

Required

  • Basic understanding of machine learning and articifial intelligence (finished the course Foundations of Artificial Intelligence)
  • Familiarity with Natural Language Processing techniques, e.g., text classification and feature extraction techniques
  • Proficiency in at least one programming language (preferably Python)

Optional

  • You took the following courses:
    • Internettechnologies & Web Engineering
    • Advanced Methods of Machine Learning
    • Security in Communication Networks
  • Familiarity with evaluation metrics for AI models
  • Proficiency in using LaTeX