Netml: A challenge for network traffic analytics

Abstract

Classifying network traffic is the basis for important network applications. Prior research in this area has faced challenges on the availability of representative datasets, and many of the results cannot be readily reproduced. Such a problem is exacerbated by emerging data-driven machine learning based approaches. To address this issue, we provide three open datasets containing almost 1.3M labeled flows in total, with flow features and anonymized raw packets, for the research community. We focus on broad aspects in network traffic analysis, including both malware detection and application classification. We release the datasets in the form of an open challenge called NetML and implement several machine learning methods including random-forest, SVM and MLP. As we continue to grow NetML, we expect the datasets to serve as a common platform for AI driven, reproducible research on network flow analytics.

Publication
In arXiv preprint arXiv:2004.13006