Detecting unknown malicious code by applying classification techniques on OpCode patterns

شناسایی کد مخرب ناشناخته توسط تکنیک های دسته بندی روی الگوهای OpCode

دانلود رایگان فایل pdf انگلیسی دانلود ترجمه تخصصی این مقاله

<a target="_blank" rel="nofollow" href="https://maghalejoo.com/wp-content/uploads/2018/12/24/4997.pdf">Download PDF</a>

Outline

Abstract
1. Introduction
2. Background
3. Methods
4 Evaluation
5 Experiments and Results
6. Discussion and Conclusions
Notes
Supplementary Material
References

رئوس مطالب

چکیده
1. مقدمه
2. پیشینه
1 2. تشخیص بدافزار ناشناخته با استفاده از الگوهای Byte N-Grams
2 2. نمایش فایل های اجرایی با استفاده از OpCodes
3 2. مسئله عدم تعادل
3. روشها
2 3. ساخت مجموعه داده
3 3. آماده سازی داده ها و انتخاب ویژگی
4. ارزیابی
5. آزمایشات و نتایج
1 5. آزمایش
1 1 5. نمایش ویژگی در برابر n-grams
2 1 5. انتخاب ویژگی و انتخاب های برتر
3 1 5. دسته بندها
4 1 5. تغییر اندازه های OpCode n-gram
2 5. آزمایش
3 5. آزمایش
6. بحث و نتیجه گیری

Abstract

In previous studies classification algorithms were employed successfully for the detection of unknown malicious code. Most of these studies extracted features based on byte n-grampatterns in order to represent the inspected files. In this study we represent the inspected files using OpCode n-gram patterns which are extracted from the files after disassembly. The OpCode n-gram patterns are used as features for the classification process. The classification process main goal is to detect unknown malware within a set of suspected files which will later be included in antivirus software as signatures. A rigorous evaluation was performed using a test collection comprising of more than 30,000 files, in which various settings of OpCode n-gram patterns of various size representations and eight types of classifiers were evaluated. A typical problem of this domain is the imbalance problem in which the distribution of the classes in real life varies. We investigated the imbalance problem, referring to several real-life scenarios in which malicious files are expected to be about 10% of the total inspected files. Lastly, we present a chronological evaluation in which the frequent need for updating the training set was evaluated. Evaluation results indicate that the evaluated methodology achieves a level of accuracy higher than 96% (with TPR above 0.95 and FPR approximately 0.1), which slightly improves the results in previous studies that use byte n-gram representation. The chronological evaluation showed a clear trend in which the performance improves as the training set is more updated.

دانلود ترجمه تخصصی این مقاله دانلود رایگان فایل pdf انگلیسی