Now 83 visitors
Today:1049 Yesterday:1376
Total: 157 391S 103P 45R
2025-12-10, Week 50
Member Login
Welcome Message
Statistics
Committee
TACT Journal Homepage
Call for Paper
Paper Submission
Find My Paper
Author Homepage
Paper Procedure
FAQ
Registration / Invoice
Paper Archives
Outstanding Papers
Proceedings
Presentation Assistant
Hotel & Travel Info
Photo Gallery
Scheduler Login
Seminar
Archives Login
Sponsors
























Work Method
*** Looking though a Presenation Sample (click!!) as the target.
Step.1: Submit a slide (select slide number + upload .jpg + description) + Write button (Save)
Step.2: Review a submitted sile with .jpg and description, and listen text to speech function
Step.3: Any time, edit it by selecting the slide hyper link on top a slide + Write button (Save)
Let's give it a try right away!!

Paper Number
Paper Title
Keyword
On-line Presentation ** Submit YouTube URL
Slide Number *** Upload slide selecting .jpg surfix file here -> slide15.jpg  
** Min. 20 ~ Max. 40 slides!!
Slide Display
Verbal Description
**Must fill up in details
Save the slide and description

* You can edit any slide by selecting the Slide # below, edit anything, and then 'Write' button (Save)
ICACT20240309 Slide.23        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Thank you for your attention. I'm now open to any questions you may have.

ICACT20240309 Slide.22        [Big slide for presentation]       Chrome Text-to-Speach Click!!


ICACT20240309 Slide.21        [Big slide for presentation]       Chrome Text-to-Speach Click!!
While AI-based anomaly detection is gaining widespread attention and application, the reliance on open datasets for research and testing has certain limitations. Existing open datasets, collected from specific networks, may not be directly applicable to other network environments due to variations in normal and malicious packet behaviors. As a solution, we have proposed a system that collects packets directly from live networks, produced a more accurate representation of the network's unique characteristics. This approach to data collection not only enhances the performance of AI-based anomaly detection but also contributes to the ongoing development of more adaptable systems.

ICACT20240309 Slide.20        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Conclusions

ICACT20240309 Slide.19        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Real data collection is illustrated below. The system detects a variety of application protocols, including TLS, OpenDNS, Radius, Azure, Microsoft 365, BitTorrent, HTTP, Google, DNS, etc. IP address and port number information has been masked for security purposes. Actual traffic information is retained for the training of AI-based anomaly detection models.

ICACT20240309 Slide.18        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Experiment result

ICACT20240309 Slide.17        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Each metadata record is comprised of 44 features. Features 1-3 explain the primary protocol, sub-protocol, and Layer 4 protocol. Features 4-7 pertain to flow start and end times in seconds and sub-seconds. Information about the IP addresses and port numbers of both the source and destination is provided in features 8-11. Features 12-19 provide information regarding the number of packets, packet size, and actual valid data throughput in both directions (source to destination and vice versa, destination to source). Features 20-41 represent the count of packets with specific flags, such as TCP Congestion Window Reduced, ECN-Echo, Urgent, Acknowledge, Push, Reset, Synchronized Sequence Numbers, and the Finish flag, enabled within the flow, in each direction from source to destination and from destination to source. Additional flow-related information is provided in the final feature.

ICACT20240309 Slide.16        [Big slide for presentation]       Chrome Text-to-Speach Click!!
The core of our system is the data collecting process that takes place at the collecting server. Packets are transmitted from the Packet Collecting module to the Metadata extracting module, where the network traffic is analyzed. This process extracts application protocols and information of flow, including IP addresses, port numbers, packet counters, and additional details corresponding to the protocol type. The output of Metadata extracting module, in the form of a string (raw data), is transmitted through a pipeline and stored in a temporary file in minutes. During the data processing, this temporary file is read, and the timestamp is converted to UNIX time format. After removing any abnormal lines, such as those lacking proper newline delimiters, the information is saved to a final file, and the temporary file is deleted. Statistical information regarding the extracted protocols is updated on a minute-by-minute basis.

ICACT20240309 Slide.15        [Big slide for presentation]       Chrome Text-to-Speach Click!!
The data collecting process is described this Figure. In each target network, packets are transmitted from the switch to the data collecting servers using packet control techniques such as mirroring or inline. The output of the data collecting servers consists of log files or extracted files with unique network characteristics. These files are subsequently transmitted to a virtual machine via the Internet. Before being sent to the virtual machine, a firewall is configured to receive only the traffic directed to a specific port number and the IP address of the collection sensor associated with the relevant agency. Finally, the extracted files have been successfully transmitted by sending an acknowledgment message.

ICACT20240309 Slide.14        [Big slide for presentation]       Chrome Text-to-Speach Click!!
The core of our system revolves around a three-step process described in this figure. First, we collected log information from security devices in public institutions. Then, we analyzed the collected IP addresses, comparing them to Threat Intelligence (TI) information to check for potential threats. When a threat is detected, the system is automatically applied to the relevant policy to agencies with API connection, while the rest without API connection receive email notifications about the threat.

ICACT20240309 Slide.13        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Proposed method

ICACT20240309 Slide.12        [Big slide for presentation]       Chrome Text-to-Speach Click!!
The SMAP and MSL datasets consist of expert-labeled telemetry anomaly data from NASA's Soil Moisture Active Passive satellite and Mars Science Laboratory rover. The number of SMAP variables is 1375, while the that of MSL features is 1485, making them significantly more extensive compared to single-entity datasets. These open datasets only reflect the characteristics of the specific networks to which they are applied. While SWAT and WADI datasets pertain to networks in a simulated water plant, SMAP and MSL datasets deal with telemetry data.

ICACT20240309 Slide.11        [Big slide for presentation]       Chrome Text-to-Speach Click!!
The WADI dataset was collected from the WADI testbed, which is an extension of the SWAT testbed. This dataset comprises data from 1233 sensors and actuators and was collected over a 16-day period. Of these 16 days, 14 days were dedicated to normal data collection, while the remaining 2 days involved 15 attacks.

ICACT20240309 Slide.10        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Existing research in this domain has often relied on open datasets offered by various laboratories and research institutions, such as Swat, WaDI, SMAP and MSL. SWAT simulates the operations of a real-world industrial water treatment plant. SWaT was run and data were collected over an 11-day period. The first 7 days were dedicated to normal data collection without any attacks or errors, while the remaining 4 days involved 36 attacks created by the research team. This dataset includes physical properties relevant to the plant and the water treatment process, as well as network traffic within the testbed.

ICACT20240309 Slide.09        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Wei et al. employ Convolutional Neural Networks to learn spatial features in the data and use Recurrent Neural Networks with long-short term memory to learn temporal features. Subsequently, the original datasets DARPA1998 and ISCX2012 undergo preprocessing. The advantage is the improved performance achieved, this approach has only been validated using a fixed dataset, which can be considered a limitation.

ICACT20240309 Slide.08        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Alauthman et al. employ the output of a supervised learning model as the state in the reinforcement learning model, both the SL and RL model are improving through this interaction. This method has a good accuracy rate when the input data is reduced in the model, leading to reduced training time. However, it has been validated in MATLAB using three datasets and has not been implemented in a real network.

ICACT20240309 Slide.07        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Hamamoto et al. utilized Genetic Algorithms to analyze the network and subsequently employed a Fuzzy Logic scheme to determine whether an instance represents an anomaly. This method exhibits high performance in Denial of Service and Distributed Denial of Service attack detection, but it is associated with a high false-negative rate.

ICACT20240309 Slide.06        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Chen et al. analyzed network traffic and employed machine learning methods to identify abnormal behavior and detect malicious apps. They used imbalanced classification methods, including the Synthetic Minority Oversampling Technique combined with Support Vector Machine, SVM Cost-Sensitive, and C4.5 Cost-Sensitive methods. While this approach performed well with highly imbalanced training data, its performance became unstable when the dataset's imbalance ratio was under 1000.

ICACT20240309 Slide.05        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Related work

ICACT20240309 Slide.04        [Big slide for presentation]       Chrome Text-to-Speach Click!!
To address the existing problem, we proposed a novel method and system for generating training data to support AI-based anomaly detection. Our approach is grounded in collecting real-world network traffic data, offering a distinct advantage in accurately reflecting the unique characteristics of the network under consideration. Furthermore, our system is designed to incorporate data related to the latest malicious attacks within the network, ensuring that AI-based anomaly detection methods are well-equipped to handle the dynamic nature of cybersecurity threats.

ICACT20240309 Slide.03        [Big slide for presentation]       Chrome Text-to-Speach Click!!
AI-based anomaly detection methods leverage the power of algorithms and statistical models to learn and recognize patterns in data, enabling them to automatically detect anomalies that might be difficult to identify through traditional approaches. But AI-based anomaly detection methods are only as effect as the data they are trained on. That cause a critical limitation when applying AI-based anomaly detection methods in the real-world network environments. Moreover, the landscape of malicious attacks is constantly evolving, giving rise to new forms and types of threats. To build effective anomaly detection systems, it is essential to have access to data that accurately reflects the latest network conditions and includes the most recent malicious attacks.

ICACT20240309 Slide.02        [Big slide for presentation]       Chrome Text-to-Speach Click!!
What is anomaly detection? Anomaly detection is a technique used in various fields to identify abnormal patterns and potential threats within a given environment. Network intrusions, cybersecurity threats, equipment malfunctions, or fraudulent activities are considered as Abnormal behaviors.

ICACT20240309 Slide.01        [Big slide for presentation]       Chrome Text-to-Speach Click!!
First of all, I¡¯d like to provide an introduction to anomaly detection and AI-based abnormal behavior detection method.

ICACT20240309 Slide.00        [Big slide for presentation]       Chrome Text-to-Speach Click!!
Hi everyone. I am Thi My Truong. Today, We're going to talk about The new development of a new system for generating training data of AI-based anomaly detection.