|PROJECTS AT VISL FINISHED IN 2006|
Neural microcircuits are basic components of human brain and biological mechanisms, responsible for reflexes. Although, the speed of signal processing in modern electronic devices is faster by millions of times than in biological organism, there is no doubt, that human brain remains being the most powerful computer. That derives from advanced and very effective parallel processing techniques, applied in biological organisms, which developed during millions of years of evolution.
In science and engineering the Neural Networks models have proved their recognition and parallel processing power in many fields. It is widely used for recognition and even for prediction, for example in prediction the behavior of stock market. In this project another application of recurrent neural networks architecture - real time String Matching - was researched, and this approach has definitely several advantages, respectively to existing solutions of this problem.
The problem overview:
String matching is a comprehensive and effective applicable key technology beyond Intrusion Detection Systems (IDS). These applications are usually deployed at choke points of a network where there is heavily traffic. Many areas can benefit from a faster string matching algorithm, which can be used in IDS, firewalls, and detecting long sequences from fast streaming data. It is necessary to develop a faster string matching algorithm, which would not be a bottleneck in the network.
In case of String Matching application of LSM, the input u(t) is continuously (in real time) streaming binary data from network and each readout function is aimed to detect in real time the corresponding network intrusion. For example if this LSM is purposed to detect 1000 network intrusions (viruses warms and ets.), then we need 1000 readout functions.
It is important that the NN (called microcircuit at the figure) has not being changed once it created at the beginning of the process, because training considers finding the readout functions f(X(t)). However, creating suitable NN requires the knowledge of it purpose and influence of NN parameters on its properties.
1) The input connections u(t) ;
2) The NN state x(t) ;
3) The readout functions f(x(t))
were used strings from Clam Antivirus Database and the detection algorithm was applied to them. The system was trained and tested in order to detect 1000 longest (1300-1500 bits) intrusion strings from the database and a large set of random strings. The result can be presented by two types of errors:
- Miss-Hit per NN: 0.001 - 0.8
- False-Alarm per NN: negligible
In other words, the probability of full system error is the probability that 8 NN's has wrong decision at the same time, multiplied by all permutations, plus 9 NN's are wrong and ets...
One can assume that the first element of the sum has the highest and most significant value, while others are negligible. Based on the above, the calculation of full system error becames very simple. This approach gives the possibility to reduce the total error to as low as required values. In other words :
- Both types of error of the combined system are negligible.
- Starting the simulation with a random string and inserting random strings between intrusions' strings is not necessary, but it immitates a realistic situation in the network.
- The best way to examine the system for False-Alarm error is to test it on a long random string (in instance zeros(1,1000) = 0.4 Mb).
Hardware model concept:
The illustration below shows how to integrate a String Matching algorithm based Neural Networks into IDS hardware:
The system receives a binary input from the net, 8 bits each clock cycle and branches it into two directions. The first is only delayed by the system's latency. The second passes through the detection system (Neural Networks plus the Readout). The output of the detection system is a binary decision block/enable. In addition, it is possible to give an information about what exactly intrusion has been detected. In case an intrusion has been detected, the system blocks the input and reports to the corresponding software, which performs an appropriative action. Based on several researches, the time characteristics of biological neuron and synapse can be scaled by factor of over 0.5 - 1 millions by implementing it on similar hardware model. That means, that hardware implementation of the presented String Matching machine can withstand a throughput of over 16 Gbps if it receives the input of 8 bits per clock cycle and 32 Gbps with 16 bits per clock cycle.
MATLAB was used as an environment for executing and analyzing experiments data.
CSIM framework (http://www.lsm.tugraz.at/) have been used for simulating neural networks in experiments.