9h15 : Accueil
9h30-9h45 : Tour de table et ouverture du séminaire
Session 1: Social Network Data and informal languages (Chair K. Smaïli)
9h45-10h15 : Nada Sbihi, Ihsane Gryech and Mounir Ghogho (UIR)
Leveraging user intuition to predict item popularity in online social networks
We investigate the problem of early prediction of item popularity in online social networks. Prior work has shown that the time taken by each item to reach i adopters (i being a small number around 5) has a higher predictive power than other non-temporal features, such as those related to the characteristics of the adopters. Here, we challenge this finding by proposing a new feature, based on the users’ intuitions, which is shown to provide significantly better predictive power for the most popular items than the above-mentioned temporal feature. A GoodReads dataset is used to illustrate the merits of the proposed method
10h15-10h45 : K. Abidi and K. Smaïli (Loria)
CALYOU: A Comparable Spoken Algerian Corpus Harvested from YouTube
This presentation addresses the issue of comparability of comments extracted from Youtube. The comments concern spoken Algerian which could be either local Arabic, Modern Standard Arabic or French. This diversity of expression arises a huge number of problems concerning the data processing. Several methods of alignment will be proposed and tested. The method which permits to best align is Word2Vec-based approach that will be used iteratively. This recurrent call of Word2Vec allows to improve significantly the results of comparability. In fact, a dictionary-based approach leads to a Recall of 4, while our approach allows to get a Recall of 33 at rank 1. Thanks to this approach, we built from Youtube CALYOU, a Comparable Corpus of the spoken Algerian.
10h45-11h15 : A. Menacer, K. Smaïli, D. Jouvet, D. Fohr, O. Mella and D. Langlois (Loria)
Towards an Arabic speech recognition system for Algerian dialect
This presentation addresses the problem of Arabic dialect in speech recognition. The Arabic language is characterized by multiple variants, including Modern Standard Arabic (MSA) which is not the native form for Arab people. The mother tongue is the local Arabic named dialect in the following.
We investigate the issue of spoken language in Algeria, which differs significantly from MSA since it
is influenced by Arabic, French, Berber and Turkish. Therefore the acoustic and language models will differ from those used in MSA. Another issue with Algerian dialect is the lack of resources which leads to the underestimation of different models used in the Automatic Speech Recognition (ASR). We start by building a state-of-the-art MSA ASR. This MSA system is evaluated on MSA data and then is applied on spoken Algerian. The performance on MSA is pretty good (a WER of 14.02) but unfortunately on Algerian dialect the performance collapse. We discuss some ideas allowing to improve the results on the dialect and present some results.
11h15-11h30 : Pause
Session 2: Security and Mobile Communication (Chair: M. Ghogho)
11h30 – 12h00 : Abdelkader Lahmadi (Loria)
Security monitoring and analytics for Networked Systems
A large amount of monitoring data is generated by every component of networked systems and also gathered from threat intelligence sources (darkness, network telescopes and honeypots). Extracting useful patterns from these data for security monitoring and prediction is a challenging task. In this talk, we will present some techniques (Hidden Markov Models, Topological Data Analysis and processing mining) that we are using to aggregate and correlate network monitoring data for different domain applications (darknets, advanced persistent threats, industrial control systems) to build attack models and extract their respective patterns.
12h00-12h30 : Ghita Mezzour (UIR)
NATO-funded project ThreatPredict
Predicting attacks can help prevent these attacks or at least reduce their impact. However, the few papers on attack prediction make accurate predictions only hours in advance or cannot predict geo-politically motivated attacks. This project aims to predict different attack types days in advance. We will develop machine-learning algorithms that capture spatio-temporal dynamics of cyber-attacks and global social, geo-political and technical events. We will use various datasets including honeypot data, Symantec WINE field data, GDELT, Twitter, and vulnerability databases. In addition to warning about attacks, this project will improve our understanding of the effect of global events on cyber-security.
12h30 – 14h00 : Déjeuner
14h00 – 14h30 : J.Y. Marion (Loria)
A morphological approach to detect code similarities and to analyse x86 binaries
Binary code analysis is a complex process which can be performed nowadays only by skilled cybersecurity experts whose workload just keeps increasing. Uses cases include vulnerabilities detection, testing, clustering and classification, malware analysis, etc… We develop a tool named Gorille, which is based on the reconstruction of a high level semantics for the binary code. Control flow graphs provide a fair level of abstraction to deal with the binary codes they represent. After applying some graph rewriting rules to normalize these graphs, our software tackles the subgraph search problem in a way which is both efficient and convenient for that kind of graphs. This technique is described as morphological analysis as it recognizes the whole shape of the malware.
That being said, some pitfalls still need to be considered. First of all, the output can only get as good as the input data. And it is known that static disassembly cannot produce the perfect control flow graph since this problem is undecidable. As a matter of facts, malware heavily use obfuscation techniques such as opaque predicates to hide their payloads and confuse analyses. Dynamic analysis should then be used along with static disassembly to combine their strengths. Another dangerous pitfall feared by every expert is the so-called false positives rate: false alarms that make them waste indeed a precious time assessing the reality of the threat. Shared binary code is not always relevant as many software embed static standard libraries. Gorille’s solution to this issue lies in graph rewriting. By rewriting classic subgraphs into configuration-based special nodes, we even obtain a higher abstraction of the control flow graph.
14h30 – 15h00 : B. Honnit, A. Tamtaoui, M.N Saidi (INPT – INSEA)
Moving object detection and classification in video surveillance
Due to the increasing of criminality and terrorism, the world pays more attention to security systems by using several surveillance cameras and funding research on developing an intelligent system for moving object detection, tracking and recognition in video surveillance.
The aim of our work is to propose an approach to: detect the moving object, classify the detected object (human or vehicle) and to recognize the classified object.
Moving object detection is a major step for video analysis; however it is a challenging task for researchers, on account of the following reasons: complex background, camera motion, object size variation, poorly textured objects, illumination condition and shadow. During our research we were able to propose an hybrid approach for moving object detection. It is based on motion and edge detection technique and It makes use of the most three recent consecutive frames to detect moving area. The experimental results show the efficiency of our approach with an accuracy rate of 92.49%. In order to classify the detected objects, we used SVM after computing the shape descriptors (Fourrier and moment of Zernike) and the average rate of good classification was 98%.
Since the classification accuracy depends on the shape descriptor that in some case does not give relevant results; currently, we are working on simulating the classification by using the information fusion techniques. In our case we try to find a method to combine the different descriptors.
Recent studies have proved that the classification results is more efficient when using a deep learning algorithm based on the Convolutional Neural Network (CNN), so we are working on another work based on applying the CNN-based algorithm in the classification stage.
15h00 – 15h30 : V. Varma, D. Bonilla, S. Lasaulce, J. Daafouz, M. Ghogh (CRAN – UIR)
Trajectory planning for energy-efficient vehicles with communications constraints
A new problem of optimizing a wireless mobile terminal trajectory under a given communication constraint is introduced. The mobile or vehicle has to move from a given starting point to a target point while uploading/downloading a given amount of data; this contrasts with the classical mobile communications paradigm where the communication and motion aspects are assumed to be independent. To reach the two aforementioned objectives, the mobile has to move sufficiently close to the wireless base station, while accounting for the energy cost due to its motion. This setup is formalized here and leads us to determine non-trivial trajectories for the mobile. Remarkably, a counterpart of the Snell-Descartes law for the light propagation is exhibited for the optimal trajectory of the mobile when the latter crosses zones in which the available data rates are different. Finally, possible extensions to the multi-agent case are discussed.
15h30 – 15h45 : Pause café
16h00 – 17h00 : Discussions