Workshop on Big Data, IoT & Smart Cities

Le LIA DataNet (CNRS) et le TICLab de l’UIR ont organisé le premier «Workshop on Big Data, IoT & Smart Cities » (ITCities’18, www.itcities.org) le 2 juillet 2018 à l’UIR.
A l’ère de la révolution numérique, la ville intelligente est celle qui améliore la qualité de vie de ses citoyens tout en répondant aux objectifs du développement durable. Qu’il s’agisse de la pollution urbaine, de l’efficacité énergétique ou de la gestion des transports, les villes s’exposent à de nombreux risques et présentent de nouveaux challenges.

L’évènement a réuni plus de 120 chercheurs, universitaires, industriels et experts pour discuter, échanger, et présenter les dernières innovations, perspectives et enjeux de la ville intelligente et du Big Data. L’évènement a aussi permis aux étudiants et chercheurs du TICLab de mettre en avant leurs projets de recherche et d’échanger avec les participants.
Des conférenciers de renom sont intervenus tout au long de la journée sur des sujets relatifs au Big Data, IoT et les villes intelligentes.

Un panel sur «Quel modèle de villes intelligentes pour le Maroc ? » a suscité beaucoup d’échanges entre les intervenants et les participants.

Deux compétitions ont été également au programme :

La première, intitulée « ma thèse en 3 minutes », invitant les étudiants en thèse à présenter, de manière claire et concise, leurs travaux de recherche relatifs à la thématique du workshop. La seconde est une compétition de visualisation de données, « Data visualization », qui propose aux participants une base de données à traiter afin d’en extraire les principales caractéristiques. Deux sessions posters ont été organisées par les thésards du TICLab pour présenter leurs projets de recherche et leurs prototypes, et mettre en avant leurs contributions.

La journée s’est clôturée par la remise des prix aux gagnants des compétitions, qui sont : pour la compétition « ma thèse en 3 minutes », Ihsane Gryech de l’UIR (1er prix), et Soukaina Iherri de l’ENSEM (2ème prix) ; pour la compétition « Data Visualisation », Ayoud Rmidi de l’ENSIAS (1er prix), et l’équipe Hamza Ettaki – Ahmed Taha Moumen – Anas El Baghdadi de l’UIR (2ème prix).

Programme de la journée LIA DataNet Mardi 2 mai 2017

9h15 : Accueil

9h30-9h45 : Tour de table et ouverture du séminaire

Session 1: Social Network Data and informal languages (Chair K. Smaïli)

9h45-10h15 : Nada Sbihi, Ihsane Gryech and Mounir Ghogho (UIR)

Leveraging user intuition to predict item popularity in online social networks
We investigate the problem of early prediction of item popularity in online social networks. Prior work has shown that the time taken by each item to reach i adopters (i being a small number around 5) has a higher predictive power than other non-temporal features, such as those related to the characteristics of the adopters. Here, we challenge this finding by proposing a new feature, based on the users’ intuitions, which is shown to provide significantly better predictive power for the most popular items than the above-mentioned temporal feature. A GoodReads dataset is used to illustrate the merits of the proposed method

10h15-10h45 : K. Abidi and K. Smaïli (Loria)

CALYOU: A Comparable Spoken Algerian Corpus Harvested from YouTube
This presentation addresses the issue of comparability of comments extracted from Youtube. The comments concern spoken Algerian which could be either local Arabic, Modern Standard Arabic or French. This diversity of expression arises a huge number of problems concerning the data processing. Several methods of alignment will be proposed and tested. The method which permits to best align is Word2Vec-based approach that will be used iteratively. This recurrent call of Word2Vec allows to improve significantly the results of comparability. In fact, a dictionary-based approach leads to a Recall of 4, while our approach allows to get a Recall of 33 at rank 1. Thanks to this approach, we built from Youtube CALYOU, a Comparable Corpus of the spoken Algerian.

10h45-11h15 : A. Menacer, K. Smaïli, D. Jouvet, D. Fohr, O. Mella and D. Langlois (Loria)

Towards an Arabic speech recognition system for Algerian dialect
This presentation addresses the problem of Arabic dialect in speech recognition. The Arabic language is characterized by multiple variants, including Modern Standard Arabic (MSA) which is not the native form for Arab people. The mother tongue is the local Arabic named dialect in the following.
We investigate the issue of spoken language in Algeria, which differs significantly from MSA since it
is influenced by Arabic, French, Berber and Turkish. Therefore the acoustic and language models will differ from those used in MSA. Another issue with Algerian dialect is the lack of resources which leads to the underestimation of different models used in the Automatic Speech Recognition (ASR). We start by building a state-of-the-art MSA ASR. This MSA system is evaluated on MSA data and then is applied on spoken Algerian. The performance on MSA is pretty good (a WER of 14.02) but unfortunately on Algerian dialect the performance collapse. We discuss some ideas allowing to improve the results on the dialect and present some results.

11h15-11h30 : Pause

Session 2: Security and Mobile Communication (Chair: M. Ghogho)

11h30 – 12h00 : Abdelkader Lahmadi (Loria)

Security monitoring and analytics for Networked Systems
A large amount of monitoring data is generated by every component of networked systems and also gathered from threat intelligence sources (darkness, network telescopes and honeypots). Extracting useful patterns from these data for security monitoring and prediction is a challenging task. In this talk, we will present some techniques (Hidden Markov Models, Topological Data Analysis and processing mining) that we are using to aggregate and correlate network monitoring data for different domain applications (darknets, advanced persistent threats, industrial control systems) to build attack models and extract their respective patterns.

12h00-12h30 : Ghita Mezzour (UIR)

NATO-funded project ThreatPredict
Predicting attacks can help prevent these attacks or at least reduce their impact. However, the few papers on attack prediction make accurate predictions only hours in advance or cannot predict geo-politically motivated attacks. This project aims to predict different attack types days in advance. We will develop machine-learning algorithms that capture spatio-temporal dynamics of cyber-attacks and global social, geo-political and technical events. We will use various datasets including honeypot data, Symantec WINE field data, GDELT, Twitter, and vulnerability databases. In addition to warning about attacks, this project will improve our understanding of the effect of global events on cyber-security.

12h30 – 14h00 : Déjeuner

14h00 – 14h30 : J.Y. Marion (Loria)

A morphological approach to detect code similarities and to analyse x86 binaries
Binary code analysis is a complex process which can be performed nowadays only by skilled cybersecurity experts whose workload just keeps increasing. Uses cases include vulnerabilities detection, testing, clustering and classification, malware analysis, etc… We develop a tool named Gorille, which is based on the reconstruction of a high level semantics for the binary code. Control flow graphs provide a fair level of abstraction to deal with the binary codes they represent. After applying some graph rewriting rules to normalize these graphs, our software tackles the subgraph search problem in a way which is both efficient and convenient for that kind of graphs. This technique is described as morphological analysis as it recognizes the whole shape of the malware.
That being said, some pitfalls still need to be considered. First of all, the output can only get as good as the input data. And it is known that static disassembly cannot produce the perfect control flow graph since this problem is undecidable. As a matter of facts, malware heavily use obfuscation techniques such as opaque predicates to hide their payloads and confuse analyses. Dynamic analysis should then be used along with static disassembly to combine their strengths. Another dangerous pitfall feared by every expert is the so-called false positives rate: false alarms that make them waste indeed a precious time assessing the reality of the threat. Shared binary code is not always relevant as many software embed static standard libraries. Gorille’s solution to this issue lies in graph rewriting. By rewriting classic subgraphs into configuration-based special nodes, we even obtain a higher abstraction of the control flow graph.

14h30 – 15h00 : B. Honnit, A. Tamtaoui, M.N Saidi (INPT – INSEA)
Moving object detection and classification in video surveillance
Due to the increasing of criminality and terrorism, the world pays more attention to security systems by using several surveillance cameras and funding research on developing an intelligent system for moving object detection, tracking and recognition in video surveillance.
The aim of our work is to propose an approach to: detect the moving object, classify the detected object (human or vehicle) and to recognize the classified object.

Moving object detection is a major step for video analysis; however it is a challenging task for researchers, on account of the following reasons: complex background, camera motion, object size variation, poorly textured objects, illumination condition and shadow. During our research we were able to propose an hybrid approach for moving object detection. It is based on motion and edge detection technique and It makes use of the most three recent consecutive frames to detect moving area. The experimental results show the efficiency of our approach with an accuracy rate of 92.49%. In order to classify the detected objects, we used SVM after computing the shape descriptors (Fourrier and moment of Zernike) and the average rate of good classification was 98%.
Since the classification accuracy depends on the shape descriptor that in some case does not give relevant results; currently, we are working on simulating the classification by using the information fusion techniques. In our case we try to find a method to combine the different descriptors.
Recent studies have proved that the classification results is more efficient when using a deep learning algorithm based on the Convolutional Neural Network (CNN), so we are working on another work based on applying the CNN-based algorithm in the classification stage.

15h00 – 15h30 : V. Varma, D. Bonilla, S. Lasaulce, J. Daafouz, M. Ghogh (CRAN – UIR)

Trajectory planning for energy-efficient vehicles with communications constraints
A new problem of optimizing a wireless mobile terminal trajectory under a given communication constraint is introduced. The mobile or vehicle has to move from a given starting point to a target point while uploading/downloading a given amount of data; this contrasts with the classical mobile communications paradigm where the communication and motion aspects are assumed to be independent. To reach the two aforementioned objectives, the mobile has to move sufficiently close to the wireless base station, while accounting for the energy cost due to its motion. This setup is formalized here and leads us to determine non-trivial trajectories for the mobile. Remarkably, a counterpart of the Snell-Descartes law for the light propagation is exhibited for the optimal trajectory of the mobile when the latter crosses zones in which the available data rates are different. Finally, possible extensions to the multi-agent case are discussed.

15h30 – 15h45 : Pause café

16h00 – 17h00 : Discussions

Programme du Kick-off meeting LIA

Journée du 16 septembre 2015

8h45 : Accueil à l’UIR

9h00 – 9h30 : Ouverture du séminaire par les responsables des différentes institutions

        UL : Karl Tombre
        UIR : Noureddine Mouaddib/ Abdelaziz Benjouad
        CRAN : Didier Wolf
        LORIA : Jean-Yves Marion

1- Volet Recherche

9h30-12h00 : Axe Big data : Méthodes, corpus et applications.
9h30-9h50 : Analyzing big data from an international perspective (Ghita Mezzour, UIR)
9h50-10h10 : MDEO System for Environmental Big Data Aquisition and Processing (Chaker El Amrani, UAE)
10h10-10h30 : High performance computing for Big Data in the Cloud (Riduan Abid, AUI)

10h30-11h00 : Pause

11h00-11h30 : Signal et Santé (Didier Wolf, CRAN)
11h30-12h00 : Requêtes et fouille de données préférentielles : un tour d’horizon (Chedy Raissi, Loria)

12h00-14h00 : Déjeuner

14h00-17h00 : Les réseaux à grande échelle : Méthodes, corpus et applications
14h00-14h20 : Bio-inspired approaches for engineering adaptive systems (Mohamed Bakhouya, UIR)
14h20-14h40 : Empowering communication networks with big data analytics (Mounir Ghogho, UIR)
14h40-15h20 : Contrôle, Réseaux et énergie (Jamal Daafouz, CRAN)

15h20-15h50 : Pause

16h00-16h10 : Projet MoreSolar – Monitoring des fermes solaires (Mounir Ghogho, UIR)
16h10-16h20 : Projet CASA-NET – Efficacité énergétique dans le bâtiment (Mohamed Bakhouya, UIR)
16h20-16h30 : Projet GTR – Prédiction de trafic (Nada Sbihi, UIR)
16h30-17h00 : Modélisation et simulation des smart-grids avec l’outil MECSYCO (Vincent Chevrier, Loria)

Journée du 17 septembre 2015

9h00-10h20 : Traitement des informations du web.

9h00-9h40 : Opinion mining (Houda Benbrahim, ENSIAS)
9h40-10h00 : Traitement des données informelles : Application au traitement de quelques dialectes arabes (Kamel Smaïli, Loria)
10h00-10h20 : Subjectivité et manipulation : Quelques perspectives pour les systèmes de recommandation et le learning analytics (Geoffray Bonnin, Loria)

10h20-11h00 : Bilan des activités de recherche

11h00-11h30 : Pause

2- Volet pédagogique

11h30-12h30 : Actions pédagogiques communes
12h30 : Fin du kick-off