Zum Hauptinhalt

SoBigData Academy

SoBigData promotes an open innovation culture in Big Data and AI, offering educational resources to support responsible data science. You’ll find engaging and flexible MOOCs designed for a relaxed yet captivating learning experience. 

Challenge yourself with interactive “play to learn” games, and earn certifications to build your professional portfolio.


SoBigData Academy

Start Learning

Basic Python

Basic Python

Difficulty level Difficulty-level-1
Database

Database

Difficulty level Difficulty-level-1
Data Analysis

Data Analysis

Difficulty level Difficulty-level-2
IR

Information Retrieval

Difficulty level Difficulty-level-3
Data Mining & Machine Learning

Data Mining & Machine Learning

Difficulty level Difficulty-level-3
Data Analysis with Spark

Data Analysis with Spark

Difficulty level Difficulty-level-2
Data Theory and Society

Data Theory and Society

Difficulty level Difficulty-level-1
Complex Network Analysis for modeling socioeconomic systems

Complex Network Analysis

Difficulty level Difficulty-level-2





GEANT
Big Data Storage

Big Data Storage

Difficulty level Difficulty-level-1
GEANT
Elasticsearch

Elasticsearch

Difficulty level Difficulty-level-1
GEANT
GitHub

GitHub

Difficulty level Difficulty-level-1
GEANT
JSON

JSON

Difficulty level Difficulty-level-1
GEANT
XML

XML

Difficulty level Difficulty-level-1
GEANT
YAML

YAML

Difficulty level Difficulty-level-1





Artificial Intelligence
AVAILABLE EARLY 2025

Artificial Intelligence

Difficulty level Difficulty-level-2
Data Visualization and Storytelling
AVAILABLE EARLY 2025

Data Visualization and Storytelling

Difficulty level Difficulty-level-2
Neural Networks & Deep Learning
AVAILABLE EARLY 2025

Neural Networks & Deep Learning

Difficulty level Difficulty-level-2
Reinforcement Learning
AVAILABLE EARLY 2025

Reinforcement Learning Theory and Practice

Difficulty level Difficulty-level-2
Text Analytics
AVAILABLE EARLY 2025

Text Analytics

Difficulty level Difficulty-level-2

Info on SoBigData

SoBigData RI is a distributed, Pan-European, multi-disciplinary research infrastructure that uses social mining and big data to understand the complexity of our contemporary, globally interconnected society. The RI is built on a common “digital laboratory” of agreed services and tools. SoBigData brings together researchers and research organizations that are independent of commercial interests and provides tools for data access as well as ethical and legal assessment with the consultancy of experts in a multi-disciplinary framework. 

Visit our website (www.sobigdata.eu) to discover all the services provided by SoBigData as well as our catalogue rich with datasets, methods, and technologies for Data Science. Explore our guidelines on EU Ethical, Legal, Social, Economic, and Cultural (ELSEC) values for data scientists. The FAIR (Findable, Accessible, Interoperable, and Reusable) and FACT (Fair, Accurate, Confidential, and Transparent) principles are key aspects of our research and tools. The infrastructure supports innovation and cutting-edge science in collaboration with a network of experts, research institutions, and industries. 

This section hosts training materials developed within the SoBigData project. To access a Training Material, click on its title.

Courses and Lectures

The following resources are standalone courses organised within the SoBigData consortium.

GATE Course

A course with lectures, hands-on sessions and exercises on GATE, an open source software devoted to resolving text processing problems. Topics include the GATE developer GUI; JAPE, GATE’s pattern language used to perform ruled-based text processing; GATE’s use in social media; Crowdsourcing using GATE; GATE Cloud; GATE’s search engine Mimir and machine learning in GATE.

SOS Online abuse of politicians

A video presentation describing GATE team’s work on online abuse of UK politicians. The presentation emphasises ethics and the social implications of online abuse, and provides a link to the dataset on the SoBigData website.

Data Mining and Machine Learning for Social Science

An introductory course for data mining and machine learning for social science. The course focuses on presenting typical data mining and machine learning techniques by using a variety of examples in social science.

Visual Analytics for Data Scientists

This module focuses on the principles and rules of visual data representation and human-computer interaction, visual analytics methods and systems, structure and propriety evaluation analysis, combination of visualisation, interactive techniques and computational processing.

Business and Data Analytics Course

This course provides users with a hands-on experience for solving business problems in the fields of sales, marketing, and business operations by applying statistical analysis and data mining techniques.

Introduction to Data Curation

This course provides an introduction to data collection, data preparation & transformation and data analysis, and is specifically designed for PhD students.

Master in Big Data Analytics and Social Mining

SoBigData Master Program Training Materials have been developed within the Post-Graduate Master in Big Data Analysis and Social Mining held at the University of Pisa, Italy.

What follow are a number of modules which comprise the Master Program.

Database Module

This module aims to introduce database analysis, focusing on DBMS architecture, Relational Models, SQL language and SQL nested queries.

Data Journalism and Storytelling Module

The module focuses on knowledge extracted from Big Data using multimedia story telling. It also showcases some of the most meaningful experiences of data journalism and storytelling.

Data Management for Business Intelligence Module

This module introduces information storage and management performed in order to support business decisions of organisations.

Data Mining and Machine Learning Module

This module provides an introduction to base concepts of data mining and knowledge extraction process, introducing analytical models and algorithms for clustering, classification and pattern discovery, also referring Big Data sources.

Data Visualisation and Visual Analytics Module

This module provides insight into designing an effective data visualisation. Moreover, it focuses on visual variables, providing an introduction to D3.js and case studies and examples.

High Performance and Scalable Analytics Module

A comprehensive module providing an overview on Social Mining and Big Data in different contexts, such as Mobility, Transactional, Network and Sport data. Some general concepts of the most used technologies over Big Data are presented as well an overview of the basic components for a Big Data Laboratory.

Information Retrieval Module

This module provides insight into the design and analysis of Information Retrieval systems which are efficient and effective to process, mine, search, cluster and classify Big Data document collections, coming from textual as well as any unstructured domain.

Text Analytics and Opinion Mining Module

This module offers insight into general text mining problems and methods, demonstrating situations in which sentiment analysis can solve information processing needs and teaching the correct application of sentiment analysis methods, tools and resources.

Social Network Analysis Module

This module introduces theories, concepts and measures of Social Network Analysis (SNA), aimed at characterizing the structure of large-scale Online Social Networks (OSNs). The course is based on lectures to introduce theoretical concepts and hands-on sessions.

Hands-on Courses and Tutorials

The following resources are composed by step-by step tutorials/courses.

Archive Crawling

A tutorial to extract event-centric document collections from large scale Web archives.

Archive Spark

An Apache Spark framework for easy data processing, extraction as well as derivation for archival collections. Originally developed for the use with Web archives, it has now been extended to support any archival dataset through Data Specifications.

Interactive Training Environments

A variety of data science materials based on R and Python. These training materials include an Rstudio docker image; a VirtualBox appliance that includes all the required R packages; Swirl courses for supporting teaching at the Department of Digital Humanities at KCL.

KCL Jupyter Notebooks

Complete stories around Jupyter Notebooks that form easy recipes for reproducible methods in social data science. This training material comprises an Apache Spark teaching and experimentation environment divided into five main topics: historical cultures; Prediction Modelling; Social and Cultural Communities; Social Sensing and Visual Arts.

Introduction to Data Science for Social Scientists

This course, initially aimed at social scientists covers topics such as Python programming, Data Cleaning and Transformation, Classification, Clustering and Frequent Pattern Mining.

Efficiency\Effectiveness Trade-offs in Learning to Rank

This tutorial provides an 'Introduction to Learning to Rank' and focuses on 'Dealing with the Efficiency/Effectiveness trade-off'. Moreover, it provides two different hands-on sessions.

Social Network Analysis with Python

This tutorial, based around Jupyter notebooks, provides an introduction to NetworkX, a focus on NDlib: Network Diffusion library; a focus on NDlib-REST: remote network diffusion experiments and a final focus on Community Discovery.