Course: Computer Science Level 3 - FAIR Data Management

Topic outline

WELCOME MESSAGE
WELCOME MESSAGE
Welcome to the 3rd Series of Computer Science Courses.
My name is Professor Francisca Oladipo .....................
- Announcements Forum
STUDY UNIT 1: STATISTICAL THINKING
STUDY UNIT 1: STATISTICAL THINKING
Statistical thinking can align one’s thoughts with the fundamental principle of statistics to make better decisions under uncertainty. In other words, understanding statistics is important for anyone that wants to make a good decision since it is applicable in every field of human activity.
With the understanding of basic statistical methods, you will know when to apply the right tool to a given problem and think statistically. This course will help you understand those statistical concepts and apply them to solve a life problem.
- CS3 Study Unit 1 ebook or Readable Module - Statistical Thinking File
  Click on this link to access the ebook or readable module on Study Unit 1: Statistical Thinking
  Upon completion of this study unit, you should be able to:
  
  1.1 Employ the use of population data, sample data, parameter and statistical analysis of data
  
  1.2 Use measure of central tendency to compute statistical analysis from an observational study
  
  1.3 Use measure of spread and partition to compute statistic from an observational study
  
  1.4 Categorize data according to a scale of measurement and use Pearson correlation to estimate correlation among variables in the data
- CS3 Study Unit 1 Lesson 1 - Statistical Thinking
  In this Lesson, the subtopics covered includes terminologies, acronyms and their meaning and population data versus sample data
  After you have read the subsection 1.1 on the Study Unit 1 ebook/readable module, watch this video for the summary of the data science
- CS3 Study Unit1 Lesson 2 - measurement of central tendency
  This lesson covers the measurement of central tendency to compute statistical analysis. Then the process of determining mean, median, and mode in Python will also be discussed
  After you might have read section 1.2 in your ebook readable module, watch this summary video on what measurement of central tendency is all about
- CS3 Study Unit 1 Lesson 3 - Measure of Spread and Partition
  This lesson covers the definition of spread and partition. Then the process of getting range, variance, standard deviation, quartile interquartile, and measurement in Python will also be discussed
  After you might have read section 1.3 in your ebook readable module, watch this summary video on the measure of spread and partition in data science is all about.
- CS3 Study Unit 1 Discussion Room - Peer to Peer Forum
  Click on this link to join online Peer to Peer discussion
  NB: Participation is mandatory for all learners
- ASSIGNMENT 1 CS 3
  Kindly attempt the question before the due date. Thank you
- CS3 Study Unit 1 Lesson 4: scale of measurement and use Pearson correlation to estimate correlation among variables in the data
  Lesson 4 is designed to cover the Nominal Scale, Ordinal Scale, Interval Scale, and Ratio Scale of measurements and the use of Pearson correlation to estimate correlation analysis among the variables in the data. . This lesson also covers the basic way of assessing correlation analysis using python.
  After you have read section 1.4 in the readable module, watch this video summarizing measurement scale and computation of Pearson correlation analysis.
  .
STUDY UNIT 2: MACHINE LEARNING
STUDY UNIT 2: MACHINE LEARNING
Over the past years, we all wondered whether a computer might be made to learn and improve with experience - the impact would be dramatic. Imagine a world where a computer could be made to learn about the treatments that are most effective for new diseases from the medical records or a piece of knowledge about a client that can default a loan when given. The study unit focus on the development of models that can learn from data for analysts to derive useful information.
- CS3 Study Unit 2 ebook or Readable Module - Theoretical Machine Learning File
  
  Click on this link to access the ebook or readable module on Study Unit 2: Theoretical Machine Learning
  Upon completion of this study unit, you should be able to:
  
  2.1 Explain the Concept of Machine learning and identify the differences between Machine learning, artificial intelligence, and deep learning.
  
  2.2 Identify different types of machine learning methods, step involved in using the method and classification of problem solved
  
  2.3 Discuss the concept of Confusion Matrix and other performance evaluation metrics not in Confusion Matrix
- Study Unit 2 Lesson 2 - Overview of Machine learning
  In this Lesson, the subtopics covered includes terminologies, acronyms, the definition of machine learning, differences between machine learning, artificial intelligence, and deep learning.
  After you have read lesson 2 in the Study Unit 2 ebook/readable module, watch this video for a summary of the overview of machine learning.
- Study Unit 2 lesson 2- Types of Machine Learning
  In this Lesson, the subtopics covered include Supervised and unsupervised learning, classification of problems solved by supervised and unsupervised machine learning.
  After you have read lesson 2 in the Study Unit 2 ebook/readable module, watch this video for the summary on different types of machine learning methods, step involved in using the method and classification of the problem solved
- Study Unit 2 Lession 3- Evaluation Metrics for Classification Models Lesson
  In this Lesson, the subtopics covered include the concept of Confusion Matrix and other performance evaluation metrics not in Confusion Matrix
  After you have read lesson 3 in the Study Unit 2 ebook/readable module, watch this video for the summary on concept of confusion matrix and other performance evaluation metrics, not in the Confusion Matrix
STUDY UNIT 3: TRAINING AND TESTING MACHINE LEARNING MODELS
STUDY UNIT 3: TRAINING AND TESTING MACHINE LEARNING MODELS
Scikit-learn is a library in Python that provides much supervised learning and unsupervised algorithms. It is built upon some of the packages you already familiar with, like NumPy, Pandas, and Matplotlib. With Scikit-learn module, you can train different machine learning models such as regression and classification and check their performance using any of the metrics discussed in unit 2.
- CS3 Study Unit 3 ebook - Training and Testing Machine Learning Models File 2.7MB PDF document
  Click on the link above to access the ebook/readable module on Study Unit 3
  Upon completion of this study unit, you should be able to:
  3.1 Train and evaluate classification models for predicting unknown categorical label and Solving problem using exploratory data analysis techniques.
  3.2 Train and evaluate Logistic regression models for predicting unknown continuous label
  3.3 Train and evaluate Logistic regression models for predicting unknown continuous label and how to select best model among the trained models
  3.4 Compete on Kaggle for machine learning and data science competition
- Study Unit 3 lesson 1- Introduction to machine learning module: The Scikit-learn
  In this Lesson, the subtopics covered the functionality that scikit-learn provides which include: regression, classification, clustering, model selection preprocessing and installation.
  After you have read lesson 1 in the Study Unit 3 ebook/readable module, watch this video for the summary of the machine learning module on classification, use of the module for exploratory of data analysis and data preprocessing
- Study Unit 3 lesson 2- Model Training and Evaluation
  In this Lesson, the subtopics covered model training and evaluation metrics like Logistic regression, Random Forest Model, Extreme Gradient Boost (XGBoost) Model, Support Vector Machine (SVM), and other related models.
  After you have read lesson 2 in the Study Unit 3 ebook/readable module, watch this video for the summary of model training and evaluation
- Study Unit 3 lesson3- Regression Machine Learning Model
  In this Lesson, the subtopics covered Regression Machine Learning Model with Kenya restaurant dataset
  After you have read lesson 3 in the Study Unit 3 ebook/readable module, watch this video for the summary on how models Import Python modules, building of model, and train on the use of the model for evaluation
- Study Unit 3 lesson 4- Machine Learning Competition Platform
  In this Lesson, the subtopics covered machine learning competition platform. focusing on Kaggle, Titanic Competition, and Other Machine Learning Competition Platforms
  After you have read lesson 4 in the Study Unit 3 ebook/readable module, watch this video for the summary of the machine learning competition platform
STUDY UNIT 4 - INTRODUCTION TO REGULATORY FRAMEWORK & FAIR DATA
STUDY UNIT 4 - INTRODUCTION TO REGULATORY FRAMEWORK & FAIR DATA
The emergence of the Internet as a global telecommunications network has had a huge impact on how we view and apply data protection and regulations. Before the massive expansion of the Internet, data was a minority interest that did not generate significant global interest. However, over the past decades, the use of and processes for data evolved significantly — both in terms of technology and use cases. Data is now considered the raw material for digital transformation. Thus, there is a need for a form of regulation to avoid chaos and misuse.

This Study Unit will provide you with an understanding of what a regulatory framework is and what it is used for. You will learn about general data protection principles including your country's data regulations. Likewise, you will get to know why we need FAIR data policies and its benefits to your country. Finally, the basics of a FAIR policies will be explored.
- Study Unit 4 Notes - Introduction to Regulatory Framework File 1.1MB PDF document
  Upon completion of this study unit, you should be able to:
  4.1 Discuss and explain the legal and regulatory frameworks and the key concept of Data regulation framework.
  4.2 Summarize Data Protection law and explain what GDPR is and its principles.
  4.3 Describe the FAIR Data Policies.
  4.4 Define Data Protection Laws in specified countries.
  4.5 Explain the FAIR compliance with Data.
  4.6 Discuss the Privacy and Data protection laws (in this project implementation countries)
STUDY UNIT 5 - FAIR DATA MANAGEMENT
STUDY UNIT 5 - FAIR DATA MANAGEMENT
This unit focus on FAIR Data Management and its core principles. The requirements for a good data management as well as the platform for creating FAIR data is covered.

You will learn what kind of questions you need to refer to make a good Data Management Plan and which tools you might use for creating a FAIR Data Management Plan? Along with that, you will have the practice of creating a FAIR Data Management plan yourself.
- Study Unit 5 notes - FAIR Data Management File 1.6MB PDF document
  
  Upon completion of this study unit, you should be able to:
  
       5.1 Enumerate the importance of managing research data
  
       5.2 Discuss Data Management and its importance
  
       5.3 Demonstrate Data Life Cycle components and what Data Management entails by looking at it.
  
  5.4 Describe FAIR Data Principles, purposes of their usage and identification of elements that help make data FAIR
  
  5.5    Construct data management plan sequentially or step by step.
  
  5.6    Explain the principles for FAIR Data Management plan, its requirements and compatibility with the FAIR Data Principles.
  
  5.7   Demonstrate ability write and use online platforms for creating FAIR Data Management Plan
  
  5.8 Understand the added value of making data management plans in research projects.
STUDY UNIT 6 - SEMANTIC DATA
STUDY UNIT 6 - SEMANTIC DATA
This Study Unit covers the basic concepts of semantic web, linked data, the semantic web stack and technologies like SKOS, RDF, OWL and SPARQL. It also explains sematic modelling and compare it with other data models.
Other topics covered includes how to use eCRF and CEDAR to create and explore metadata and how to use them as FAIR tool.
- Study Unit 6 notes - Semantic Data File 4MB PDF document
  Upon completion of this study unit, you should be able to:
  
  6.1     Describe the Semantic Web, it’s goals and benefit
  
  6.2     Explain Semantic Web basic building blocks such as RDF, SKOS, OWL etc.
  
  6.3     Describe the concept of structure of Linked Data
  
  6.4     Explain the concept of Semantic Modelling, Ontology and data models.
  
  6.5     Use the eCRF Wizard to create and explore metadata
  6.6 Use the CEDAR workbench to create and explore metadata.
STUDY UNIT 7 - FAIR DATA POINT (FDP) INSTALLATION
STUDY UNIT 7 - FAIR DATA POINT (FDP) INSTALLATION

This Study Unit will show you how to deploy FDP locally through designing a Semantic Data Model and publishing it to FDP. The objective of this module is to illustrate how a non - FAIR can be assigned machine-readable metadata to enable them to be discoverable by individuals and machines.
- Study Unit 7 notes - FDP Installation File 2.5MB PDF document
  
  Upon completion of this study unit, you should be able to:
  
          7.1 Install Docker
  
          7.2 Install FAIR Data Point
  
          7.3 Install Open Refine
- Unit 7 Lessons - FDP Installation
  In this lesson, you will be provided with synthetic data in order to learn how to run FDP-s locally, how to install docker, create metadata, catalogue, datasets and distribution.
  The objectives of this unit are
  
  to illustrate how a non - FAIR can be assigned machine-readable metadata, to enable them to be discoverable by individuals and machines.
  
  To demonstrate how FDP-s are configured and what are: metadata, catalogue, datasets and distribution.
  
  Watch these videos and read the notes provided
  
  Video 1: FAIR Data Points
  
  Video 2
  
  Video 3 - Publishing FAIR Data via Open Refine
STUDY UNIT 8 - FAIR DATA FOR HEALTH
STUDY UNIT 8 - FAIR DATA FOR HEALTH
Data-driven technologies are changing business, our daily lives, and the way we conduct research more than ever. In recent years, more and more data have been generated in the healthcare ecosystem. The data contain potential knowledge to transform health care delivery and life sciences. Advanced analytics could potentially power the data collected from numerous sources to improve prevention, diagnosis and treatment of diseases, as well as supporting individuals and societies to maintain their health and well-being. The era of exponential growth of data has also witnessed the increase of risk involved in sharing them

This Study Unit teach the importance of FAIR Data Principles in Healthcare research. How FAIR Data Principles can facilitate knowledge discovery from health data. How linked health data drives research, better use and learning from data, and further contributions to patient care.
- Study Unit 8 notes - FAIR data for Health File 2.5MB PDF document
  Upon completion of this study unit, you should be able to:
  
  8.1       Describe FAIR Data Points (FDP), its roles, components and benefits to research
  
  8.2       Explain the role and tasks of clinical researcher in relation to FAIR Data
  
  8.3       Describe the Personal Health Train (PHT)
  8.4       Explain the components of PHT in relation to FAIR