What is Machine Learning ?

Machine learning or ML as it is popularly known today is the study of computer algorithms that improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

What is Federated Learning ?

Federated learning is a new iteration of Machine learning with emphasis on data privacy and security. This is achieved by training an algorithm across multiple decentralized devices holding local data samples without the need to exchange them. 

It was first introduced by Google AI in 2017.

How does traditional machine learning work as compared to Federated learning ?

A Typical machine learning algorithm involves the following steps –

  1. Identification of the problem
  2. Data preparation for solving the problem
  3. ‘Training’ an ML algorithm on a centralised server or machine
  4. Sending trained model to client systems ( or providing an ML service that exposes the API )
  5. Commencement of result prediction on unseen data 

Once step 5 is done, it’s more about monitoring the predictions and making small course corrections.

All of this means – REAL TIME COMPUTATION in a CENTRALIZED SERVER.

Google maps, Instagram feeds etc suggestions improve as the data guzzling increases. All of this data is private, sensitive data of users. And this is all stored in centralised servers which possess a grave security risk. All Traditional ML applications work on a very simple logic, i.e –  the more data you feed it, the more accurate it gets, the better and more personalised results it returns. If not built by training on large user data, these often result in poor and personalised results. This leads to less adaptability of the new applications by the user community.

Incomes Federated Learning to the rescue – 

  1. Federated learning solutions start by training a generic machine learning model in a centrally located server, this model is not personalized but acts as a baseline to start with.
  2. The server now sends this model to user devices  also known as clients 
  3. As client systems generate data, local models learn and get better with time.
  4. Periodically the clients now send their learnings to the server without exposing the critical data. This is achieved via encryption.
  5. The server now factors in the delta learned by the clients and recomputes the model again and shares the model with clients.
  6. The iteration now continues and with each passing iteration, the prediction only improves.

Difference between Machine learning and federated learning ?

Machine learning involves a data pipeline that uses a central server which hosts the trained model to make predictions. The data collection by local devices are sent back to the server for processing and subsequently returned back to devices. This round-trip limits the models ability to learn real-time,
Whereas, In Federated learning,  downloads the current model and computes an updated model at the device itself using local data. These locally trained models are then sent from the devices back to the central server where they are aggregated, i.e. averaging weights, and then a single consolidated and improved global model is sent back to the devices.

Core challengers in Federated learning : 

  1. Expensive communication : Communication is a critical bottleneck in federated networks, which, coupled with privacy concerns over sending raw data, necessitates that data generated on each device remain local.
  2. Systems Heterogeneity : The storage, computational, and communication capabilities of each device in federated networks may differ due to variability in hardware (CPU, memory), network connectivity (3G, 4G, 5G, wifi), and power (battery level). 
  3. Statistical Heterogeneity : The storage, computational, and communication capabilities of each device in federated networks may differ due to variability in hardware (CPU, memory), network connectivity (3G, 4G, 5G, wifi), and power (battery level). 
  4. Privacy Concerns : communicating model updates throughout the training process can nonetheless reveal sensitive information, either to a third-party, or to the central server.