Towards a Distributed Federated Learning Aggregation Placement using Particle Swarm Intelligence

Amir Ali-Pour
laya Samizadeh ÉTS Montréal / Université du Québec
amir.ali-pour@etsmtl.ca ÉTS Montréal / Université du Québec
laya.samizadeh.1@ens.etsmtl.ca Sadra Bekrani
Julien Gascon-Samson Islamic Azad University of Bojnord
sadra.bekrani@iau.ir ÉTS Montréal / Université du Québec
julien.gascon-samson@etsmtl.ca

Abstract

Federated learning has become a promising distributed learning concept with extra insurance on data privacy. Extensive studies on various models of Federated learning have been done since the coinage of its term. One of the important derivatives of federated learning is hierarchical semi-decentralized federated learning, which distributes the load of the aggregation task over multiple nodes and parallelizes the aggregation workload at the breadth of each level of the hierarchy. Various methods have also been proposed to perform inter-cluster and intra-cluster aggregation optimally. Most of the solutions nonetheless require monitoring the nodes’ performance and resource consumption at each round, which necessitates frequently exchanging systematic data. To optimally perform distributed aggregation in SDFL with minimal reliance on systematic data, we propose Flag-Swap, a Particle Swarm Optimization (PSO) method that optimizes the aggregation placement according only to the processing delay. Our simulation results show that PSO-based placement can find the optimal placement relatively fast, even in scenarios with many clients as candidates for aggregation. Our real-world docker-based implementation of Flag-Swap over the recently emerged FL framework shows superior performance compared to black-box-based deterministic placement strategies, with about $43\%$ minutes faster than random placement, and $32\%$ minutes faster than uniform placement, in terms of total processing time.

Index Terms:

Distributed Systems, Federated Learning, Aggregation, Task Placement, Swarm Intelligence, Black-box Optimization

I Introduction

Federated Learning (FL) has emerged as a revolutionary approach to distributed machine learning within Internet of Things (IoT) ecosystems [1, 2]. With the rapid expansion of IoT devices, vast amounts of decentralized data are being generated at the network edge, posing significant challenges for traditional centralized learning methods. These conventional approaches require transferring data to central servers, leading to high bandwidth costs, increased latency, and serious privacy concerns. In contrast, FL facilitates collaborative model training directly on edge devices, allowing them to contribute to a shared global model without transmitting raw data [3]. This capability is particularly beneficial for IoT environments, where efficient bandwidth utilization, enhanced privacy, and real-time responsiveness are crucial. By minimizing data transmission, preserving data privacy, and optimizing edge computational resources, FL effectively overcomes key limitations of centralized learning [4, 5].

The key part of FL ecosystems is the aggregators, which are nodes that accumulate model parameters or their gradients from the individual nodes and accumulate them using various aggregation methods. The aggregation yields a new set of model parameter values that speculatively represent the learned features from all the contributing nodes’ data. Various FL schematics exist, which depend heavily on the underlying network topology. There are three main categories: 1) Central FL (CFL) is the conventional FL model which is based on the client/server communication model, and follows a star topology, wherein one central unit (i.e., parameter server or aggregation server) is responsible for performing the global model update, and thus all the contributing clients would send their model parameters to that central unit. 2) Fully Decentralized FL (DFL) is a model that follows a P2P communication method, and no central unit is dedicated to aggregation. Instead, model parameters are aggregated after each hop at the destination client machine. 3) Semi-Decentralized FL (SDFL) is a hybrid model between the CFL and DFL, wherein the aggregation load is spread down onto multiple machines, and the aggregator machines either synchronously or asynchronously deliver the aggregation with mutual agreement on the global model updates. This FL model promises parallelism while avoiding a single point of failure, given that the aggregation is distributed across multiple nodes, thus the system is more resilient to node failures or connectivity issues. One of the known SDFL models is the Hierarchical SDFL, in which the aggregation is spread not only specially at the breadth of each hierarchy level but also temporally between each hierarchy level [24]. On top of the advantages of SDFL, Hierarchical SDFL promises scalability, reduced computation bottleneck, and better adaptation to system constraints. From hereon we refer to Hierarchical SDFL as SDFL.

One of the key challenges in SDFL is to find a set of suitable machines as aggregators. The criteria behind choosing an aggregator machine can be bound to several parameters, including key systematic parameters such as the availability of the machine, its computation resources, and communication bandwidth. Several methods have been proposed [8] that use different sets of parameters to create the criterion for developing or optimizing the search for a suitable aggregation site. Nonetheless, most of these methods require the contributing clients to inform the coordinator of their internal performance, which could impose challenges such as network congestion if such data is requested frequently, or violate the privacy of the contributing clients. In contrast to such methods, task placement methods that follow black-box system optimization exist which have seldom been practiced for SDFL. Given that, one can incorporate such an optimization, and in turn, guarantee an optimal placement of aggregation while avoiding transmission and additional processing of the client machines’ internal performance for a supervised optimization. In this paper, we set the goal to investigate the efficacy of using such optimizers. Specifically, we propose using the particle swarm optimization (PSO) method to progressively improve the placement of aggregation. We show that we can improve the placement of aggregation with PSO with regard only to the global processing delay at each FL round. We also demonstrate that PSO imposes marginal computational complexity, given if a suitable FL framework is used that supports Hierarchical FL implementation, making the optimizer a suitable candidate for constrained systems at the edge. Following is the list of contributions we deliver:

•

A black-box PSO-based aggregation placement for SDFL
•

Evaluation of the efficacy of the optimizer in various simulated SDFL scenarios with different numbers of clients and varied depth and width in the hierarchy model.
•

Evaluation of the efficacy of the optimizer in a real SDFL ecosystem based on MQTT communication deployed on docker containers.
•

Comparison with random placement and uniform placement based on round-robin algorithm.

The rest of the paper is as follows: Section II presents the motivation behind employing a black-box optimizer in an SDFL based on the Publish/Subscribe communication model. Section III explains the key features and the mechanism of the proposed optimizer for aggregation placement. Section IV describes the experimental setup and the experimental results using both simulation and real-world deployment. Section V is a discussion of the related works. Section VI concludes the paper.

II Motivation

A commonplace communication model for SDFL would be the Client/Server model similar to CFL [7]. While this model is effective for systems with substantial computational resources and stable network connections, it is not well-suited for environments with resource-constrained devices, such as those found in IoT networks. In such scenarios, dynamic role management, where devices alternate as aggregators to mitigate overload and device exhaustion, becomes essential. Implementing this in a client/server architecture would require complex mechanisms for dynamic role assignment. Alternatively, a fully decentralized peer-to-peer (P2P) approach can ensure that aggregation roles are distributed effectively, though it incurs a training time overhead due to sequential communication.

Lately, a proposition was made to use the Publish/Subscribe communication model instead of Client/Server [21]. They integrate such a service, which only requires a broker at the edge to disseminate the model updates, while the FL-specific roles are delegated to the devices that need the ML services. Therefore, at the edge, role association would be as general as just a message disseminator which does not need any adaptation to the FL process. For instance, if an MQTT broker is running as a service on an edge server, we can connect to that and establish the FL roles among the devices connected to the broker. This would in turn help set up the framework faster and with reduced cost of installment.

Refer to caption — Figure 1: Overview of Parameter sharing for aggregation using Pub/Sub communication in a Clustered Semi-Decentralized Federated Learning Topology.

SDFL over MQTT is a promising practice, that provides simplified orchestration, avoids single point of failure, and increases redundancy. Role association and role management in SDFL over MQTT as described in [21] can be managed relatively easily compared to SDFL implementations using other FL frameworks. This is because in SDFLMQ, FL roles are associated to topics. Following that, candidates for each role can choose to subscribe to their role’s topic, and clients that want to communicate to a node with a specific role, can publish to that role’s topic. The simplicity of role management in this SDFL model helps save time and energy in changing FL’s actor roles during the FL process. Additionally, it opens more room to develop more sophisticated optimization algorithms.

Regarding load balancing and task scheduling, numerous techniques can be used to solve this problem. However, in the context of SDFLMQ as described in [21], one can notice that there is anonymity in the contribution of clients to the FL process. Meaning that clients do not share any information about their internal status to register their candidacy for aggregation with the coordinator. This anonymity in turn enables further expandability and upholds clients’ data privacy. Nonetheless, as mentioned earlier, most of the load-balancing techniques for SDFL need to process clients’ systematic data to choose suitable sites for aggregation. To be able to perform aggregation placement without requiring such data, one can think of incorporating black-box-based optimization techniques. These techniques can perform optimization with only some macro measurements of the entire system such as the total processing delay, or total energy consumption. Solutions that fall into black-box optimization could be evolution strategies, Bayesian optimization, ant colony optimization, genetic algorithm (GA), swarm intelligence, reinforcement learning, etc [22, 23, 25].

While most of these algorithms are potentially applicable to solving the aggregation placement in SDFL, PSO can be found the most potential, mainly due to its convergence speed. Several studies compared PSO to other algorithms such as GA, and concluded that PSO in turn has better performance and convergence whereas GA yields premature convergence [23]. Given that we aim to optimize the aggregation placement with regards to total processing delay, better performance in the optimizer algorithm of course can lead to better placement which in SDFL would lead to lowered total processing delay. Fast convergence also means that we would go through fewer trials until we reach a status where all suggestions (i.e., particles in PSO) lead to a local/global best placement. Given that, it is justified to implement a placement optimizer in SDFL using PSO. In the following, we explain our aggregation placement optimizer based on PSO for SDFL over MQTT.

III Proposed Method

In our black‐box PSO approach, clients do not share their internal performance metrics. The coordinator records the processing time of each round and computes the processing delay by subtracting the round’s start time from the round’s ending time. This in turn elevates the necessity of each client informing the coordinator of the internal performance or processing delay, thus significantly reducing the communication load while preserving the privacy of each client. The core objective of our method is to progressively minimize the total processing delay (TPD) of the FL rounds through the PSO optimization loop. Fig. 2 shows the general overview of agg placement in SDFL using PSO.

To achieve optimal placement, we update the clients’ roles by efficiently arranging them as either trainers or aggregators before the beginning of each round. By leveraging the global search capabilities of PSO, the method explores a vast solution space of possible client arrangements and identifies configurations that lead to reduced latency, critical for scalability and real-time performance. Thus, at each round, after computing the processing delay of the previous round, PSO suggests a new arrangement according to its particles. The PSO particles are also updated after each PSO fitness round according to the local and global particle fitness values.

III-A Particle Swarm Optimization for Client Placement

We employ PSO to optimize the assignment of clients to aggregator roles within the hierarchy. In this formulation:

•

Particle Representation: Each particle represents a potential arrangement solution. Each element in the vector is a client ID assigned to an aggregator slot.
•

Swarm: A population of $P$ particles explores the solution space.
•

Velocity: Each particle has a velocity vector that dictates how its position changes in each iteration.

III-B Fitness Function

The quality of a client arrangement is evaluated using a fitness function based on the Total Processing Delay (TPD). The fitness $f$ of an arrangement is:

f=-T

(1)

where $T$ is the TPD of the corresponding FL round. By maximizing $f$ , we effectively minimize $T$ . This formulation captures the bottleneck effect at each hierarchy level, ensuring that the arrangement balances the computational load across the hierarchy.

III-C Optimization Loop

The optimization loop in PSO for aggregation placement in SDFL is the following:

•

A swarm of $N$ particles is initialized (e.g., $N=10$ ).
•

The initial position of each particle is a random permutation of client IDs assigned to aggregator roles.
•

Initial velocities are set to zero.
•

The personal best position of each particle is its initial position, and the global best position is the position yielding the highest initial fitness.

The optimization loop steps are the following:

Velocity Update:

v_{i}^{t+1}=w\cdot v_{i}^{t}+c_{1}\cdot r_{1}\cdot(p_{i}-x_{i}^{t})+c_{2}\cdot r% _{2}\cdot(g-x_{i}^{t})

(2)

where:

•

$v_{i}^{t}$ : Velocity vector of particle $i$ at iteration $t$ .
•

$x_{i}^{t}$ : Position of particle $i$ at iteration $t$ .
•

$p_{i}$ : Personal best position of particle $i$ .
•

$g$ : Global best position.
•

$w$ : Inertia weight (e.g., 0.01).
•

$c_{1}$ : Cognitive coefficient (e.g., 0.01).
•

$c_{2}$ : Social coefficient (e.g., 1).
•

$r_{1},r_{2}$ : Random numbers in [0, 1].

Velocity components are clamped to the interval $[-V_{\max},V_{\max}]$ , where :

V_{\max}=\max\left(1,\,D\times\textit{velocity\_factor}\right)

(3)

and $D$ is the number of dimensions in the search space. For example, a typical value is $\text{velocity\_factor}=0.1$ .

Position Update: The new position is computed as:

x_{i}^{t+1}=(x_{i}^{t}+v_{i}^{t+1})\mathbin{\%}\textit{client\_count}

(4)

Duplicates are resolved by incrementing until a unique client ID is found.

3.
Hierarchy Rearrangement: After updating a particle’s position:
- •
  
  Clients are reassigned aggregator roles based on the updated particles.
- •
  
  Remaining clients are assigned trainer roles from a buffer of available labels.
4.

Iteration and Convergence: The algorithm iterates for $M$ steps, updating personal and global bests when better fitness values are found. This usually happens when the TPD value is converged to a minimum value. The final global best position represents the optimal client placement. Algorithm 1 shows the iterative process of swarm optimization.

Algorithm 1 PSO Algorithm for SDFL

Inputs:

DEPTH

WIDTH

pop_{n}

max\_iter

iw

c1

c2

velocity\_factor

Initialization:

Generate hierarchy with aggregators and trainers

Create

pop_{n}

particles with positions (client assignments)

Compute initial fitness for each particle

Main Loop:

for

iteration\leftarrow 1

max\_iter

for each particle

p

Update velocity using

iw

c1

c2

Update position based on velocity

Rebuild hierarchy with new assignments

Compute new fitness

if new fitness better than

pbest

then

Update

pbest

if new fitness better than

gbest

then

Update

gbest

Processing_Fitness Function:

Traverse hierarchy bottom-up

Compute memory consumption and delays per level

Sum maximum delays across levels

Return fitness, total delay

IV Experimental Setup & Results

IV-A Simulation Model

We model the FL system as a hierarchical tree with a depth $D$ and a width $W$ . The hierarchy comprises clients with two distinct roles:

•

Aggregators (Agtrainers): Nodes responsible for aggregating model updates from their child clients. Each aggregator maintains a processing buffer containing its children, which can be either trainers (for layer $D-1$ ) or other aggregators.
•

Trainers: Leaf nodes that perform local model training and send updates to their parent aggregators.

Each client $c_{i}$ is defined by the following attributes:

•

Memory capacity $\textbf{memcap}_{i}$ : The memory capacity of the client.
•

Model data size $\textbf{mdatasize}_{i}$ : The size of the model data processed by the client (fixed at 5 units in this study).
•

Processing speed $\textbf{pspeed}_{i}$ : The computational speed of the client, randomly assigned between 5 and 15 units.
•

Client ID $\textbf{client\_id}_{i}$ : A unique identifier for the client.

The hierarchy is constructed recursively starting from a root aggregator at level 0. For each level $l$ (where $0\leq l<D-1$ ), an aggregator has $W$ child aggregators at level $l+1$ . At the leaf level ( $l=D-1$ ), each aggregator is assigned several trainers (e.g., 2 in our simulation model). The total number of aggregator positions, or dimensions, is computed as:

\textit{dimensions}=\sum_{i=0}^{D-1}W^{i}

(5)

This represents the number of slots in the hierarchy where clients can be assigned as aggregators. The fitness function $f$ is implemented as the following: We first use Breadth-First Traversal (BFT) to organize the hierarchy into levels, starting from root. Then, we calculate the TPD by processing these levels from the bottom (leaf nodes) to the top (root). For each level, we determine the maximum cluster delay among all aggregators, and the TPD is the sum of these maximum delays across all levels. For an aggregator $a$ , the cluster delay $d_{a}$ is defined as:

d_{a}=\frac{\textit{mdatasize}_{a}+\sum_{c\in\textit{children}(a)}\textit{% mdatasize}_{c}}{\textit{pspeed}_{a}}

(6)

where $\text{children}(a)$ denotes the set of clients in $a$ ’s processing buffer. The total processing delay (TPD) $T$ is:

T=\sum_{\textit{levels}}\max_{a\in\textit{level}}d_{a}

(7)

IV-B Simulation setup & results

A simulation was implemented featuring an SDFL system with a hierarchical structure of depth $N\in\{3,4,5\}$ and width $M\in\{4,5\}$ , constructed via breadth-first traversal to ensure balanced role distribution. Clients within this hierarchy are categorized as either aggregators or trainers. Each simulated client node has a processing buffer that is used to keep their child nodes within an array, and if those child nodes are also aggregators, they maintain their non-empty processing buffers. Trainer nodes also have processing buffers, which remain empty. Trainers retain these buffers because their role might change later, potentially transitioning into an aggregator position. Each client is assigned random attributes, including memory capacity $10<m<50$ , processing speed $5<ps<15$ units, and a uniform model data size fixed at $5$ . The PSO-based role assignment changes the position of simulated client nodes in the hierarchy which in turn affects the TPD. Note that the Total Processing Delay (TPD) is calculated as the sum of the maximum cluster delays in each level of the hierarchical structure. The role adjustments lead to minimizing the TPD across the system.

Optimization of client role assignments is achieved through PSO utilizing a swarm of $P\in\{5,10\}$ particles, each representing a potential configuration of the hierarchical structure. Note that each particle indicates the position of the aggregator clients. Trainer clients will be assigned randomly as the terminal node to the aggregators. The PSO algorithm is configured with an inertia weight of $0.01$ to favor exploitation, a cognitive coefficient (c1) of $0.01$ for stability with the small swarm size, and a social coefficient (c2) of $1$ to emphasize the influence of the global best solution. It iterates for $100$ generations, with a velocity factor of $0.1$ .

Results of the aggregation placement using PSO in simulated SDFL are shown in Fig. 3. Each plot shows the normalized TPD with respect to PSO iterations. Grey curves show the processing delay per PSO particle, and the red, green, and orange curves show the worst, best, and average processing delay at each iteration step, respectively. The key observation her is the convergence of TPD. As expected, PSO particles manage to lead the TPD to a minimum value, up to a point where all the particles suggest the same placement which results in the global minimum TPD. The convergence of all particles to one placement is needed, since at each FL round when a particle is given for a new placement, it is not assured if the particle will lead to a new minimum TPD. The only way is to test the particle and calculate the TPD after the global model is yielded for that round. Once the particles converge, we can ensure that the optimizer has searched the potential placements in the search space while heuristically progressing toward minimizing the TPD.

Moreover, we can also see that PSO adapts well to the increasing number of clients, even though knowing that the dimensionality of the particles in cases with large numbers of clients would be high. We can see this by comparing Fig. 3 (a) with Fig. 3 (b) and Fig. 3 (c), and Fig. 3 (d) with Fig. 3 (e) and Fig. 3 (f). The last observation is the effect of increasing the number of particles. We can see that a larger number of particles can potentially result in finding a better placement leading to an even lower TPD value. This can be seen in comparing the results in Fig. 3 (a) with Fig. 3 (d), or Fig. 3 (b) with Fig. 3 (e), or Fig. 3 (c) with Fig. 3(f).

IV-C Docker-based setup & results

To evaluate the applicability of PSO and it’s potential use in real systems, we integrated our implementation into the SDFLMQ framework’s code which is publicly available at [16], and compared the performance of our method with the builtin placement strategies including random placement and uniform round-robin-based placement. We created one scenario, including 10 docker-container clients, with one client having $2Gb$ dedicated memory and $3$ dedicated cores, two clients with $1Gb$ dedicated memory, $1Gb$ capacity for memory swap, and $1$ core each, and seven clients with $64Mb$ dedicated memory, $2Gb$ capacity for memory swap, and $1$ dedicated core each. We gave a multi-layer perceptron model to each client, with $1.8$ million parameters, and about $30Mb$ of size in json format, which is the format used in SDFLMQ to write the model parameters in and transmit in-between SDFLMQ nodes. We run the scenario for $50$ rounds, and recorded the processing delay at each round, and the total processing delay after 50 rounds. Fig. 4 shows the processing delay, per round for the three placement strategies including random placement, uniform round-robin placement, and PSO-based placement. As can be seen, PSO-based placement was able to converge after the $10^{th}$ round. After the convergence, PSO-based placement shows between $20$ seconds to $30$ seconds faster processing time per round, compared to random-based and uniform-based placements. The total processing time in PSO-based placement also is significantly better, leading to around $30$ minutes faster than in random-based placement, and around $20$ minutes faster than in uniform-based placement.

Overall, the evaluation results presented here hint that PSO has the competency to be integrated in choosing the aggregation sites in semi-decentralized federated learning. Nonetheless, further developments need to be done and studies to be conducted to ensure PSO’s adaptation towards varying SDFL topologies and changing system characteristics.

V Related Work

There are various propositions made to use PSO in task scheduling and load balancing, both in the cloud and the Edge. Below are a few most related works regarding placement optimization at the edge.

One key aspect is computing offloading in Mobile Edge Computing (MEC). A study explored a Particle Swarm Optimization (PSO)-based task offloading strategy for 5G-enabled Industrial Internet of Things (IIoT) environments, optimizing energy efficiency and latency by distributing tasks among heterogeneous edge servers [17]. The PSO approach was compared with Genetic Algorithm (GA) and Simulated Annealing (SA), demonstrating its advantages in reducing latency and balancing energy consumption [17].

Cloud computing task scheduling is another critical area. Researchers proposed a hybrid PSO-Genetic Algorithm (PSO-PGA) incorporating a phagocytosis mechanism to expand the search space and avoid local optima in cloud task scheduling [18]. The phagocytosis mechanism, inspired by biological immune responses, allows weaker solutions to be engulfed and replaced by stronger ones, thereby maintaining diversity and preventing premature convergence. The study demonstrated improved completion times and convergence accuracy compared to traditional PSO and GA approaches [18].

Another study introduced a novel task scheduling approach in cloud computing using Dynamic Dispatch Queues (TSDQ) combined with hybrid meta-heuristic algorithms [19]. Two variations, one using Fuzzy Logic with PSO (FLPSO) and another integrating Simulated Annealing with PSO (SAPSO), were tested. The results indicated that FLPSO significantly reduced waiting time, queue length, makespan, and execution cost, beating other state-of-the-art scheduling strategies [19].

Furthermore, edge aggregation and server placement in SDFL have been explored to address device association and resource allocation challenges. A study formulated an edge aggregation optimization problem and converted it into a dynamic optimization problem based on training loss degradation [9]. It introduced a Trilateral Matching-based Association (TMA) approach for efficient device association and resource allocation, which employs the classic Hungarian algorithm to derive the ideal matching set. Additionally, a Tabu Search-based Placement (TSP) approach was proposed to optimize the placement of edge servers. The combination of TMA and TSP in an iterative manner improved device participation reliability and edge aggregation efficiency [9].

An adaptive PSO-based scheduling approach (AdPSO) was also proposed to optimize task execution in cloud computing [20]. This study introduced a new inertia weight strategy called Linearly Descending and Adaptive Inertia Weight (LDAIW) to improve the balance between local and global search. Experimental results showed that AdPSO achieved up to a 10 % improvement in makespan, a 12 % improvement in throughput, and a 60 % improvement in resource utilization compared to existing PSO-based scheduling strategies [20].

Overall, existing research provides various optimization techniques for task scheduling and offloading in edge and cloud environments. However, open challenges remain in balancing energy consumption, latency, and computational efficiency in SDFL systems, necessitating further exploration of hybrid meta-heuristic algorithms as black-box optimizers.

VI Conclusion

In this paper, we explored the usability of PSO as a black-box optimizer for aggregation placement in hierarchical semi-decentralized federated learning. We discussed that compared to other meta-heuristics, PSO shows faster and more accurate convergence. Our simulations and Docker-based implementations demonstrated that PSO efficiently optimizes client placement, reducing processing delay by balancing aggregation load across levels. We showed that PSO adapts well to large client numbers and outperforms random and uniform placement methods. Future work will explore adapting PSO for continuous system variations, adaptive particle sizes, and incorporating additional parameters into the fitness function. We will maintain PSO as a black-box solution and compare it with other meta-heuristic and learning-based approaches.

References

[1] Nguyen, Dinh C, Ding, Ming, Pathirana, Pubudu N, Seneviratne, Aruna, Li, Jun, Poor, H Vincent, ”Federated learning for internet of things: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 23, no. 3, pp. 1622–1658, 2021.
[2] Zhang, Tuo, Gao, Lei, He, Chaoyang, Zhang, Mi, Krishnamachari, Bhaskar, Avestimehr, A Salman, ”Federated learning for the internet of things: Applications, challenges, and opportunities,” IEEE Internet of Things Magazine, vol. 5, no. 1, pp. 24–29, 2022.
[3] Lim, Wei Yang Bryan, Luong, Nguyen Cong, Hoang, Dinh Thai, Jiao, Yutao, Liang, Ying-Chang, Yang, Qiang, Niyato, Dusit, Miao, Chunyan, ”Federated learning in mobile edge networks: A comprehensive survey,” IEEE communications surveys & tutorials, vol. 22, no. 3, pp. 2031–2063, 2020.
[4] Ji, Xiuzhao, Tian, Jie, Zhang, Haixia, Wu, Dalei, Li, Tiantian, ”Joint device selection and bandwidth allocation for cost-efficient federated learning in industrial internet of things,” IEEE Internet of Things Journal, vol. 10, no. 10, pp. 9148–9160, 2023.
[5] Guo, Yinghao, Zhao, Zichao, He, Ke, Lai, Shiwei, Xia, Junjuan, Fan, Lisheng, ”Efficient and flexible management for industrial internet of things: A federated learning approach,” Computer Networks, vol. 192, pp. 108122, 2021.
[6] Bonawitz, Keith, ”Towards federated learning at scale: Syste m design,” arXiv preprint arXiv:1902.01046, 2019.
[7] Beltrán, Enrique Tomás Martínez, Pérez, Mario Quiles, Sánchez, Pedro Miguel Sánchez, Bernal, Sergio López, Bovet, Gérôme, Pérez, Manuel Gil, Pérez, Gregorio Martínez, Celdrán, Alberto Huertas, ”Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges,” IEEE Communications Surveys & Tutorials, 2023.
[8] Luo, Siqi, Chen, Xu, Wu, Qiong, Zhou, Zhi, Yu, Shuai, ”HFEL: Joint edge association and resource allocation for cost-efficient hierarchical federated edge learning,” IEEE Transactions on Wireless Communications, vol. 19, no. 10, pp. 6535–6548, 2020.
[9] Xu, Bo, Zhao, Haitao, Cao, Haotong, Garg, Sahil, Kaddoum, Georges, Hassan, Mohammad Mehedi, ”Edge aggregation placement for semi-decentralized federated learning in Industrial Internet of Things,” Future Generation Computer Systems, vol. 150, pp. 160–170, 2024.
[10] Ziller, Alexander, Trask, Andrew, Lopardo, Antonio, Szymkow, Benjamin, Wagner, Bobby, Bluemke, Emma, Nounahon, Jean-Mickael, Passerat-Palmbach, Jonathan, Prakash, Kritika, Rose, Nick, others, ”Pysyft: A library for easy federated learning,” Federated Learning Systems: Towards Next-Generation AI, pp. 111–139, 2021.
[11] He, Chaoyang, Li, Songze, So, Jinhyun, Zeng, Xiao, Zhang, Mi, Wang, Hongyi, Wang, Xiaoyang, Vepakomma, Praneeth, Singh, Abhishek, Qiu, Hang, others, ”Fedml: A research library and benchmark for federated machine learning,” arXiv preprint arXiv:2007.13518, 2020.
[12] Beutel, Daniel J, Topal, Taner, Mathur, Akhil, Qiu, Xinchi, Fernandez-Marques, Javier, Gao, Yan, Sani, Lorenzo, Li, Kwing Hei, Parcollet, Titouan, de Gusmão, Pedro Porto Buarque, others, ”Flower: A friendly federated learning research framework,” arXiv preprint arXiv:2007.14390, 2020.
[13] Lin, Frank Po-Chen, Hosseinalipour, Seyyedali, Azam, Sheikh Shams, Brinton, Christopher G, Michelusi, Nicolo, ”Semi-decentralized federated learning with cooperative D2D local model aggregations,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 12, pp. 3851–3869, 2021.
[14] Beltrán, Enrique Tomás Martínez, Gómez, Ángel Luis Perales, Feng, Chao, Sánchez, Pedro Miguel Sánchez, Bernal, Sergio López, Bovet, Gérôme, Pérez, Manuel Gil, Pérez, Gregorio Martínez, Celdrán, Alberto Huertas, ”Fedstellar: A platform for decentralized federated learning,” Expert Systems with Applications, vol. 242, pp. 122861, 2024.
[15] Sun, Yuchang, Shao, Jiawei, Mao, Yuyi, Wang, Jessie Hui, Zhang, Jun, ”Semi-decentralized federated edge learning with data and device heterogeneity,” IEEE Transactions on Network and Service Management, vol. 20, no. 2, pp. 1487–1501, 2023.
[16] , ”SDFLMQ Python Source code,” , 2025.
[17] You, Qian, Tang, Bing, ”Efficient task offloading using particle swarm optimization algorithm in edge computing for industrial internet of things,” Journal of Cloud Computing, vol. 10, pp. 1–11, 2021.
[18] Fu, Xueliang, Sun, Yang, Wang, Haifang, Li, Honghui, ”Task scheduling of cloud computing based on hybrid particle swarm algorithm and genetic algorithm,” Cluster Computing, vol. 26, no. 5, pp. 2479–2488, 2023.
[19] Ben Alla, Hicham, Ben Alla, Said, Touhafi, Abdellah, Ezzati, Abdellah, ”A novel task scheduling approach based on dynamic queues and hybrid meta-heuristic algorithms for cloud computing environment,” Cluster Computing, vol. 21, no. 4, pp. 1797–1820, 2018.
[20] Nabi, Said, Ahmad, Masroor, Ibrahim, Muhammad, Hamam, Habib, ”AdPSO: adaptive PSO-based task scheduling approach for cloud computing,” Sensors, vol. 22, no. 3, pp. 920, 2022.
[21] Amir Ali-Pour, Julien Gascon-Samson, ”SDFLMQ: A Semi-Decentralized Federated Learning Framework over MQTT,” arXiv preprint, 2025.
[22] Meunier, Laurent, Rakotoarison, Herilalaina, Wong, Pak Kan, Roziere, Baptiste, Rapin, Jérémy, Teytaud, Olivier, Moreau, Antoine, Doerr, Carola, ”Black-box optimization revisited: Improving algorithm selection wizards through massive benchmarking,” IEEE Transactions on Evolutionary Computation, vol. 26, no. 3, pp. 490–500, 2021.
[23] Boveiri, Hamid Reza, Khayami, Raouf, ”On the performance of metaheuristics: A different perspective,” arXiv preprint arXiv:2001.08928, 2020.
[24] Liu, Lumin, Zhang, Jun, Song, SH, Letaief, Khaled B, ”Client-edge-cloud hierarchical federated learning,” , pp. 1–6, 2020.
[25] Auger, Anne, Hansen, Nikolaus, Perez Zerpa, Jorge M, Ros, Raymond, Schoenauer, Marc, ”Experimental Comparisons of Derivative Free Optimization Algorithms: (Invited Talk),” , pp. 3–15, 2009.