Based on reinForceMenT learning

FIND A SOLUTION AT Academic Writers Bay

38 2576-3180/20/$25.00 © 2020 IEEE IEEE Internet of Things Magazine • March 2020Motahareh Mobasheri, Yangwoo Kim, and Woongsup KimToward developing Fog decision Makingon The TransMission raTe oF various ioTdevices Based on reinForceMenT learningIntroductIonDuring the last few years, because of the emergence of theInternet of Things (IoT) and then some of its consequent trendssuch as smart cities, cloud-based infrastructures have becomeinefficient solutions due to their centralized computing model.Because of the continuously increasing number of IoT devices, besides managing them, cloud limitations such as latencyand network bandwidth require more attention, especially inthe case of emergencies such as car crashes in a smart city inwhich every millisecond is very important to prevent damage.Moreover, because of bandwidth limitations, it is impossible totransfer all of the IoT devices’ data to the cloud. Every moment,IoT devices produce a considerable amount of data that is noteconomically feasible to transfer to the cloud. These problemsresult in information loss and subsequent incorrect decisionmaking.These challenges have led to the need for a distributedintelligent platform at the edge of the network. Fog computing extends the cloud services to the network edge, bringingcomputation, communication, and storage closer to end userswith certain latency, mobility, bandwidth, security, and privacyconstraints. By using analytical algorithms in the fog layer, wecan preprocess IoT data and only send higher-level events tothe center (the cloud). Besides the above mentioned problems,sometimes, decisions about the IoT devices need to be made inreal time. By using the intelligence in the network edge, edgedevices can make decisions quicker with machine learningalgorithms.With the advent of fog computing, IoT management hasbecome a popular research area. In [1], a software definednetworking (SDN)-based approach was proposed for managing IoT by changing the existing structure and decoupling thecontrol plane from the data plane. In [2], the authors focusedon the relationship between the edge and the cloud, and proposed an approach to manage the requirements of low-latencyand bandwidth-intensive applications. In [3], the authors solvedthe problem of the video traffic volume generated by IoT-basedmultimedia devices by transmitting prioritized frames for avideo sequence and prioritized packets for an image, therebyensuring that the data transmission took less bandwidth. Thisapproach needs extra processing in the IoT devices for assigning priorities to data frames. To achieve efficiency in bandwidthand power management, in [4], the authors eliminated theredundant data gathered from different sensors. The goal of[5] was to improve the users’ quality of experience (QoE) byaddressing the issue of load balancing in fog computing. In [6],a node called a broker was introduced to perform the scheduling among the users and the fog and cloud nodes. The system described in [7] dynamically and automatically determinesthe processing tasks to be computed on a cloud server or itsdefined local gateway device.Although these papers focused on the communication ofend devices and fog nodes, they consider cloud nodes alongwith fog nodes. Moreover, they do not consider emergencysituations, and their approaches are not based on machinelearning techniques for the devices’ bandwidth allocation. Inthis study, we only focus on the fog node side and the corresponding IoT devices.Since one of the IoT challenges is bandwidth consumption,bandwidth management has become crucial in smart environments. As the number of devices increases, this challengebecomes more important [8]. While smart cities are gettingmore advanced, and are equipped with various IoT deviceswith exponential growth and different requirements, managingthem is not an easy problem. Moreover, because of varioussmart environments and their unique situations, it is not suitableto define rule tables manually and separately for each one’s IoTdevices and fog nodes’ operations.By using an appropriate algorithm, we can reduce humanerrors and eliminate time-consuming efforts for defining accurate rules, and set everything automatically and usable for allsmart environments without any assumption on the type ofnetworks and devices’ features. Reinforcement learning (RL),as a powerful machine learning approach, does not need anytrainer or supervisor for solving problems and is suitable for thementioned purposes. RL is a learning process for mapping visited states to available and feasible actions in order to maximizea received reward. The Q-learning algorithm is one of the mostcommon RL algorithms, learning optimal decisions by usingonly the rewards received from the environment. An RL taskthat satisfies the Markov property is called a Markov decisionprocess (MDP) [9]. Starting from the current state of the RLagent, Q-learning finds an optimal policy for any finite MDP,AbstrActIn recent years, the focus on reducing the delay and the cost of transferring data to the cloud has led to data processing near enddevices. Therefore, fog computing has emerged as a powerful complement to the cloud to handle the large data volume belonging to the Internet of Things (IoT) and the requirements of communications. Over time, because of the increasing number of IoTdevices, managing them by a fog node has become more complicated. The problem addressed in this study is the transmission rateof various IoT devices to a fog node in order to prevent delays in emergency cases. We formulate the decision making problem ofa fog node by using a reinforcement learning approach in a smart city as an example of a smart environment and then develop aQlearning algorithm to achieve efficient decisions for IoT transmission rates to the fog node. Although to the best of our knowledge,thus far, there has been no research with this objective, in this study two more approaches, random-based and greedy-based, aresimulated to show that our method performs considerably better (over 99.8 percent) than these algorithms.Digital Object Identifier: 10.1109/IOTM.0001.1900070IEEE Internet of Things Magazine • March 2020 39while maximizing the expected value of the total reward overall successive steps [10].The problem addressed in this study regards managing thenetwork bandwidth of a smart city in which several IoT devicesare connected to a single fog node. Because of the network’sbandwidth limitation, if an emergency event happens in one ofthe IoT devices’ area, the fog node will have to select an appropriate device and decrease its bandwidth in order to increasethe bandwidth of the needy device that is in an emergency.The goal of this study is making the best decisions aboutdevice selections for helping emergency devices on the basisof RL with maximum performance. The fog node as a managerand a learner agent should be aware of emergency devicesand help them by increasing their bandwidth. Therefore, thefog node can receive more data, and then it can prepare betterreports for higher levels of the network structure. Since thetotal bandwidth of the network is fixed, the fog node shouldlearn the best IoT device for decreasing its bandwidth and thenincreasing the bandwidth of the current needy device.The remainder of the article is organized as follows. Theresearch problem is defined in detail. We formulate the decision making problem and present the proposed method. Thenwe describe the simulation of the proposed approach, followedby our conclusions and suggestions for future works.Problem defInItIonIn this article, we consider the decision making problem of asingle fog node in emergency cases about transmission rates ofvarious IoT devices in a smart city as a smart environment. Thissmart city’s network has a fixed and limited total bandwidth,supports high video traffic, and includes several IoT deviceswith predefined priorities connected to a single fog node towhich they send their data. The amount of the primary bandwidth that they are allowed to use for transferring data to thefog node, their amount of required additional bandwidth insudden and emergency situations, and their priorities on thebasis of their locations are fixed and predefined by the smartcity management system.We have considered a single type of camera device amongdifferent categories of IoT devices in a smart city, which canmonitor a smart city with crowded main highways. Althoughthe camera devices can adjust their quality of capturing video, itis not necessary to transfer high-quality videos to the fog nodein normal situations. In contrast, in emergencies, the fog nodeshould prepare the required extra bandwidth for the involvedcameras to receive sufficient data with better quality for furtheranalyses of events that have caused the emergency situations.If a large event covered a wide area of the smart city andnumerous devices got into emergency situations, the low-prioritydevices, placed at unimportant locations, are responsible to helphigh-priority devices with their own bandwidth in order to prevent the smart city’s network failure. We have classified the IoTdevices connected to our scenario’s fog node into three types:1. IoT devices with the lowest priority. When these devices havesufficient bandwidth, they are always candidates for helpingemergency devices. Furthermore, these devices never meetemergency situations on the basis of their locations in thesmart city.2. Main devices with the highest priority that can never helpemergency devices should always be in active mode.3. IoT devices whose priorities are not the lowest or the highest. When they are not in emergency situations and havesufficient required bandwidth, they can help the emergencydevices.We have considered a feasible device set (helping candidateset) that includes type 1 and 3 devices with normal situationsand enough available bandwidth for lending to the currentneedy device by the amount that it needs.In the proposed method, in order to achieve the abovementioned objective, over time, the fog node acts as an RLagent that has to learn the environment states based on the RLapproach [9]. Then, on the basis of this learning process, thefog node has to make future-oriented decisions. The fog nodehas to select a device from the recently updated feasible deviceset and decrease its bandwidth in order to increase the needydevice’s bandwidth for further communications. The goal of thefog node is to select the best helper from the feasible deviceset on the basis of the devices’ priorities. Selecting based onpriority means that whereas there is a helper with lower priority,choosing a helper with higher priority is not efficient since thehigher assigned priority shows the importance of the device’slocation and higher probability of meeting an emergency situation.Our fog node’s selection is future oriented; that is, the fognode’s selections are such that in the future, the system willmeet the minimum emergency situations or the minimum number of devices with insufficient bandwidth due to helping others. The fog node makes these decisions while it does not haveany supervisor and information about the distribution parameters related to future emergencies.For a better sense, imagine a situation in which the analysisof the captured data belonging to a main highway shows anunusual situation such as an accident. In this case, the fog nodeneeds more data with better quality for further decisions, so itsends a request to the corresponding camera for increasing itsfrequency of transferring data. However, as transferring moredata needs more bandwidth, the fog node selects an appropriate helper and reduces its bandwidth by the amount thatthe emergency device needs. As the number of needy devicesincreases, the assignment of extra bandwidth to several devicesbecomes complicated and has an impressive effect on the situation, whereas every millisecond is really important to preventdamage and failure.Suppose that one or more main highways in a smart cityhave suffered from a series of accidents because of heavy snow.What will happen when several critical sensors get into emergency situations? If we use certain rules defined by a supervisor,an unpredicted event may take place when there is no predicted instruction for it. Therefore, the best way is that the fognode learns different situations and their best suitable actions.In this approach, the fog node tries all the feasible actions oneby one in each visit of every state, learns the best actions duringits learning process, and then uses its best experiences for thesucceeding similar visits in the future without any supervisor.Note that we have only focused on the bandwidth limitation and not on solving emergency events. The reason for asking an emergency device for more data is that the fog nodeis responsible for sending reports with sufficient informationto the center for higher-level decisions. The question of whatactions should be performed to turn the smart city to a normalsituation is related to higher levels of the hierarchical management system, and we do not discuss it here.mAthemAtIcAl modelIngIn this study, to model the fog node’s decision making problem,the state, the action, and the reward functions are defined asfollows:• State: The state of time step n explains the situations of allthe connected devices to the fog node. flag is a Booleanarray containing nd elements for nd devices, where each oneshows the situation of a device according to its index. EveryIoT device experiences a normal situation (the 0 position) oran emergency situation (the 1 position).We have assumed that all the devices are adjusted to sendreports to the fog node in the predefined intervals, but as adevice understands an unusual situation during the captureddata processing, it has to inform the fog node. In every timestep, flag is updated on the basis of the received emergencyreports. The elements of flag that are related to the emergency devices via their indexes are changed to 1, and the others40 IEEE Internet of Things Magazine • March 2020remain in the 0 position. As soon as the emergency situationends, the related device has to inform the fog node to changethe related element of flag to 0.• Action: If all the devices are in normal situations, there is noneed to select any action. When an emergency occurs, it isnecessary to perform an appropriate action for preparingenough information for the center in order to prevent furtherdamage. At the beginning of each time step, the fog nodechecks the updated flag to find what elements of flag arerecently changed to 1. If there is a new 1-value element, thefog node starts making decisions.For example, assume that an event occurs in the area ofdevice i (di). Therefore, the fog node has to select a device (dk)from the feasible device set and assign the bandwidth of dk todi as much as it needs. Finally, when di comes back to its normal situation, the fog node sets the related element of flag from1 to 0 and then takes back the additional assigned bandwidthfrom di to dk.Based on the RL approach, with the probability of 1 – e thefog node selects an action (a helper) randomly; otherwise, itselects on the basis of past experiences, that is, it selects a helper with the highest value in the Q matrix [11]. At the beginningof the learning process, the probability of random action selection is higher than selecting an action on the basis of past experiences. As the fog node experiences strengthen over time, theprobability of random action selection decreases step by step.• Reward function: As the fog node performs its selectedaction, it receives a reward value that determines the qualityof the recently selected action. Therefore, when the fog nodeassigns sufficient bandwidth to a needy device, it perceivesan increase in the received reward. In this study, the rewardfunction shows the number of needy devices that receivedthe required additional bandwidth plus the number of devices with normal situations, since the fog node operation is oneof the reasons for being in a normal situation.As we have mentioned before, one issue that causes anaction selection to become optimal is the selection of a devicewith the lowest priority among all the feasible helpers. Therefore, this constraint should affect the reward function. For thispurpose, the value of received reward is decreased by a punishment value that shows the number of helpers in the feasible device set whose priorities are less than the priority of theselected helper for the current needy device. One more parameter, penalty, affects the reward function. Penalty decreases thereward function only when there is no device with the helpingconditions because of the bad operation of the fog node in thepast. Moreover, the value of penalty should be sufficiently highto be a good alarm for the fog node. Therefore, we assign thenumber of all the devices with lower priorities than the priorityof the needy device to penalty.As soon as the fog node has completed whatever it was supposed to do for each element of flag and received its reward, itupdates its Q matrix using the main equation of the Q-learningalgorithm [11], where a, g, and ai are the learning coefficient,the constant discount factor, and the index of the device thatis selected to help device i, respectively. Moreover, Q-valuesare held in the form of two matrixes (Qnew and Qold) becauseof the updating procedure. The main algorithm and Q-learningalgorithm show the fog node operation as an RL agent.sImulAtIon resultsOur smart city scenario includes a main highway with 20 IoTdevices (nd = 20) that all send video traffic to the fog node.The predefined priority array is one of the inputs of the learning algorithm. For 20 devices, we have considered 6, 4, and10 devices for type 1, 2, and 3, respectively. For simulatingemergency events during the fog node’s learning period, weuse the uniformly distributed pseudorandom integer function ineach time step to generate nd values randomly from 0, 1 fornd devices held by the flag array showing all of the smart city’sdevices’ situations. When a device has 1-value in its relatedindex of flag, it has encountered an emergency and needs help.Q matrix, flag array, and e are initialized to 0, 0, and n-0.015,respectively. The scheduling update interval in whole implementations is based on time steps, and each time step is consideredas 1 s.We have simulated two more algorithms to compare theirresults with our Q-learning approach’s result. One of thesealgorithms is called the greedy algorithm, since it selects thebest action in each time step based on the supervisory instructions for the current situation without considering future stepsor past experiences. When the fog node is under the ordersof a supervisor, it does not need to learn various states. Thesupervisory instructions are provided in such a way that the fognode automatically selects a feasible helper with the lowest priority. The other approach, called the random algorithm, selectsa device randomly as a helper and then checks whether it ispossible for this device to help the current needy device or not.Then, if this device does not have helping conditions, the fognode continues to select another device randomly, regardlessof the device’s priorities. As soon as the fog node reaches adevice with the helping conditions, it applies changes in thebandwidths of the needy device and the helper.Successful bandwidth management (SBM) is our target function for evaluating and comparing all three approaches in whichthe number of helped devices and devices with normal situations are decreased by the value of countpun where countpuncalculates the total number of devices in the feasible device setMain algorithm.1. Input:1. Initialize priorities, primary bandwidths, extra needed bandwidths inemergencies, and current bandwidths of all IoT devices, flag, time step, andQ matrix.2. While not converged do1. if all flag’s elements were 0 in the last 2 steps do1. countpun=02. GO TO 2/42. else1. for each element i of flag do1. if flag(i) is converted from 1 to 0 do1. Find device u which has helped device i2. Take back the borrowed bandwidth from i to u2. elseif flag(i) is converted from 0 to 1 do (Q_learning)/‚(greedy)/ƒ(random)3. end if2. end for3. end if4. calculate SBMn using countpun5. Calculate the average of SBM6. Make new flag for the next step3. end whileAlgorithm . Q-learning.1. with probability e randomly choose device k among feasible helpers2. otherwise ai = argmaxjfeasible (Q(i, j)) where j is the index of selected helperto device i3. if all devices were not able to help do1. calculate penalty4. else1. calculate punish(i)2. Increase the bandwidth of device i and decrease the bandwidth of device j,as much as i needs5. end if6. Calculate RnQ based on punish(i), penalty, and the helped devices and normaldevices.1. countpun = countpun + punish(i)7. Calculate Q matrix:1. Qnew = (1 – a)Qold (i, ai) + a (RnQ + g (maxkfeasible (Qold (i’, k’))))IEEE Internet of Things Magazine • March 2020 41with priorities less than the selected helpers’ priorities for all theneedy devices in time step n.Because of the random procedure of the random algorithm,it has the lowest average SBM. As the number of helpers withlower priorities increases, the countpun value increases; therefore, the average SBM decreases. Moreover, the selection ofhelpers on the basis of supervisory instructions is not optimal,since it is impossible to consider all the probable events in thevariable fog node’s environment, and action selections are notfuture oriented. By using the Q-value, the agent can choose thebest action on the basis of all the achievable rewards startingfrom the current state (not just the immediate received reward).This is the main motivation for the fog node to try all the feasible helpers in every state and see their results in order to gainpowerful experiences. Therefore, the final average SBM of theQ-learning algorithm is better than that of the greedy algorithm.In Fig. 1, all of these algorithms’ results are presented.The red curve, the blue stars, and the green circles denotethe results of the Q-learning, greedy, and random algorithms,respectively. Further, the vertical axis shows the average SBM,and the horizontal axis denotes the time steps. As is obviousfrom this figure, the Q-learning algorithm’s result convergesto the highest average value of SBM (19.98) among all thepresented results. The second highest average SBM is for thegreedy approach (18.01), and the last one is related to therandom approach (17.01). The performance is calculated viaa proportion equation. Considering 20 IoT devices, in the bestsituation the maximum average of SBM is 20, which can beconsidered as 100 percent performance; since average SBM =20 means all of the emergency IoT devices have received helpor are in the normal situation.Moreover, the total number of time steps of the Q-learning from the start of the learning process to the convergence(with accuracy = 10–5) is 1614 s, while those of the greedy andrandom algorithms are 1002 s and 1306 s, respectively. TheQ-learning algorithm needs to have a learning procedure; therefore, it obviously needs a longer time for converging. In mostcases, the total number of time steps of the random algorithmis lower than that of Q-learning, since the fog node selects adevice as a helper randomly without any process or procedure.Therefore, this strategy makes the random approach faster thanthe Q-learning algorithm. Sometimes, the random algorithm isvery unlucky and has to perform the selection process severaltimes to find a device with all of the required helping conditions. This makes the random algorithm slower than the greedyalgorithm, and sometimes even slower than the Q-learningalgorithm.To analyze the results of these algorithms in detail, wefocused on the beginning period of their operation. Figure 2shows the first 100 s of Fig. 1.The highest achievable SBM value is 20 (because nd = 20).The initial average SBM values of all three algorithms are 0.Immediately, the greedy algorithm reaches the highest average SBM value, while the others go toward 20 gradually, andtheir curves are strict lines toward 20 because the numberof emergency events is low and the number of devices withhelping conditions is high. As time goes on and the numberof emergency events increases, the number of feasible helpersdecreases, and this decreases the average SBM values of allthree algorithms. As the learning process is in the beginningstages, the fog node does not have sufficient expertise; thus, itsoperation is not perfect. Therefore, the temporary descent ofthe Q-learning’s average SBM is more than that of the others’.Over time, the learning progresses, and the fog node strengthens its decisions on the basis of its gained experiences; therefore, its average SBM increases and finally exceeds that of thetwo other algorithms.After a while, because of the convergence of the fog node’slearning process, the average SBM becomes stable. Once thelearning process is complete, the fog node can select the bestdevice among the helping candidates on the basis of its experiences and the received rewards. The average SBM does notconverge exactly to the highest value, but approximates it.There are two reasons for this fact: 1) As the fog node’s enviFigure 1. Average SBM of the fog node based on the proposedalgorithm (Q-learning), and supervised and random algorithms.Figure 2. Average SBM of the fog node in the first 100 s based onQ-learning, supervised, and random algorithms.Figure 3. Average SBM of the fog node in the first 300 s basedon Q-learning, greedy, and random algorithms for 80 IoTdevices.Figure 4. Average SBM of the fog node based on Q-learning,greedy, and random algorithms for 80 IoT devices with eightmore devices with highest priority than Fig. 3.42 IEEE Internet of Things Magazine • March 2020ronment is variable and not static, unpredicted situations mayoccur; therefore, the fog node continues selecting randomactions with a low probability even after convergence. Moreover, as the fog node visits new states, it receives low rewardsat the beginning of its learning period. 2) At times, all the devices are in their normal situations; therefore, the fog node doesnot need to select a helper, but it receives the highest rewardvalue, as this ideal situation is related to the past optimal operation of the fog node.In the following, we examine all three algorithms with diff erent initializations. We increased the number of IoT devices by60, so the smart city considered in Fig. 3 includes 80 cameras.The results show that when the number of IoT devices increases, the fog node operation becomes worse in the cases of thegreedy and random algorithms, as managing higher numbers ofdevices is more diffi cult. We continued evaluating the proposedalgorithm by changing the devices’ priorities besides increasingtheir numbers. We changed the priorities of eight devices considered in Fig. 3 to the highest and plotted the corresponding resultsin Fig. 4. Then we changed the devices’ priorities considered inthis fi gure to higher levels to make the deciding conditions morediffi cult for the fog node. The corresponding results are shownin Fig. 5. Obviously, the application of these changes makesbandwidth management more diffi cult. The percentages of theaverage SBM of all the above fi gures for the Q-learning, greedy,and random methods are shown in Table 1.conclusIonWith the emergence of big data and IoT, the use of manualand supervisory instructions has become difficult, costly, andtime consuming. In contrast, the amount of data increases overtime, and situations change continuously. These factors increasethe error probability. Moreover, the environment changesover time, and this results in permanent needs for changingpredefi ned instructions. The reinforcement learning approachsolves these problems without using any supervisors, while itachieves better performance by reducing human errors.In this study, we solve the bandwidth limitation problem ofa smart city as a smart environment in emergency situationsusing the Q-learning algorithm as a popular RL technique. Thenwe compare the Q-learning algorithm with the greedy andrandom algorithms and show the preference of the proposedapproach from the point of view of performance. The proposedalgorithm gained over 99.8 percent performance after fi nishingthe learning period with diff erent initializations. The fog nodeachieves this high performance without any supervisor, and allthe operations are executed according to its decisions based onits experiences.In the future, this problem will be investigated further byadding several fog nodes to handle all the smart city zones inthe same scenario when all of the fog nodes have full cooperation, and the decisions of each one aff ect those of the others.AcKnoWledgmentsThis research was supported by the MSIT (Ministry of Science,ICT), Korea, under the ITRC (Information Technology ResearchCenter) support program (IITP-2019-2016-0-00465) supervisedby the IITP (Institute for Information & Communications Technology Planning & Evaluation).references[1] K. C. Okafor et al., “Leveraging Fog Computing for Scalable IoT DatacenterUsing SpineLeaf Network Topology,” J. Electrical and Computer Engineering2017, 2017.[2] Q. Wang et al., “Multimedia IoT Systems and Applications,” 2017 GlobalInternet of Things Summit, June 2017. DOI: 10.1109/GIOTS.2017.8016221.[3] P. K. Choubey et al., “Power Effi cient, Bandwidth Optimized and Fault TolerantSensor Management for IOT in Smart Home,” 2015 IEEE Int’l. Advanced Computing Conf., June 2015, pp. 366–70.[4] J. Oueis, E. C. Strinati, and S. Barbarossa, “The Fog Balancing: Load Distribution for Small Cell Cloud Computing,” 2015 IEEE VTC-Spring, May 2015, pp.1–6.[5] Q. Zhu et al., “Task Offloading Decision in Fog Computing System,”China Commun., vol. 14, no. 11, Nov., 2017, pp. 59–68. DOI: 10.1109/CC.2017.8233651.[6] D. Happ and A. Wolisz, “Towards Gateway to Cloud Off loading in IoT Publish/Subscribe Systems,” 2017 2nd Int’l. Conf. Fog and Mobile Edge Computing, May 2017, pp. 101–06.[7] T. Mitchell, Machine Learning, Ed. Science/Engineering/Math, Portland, OR,1997.[8] S. S. I. Samuel, 2016, Mar., “A Review of Connectivity Challenges in IoT SmartHome,” 2016 3rd MEC Int’l. Conf. Big Data and Smart City, pp. 1–4.[9] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, vol. 1,no. 1, MIT Press, 1998.[10] F. S. Melo, “Convergence of Qlearning: A Simple Proof,” Inst. Systems andRobotics, Tech. Rep, 2001, pp. 1–4.[11] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: ASurvey,” J. Artif. Intell. Research, 1996 , vol. 4, 237–85.bIogrAPhIesMotahareh Mobasheri ([email protected]) received herB.E. degree in information technology engineering from Semnan University, Iran, in 2013 and her M.S. degree in computernetworks from Amirkabir University of Technology (TehranPolytechnic), Iran, in 2015. Currently, she is a Ph.D. candidatein the Information and Communication Engineering Department of Dongguk University, Seoul, Republic of Korea.Yangwoo Kim ([email protected]) received his from Syracuse University, New York, in 1992. He isthe corresponding author for this article and a professor atDongguk University. His research interests include parallel anddistributed processing systems, cloud computing, grid computing, and edge computing.Woongsup Kim ([email protected]) received his in computer engineering from Seoul National University in 1998, his M.S. degree in computer and information science from the University of Pennsylvania in 2001, andhis Ph.D. degree in computer science from Michigan State,Lansing. Since 2007, he has been a faculty member of theDepartment of Information and Communication Engineeringat Dongguk University. His research interests include softwareengineering, web semantics, service oriented computing, andIoT system design.Figure 5. Average SBM of the fog node based on Q-learning,greedy, and random algorithms for 80 IoT devices with priority increase of some devices of Fig. 4.Table 1. Final results of the 1: Q-learning; 2: greedy; and 3: random approaches.
Average SBM
Total time steps (s)
Performance (%)

Order from Academic Writers Bay
Best Custom Essay Writing Services