Figures from this paper
- figure 1
- figure 2
- figure 3
- figure 4
Topics
Regret (opens in a new tab)Social Networks (opens in a new tab)Learning Algorithm (opens in a new tab)Adversarial Bandits (opens in a new tab)Network Structure (opens in a new tab)Multi-armed Bandit (opens in a new tab)
9 Citations
- Stephen PasterisAlberto Rumi M. Herbster
- 2024
Computer Science, Mathematics
The CBA algorithm is proposed, which exploits the assumption that one action corresponding to the learner's abstention from play, has no reward or loss on every trial, and is the first to achieve bounds on the expected cumulative reward for general confidence-rated predictors.
- Junghyun LeeLaura SchmidSeYoung Yun
- 2023
Computer Science
ArXiv
This work provides a rigorous regret analysis for the standard flooding protocol combined with the UCB policy, and proposes a new protocol called Flooding with Absorption (FWA), which is verified empirically that using FWA leads to significantly lower communication costs despite minimal regret performance loss compared to flooding.
- PDF
- Stephen PasterisChris HicksV. Mavroudis
- 2023
Computer Science, Mathematics
NeurIPS
The nearest neighbour rule is adapted to the contextual bandit problem and the algorithm is extremely efficient - having a per trial running time polylogarithmic in both the number of trials and actions, and taking only quasi-linear space.
- Juliette AchddouNicolò Cesa-BianchiPierre Laforgue
- 2024
Computer Science
AISTATS
The analysis shows that the regret of $\texttt{MT-CO}_2\texttt{OL}$ is never worse than the bound obtained when agents do not share information, and it is proved that the algorithm can be made differentially private with a negligible impact on the regret.
- Stephen PasterisChris HicksV. Mavroudis
- 2023
Computer Science, Mathematics
ArXiv
The adversarial contextual bandit problem in metric spaces is considered, designing an algorithm in which it can hold out any set of contexts when computing the authors' regret term and hence inherits its extreme computational efficiency.
- Highly Influenced[PDF]
- Nicolò Cesa-BianchiT. CesariR. D. Vecchia
- 2021
Computer Science, Mathematics
ArXiv
This work characterize regret in terms of the independence number of the strong product between the feedback graph and the communication network, which recovers as special cases many previously known bounds for distributed online learning with either expert or bandit feedback.
- Pierre LaforgueA. VecchiaNicolò Cesa-BianchiL. Rosasco
- 2022
Computer Science
ArXiv
AdaTask can be seen as a comparator-adaptive version of Follow-the-Regularized-Leader with a Mahalanobis norm potential, and a variational formulation of this potential reveals how AdaTask jointly learns the tasks and their structure.
- 1
- PDF
- Baojian ZhouYifan SunReza Babanezhad
- 2023
Computer Science, Mathematics
ICML
This work proves an effective regret of $\mathcal{O}(\sqrt{n^{1+\gamma}})$ when suitable parameterized graph kernels are chosen, and proposes an approximate algorithm FastONL enjoying regret based on this relaxation.
- Erhan BayraktarIbrahim EkrenXin Zhang
- 2022
Mathematics, Computer Science
ArXiv
This paper heuristically derive a limiting PDE on Wasserstein space which characterizes the asymptotic behavior of the regret of the forecaster and shows that the problem of obtaining regret bounds and efficient algorithms can be tackled by finding appropriate smooth sub/supersolutions of this parabolic PDE.
91 References
- L. E. CelisFarnood Salehi
- 2017
Computer Science, Economics
ArXiv
This paper provides algorithms for this setting, both for stochastic and adversarial bandits, and shows that their regret smoothly interpolates between the regret in the classical bandit setting and that of the full-information setting as a function of the neighbors' exploration.
- N. Cesa-BianchiC. GentileGiovanni Zappella
- 2013
Computer Science
NIPS
A global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to "share" signals (contexts and payoffs) with the neghboring nodes, and derives two more scalable variants of this strategy based on different ways of clustering the graph nodes.
- 151 [PDF]
- Swapna BuccapatnamA. EryilmazN. Shroff
- 2013
Computer Science
52nd IEEE Conference on Decision and Control
The investigations in this work reveal the significant gains that can be obtained even through static network-aware policies, and proposes a randomized policy that explores actions for each user at a rate that is a function of her network position.
- 38
- PDF
- Zhi WangChicheng ZhangManish SinghL. RiekKamalika Chaudhuri
- 2021
Computer Science
AISTATS
An upper confidence bound-based algorithm is developed, RobustAgg ($epsilon), that adaptively aggregates rewards collected by different players and achieves instance-dependent regret guarantees that depend on the amenability of information sharing across players.
- 16 [PDF]
- Meng FangD. Tao
- 2014
Computer Science, Mathematics
KDD
This paper formalizes the networked bandit problem and proposes an algorithm that considers not only the selected arm, but also the relationships between arms, in that it decides an arm depending on integrated confidence sets constructed from historical data.
- 27
- X. XuFang DongYanghua LiShaojian HeX. Li
- 2020
Computer Science, Mathematics
AAAI
A contextual bandit problem is studied in a highly non-stationary environment and an efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved.
- 25 [PDF]
- Aleksandrs Slivkins
- 2011
Mathematics, Computer Science
COLT
This work considers similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem, and presents algorithms that are based on adaptive partitions, and take advantage of "benign" payoffs and context arrivals without sacrificing the worst-case performance.
- 437 [PDF]
- Zhi WangManish SinghChicheng ZhangL. RiekKamalika Chaudhuri
- 2020
Computer Science
This paper forms the -multi-player multi-armed bandit problem, and develops an upper confidence bound-based algorithm that adaptively aggregates rewards collected by different players, to be the first to develop such a scheme in a multi-player bandit learning setting.
- 6
- PDF
- Qingyun WuHuazheng WangQuanquan GuHongning Wang
- 2016
Computer Science
SIGIR
This paper develops a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating, and rigorously proves an improved upper regret bound.
- 105
- PDF
- Abishek SankararamanA. GaneshS. Shakkottai
- 2019
Computer Science
Proc. ACM Meas. Anal. Comput. Syst.
A novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random is developed, demonstrating that even a minimal level of collaboration among the different agents enables a significant reduction in per-agent regret.
- 71 [PDF]
...
...
Related Papers
Showing 1 through 3 of 0 Related Papers