[PDF] A Gang of Adversarial Bandits | Semantic Scholar (2024)

Figures from this paper

figure 1
figure 2
figure 3
figure 4

Topics

Regret (opens in a new tab)Social Networks (opens in a new tab)Learning Algorithm (opens in a new tab)Adversarial Bandits (opens in a new tab)Network Structure (opens in a new tab)Multi-armed Bandit (opens in a new tab)

9 Citations

Bandits with Abstention under Expert Advice

Stephen PasterisAlberto Rumi M. Herbster

Computer Science, Mathematics

2024

The CBA algorithm is proposed, which exploits the assumption that one action corresponding to the learner's abstention from play, has no reward or loss on every trial, and is the first to achieve bounds on the expected cumulative reward for general confidence-rated predictors.

[PDF]

Communication-Efficient Collaborative Heterogeneous Bandits in Networks

Junghyun LeeLaura SchmidSeYoung Yun

Computer Science

ArXiv

2023

This work provides a rigorous regret analysis for the standard flooding protocol combined with the UCB policy, and proposes a new protocol called Flooding with Absorption (FWA), which is verified empirically that using FWA leads to significantly lower communication costs despite minimal regret performance loss compared to flooding.

Nearest Neighbour with Bandit Feedback

Stephen PasterisChris HicksV. Mavroudis

Computer Science, Mathematics

NeurIPS

2023

The nearest neighbour rule is adapted to the contextual bandit problem and the algorithm is extremely efficient - having a per trial running time polylogarithmic in both the number of trials and actions, and taking only quasi-linear space.

[PDF]

Multitask Online Learning: Listen to the Neighborhood Buzz

Juliette AchddouNicolò Cesa-BianchiPierre Laforgue

Computer Science

AISTATS

2024

The analysis shows that the regret of $\texttt{MT-CO}_2\texttt{OL}$ is never worse than the bound obtained when agents do not share information, and it is proved that the algorithm can be made differentially private with a negligible impact on the regret.

91 References

L. E. CelisFarnood Salehi

Computer Science, Economics

ArXiv

2017

This paper provides algorithms for this setting, both for stochastic and adversarial bandits, and shows that their regret smoothly interpolates between the regret in the classical bandit setting and that of the full-information setting as a function of the neighbors' exploration.

[PDF]

A Gang of Bandits

N. Cesa-BianchiC. GentileGiovanni Zappella

Computer Science

NIPS

2013

A global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to "share" signals (contexts and payoffs) with the neghboring nodes, and derives two more scalable variants of this strategy based on different ways of clustering the graph nodes.

[PDF]

Multi-armed bandits in the presence of side observations in social networks

Swapna BuccapatnamA. EryilmazN. Shroff

Computer Science

52nd IEEE Conference on Decision and Control

2013

The investigations in this work reveal the significant gains that can be obtained even through static network-aware policies, and proposes a randomized policy that explores actions for each user at a rate that is a function of her network position.

Multitask Bandit Learning through Heterogeneous Feedback Aggregation

Zhi WangChicheng ZhangManish SinghL. RiekKamalika Chaudhuri

Computer Science

AISTATS

2021

An upper confidence bound-based algorithm is developed, RobustAgg ($epsilon), that adaptively aggregates rewards collected by different players and achieves instance-dependent regret guarantees that depend on the amenability of information sharing across players.

[PDF]

Networked bandits with disjoint linear payoffs

Meng FangD. Tao

Computer Science, Mathematics

KDD

2014

This paper formalizes the networked bandit problem and proposes an algorithm that considers not only the selected arm, but also the relationships between arms, in that it decides an arm depending on integrated confidence sets constructed from historical data.

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

X. XuFang DongYanghua LiShaojian HeX. Li

Computer Science, Mathematics

AAAI

2020

A contextual bandit problem is studied in a highly non-stationary environment and an efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved.

[PDF]

Contextual Bandits with Similarity Information

Aleksandrs Slivkins

Mathematics, Computer Science

COLT

2011

This work considers similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem, and presents algorithms that are based on adaptive partitions, and take advantage of "benign" payoffs and context arrivals without sacrificing the worst-case performance.

[PDF]

Stochastic Multi-Player Bandit Learning from Player-Dependent Feedback

Zhi WangManish SinghChicheng ZhangL. RiekKamalika Chaudhuri

Computer Science

2020

This paper forms the -multi-player multi-armed bandit problem, and develops an upper confidence bound-based algorithm that adaptively aggregates rewards collected by different players, to be the first to develop such a scheme in a multi-player bandit learning setting.

Contextual Bandits in a Collaborative Environment

Qingyun WuHuazheng WangQuanquan GuHongning Wang

Computer Science

SIGIR

2016

This paper develops a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating, and rigorously proves an improved upper regret bound.

Social Learning in Multi Agent Multi Armed Bandits

Abishek SankararamanA. GaneshS. Shakkottai

Computer Science

Proc. ACM Meas. Anal. Comput. Syst.

2019

A novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random is developed, demonstrating that even a minimal level of collaboration among the different agents enables a significant reduction in per-agent regret.

[PDF]

...

Related Papers

Showing 1 through 3 of 0 Related Papers