We consider online learning in finite stochastic Markovian environments
where in each time step a new reward function is chosen by an oblivious adversary.
The goal of the learning agent is to compete
with the best stationary policy in hindsight in terms of the total reward received.
In details, in each time step
the agent observes the current state and the reward associated with the last transition,
however, the agent does not observe the rewards associated with other state-action pairs.
The agent is assumed to know the transition probabilities.
The state of the art result for this setting is
an algorithm with an expected regret of *O(T ^{2/3}ln T)*.
In this paper, assuming that stationary policies mix uniformly fast,
we show that after

Partial-monitoring games constitute
a mathematical framework for sequential decision making problems with imperfect feedback:
The learner repeatedly chooses an action,
the opponent responds with an outcome,
and then the learner suffers a loss and receives a feedback signal,
both of which are fixed functions of the action and the outcome.
The goal of the learner is to minimize his total cumulative loss.
We make progress towards the classification of these games based on their minimax expected regret.
Namely, we classify almost all games with two outcomes and a finite number of actions:
We show that their minimax expected regret is either
zero, *\tilde\Theta(\sqrt{T})*, *\Theta(T ^{2/3})*, or

*Keywords:*
Online algorithms, Online learning, Imperfect feedback, Regret analysis

It has been shown by earlier results that for fixed rate multiresolution scalar quantizers and for mean squared error distortion measure, codecell convexity precludes optimality for certain discrete sources. However it was unknown whether the same phenomenon can occur for any continuous source. In this paper, examples of continuous sources (even with bounded continuous densities) are presented for which optimal fixed rate multiresolution scalar quantizers cannot have only convex codecells, proving that codecell convexity precludes optimality also for such regular sources.

*Keywords -*
Clustering methods, quantization, source coding, rate distortion theory,
mean square error methods, optimization methods, multi-resolution, codecell
convexity, continuous density function.

We consider the problem of actively learning the mean values of
distributions associated with a finite number of options.
The decision maker can select which option to generate the next observation from,
the goal being to produce estimates with equally good precision for all the options.
If sample means are used to estimate the unknown values
then the optimal solution,
assuming that the distributions are known up to a shift,
is to sample from each distribution proportional to its variance.
No information other than the distributions' variances is needed to calculate the optimal solution.
In this paper we propose an incremental algorithm that asymptotically
achieves the same loss as an optimal rule.
We prove that the excess loss suffered by this algorithm,
apart from logarithmic factors, scales as *n ^{-3/2}*,
which we conjecture to be the optimal rate.
The performance of the algorithm is illustrated on a simple problem.

**Key words:**
active learning, heteroscedastic noise, regression, sequential analysis,
sequential allocation

We consider the problem of finding a near-optimal policy in continuous space, discounted Markovian Decision Problems given the trajectory of some behaviour policy. We study the policy iteration algorithm where in successive iterations the action-value functions of the intermediate policies are obtained by picking a function from some fixed function set (chosen by the user) that minimizes an unbiased finite-sample approximation to a novel loss function that upper-bounds the unmodified Bellman-residual criterion. The main result is a finite-sample, high-probability bound on the performance of the resulting policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept that we call the VC-crossing dimension, the approximation power of the function set and the discounted-average concentrability of the future-state distribution. To the best of our knowledge this is the first theoretical reinforcement learning result for off-policy control learning over continuous state-spaces using a single trajectory.

**Key words**
Reinforcement learning, policy iteration, Bellman-residual minimization,
off-policy learning, nonparametric regression, least-squares regression,
finite-sample bounds

It is shown by earlier results that
the minimax expected (test) distortion redundancy of empirical vector
quantizers with three or more levels designed from *n* independent
and identically distributed data points is at least
*\Omega(1 / *\sqrt*{n})* for the class of distributions on a bounded set.
In this paper,
a much simpler construction and proof for this are given
with much better constants.
There are similar bounds for the training distortion of
the empirical optimal vector quantizer with three or more levels.
These rates, however, do not hold for a one-level quantizer.
Here the two-level quantizer case is clarified,
showing that it already shares the behavior of the general case.
Given that the minimax bounds are proved using a construction
that involves discrete distributions,
one suspects that
for the class of distributions with uniformly bounded continuous densities,
the expected distortion redundancy might decrease as
*o(1 / *\sqrt*{n})* uniformly.
It is shown as well that this is not so,
proving that the lower bound for the expected test distortion remains true
for these subclasses.

*Keywords -*
Empirical vector quantizer design, distortion redundancy,
minimax lower bounds, training distortion.

We consider the rate of convergence of the expected distortion redundancy of
empirically optimal vector quantizers.
Earlier results show that the mean-squared distortion of an empirically
optimal quantizer designed from *n* independent and identically distributed
source samples converges uniformly to the optimum at a rate of
*O(1 / *\sqrt*{n})*, and that this rate is sharp in the minimax sense.
We prove that for any fixed distribution supported on a given finite set
the convergence rate is *O(1/n)*
(faster than the minimax lower bound),
where the corresponding constant depends on the source distribution.
For more general source distributions
we provide conditions
implying a little bit worse *O(*log* n/n)* rate of convergence.
Although these conditions, in general, are hard to verify,
we show that sources with continuous densities
satisfying certain regularity properties
(similar to the ones in [Pol82b] that were used to prove
a central limit theorem for the code points of the empirically optimal quantizers)
are included in the scope of this result.
In particular,
scalar distributions with strictly log-concave densities with bounded support
(such as the truncated Gaussian distribution) satisfy these conditions.

We derive new margin-based inequalities for the probability of error of classifiers. The main feature of these bounds is that they can be calculated using the training data and therefore may be effectively used for model selection purposes. In particular, the bounds involve empirical complexities measured on the training data (such as the empirical fat-shattering dimension) as opposed to their worst-case counterparts traditionally used in such analyses. Also, our bounds appear to be sharper and more general than recent results involving empirical complexity measures. In addition, we develop an alternative data-based bound for the generalization error of classes of convex combinations of classifiers involving an empirical complexity measure that is easier to compute than the empirical covering number or fat-shattering dimension. We also show examples of efficient computation of the new bounds.

**Keywords:**
classification, error estimation, fat-shattering dimension, margin-based bounds

Suppose *P* is an arbitrary discrete distribution
on a countable alphabet {\cal X}.
Given an i.i.d. sample *(X _{1},...,X_{n})* drawn from

**Keywords:**
functional estimation; entropy estimation; rates of convergence; match lengths

We show that there exist individual lower bounds corresponding to the upper bounds for the rate of convergence of nonparametric pattern recognition which are arbitrarily close to Yang's minimax lower bounds, for certain ``cubic'' classes of regression functions used by Stone and others. The rates are equal to the ones of the corresponding regression function estimation problem. Thus for these classes classification is not easier than regression function estimation.

**Keywords:**
individual rates of convergence, nonparametric pattern recognition

We show that there exist individual lower bounds on the rate of convergence of nonparametric regression estimates, which are arbitrarily close to Stone's minimax lower bounds.

*AMS classifications*: Primary: 62G20. Secondary: 62G05.

*Keywords*: Individual rates of convergence, nonparametric regression.

We give a short proof of the following result.
Let *(X,Y)* be any distribution on *N* x {0,1},
and let *(X _{1},Y_{1}),...,(X_{n},Y_{n})* be an i.i.d. sample drawn from this distribution.
In discrimination, the Bayes error

infinitely often.

*1991 Mathematics Subject Classifications*: Primary: 62G07. Secondary: 62G05, 62F12, 60F25.

*Keywords and phrases.*
Discrimination, statistical pattern recognition,
nonparametric estimation, Bayes error, lower bounds, rates of convergence.

Minimax lower bounds for concept learning state, for example,
that for each sample size *n* and learning rule *g _{n}*,
there exists a distribution of the observation

In this paper we investigate minimax lower bounds in such a - stronger - sense.
We show that for several natural *k*-parameter concept classes,
including the class of linear halfspaces, the class of balls,
the class of polyhedra with a certain number of faces, and a class of neural networks,
for any *sequence* of learning rules *{g _{n}}*,
there exists a fixed distribution of

Back to the publications

© András Antos

Updated: Feb 7, 2021