Springer-Verlag, Berlin, Heidelberg, LNCS/LNAI, 5254, pp.287-302, October 2008]

In this paper
we consider the problem of actively learning the mean values of distributions
associated with a finite number of options (arms).
The algorithms can select which option to generate the next sample from
in order to
produce estimates with equally good precision for all the distributions.
When an algorithm uses sample means to estimate the unknown values then
the optimal solution, assuming full knowledge of the distributions, is
to sample each option proportional to its variance.
In this paper we propose an incremental algorithm that
asymptotically achieves the same loss as an optimal rule.
We prove that the excess loss suffered by this algorithm,
apart from logarithmic factors, scales as *n ^{-3/2}*,
which we conjecture to be the optimal rate.
The performance of the algorithm is illustrated in a simple problem.

December 2007, Poster]

We consider continuous state, continuous action batch reinforcement learning
where the goal is to learn a good policy
from a sufficiently rich trajectory generated by some policy.
We study a variant of fitted *Q*-iteration,
where the greedy action selection is replaced by searching for a policy
in a restricted set of candidate policies by maximizing the average action-values.
We provide a rigorous analysis of this algorithm,
proving what we believe is the first finite-time bound for value-function based algorithms
for continuous state and action problems.

IEEE, pp.330-337, April 2007]

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is fitted policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced allowing us to significantly extend the scope of previous results.

Springer-Verlag, Berlin, Heidelberg, LNCS/LNAI, 4005, pp.574-588, June 2006]

Abstract

We consider batch reinforcement learning problems in continuous space,
expected total discounted-reward Markovian Decision Problems.
As opposed to previous theoretical work,
we consider the case when the training data consists of a single sample path
(trajectory) of some behaviour policy.
In particular, we do not assume access to a generative model of the environment.
The algorithm studied is policy iteration
where in successive iterations the *Q*-functions of the intermediate policies
are obtained by means of minimizing a novel Bellman-residual type error.
PAC-style polynomial bounds are derived on the number of samples
needed to guarantee near-optimal performance
where the bound depends on the mixing rate of the trajectory,
the smoothness properties of the underlying Markovian Decision Problem,
the approximation power and capacity of the function set used.

Springer-Verlag, Berlin, Heidelberg, LNCS/LNAI, 3559, pp.531-544, June 2005]

It is shown by earlier results that
the minimax expected (test) distortion redundancy of
empirical vector quantizers with three or more levels
designed from *n* independent and identically distributed data points
is at least *\Omega(1/*\sqrt*{n})* for the class of distributions on a bounded set.
In this paper,
a much simpler construction and proof for this are given with much better constants.
There are similar bounds for the training distortion of
the empirically optimal vector quantizer with three or more levels.
These rates, however, do not hold for a one-level quantizer.
Here the two-level quantizer case is clarified,
showing that it already shares the behavior of the general case.
Given that the minimax bounds are proved using a construction
that involves discrete distributions,
one suspects that
for the class of distributions with uniformly bounded continuous densities,
the expected distortion redundancy might decrease as *o(1/*\sqrt*{n})* uniformly.
It is shown as well that this is not so,
proving that the lower bound for the expected test distortion remains true
for these subclasses.

IEEE Information Theory Society, p.301, 2004]

We consider the rate of convergence of the expected distortion redundancy of
empirically optimal vector quantizers.
Earlier results show that the mean-squared distortion of an empirically
optimal quantizer designed from *n* independent and identically distributed
source samples converges uniformly to the optimum at a rate
*O(1 / *\sqrt*{n})*, and that this rate is sharp in the minimax sense.
We prove that for any fixed source distribution supported on a given finite
set, the convergence rate is *O(1/n)* (faster than the
minimax lower bound), where the corresponding constant
depends on the distribution.
For more general source distributions,
we provide conditions
implying a little bit worse *O(*log* n/n)* rate of convergence.
In particular, scalar
distributions having strictly log-concave densities with bounded support
(such as the truncated Gaussian distribution) satisfy these conditions.

p.45, 2001]

Given an i.i.d. sample *(X _{1},...,X_{n})*
drawn from an unknown discrete distribution

Let *X* be a real-valued random variable with finite expectation denoted by *m*.
Let *(X _{1},...,X_{n})* be an i.i.d. sample
drawn from this distribution of

is a strongly consistent estimate of

It is also known that restricting ourselves to the class

We prove that this is sharp, even for discrete

Springer, Berlin, LNCS/LNAI, 1572, pp.241-252, 1999]

We show that there exist individual lower bounds corresponding to the upper bounds on the rate of convergence of nonparametric pattern recognition which are arbitrarily close to Yang's minimax lower bounds, if the a posteriori probability function is in the classes used by Stone and others. The rates equal to the ones on the corresponding regression estimation problem. Thus for these classes classification is not easier than regression estimation either in individual sense.

Warsawa, Poland, August 1997]

Binary search trees are important in data storing,
and the running time of several algorithms is related to the height of the tree.
The data can be modeled by a random sequence of numbers.
On the other hand, random binary search trees
can be used in certain computer graphics applications.
The case when the random input sequence consists of independent identically distributed (i.i.d.) variables
has been heavily studied.
In this case the height has order log *n*, asymptotically,
where *n* is the number of nodes.
We study the case when the input sequence is an ordinary RAndom WAlk,
that is, the partial sums of an i.i.d. sequence.
We show that
if the basic distribution satisfies some regularity conditions
then the height has order square root of *n*.
We give an exact limit law,
where the limit distribution is the Erdõs-Kac-Rényi distribution *L(z)*.
As a by-product, we prove a limit theorem for random walks.

Association for Computing Machinery, New York, pp.303-309, 1996]

Known minimax lower bounds for learning state that for each sample size *n*,
and learning rule *g _{n}*,
there exists a distribution of the observation

In this paper we investigate minimax lower bounds in such - much
stronger - sense.
We show that for several natural *d*-parameter concept classes,
for any *sequence* of learning rules *{g _{n}}*,
there exists a fixed distribution of

We also show that it is neither the VC dimension, nor the rate of increase of the shatter coefficients of the class that determine the asymptotic behavior of the concept class.

Back to the publications

© András Antos

Updated: Feb 7, 2021