Convexity
TOC
Introduction
Convexity is an important concept in machine learning, especially in optimization problems. Many optimization algorithms are designed to find the minimum of a convex function because a convex function has only one minimum point, and the minimum point can be found using efficient algorithms. In contrast, non-convex functions may have multiple local minimum points, and it may be difficult to find the global minimum.
In machine learning, convexity is important for many tasks, including linear regression, logistic regression, support vector machines, and neural networks. The loss function used in these tasks is typically convex, which allows efficient optimization using algorithms such as gradient descent.
Convex set
A set C is convex if you connect any two points in that set
Examples of convex set are hyperplanes and halfspaces. A hyperplane in n-dimensional space is the set of points satisfying the equation:
Another example is a norm ball. An Euclidean ball is the set of points that satisfies:
Ellipsoids are convex sets, too.
The thing is, if we use norm p (
The intersection of halfspaces and hyperplanes are also convex. The name is a polyhedra. For a halfspace
A point is a convex combination of all points
Convex function
A function is convex if its domain is a convex set and when we connect any two points on the graph of that function, we have a line above or on the graph. Mathematically, a function
Image: graph of a convex function
Similarly, a function
We can do the same for concave and strictly concave functions. If a function is strictly convex and it has a minimum then that minimum is a global minimum. Here are some properties:
-
If f(x) is convex then af(x) is convex if a > 0 and concave if a < 0
-
The sum of two convex functions is a convex function with the domain to be the intersection of the two functions (the intersection of two convex sets is a convex set)
-
If
is convex then is convex on the domain set being the intersection of all the domain sets.
Examples
Examples of univariate convex functions:
-
y = ax + b since the line connecting any two points lie on the graph of it
-
-
on or - the negative entropy
Examples of univariate concave functions:
-
y = ax + b since -y is a convex function. Note that y = ax + b is both convex and concave
-
-
Derivatives
For differential function, we can check the derivatives to see whether it is convex or not. The definition of the tangient of function f at a point
For multivariate function, i.e. the variable is a vector of dimension d. Its first derivative is a vector of d dimensions. The second derivative is a square matrix of d x d. The second derivative is denoted
We have some examples:
-
The negative entropy function
is strictly convex since the domain x > 0 is convex and is greater than 0 with all x in the domain. -
is not a convex function since the second derivative can have a negative value. -
The cross entropy function is a strictly convex function. For two probabilities x and 1 - x, 0 < x < 1, a is a constant in [0,1]: f(x) = -(alog(x) + (1-a)log(1-x)) has the second derivative to be
is a positive number. -
If A is a positive definite matrix
is convex since its Hessian is A - a positive definite matrix. -
For negative entropy function with two variables f(x,y) = x log(x) + y log(y) on the positive set of x and y. This function has the first derivative to be
and its Hessian to be:
This is a diagonal matrix with the positive entries so it is a positive definite matrix. Therefore the negative entropy is a strictly convex funciton.
Jensen’s Inequality
Given a convex function f, we have the generalization of the convexity definition:
and
Properties
Local Minima are Global Minima
Consider a convex function f on a convex set X. Suppose that
However, according to the definition of convex function, we have:
This contradicts