Skip to main content

22 posts tagged with "DSP 2017"

Posts related to the Daj Się Poznać 2017 competition. It was a competition that involved running your own blog and developing a project.

View All Tags

Multi-armed bandit - Upper Confidence Bound

· 6 min read

This post is part of my struggle with the book "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto. Other posts systematizing my knowledge and presenting the code I wrote can be found under the tag Sutton & Barto and in the repository dloranc/reinforcement-learning-an-introduction.


In multi-armed bandit we need exploration to find the best action because the value of each action is uncertain. The value of the action changes when we perform the action from time to time and learn about the reward we receive. The more often a given action is selected, the more certain we are that the value of this action is correct. So far, however, we have not taken this rather intuitive observation into account in our calculations. The actions were selected randomly, without taking into account whether the action values ​​were closest to the best one or how certain the estimates were.

Multi-armed bandit - optimistic initial values

· 3 min read

This post is part of my struggle with the book "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto. Other posts systematizing my knowledge and presenting the code I wrote can be found under the tag Sutton & Barto and in the repository dloranc/reinforcement-learning-an-introduction.


All the methods I have described so far depend on initial estimates of the value of Q1(a)Q_1(a). This is especially visible when we calculate MAB with ϵ=0\epsilon = 0, i.e. without exploration, still selecting the best possible action (arm). In statistics, we call such methods biased. The bias disappears for methods with α\alpha of 1n\frac{1}{n} when each action is selected at least once. For constant α\alpha, the bias does not disappear, it only decreases with time (subsequent iterations of the algorithm).

Torchcraft - analysis and changing the game state

· 6 min read

This post is about the Starcraft bot I am developing using machine learning. The project is being developed as part of the "Daj Się Poznać 2017" competition.


In the last post about the project I described how the maps look like and wrote how to create a basic script that connects to Starcraft and downloads the game state in a loop, or rather each subsequent logical frame games. I don't think I've written yet what a logical game frame is. The thing with logical frames is that graphics rendering is independent of calculations that change the game state. The frame rate is not constant and depends on the speed of your computer. The game state is calculated every interval. If you have played Starcraft, you probably know that you can set the game speed in the options. Changing the speed results in a change in the time between logical frame calculations. This is a fundamental difference, if we had constant 30 or 60 FPS, the issue would probably be solved differently.

Multi-armed bandit - non-stationary version

· 3 min read

This post is part of my struggle with the book "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto. Other posts systematizing my knowledge and presenting the code I wrote can be found under the tag Sutton & Barto and in the repository dloranc/reinforcement-learning-an-introduction.


Non-stationary problem

In this post, I will discuss a particular type of multi-armed bandit (MAB) problem, which consists in the fact that for each one-armed bandit, the value of the rewards changes over time. This is the so-called non-stationary version of MAB. Until now, the value of rewards was obtained from a certain normal distribution with a certain mean and variance (the mean for each arm was selected randomly at the beginning in the constructor).

TorchCraft - basic script

· 4 min read

This post is about the Starcraft bot I am developing using machine learning. The project is being developed as part of the "Daj Się Poznać 2017" competition.


This week I was going to write about Torch itself and how to create neural networks in it, but I decided that I would focus on the very basics of TorchCraft itself and its interaction with Starcraft. TorchCraft, unfortunately, has poor documentation and apart from the installation description, almost everything has to be figured out based on examples from the examples directory.

TorchCraft - Lua basics

· 5 min read

This post is about the Starcraft bot I am developing using machine learning. The project is being developed as part of the "Daj Się Poznać 2017" competition.


Recently I didn't have enough time to finish installing TorchCraft. After writing previous post, I remembered that in NTFS there is something called "junction point", which is something like symlinks in Windows. If BWEnv.exe requires Starcraft with the path C:\Starcraft, just download Junction, move it to *C:\Windows\System32 * and execute in the console: junction c:\Starcraft d:\Games\Starcraft. It worked for me. Well, okay, it didn't really work, because the game ran in full screen (the config says that it should be in a window), something was working there, but after the second attempt to run BWEnv.exe it didn't want to run anymore. A better option turned out to be to use BWEnv.dll and launch the game through Chaoslaucher.

Multi-armed bandit - simple optimization

· 4 min read

This post is part of my struggle with the book "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto. Other posts systematizing my knowledge and presenting the code I wrote can be found under the tag Sutton & Barto and in the repository dloranc/reinforcement-learning-an-introduction.


In last post I discussed the basic version of multi-armed bandit with ϵ\epsilon-greedy strategy. The presented algorithm has a small drawback, as it requires recording each reward and calculating the arithmetic mean of the rewards for a given action each time the best action is selected. Not only does the algorithm require memory for rewards, as many times as there are time steps, but each time it is necessary to choose the best action, a lot of unnecessary and quite time-consuming calculations take place. Let's imagine that we have to calculate the arithmetic mean of one million prizes. How long will it take? This can be solved better.

TorchCraft installation

· 4 min read

This post is about the Starcraft bot I am developing using machine learning. The project is being developed as part of the "Daj Się Poznać 2017" competition.


This whole week I was wondering what path to take for my bot project. I wanted to use Deeplearning4j, but this library conflicts with BWMirror, which requires Java 32-Bit. An alternative is to rewrite the entire project from BWMirror to JNIBWAPI. The second alternative is to rewrite the project to C++, which I am not very interested in because I don't feel good in this language. I mean, not in Java either, but writing in Java is easier to learn. However, I finally decided to take up TorchCraft. Of course, I lose the opportunity to participate in SSCAIT, but I think it will be better to finally do something related to reinforcement learning using a ready-made environment. If I had to do the same in Java, it would take me a lot of time.

Attack of multi-armed bandits

· 7 min read

This post is part of my struggle with the book "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto. Other posts systematizing my knowledge and presenting the code I wrote can be found under the tag Sutton & Barto and in the repository dloranc/reinforcement-learning-an-introduction.


Multi-armed bandit problem (or k-armed bandit problem) is one of the reinforcement learning problems, I don't know if it's the simplest one, but it allows for a relatively quick introduction to the subject and to become familiar with the basic concepts.