Skip to main content

New version of the blog

· One min read

I finally got motivated and decided to refresh this blog. The first problem I encountered was that Sculpin and the entire PHP-based environment stopped working. In addition, Travis CI hasn't built the project for a long time. I moved to Docusaurus developed by Meta (Node.js) and I used GitHub Actions to build the project. It took me half a day to transfer old posts, but overall I'm happy with this tool. I know the name implies an emphasis on documentation, but you can pretty quickly build a blog on that. All I had to do was add a search engine and the English language support.

What are my plans for the future? I think I will continue writing about machine learning and AI, but I intend to expand the topic to other things related to broadly understood computer science. I will probably also write in English.

That's it, I hope I will manage to write regularly :)

Docker on Windows - problems with volumes, symlinks and connectivity

· 2 min read

I started learning docker a few days ago. So far I have used Vagrant for virtualization, but it didn't sit very well with me. But I don't want to talk about that. I haven't quite grasped docker yet, but after a few attempts and one project, I've found that it's a cool thing to do. However, I'm not going to create any tutorial. If you are interested in one, I invite you to check the docker-curriculum.com. It's pretty cool and does a good job of explaining the basics.

While playing around with docker, I encountered a few problems and that's what this post will be about.

Multi-armed bandit - Upper Confidence Bound

· 6 min read

This post is part of my struggle with the book "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto. Other posts systematizing my knowledge and presenting the code I wrote can be found under the tag Sutton & Barto and in the repository dloranc/reinforcement-learning-an-introduction.


In multi-armed bandit we need exploration to find the best action because the value of each action is uncertain. The value of the action changes when we perform the action from time to time and learn about the reward we receive. The more often a given action is selected, the more certain we are that the value of this action is correct. So far, however, we have not taken this rather intuitive observation into account in our calculations. The actions were selected randomly, without taking into account whether the action values ​​were closest to the best one or how certain the estimates were.

Multi-armed bandit - optimistic initial values

· 3 min read

This post is part of my struggle with the book "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto. Other posts systematizing my knowledge and presenting the code I wrote can be found under the tag Sutton & Barto and in the repository dloranc/reinforcement-learning-an-introduction.


All the methods I have described so far depend on initial estimates of the value of Q1(a)Q_1(a). This is especially visible when we calculate MAB with ϵ=0\epsilon = 0, i.e. without exploration, still selecting the best possible action (arm). In statistics, we call such methods biased. The bias disappears for methods with α\alpha of 1n\frac{1}{n} when each action is selected at least once. For constant α\alpha, the bias does not disappear, it only decreases with time (subsequent iterations of the algorithm).

Torchcraft - analysis and changing the game state

· 6 min read

This post is about the Starcraft bot I am developing using machine learning. The project is being developed as part of the "Daj Się Poznać 2017" competition.


In the last post about the project I described how the maps look like and wrote how to create a basic script that connects to Starcraft and downloads the game state in a loop, or rather each subsequent logical frame games. I don't think I've written yet what a logical game frame is. The thing with logical frames is that graphics rendering is independent of calculations that change the game state. The frame rate is not constant and depends on the speed of your computer. The game state is calculated every interval. If you have played Starcraft, you probably know that you can set the game speed in the options. Changing the speed results in a change in the time between logical frame calculations. This is a fundamental difference, if we had constant 30 or 60 FPS, the issue would probably be solved differently.

Multi-armed bandit - non-stationary version

· 3 min read

This post is part of my struggle with the book "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto. Other posts systematizing my knowledge and presenting the code I wrote can be found under the tag Sutton & Barto and in the repository dloranc/reinforcement-learning-an-introduction.


Non-stationary problem

In this post, I will discuss a particular type of multi-armed bandit (MAB) problem, which consists in the fact that for each one-armed bandit, the value of the rewards changes over time. This is the so-called non-stationary version of MAB. Until now, the value of rewards was obtained from a certain normal distribution with a certain mean and variance (the mean for each arm was selected randomly at the beginning in the constructor).

TorchCraft - basic script

· 4 min read

This post is about the Starcraft bot I am developing using machine learning. The project is being developed as part of the "Daj Się Poznać 2017" competition.


This week I was going to write about Torch itself and how to create neural networks in it, but I decided that I would focus on the very basics of TorchCraft itself and its interaction with Starcraft. TorchCraft, unfortunately, has poor documentation and apart from the installation description, almost everything has to be figured out based on examples from the examples directory.

TorchCraft - Lua basics

· 5 min read

This post is about the Starcraft bot I am developing using machine learning. The project is being developed as part of the "Daj Się Poznać 2017" competition.


Recently I didn't have enough time to finish installing TorchCraft. After writing previous post, I remembered that in NTFS there is something called "junction point", which is something like symlinks in Windows. If BWEnv.exe requires Starcraft with the path C:\Starcraft, just download Junction, move it to *C:\Windows\System32 * and execute in the console: junction c:\Starcraft d:\Games\Starcraft. It worked for me. Well, okay, it didn't really work, because the game ran in full screen (the config says that it should be in a window), something was working there, but after the second attempt to run BWEnv.exe it didn't want to run anymore. A better option turned out to be to use BWEnv.dll and launch the game through Chaoslaucher.

Multi-armed bandit - simple optimization

· 4 min read

This post is part of my struggle with the book "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto. Other posts systematizing my knowledge and presenting the code I wrote can be found under the tag Sutton & Barto and in the repository dloranc/reinforcement-learning-an-introduction.


In last post I discussed the basic version of multi-armed bandit with ϵ\epsilon-greedy strategy. The presented algorithm has a small drawback, as it requires recording each reward and calculating the arithmetic mean of the rewards for a given action each time the best action is selected. Not only does the algorithm require memory for rewards, as many times as there are time steps, but each time it is necessary to choose the best action, a lot of unnecessary and quite time-consuming calculations take place. Let's imagine that we have to calculate the arithmetic mean of one million prizes. How long will it take? This can be solved better.