Skip to main content

3 posts tagged with "scikit-learn"

Posts related to the scikit-learn library.

View All Tags

OneHotEncoder

· 2 min read

This post is a continuation of a previous post about LabelEncoder. This time it will be about a technique called one hot encoding or one-hot. Having categories converted into corresponding numbers, we can also convert them into several columns (the number of columns depends on how many categories there are), which contain zeros and ones, respectively, denoting whether a row belongs to a category or not. We use this method when we use an algorithm that may have a problem with numeric variables (because they assume some order).

LabelEncoder

· One min read

Czasami, przetwarzając zbiór danych, mamy do czynienia ze zmiennymi, które są typu tekstowego i przyporządkowują obserwację statystyczną do jakiejś kategorii. Przykładowo, mamy do czynienia z uczniami pewnej szkoły, którzy chodzą do różnych klas (1A, 1B, 1C, 2A, 2B, 2C itd.). Chcemy takie zmienne zamienić na liczby w celu ich dalszego przetwarzania przez jakiś wybrany algorytm np. random forest. Można do tego użyć klasy LabelEncoder z biblioteki scikit-learn.

Sometimes, when processing a data set, we deal with text-type variables that assign a statistical observation to a category. For example, we are dealing with students of a certain school who go to different classes (1A, 1B, 1C, 2A, 2B, 2C, etc.). We want to convert such variables into numbers for further processing by a selected algorithm, e.g. random forest. You can use the LabelEncoder class from the scikit-learn library for this.

KFold i StratifiedKFold

· 2 min read

As I am still quite a beginner in the world of Python and its libraries, especially those related to machine learning, many things are unknown to me. An example of this is dividing the data set into parts for training and cross validation. So far, I've done it my own way, but why do it this way when we have other tools for it?