

Here we’ll explore just a few of the available options. Numpy’s random sampling module contains many methods for generating pseudo random numbers. We can use NumPy’s random sampling for this task. Let’s say we want to create four teams and evenly divide the workers between them. This is a great start! But, what if we want to assign each worker to a team? If we just use Faker we would have a huge number of teams with potentially only one worker per team.
FAKE DATA GENERATOR GENERATOR
Let’s initialize a faker generator and start making some data: # initialize a generator fake = Faker() #create some fake data print(fake.name()) print(fake.date_between(start_date='-30y', end_date='today')) print(lor_name()) > Bruce Clark > LimeGreenįaker also has a method to quickly generate a fake profile! print(fake.profile() > for x in range(10)] print(fake_workers)ĭictionary of Workers with Name and Hire Date
FAKE DATA GENERATOR INSTALL
Faker is self described as “a Python package that generates fake data for you.”įaker is available on PYPI and is easily installable with pip install faker. We can use the amazing package, Faker to get started. Let’s get started making our fake widget factory dataset! Faker

Making a widget consists of three steps, all of which are timed by the widget monitoring system. The factory monitors widget making productivity by counting the number of widgets made and how fast the workers can make them. Our widget factory has employees whose only job is to make widgets. Here we will create a dataset for an imaginary widget factory. Here we solve this problem, once and for all, by creating our own dataset! The Widget Factory Data that meets our needs may be proprietary, expensive, hard to collect, or simply may not exist.įinding a suitable dataset is the most common problem I face when wanting to try out a new library or technique - or beginning to write a new article. The first step in data analysis is finding data to analyze.Īll too often, this crucial first step is next to impossible.
