General OverviewHow to Generate Fake Data with Faker PythonInstall Faker and its dependenciesImport Faker and other modulesDefine functions to generate fake data for each entityGenerate and Save the fake data to JSON filesAbout AuthorsSai Manasa IvaturiSrinivas vaddi
Generating fake data is a common task for developers and testers who need to populate their databases, applications, or documents with realistic but not real data. Fake data can help to test the functionality, performance, and security of the software without exposing sensitive information or violating privacy regulations.
However, not all fake data are created equal. Some data are more complex than others and require more than just random strings or numbers to be convincing. For example, if you need to generate fake names, addresses, phone numbers, or emails, you need to make sure that they follow the format and conventions of the country or region you are targeting. Otherwise, you might end up with data that looks inaccurate or invalid.
Generating fake data manually can be tedious and error-prone. Fortunately, that is where Faker comes in handy, a Python package that can help you create realistic and diverse fake data easily and quickly. It is called Faker. Faker is a Python package that generates fake data for you. It supports multiple languages and locales and provides a variety of providers that can produce different types of data, such as person, address, datetime, internet, etc. Read more about its Python3 implementation here and its PyPI package here.
General Overview
Faker is easy to use and install. To use Faker, run the below pip install command to install it:
TheTechCruise.com Pyodide Terminal
pip install Faker
The next step would be to import it and create a Faker object:
TheTechCruise.com Pyodide Terminal
from faker import Faker
fake = Faker()
To access the properties of the Faker object and to generate data by type. For example:
TheTechCruise.com Pyodide Terminal
fake.name() # 'Lucy Cechtelar'
fake.address() # '426 Jordy Lodge # Cartwrightshire, SC 88120-6700'
fake.email() # '[email protected]'
Each call to a property will yield a different (random) result. It is also possible to specify the locale or language to use by passing it as an argument to the Faker constructor. For example:
TheTechCruise.com Pyodide Terminal
fake = Faker('fr_FR') # create a French faker
fake.name() # 'Thibault Martin'
fake.address() # "25 Rue de l'Abbé Groult\n75015 Paris"
fake.email() # '[email protected]'
Faker supports many locales and languages, so you can generate data that suit your needs.
Faker also has some advanced features that can help you customize and control the generation of fake data. For example, you can use the
seed()
method to set a seed for the random generator, so that you can get consistent results across multiple runs or tests. You can also create your own providers or extend the existing ones to add more functionality or data types1.Faker is a powerful and flexible tool that can save you time and effort when you need to generate fake data for your projects. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you.
How to Generate Fake Data with Faker Python
In this section, we will try to understand how we can use Faker to generate fake data for a simple inventory management system. The system consists of four main entities: suppliers, products, orders, and shipments. Each entity has its own attributes and relationships with other entities. For example, a supplier can supply multiple products, a product can belong to multiple orders, an order can contain multiple shipments, and a shipment can have multiple inventory units.
To generate fake data for this system, we will use the following steps:
- Install Faker and its dependencies
- Import Faker and other modules
- Define functions to generate fake data for each entity
- Save the fake data to JSON files
Install Faker and its dependencies
As discussed above to install Faker, you can use pip:
TheTechCruise.com Pyodide Terminal
pip install Faker
Additionally, we will use another package called faker-vehicle which provides vehicle-related data for Faker. To install it, you can use pip as well:
TheTechCruise.com Pyodide Terminal
pip install faker-vehicle
Import Faker and other modules
After installing the packages, we need to import them in our Python script. We also need to import some other modules, such as random and json, which we will use later.
TheTechCruise.com Pyodide Terminal
# Import the Faker package, the random module, and the json module
from faker import Faker
import random
import json
# Import the VehicleProvider module from faker_vehicle package
from faker_vehicle import VehicleProvider
# Add the VehicleProvider to the Faker object to access vehicle-related data
fake = Faker()
fake.add_provider(VehicleProvider)
We create a Faker object with the default locale (en_US), and then add the VehicleProvider to it. This way, we can access vehicle-related data from the fake object.
Define functions to generate fake data for each entity
Next, we need to define functions to generate fake data for each entity in our system. Each function will return a dictionary that represents an instance of the entity. The dictionary will contain the attributes and values of the entity.
For example, here is the function to generate a fake supplier:
TheTechCruise.com Pyodide Terminal
# Define a function to generate a fake supplier dictionary
def generate_supplier():
supplier = {
# Generate a random UUID for the supplier ID
"_id": fake.uuid4(),
# Generate a random company name for the supplier name
"name": fake.company(),
# Generate a random sentence for the supplier description
"description": fake.sentence(),
# Generate a random name for the contact person
"contact_name": fake.name(),
# Generate a random email for the contact email
"contact_email": fake.email(),
# Generate a random phone number for the contact phone
"contact_phone": fake.phone_number(),
# Generate a random address for the supplier address
"address": fake.address(),
# Generate a random ISO 8601 date and time for the last updated field
"last_updated": fake.iso8601(),
}
# Return the supplier dictionary
return supplier
We use various methods of the fake object to generate different types of data. For example, we use
fake.uuid4()
to generate a random UUID4, fake.company()
to generate a random company name, fake.sentence()
to generate a random sentence, etc.Similarly, we can define functions to generate fake data for products, orders, shipments, and inventory units. Here are some examples:
TheTechCruise.com Pyodide Terminal
# Define a function to generate a fake product dictionary
def generate_product():
product = {
# Generate a random UUID for the product ID
"_id": fake.uuid4(),
# Generate a random vehicle make and model for the product name
"name": fake.vehicle_make_model(),
# Generate a random vehicle year, make, and model for the product description
"description": fake.vehicle_year_make_model(),
# Generate a random ISO 8601 date and time for the last updated field
"last_updated": fake.iso8601(),
# Generate a random email for the created by field
"created_by": fake.email(),
# Initialize an empty list for the related subproducts field
"related_subproducts": [],
# Use a placeholder URL for the image field
"image": "http://myimage.com"
}
# Return the product dictionary
return product
# Define a function to generate a fake order dictionary, given a list of shipment IDs as a parameter
def generate_order(shipments):
order = {
# Generate a random UUID for the order ID
"_id": fake.uuid4(),
# Generate a random catch phrase for the order name
"name": fake.catch_phrase(),
# Generate a random sentence for the order description
"description": fake.sentence(),
# Generate a random ISO 8601 date and time for the last updated field
"last_updated": fake.iso8601(),
# Choose a random status from a list of options for the status field
"status": random.choice(["pending", "processing", "completed", "cancelled"]),
# Use the shipments parameter for the related shipments field
"related_shipments": shipments,
# Generate a random integer between 1 and 10 for the priority field
"priority": random.randint(1, 10),
# Generate a random email for the created by field
"created_by": fake.email()
}
# Return the order dictionary
return order
# Define a function to generate a fake shipment dictionary, given a supplier ID, a product ID, and an order ID as parameters
def generate_shipment(supplier_id, product_id, order_id):
shipment = {
# Generate a random UUID for the shipment ID
"_id": fake.uuid4(),
# Generate a random machine make and model for the shipment name
"name": fake.machine_make_model(),
# Generate a random machine year, make, model, and category for the shipment description
"description": fake.machine_year_make_model_cat(),
# Generate a random ISO 8601 date and time for the last updated field
"last_updated": fake.iso8601(),
# Choose a random status from a list of options for the status field
"status": random.choice(["pending", "shipped", "delivered", "returned"]),
...
Generate and Save the fake data to JSON files
After defining the functions to generate fake data, we can use them to create some sample data and save them to JSON files. We will use the json module to dump the data to files with proper indentation.
For example, here is how we can generate and save some suppliers:
TheTechCruise.com Pyodide Terminal
# Generate suppliers
num_suppliers = 50
suppliers = [generate_supplier() for _ in range(num_suppliers)]
save_to_json(suppliers, "mock_suppliers.json")
# Define a function to save data to JSON files
def save_to_json(data, filename):
with open(filename, "w") as jsonfile:
json.dump(data, jsonfile, indent=4)
We use list comprehension to create a list of suppliers by calling the
generate_supplier()
function 50 times. Then we use the save_to_json()
function to save the list to a file called “mock_suppliers.json”.We can do the same for products, orders, shipments, and inventory units. However, we need to consider the relationships between them. For example, when we generate shipments, we need to assign them to existing suppliers, products, and orders. We can do this by choosing randomly from the lists of suppliers, products, and orders that we have already generated.
Here is an example of how we can generate and save some shipments:
TheTechCruise.com Pyodide Terminal
# Generate shipments and inventory units
shipments = []
inventory_units = []
num_shipments = 2000
for i in range(num_shipments):
# Choose a random supplier ID from the suppliers list
supplier_id = random.choice(suppliers)["_id"]
# Choose a random product ID from the products list
product_id = random.choice(products)["_id"]
# Choose a random order ID from the orders list
order_id = random.choice(orders)["_id"]
# Generate a shipment with these IDs as parameters
product_shipment = generate_shipment(supplier_id, product_id,
Full code example can be found here → https://github.com/vaddisrinivas/blog-repos/tree/main/faker
About Authors
Sai Manasa Ivaturi
I'm a Software Development Engineer based in Atlanta, Georgia with 5+ years of experience in the software industry. My focus area has been Backend development and full-stack development.
View my Resume here.
Masters Degree in Computer Science Indiana University, Bloomington
Jan 22 - May 23
Bachelors Degree in Computer Science Pragati Engineering College, India
Aug 14 - April 18
Certifications and badges
Srinivas vaddi
Hi! I’m a recent master’s graduate from Indiana University Bloomington (IUB) 🎓 and a Software Development Engineer with 4+ years of experience. Looking for #jobs!
My areas of expertise are Software Development, DevOps, Testing, Integration, Data Engineering and Data Analytics. Mostly worked on Python, Django/Flask, Apache Airflow, Apache Spark, AWS, and DevOps. I have a versatile background & a ‘can do’ attitude 🤓.
I like blogging and sharing knowledge. I’ve built a server at home from scratch! I used it to learn various technologies and to contribute to the open-source. I love tech, philosophy, literature, and history. My favorite books 📚 of all time are ‘The Alchemist’ and ‘Chanakya Neeti’ 🙌.
Masters Degree in Computer Science Indiana University, Bloomington Aug 23, 2021 → Dec 17, 2022
Bachelors Degree in Computer Science Gitam University (Deemed to be) Jun 1, 2014 → Apr 1, 2018