Generating Fake data with Faker
Generating Fake data with Faker

Generating Fake data with Faker

Tags
Python
Faker
Software Development
Testing
App Dev
Projects
Research
Parent item
Sub-item
Generating fake data is a common task for developers and testers who need to populate their databases, applications, or documents with realistic but not real data. Fake data can help to test the functionality, performance, and security of the software without exposing sensitive information or violating privacy regulations.
 
However, not all fake data are created equal. Some data are more complex than others and require more than just random strings or numbers to be convincing. For example, if you need to generate fake names, addresses, phone numbers, or emails, you need to make sure that they follow the format and conventions of the country or region you are targeting. Otherwise, you might end up with data that looks inaccurate or invalid.
Generating fake data manually can be tedious and error-prone. Fortunately, that is where Faker comes in handy, a Python package that can help you create realistic and diverse fake data easily and quickly. It is called Faker. Faker is a Python package that generates fake data for you. It supports multiple languages and locales and provides a variety of providers that can produce different types of data, such as person, address, datetime, internet, etc. Read more about its Python3 implementation here and its PyPI package here.

General Overview

Faker is easy to use and install. To use Faker, run the below pip install command to install it:
Idea IconTheTechCruise.com Pyodide Terminal
pip install Faker
The next step would be to import it and create a Faker object:
Idea IconTheTechCruise.com Pyodide Terminal
from faker import Faker
fake = Faker()
To access the properties of the Faker object and to generate data by type. For example:
Idea IconTheTechCruise.com Pyodide Terminal
fake.name() # 'Lucy Cechtelar'
fake.address() # '426 Jordy Lodge # Cartwrightshire, SC 88120-6700'
fake.email() # '[email protected]'
Each call to a property will yield a different (random) result. It is also possible to specify the locale or language to use by passing it as an argument to the Faker constructor. For example:
Idea IconTheTechCruise.com Pyodide Terminal
fake = Faker('fr_FR') # create a French faker
fake.name() # 'Thibault Martin'
fake.address() # "25 Rue de l'Abbé Groult\n75015 Paris"
fake.email() # '[email protected]'
Faker supports many locales and languages, so you can generate data that suit your needs.
Faker also has some advanced features that can help you customize and control the generation of fake data. For example, you can use the seed() method to set a seed for the random generator, so that you can get consistent results across multiple runs or tests. You can also create your own providers or extend the existing ones to add more functionality or data types1.
Faker is a powerful and flexible tool that can save you time and effort when you need to generate fake data for your projects. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you.

How to Generate Fake Data with Faker Python

In this section, we will try to understand how we can use Faker to generate fake data for a simple inventory management system. The system consists of four main entities: suppliers, products, orders, and shipments. Each entity has its own attributes and relationships with other entities. For example, a supplier can supply multiple products, a product can belong to multiple orders, an order can contain multiple shipments, and a shipment can have multiple inventory units.
To generate fake data for this system, we will use the following steps:
  • Install Faker and its dependencies
  • Import Faker and other modules
  • Define functions to generate fake data for each entity
  • Save the fake data to JSON files

Install Faker and its dependencies

As discussed above to install Faker, you can use pip:
Idea IconTheTechCruise.com Pyodide Terminal
pip install Faker
Additionally, we will use another package called faker-vehicle which provides vehicle-related data for Faker. To install it, you can use pip as well:
Idea IconTheTechCruise.com Pyodide Terminal
pip install faker-vehicle

Import Faker and other modules

After installing the packages, we need to import them in our Python script. We also need to import some other modules, such as random and json, which we will use later.
Idea IconTheTechCruise.com Pyodide Terminal
# Import the Faker package, the random module, and the json module
from faker import Faker
import random
import json

# Import the VehicleProvider module from faker_vehicle package
from faker_vehicle import VehicleProvider

# Add the VehicleProvider to the Faker object to access vehicle-related data
fake = Faker()
fake.add_provider(VehicleProvider)
We create a Faker object with the default locale (en_US), and then add the VehicleProvider to it. This way, we can access vehicle-related data from the fake object.

Define functions to generate fake data for each entity

Next, we need to define functions to generate fake data for each entity in our system. Each function will return a dictionary that represents an instance of the entity. The dictionary will contain the attributes and values of the entity.
For example, here is the function to generate a fake supplier:
Idea IconTheTechCruise.com Pyodide Terminal
# Define a function to generate a fake supplier dictionary
def generate_supplier():
    supplier = {
        # Generate a random UUID for the supplier ID
        "_id": fake.uuid4(),
        # Generate a random company name for the supplier name
        "name": fake.company(),
        # Generate a random sentence for the supplier description
        "description": fake.sentence(),
        # Generate a random name for the contact person
        "contact_name": fake.name(),
        # Generate a random email for the contact email
        "contact_email": fake.email(),
        # Generate a random phone number for the contact phone
        "contact_phone": fake.phone_number(),
        # Generate a random address for the supplier address
        "address": fake.address(),
        # Generate a random ISO 8601 date and time for the last updated field
         "last_updated": fake.iso8601(),

    }
    # Return the supplier dictionary
    return supplier
We use various methods of the fake object to generate different types of data. For example, we use fake.uuid4() to generate a random UUID4, fake.company() to generate a random company name, fake.sentence() to generate a random sentence, etc.
Similarly, we can define functions to generate fake data for products, orders, shipments, and inventory units. Here are some examples:
Idea IconTheTechCruise.com Pyodide Terminal
# Define a function to generate a fake product dictionary
def generate_product():
    product = {
        # Generate a random UUID for the product ID
        "_id": fake.uuid4(),
        # Generate a random vehicle make and model for the product name
        "name": fake.vehicle_make_model(),
        # Generate a random vehicle year, make, and model for the product description
        "description": fake.vehicle_year_make_model(),
        # Generate a random ISO 8601 date and time for the last updated field
        "last_updated": fake.iso8601(),
        # Generate a random email for the created by field
        "created_by": fake.email(),
        # Initialize an empty list for the related subproducts field
        "related_subproducts": [],
        # Use a placeholder URL for the image field
        "image": "http://myimage.com"
    }
    # Return the product dictionary
    return product

# Define a function to generate a fake order dictionary, given a list of shipment IDs as a parameter
def generate_order(shipments):
    order = {
        # Generate a random UUID for the order ID
        "_id": fake.uuid4(),
        # Generate a random catch phrase for the order name
        "name": fake.catch_phrase(),
        # Generate a random sentence for the order description
        "description": fake.sentence(),
        # Generate a random ISO 8601 date and time for the last updated field
        "last_updated": fake.iso8601(),
        # Choose a random status from a list of options for the status field
        "status": random.choice(["pending", "processing", "completed", "cancelled"]),
        # Use the shipments parameter for the related shipments field
        "related_shipments": shipments,
        # Generate a random integer between 1 and 10 for the priority field
        "priority": random.randint(1, 10),
        # Generate a random email for the created by field
        "created_by": fake.email()
    }
    # Return the order dictionary
    return order

# Define a function to generate a fake shipment dictionary, given a supplier ID, a product ID, and an order ID as parameters
def generate_shipment(supplier_id, product_id, order_id):
    shipment = {
        # Generate a random UUID for the shipment ID
        "_id": fake.uuid4(),
        # Generate a random machine make and model for the shipment name
        "name": fake.machine_make_model(),
        # Generate a random machine year, make, model, and category for the shipment description
        "description": fake.machine_year_make_model_cat(),
        # Generate a random ISO 8601 date and time for the last updated field
        "last_updated": fake.iso8601(),
        # Choose a random status from a list of options for the status field
        "status": random.choice(["pending", "shipped", "delivered", "returned"]),
         ...

Generate and Save the fake data to JSON files

After defining the functions to generate fake data, we can use them to create some sample data and save them to JSON files. We will use the json module to dump the data to files with proper indentation.
For example, here is how we can generate and save some suppliers:
Idea IconTheTechCruise.com Pyodide Terminal
# Generate suppliers
num_suppliers = 50
suppliers = [generate_supplier() for _ in range(num_suppliers)]
save_to_json(suppliers, "mock_suppliers.json")

# Define a function to save data to JSON files
def save_to_json(data, filename):
    with open(filename, "w") as jsonfile:
        json.dump(data, jsonfile, indent=4)
We use list comprehension to create a list of suppliers by calling the generate_supplier() function 50 times. Then we use the save_to_json() function to save the list to a file called “mock_suppliers.json”.
We can do the same for products, orders, shipments, and inventory units. However, we need to consider the relationships between them. For example, when we generate shipments, we need to assign them to existing suppliers, products, and orders. We can do this by choosing randomly from the lists of suppliers, products, and orders that we have already generated.
Here is an example of how we can generate and save some shipments:
Idea IconTheTechCruise.com Pyodide Terminal
# Generate shipments and inventory units
shipments = []
inventory_units = []
num_shipments = 2000

for i in range(num_shipments):
    # Choose a random supplier ID from the suppliers list
    supplier_id = random.choice(suppliers)["_id"]
    # Choose a random product ID from the products list
    product_id = random.choice(products)["_id"]
    # Choose a random order ID from the orders list
    order_id = random.choice(orders)["_id"]
    # Generate a shipment with these IDs as parameters
    product_shipment = generate_shipment(supplier_id, product_id,

About Authors

Sai Manasa Ivaturi

👉🏻GitHub
👉🏻GitHub
👉🏻Medium
👉🏻Medium

 👉🏻Email
👉🏻Email
👉🏻Linkedin
👉🏻Linkedin

I'm a Software Development Engineer based in Atlanta, Georgia with 5+ years of experience in the software industry. My focus area has been Backend development and full-stack development.
I'm a Software Development Engineer based in Atlanta, Georgia with 5+ years of experience in the software industry. My focus area has been Backend development and full-stack development.
 View my Resume here.
View my Resume here.
Masters Degree in Computer Science Indiana University, Bloomington
Jan 22 - May 23
Masters Degree in Computer Science Indiana University, Bloomington Jan 22 - May 23
Bachelors Degree in Computer Science  Pragati Engineering College, India
Aug 14 - April 18
Bachelors Degree in Computer Science Pragati Engineering College, India Aug 14 - April 18
Certifications and badges
Certifications and badges
👉🏻Verify
👉🏻Verify
👉🏻Verify
👉🏻Verify
👉🏻Verify
👉🏻Verify

Srinivas vaddi



Hi! I’m a recent master’s graduate from Indiana University Bloomington (IUB) 🎓 and a Software Development Engineer with 4+ years of experience. Looking for #jobs!
Hi! I’m a recent master’s graduate from Indiana University Bloomington (IUB) 🎓 and a Software Development Engineer with 4+ years of experience. Looking for #jobs!
My areas of expertise are Software Development, DevOps, Testing, Integration, Data Engineering and Data Analytics. Mostly worked on Python, Django/Flask, Apache Airflow, Apache Spark, AWS, and DevOps. I have a versatile background & a ‘can do’ attitude 🤓.
My areas of expertise are Software Development, DevOps, Testing, Integration, Data Engineering and Data Analytics. Mostly worked on Python, Django/Flask, Apache Airflow, Apache Spark, AWS, and DevOps. I have a versatile background & a ‘can do’ attitude 🤓.
👨🏻‍💻
I like blogging and sharing knowledge. I’ve built a server at home from scratch! I used it to learn various technologies and to contribute to the open-source. I love tech, philosophy, literature, and history. My favorite books 📚 of all time are ‘The Alchemist’ and ‘Chanakya Neeti’ 🙌.
Masters Degree in Computer Science  Indiana University, Bloomington
Masters Degree in Computer Science Indiana University, Bloomington Aug 23, 2021 → Dec 17, 2022
Bachelors Degree in Computer Science  Gitam University (Deemed to be)
Bachelors Degree in Computer Science Gitam University (Deemed to be) Jun 1, 2014 → Apr 1, 2018
Certifications and badges
Certifications and badges
 
notion image
👆 verify
 
 
notion image
👆 verify
 
 
 
notion image
👆 verify
notion image
👆 verify
notion image
  👆 verify
 
Buy us a coffeeBuy us a coffee