Write/Convert Nested JSON data to CSV for specific/subset keys(headers)

Currently, I am working on a Machine Learning project with my colleagues where we don’t have much data to train the model so we scrapped data from multiple places and kept in JSON format because sometimes we get data for some fields or keys and sometimes we don’t.

But when we use it for training we need to put them in table structure or CSV format. By doing it we also will know about which features we have data and for which we don’t. After that, we can analyze the data and then we can impute where we can and drop features that we can’t impute.

We can easily write JSON data to CSV file if JSON is flat structured and we know all the keys. The code is simple for this. Load each JSON so that it will become a dictionary object then we can put it in the list after that using Dictwriter in CSV module we can write it to CSV file but we have 3 problems here
1. Nested JSON structure
2. All JSONs don’t have the same structure. Some keys commonly available in each JSON and some keys present in some JSONs and absent in some JSONs.
3. We don’t know the unique keys list.

After some googling found answer to the above problems here. Using that wrote code according to my requirement. For example purpose, I am going to create the JSON dataset which is somewhat similar to my problem for that I used the JSON Generator tool.

JSON looks like this

[
    {
      "_id": "5e7f55e2f065ef934a048d32",
      "index": 0,
      "guid": "88647c07-bf2a-42c3-8e23-cec1ed731f2f",
      "isActive": false,
      "balance": "$1,254.74",
      "picture": "http://placehold.it/32x32",
      "age": 40,
      "eyeColor": "green",
      "name": "Laverne Robles",
      "gender": "female",
      "company": "COMVEY",
      "email": "lavernerobles@comvey.com",
      "phone": "+1 (898) 413-3041",
      "address": "783 Livingston Street, Hillsboro, District Of Columbia, 1897",
      "about": "Anim ut sunt consectetur irure irure sit consectetur et laboris minim. Et tempor excepteur aute consequat in sunt aliquip in. Proident non et reprehenderit consequat ea adipisicing. Nisi excepteur Lorem irure pariatur fugiat occaecat labore officia. Quis consectetur commodo velit aliquip cupidatat et esse.\r\n",
      "registered": "2019-04-13T02:41:28 -06:-30",
      "latitude": 85.172936,
      "longitude": 40.103047,
      "tags": [
        "anim",
        "cupidatat",
        "excepteur",
        "ea",
        "do",
        "ipsum",
        "consectetur"
      ],
      "friends": [
        {
          "1": "Dawn Lynch"
        },
        {
          "2": "Rhodes Pacheco"
        }
      ],
      "greeting": "Hello, Laverne Robles! You have 4 unread messages.",
      "favoriteFruit": "banana"
    },
.....
]

Code to convert this JSON Data to CSV

This code is pretty much self-explanatory. Basically what we are doing here is loading the JSON then iterating it one by one then converting each object(dictionary) from nested structure to flat structure then extracting unique keys after that writing it to CSV

Some times we want to write only specific keys data to CSV if that is the case then we just need to create a list with keys which we want to and use that list as CSV DictWriter fieldnames and in addition to it, we need to use extrasaction=”ignore” because if the dictionary has keys more than we mentioned in fieldnames it will give a ValueError.

You will find code for this case below

Here we tried to extract data for keys age, balance, company, eyeColor, favoriteFruit, friends_1, friends_2, friends_3, gender, greeting, isActive, latitude, longitude, registered, tags

Peace. Happy Coding.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *