Skip to Content

Converting JSON Data Into Flat Structure Using Alteryx

Blog | May 13, 2018 | By Vijai Narasimha

JSON or JavaScript Object Notation is an open-standard text-based storage format. It is a light-weight format that is easy for machines to parse (built for data interchange). It is derived or it is a subset of the JavaScript programming language for representing simple data structures. The data is stored in Name-Value pairs. In many situations, JSON was and is used as an alternative to XML.

Some architects and developers prefer data in JSON format because of its simple syntax. Some developers owe its usage to performance. Some are in favor of its server-side language consumption. Some prefer JSON for its verbose statements. The reasons are many.

JSON files come in different formats. From a very simple format where fields and field members are arranged in a new line or fields arranged in a single line to huge files which contain metadata information and stored with Parent and Child fields (or nested JSON objects) or sometimes stores as huge arrays.

A couple of examples are shown here.

But sometimes to analyze the data or to perform some testing or manipulation, we might need this data in a tidy Two – Dimensional flat structure where the column headers are on the top followed by row-level values like a standard Excel or CSV format.

Assuming, I am reading the data in Tableau for visualization, Tableau will not be able to read all JSON formats. Also, there is no efficient or easy way to import JSON data directly into a text file or excel.

____________________________________________________________________________________________
_________________________________________________________________________________________________________________________

In this blog, we will take one of the JSON formats and convert it into flat structure using Alteryx. This blog consists of two parts. For Part A, we will use the JSON data in the.JSON format and PART B will be a .TXT format.

The file structure that we want to use looks like this:

Part A

Using the file in JSON format. The step by step approach is discussed here

1) We will start with the Input tool. The data file is brought in as a standard JSON file with the default settings. We can see here Alteryx puts the fields one below another in a single column called JSON_Name followed by a single column of values called JSON_ValueString.

This data set consists of 15 rows of information. Each row has 6 fields. Alteryx is reading from top to bottom. So, we have 90 records displayed.

2) The next step would be to isolate the field names and remove the unwanted 0, 1, 2… to the left hand side of the JSON_Name field. So we will use the Text To Columns tool to separate the field using “.” as the delimiter.

Now, we need to focus on only three columns JSON_Name1, JSON_Name2, and JSON_Value String. We can use a Select tool and remove unwanted columns.

3) The next step would be to Pivot the values using the Cross Tab tool. This tool helps to identify the Grouping field, in this case the JSON_Name1; then pick the field headers which is JSON_Name2 and finally the field values which is JSON_ValueString.

In this case the Aggregation method has to be selected. Concatenate, First and Last – all work the same in this example.

4) The last step would be to remove unwanted columns or perform any other clean-up if necessary and write the data into an output file. For now, we are only reading the data on the Browse tool.

The final workflow looks like this:
Part B

Using the file in TXT format. This can be solved in 2 methods.

Method 1

The shortest way would be to convert the text file into a JSON format and read it using the workflow that was built before. But we will explore some other options too. Reasons not to covert to .json can be many for this – ranging from data integrity to performance to ease of handling etc

1) We will start with the Input tool as a standard text file. But we have to pay some attention to the configuration and settings.

We need to bring the data as a delimited file. But there is no delimiter which kind of is tricky. Since the first row does not contain the headers, we need to uncheck the option.

Now the data in Alteryx still looks like the raw file.

2) We need to parse the data into JSON format using the JSON Parse tool.

Now this window looks similar to the one in PART A. So, we assume the same steps will work here. But not quite. If we look carefully, the index values in the JSON_Name are missing except for the first 6 entries. We need to create the ID to be used in pivoting. But we see that we have an ID field in the JSON_ValueString field. We shall leverage that to somehow achieve what we did in PART A.

3) We will use the Formula tool to convert the ID to number from the JSON_ValueString field.  Currently the data is in String format. When we convert everything to a number, only the ID values remain.

4) The next step would be to replicate the values from the previous row until all zeros are converted. For this step, we will use the Multi_Row Formula tool.

The Calculation is saying that wherever the ID is not 0, leave that value as is. Whenever the value is 0, pick the previous row (Row – 1) value from the newly created Pivot ID column.

Now we have everything we need. We can pivot it. But before the pivoting step, we need to remove the “0” and the “.” From the first 6 rows. We can do it in different ways. We can trim using Formula tool or we can using RegEx tool to replace those specific characters or we can use the Text To Columns tool like we used in PART A.

5) Here, we will use the Text To Columns followed by a formula tool to replace the existing zeros.

6) The next step would be to Pivot the values using the Cross Tab tool. The Grouping field, in this case is the Pivot ID; then pick the field headers which is JSON_Name1 and finally the field values which is JSON_ValueString.

In this case the Aggregation method has to be selected. Concatenate, First and Last – all work the same.

7) The last step would be to remove unwanted columns or perform any other clean-up if necessary and write the data into an output file.

The final workflow looks like this:

Method 2

In PART A and PART B – Method 1, both cases took support of the ID field. Now with a situation where the ID field is absent, this method will show the creation of an ID field that can be used for Pivoting.

Some of the initial steps are same.

1) We will start with the Input tool as a standard text file. But we have to pay some attention to the configuration and settings.

We need to bring the data as a delimited file. But there is no delimiter which kind of is tricky. Since the first row does not contain the headers, we need to uncheck the option.

2) We need to parse the data into JSON format within alteryx using the JSON Parse tool

3) The next step would be to create a new ID field that will be used for Pivoting. For this step, we will use the Tile tool which can identify buckets or segments in the data. We will split the data into equal segments.

Here we have to pay attention to Number of Tiles that are required. For this example, we have a total of 90 rows. And a group of 6 rows (6 fields that represent the data-id, first_name, last_name, email, gender, role) makes up one entry in the data set. It is 90 divided by 6 which equals to 15. Thus, we need 15 buckets of this data.

For this example, we need only the Tile_Num field. This will be used for pivoting. But, before pivoting, we need to remove the “0” and the “.” from the JSON_Name field.

4) In the previous Method, we used Text To Columns tool. Here we shall try the RegEx tool with the Replace method. The “0.” will be replaced by a blank. Not space. In this case, we don’t need anything, it will be blank.

5) Now for the Pivoting step. Using the Tile_Num field as the anchor, JSON_Name as headers and JSON_ValueString as values.

6) The last step would be to clean up.

The final workflow looks like this:

author image
Author
Vijai Narasimha
Back to top