Skip to Content

Latest File Refresh From Shared Folder in Tableau

Blog | June 9, 2019 | By Vijai Narasimha

Data reporting traditionally was done using local files Excel because of the small data volumes. As the data volumes grew in size, Excel could not handle the performance. Sometimes the memory on the PC would not support opening such huge reports. This led to the development of different database systems and their management software etc.

But… there are so many companies and teams that still rely on looking at data coming from local files that are extracts generated from other traditional systems like Business Objects, Crystal Reports, Cognos etc. or even Tableau exports for that matter. Since the world is moving into the visual reporting space, many teams which perform daily, weekly, quarterly Budget Analysis; daily reporting of Trades; weekly study of Risk compliance etc. rely on some Self Service BI tools to pick these local files from a certain location and create Dashboards.

Would it not be easier to dump the data into a database and then read from it? Yes that would be the ideal approach, but many teams rely on tactical solutions at it suits that situation!

This blog is one of the use case examples that we solve on a regular basis using Tableau. This will be one of the shorter and simpler blogs that explains one of the most widely requested features. Let us look at the requirements.

1) The Marketing team generates a Monthly CSV file from a CRM like Zoho or Salesforce. The file has details about all the projects, resources allocated, events and other marketing related data.

2) Every month the generated file is automatically placed in a Shared folder (Accessible location within the network).

3) The file will be named Marketing_Report_MM_YYYY. (The name of the file is important for the search. The two digit month MM and the two or four digit year YYYY are needed for Tableau as the report itself does not have a date field. We need some identifier for us to use in Tableau. All these will be more evident when we go into the steps)

4) The file has to be picked up by Tableau from this folder and the dashboard should display only the entries for the latest month i.e every month the Dashboard should get refreshed based on the latest file. (This is where the Date identifier comes into place)

There are some caveats or points of caution here.

a) File format should not change. Eg Excel to CSV, CSV to PDF

b) Fields cannot be renamed.

c) File names should be consistent with a new identifier every time the file is generated.

d) Tableau Server should have access to the shared folder for the refresh to work seamlessly

e) If there are more files of the same name, the Dashboard will display duplicate values.

f) Over a period of time, there might be many files in the folder which might impact Tableau performance. So, a timely clean-up/archival of old files is needed.

g) Extract Refresh on the Server works better than the Live connection.

Before we go into the steps, we need some data. For this example, we have 6 files Jan to June, one for each month. We will start with 4 files and add more files to test the Tableau refreshes.

The first 4 files, Jan to Apr are placed in the shared folder called Marketing Documents. There are some sub folders too. Also, we are very particular about the naming convention for each of the files.

1) Let us start by connecting to one of the files as a Text file.

2) Since we are connecting to the folder, we can see all the CSV files. Tableau by default will connect to one file. We will convert it into a Union with a Wildcard search. This is where the name of the file comes into play. For this we will go with the *Report* or we can make it very specific *Marketing_Report*. The * looks for any character match, either before the word or after the word. The data connection is now renamed to Union.

We can also see that Tableau automatically identified the path of the file. It is the last field in the Preview.

3) We will create a very basic cross tab. We can go for a fancy dashboard, but we are only testing the refresh process.

In this visualization, we have all the 4 months of info. But we need to narrow down to the latest month.

4) We need to create a Calculated Date. This is most laborious process. That is the reason, we need a Date field as an identifier in the data itself. Since the data is an output of another tool, the program has to be tweaked to provide that field. But in reality many programs write files with a date indicator appended to the name either as a suffix or prefix as in this case. It is a trade off, file with a Date field or create a new field using the file name.

We will use the Path field, cut into pieces and use it for the Date Calc. There are multiple blog posts that show how to create a date from a string field. We will quickly go though all the steps. There are different ways to do it. We can use split function or use the left and right function or a combination of all. The disadvantage is that since this is a string Calc at row level, the performance can get impacted.

  • First, we will use Custom split at ‘.’ (dot) to remove the csv file type at the end.
  • Next, we will use the Right function to isolate MM_YYYY from the Path – Split 1 field. We will select the last 7 characters.
  • Next, we can pick the MM_YYYY into two parts using the Split function at ‘_’ (underscore) or use the Left and Right function.
  • Now, we have the Month and Year. We can use these fields to create our Date field. All the fields are string, we need to convert them to number. Since the Day field is missing, we are using 01 as an arbitrary value.

** All these calculations can be achieved in a single step. Splitting into steps will help in debugging.

5) Now that we have the date (filter) field ready, we can use it as the latest Date in our Dashboard or the visualization that we created. We can also make the filter for current month. But in this case, we are looking for the latest file.

Since the latest file in the shared folder is April, we can see the April dates. The filter can be also used as a Data Source filter.

6) Now, we can test with other files. Let us add the May file and refresh the Tableau data source.

Let us add the June file and refresh again.

One more thing to consider is that the Tableau workbook cannot be packaged which will lock the files and not refresh the new files. Same with the Tableau Server, the published workbook should not include External files.

author image
Author
Vijai Narasimha
Back to top