Normalising Multivalued Fields in SQL (Salesforce Pick Lists)

Salesforce’s storage of picklist values is a bit of a compromise of good data modelling and something of a headache if you want to re-normalise the data elsewhere. Where a user can select more than one item in a pick list field on an object the selections are stored as a semi-colon separated list of items on the object record – so…

1value 1;value 4;value 12
2value 2


Normalising this kind of non-atomic field is typically a hassle in SQL. You can write a UDF to extract the field values iteratively or recursively. But UDFs tend to be pretty slow (and recursion is limited to 32 calls). Alternatively you could implement a CLR function which will be much faster but then you’ve got to develop, deploy and support it. The following solution should be reliable, fast, doesn’t require any procedural code and can be applied to any CSV like field.

In this kind of problem often the first thing to do is get the right number of rows out. We need one row for each value in the multivalued picklist field. So first off we need to work out how many values are in each field, if we take out the delimiters and compare with the length of the original field this will give the number of entries (if we add 1).

len(mvpl_field) - len(replace(mvpl_field, ';', '')) + 1


Add this into a query on the object with the multivalued picklist values and the object id.

select	id
,	mvpl_field
,	len(mvpl_field) - len(replace(mvpl_field, ';', '')) + 1 number_entries
from	mvpl_object


We can now cross join to an integers table to get one row per value in the picklist duplicating the picklist data.

select	id
,	mvpl_field
,	number_entries
,	number
from 	(

	select	id
	,	mvpl_field
	,	len(mvpl_field) - len(replace(mvpl_field, ';', '')) + 1 number_entries
	from	mvpl_object

	) sub

	join integers	 i 
	on i.number <= sub.number_entries;


I’ll encapsulate the above query in view called mvpl_object_v. Now we just need to get value 1 in row 1, value 2 in row 2 and so on. A recursive CTE will provide a tidy way of getting each value. The anchor members of the CTE will be the first value in each list – where number = 1. The column val will strip out the first value and remainder will hold what’s left.

select	id
,	number_entries
,	number

,	case 
		--case when only one value in the field
		when charindex(';', mvpl_field) = 0 then mvpl_field 
		--otherwise get the first value in the field
else left(mvpl_field, charindex(';', mvpl_field)-1) 
end val

	--remove the first value and leave the remaining values
,	right(mvpl_field, len(mvpl_field) - charindex(';', mvpl_field)) remainder

from	mvpl_object_v 
where	number = 1


The recursive portion will then strip out the first value in remainder and for the next value in the field and generate a new remainder removing this item.

union all

,	o.number_entries
,	o.number

,	case 
		when charindex(';', c.remainder) = 0 then c.remainder 
		else left(c.remainder, charindex(';', c.remainder)-1) 
	end val

,	right(c.remainder, len(c.remainder) - charindex(';', c.remainder)) remainder

from	mvpl_object_v o 
	join cte c 
		on o.number - 1 = c.number 
		and =
where	o.number != 1


Then select the object ID, the sequence number of the value and picklist values from the CTE.

select	id
,	number seq
,	val picklist_value
from	cte


Again I’ll put this CTE code into a view called mvpl_object_picklist_values_v. Let’s try this out. I’ll create an object table with two records with a multivalued field with a few semicolon separated values.

create table mvpl_object
	id int
,	mvpl_field varchar(max)

insert into mvpl_object values (1, 'value 1;value 4;value 12')
insert into mvpl_object values (2, 'value 2')


Selecting from mvpl_object_v gives:

1value 1;value 4;value 1231
1value 1;value 4;value 1232
1value 1;value 4;value 1233
2value 211


And selecting from mvpl_object_picklist_values_v gives:

11value 1
12value 4
13value 12
21value 2

So it works perfectly! In a follow up I’ll show how you can track changes to the selected values over time.