Logo Home    Page list

Place Names - parse and recombine #places #instr

Table of Contents

Raison d'etre
SQLite now has the INSTR function!
Sample results of parsed Place names
Where are the commas? query
Parse the Standardized Place names query
Update Place Abbreviations with two parts of the Standardized Place name
Update the working Place name with three parts
Further ideas

Raison d'etre

This page responds to a request from vyger seeking a way to parse the Standardized Place name using SQLite akin to what might be done in Visual Basic with the InStr function:
...
With the RM geocoded table the Normalized is generally a 4 component field delimited by commas,
what I would like is set the Name to the substring left of the third comma and the Abbrev to
the substring left of the second comma...

SQLite now has the INSTR function!

As of SQLite 3.7.15 dated 2012-12-12, the INSTR(X,Y) function was finally included. Until that time, I do not think it would have been possible to parse a comma delimited string in SQLite without using a higher level language, either to extend SQLite or to use SQLite as a data source. Now it has become feasible with those SQLite managers that have incorporated SQLite 3.7.15 or later. Unfortunately, as of this writing, SQLiteSpy has not been updated since 2011 and thus does not support the INSTR function; a new version that will has been promised but no timeline given. To carry out the development of a suitable query, I was fortunate to find that SharpPlus SQLite Developer has undergone recent revision and therefore does support it. So does SQLite Expert but only the paid version of SQLite Developer and SQLite Expert support the fake RMNOCASE collation needed to modify the Place Name. I'm hopeful that a future version of RMtrix will incorporate the SQLite INSTR function and these queries.

Sample results of parsed Place names

PlaceParse.PNG
Screenshot of results from PlaceParse.sql as displayed by SQLite Developer


The results above show some of the contents of a temporary table xPlacePartsTable containing the PlaceID and Normalized columns from RM's PlaceTable, the comma placements within the values of Normalized and the parsing of Normalized into four parts. Because it is directly and uniquely related to PlaceTable via PlaceID, it is easy to assemble the Standardized Name parts and update the working Name and Abbrev accordingly.

Where are the commas? query

PlaceCommaParse.sql This query creates an initial temporary table xPlaceCommaTable with the columns from PlaceID to Comma3:
-- PlaceCommaParse.sql
/*
2013-02-17 Tom Holden ve3meo
Creates a temporary table of non-empty Standardized Place names with the
positions of up to three commas in the string. Can be used to parse out
the 4 parts of the name for further use such as the generation of a 2-part
Abbreviation and 3-part Name for reports.
*/
DROP TABLE IF EXISTS xPlaceCommaTable;
 
CREATE TEMP TABLE xPlaceCommaTable AS
SELECT PlaceID
    ,Normalized
    ,Comma1
    ,Comma2
    ,Comma2 + INSTR(SUBSTR(Normalized, Comma2 + 1), ',') AS Comma3
FROM (
    SELECT PlaceID
        ,Normalized
        ,Comma1
        ,Comma1 + INSTR(SUBSTR(Normalized, Comma1 + 1), ',') AS Comma2
    FROM (
        SELECT PlaceID
            ,Normalized
            ,INSTR(Normalized, ',') AS Comma1
        FROM PlaceTable
        WHERE PlaceType = 0
            AND Normalized NOT LIKE ''
        )
    );

Parse the Standardized Place names query

PlaceParse.sql This query uses the initial temporary table to build an extended table with all the columns from the first plus the parsed parts of the Standardized Name as shown in the figure above:
-- PlaceParse.sql
/*
   2013-02-17 Tom Holden ve3meo
 
   Requires existence of table created by PlaceCommaParse.sql.
   Extracts the parts of a 4-part Standardized Place name and saves them
   to a temporary table
   */
DROP TABLE IF EXISTS xPlacePartsTable;
 
CREATE TEMP TABLE xPlacePartsTable AS
SELECT *
    ,CASE
        WHEN Comma1 > 0
            THEN SUBSTR(Normalized, 1, Comma1 - 1)
        ELSE Normalized
        END AS Place1
    ,CASE
        WHEN Comma2 > Comma1
            THEN SUBSTR(Normalized, Comma1 + 1, Comma2 - Comma1 - 1)
        WHEN Comma1 > 0
            THEN SUBSTR(Normalized, Comma1 + 1)
        ELSE ''
        END AS Place2
    ,CASE
        WHEN Comma3 > Comma2
            THEN SUBSTR(Normalized, Comma2 + 1, Comma3 - Comma2 - 1)
        WHEN Comma2 > Comma1
            THEN SUBSTR(Normalized, Comma2 + 1)
        ELSE ''
        END AS Place3
    ,CASE
        WHEN Comma3 > Comma2
            THEN SUBSTR(Normalized, Comma3 + 1)
        ELSE ''
        END AS Place4
FROM xPlaceCommaTable;

Update Place Abbreviations with two parts of the Standardized Place name

PlaceAbbrevUpdate.sql This query writes up to the first two parts of the Standardized Name to the Abbrev column of PlaceTable:
-- PlaceAbbrevUpdate.sql
/*
2013-02-17 Tom Holden ve3meo
Combines up to the first two parts of the Standardized Place name
(the Normalized column) and places the concatenated result in the
Abbrev column of PlaceTable for Places having non-empty Normalized
fields.
 
Requires temp xPlacePartsTable generated by PlaceParse.sql or equiv.
*/
UPDATE PlaceTable
SET Abbrev = (
        SELECT CASE
                WHEN Comma1 > 0
                    THEN Place1 || ', ' || Place2
                ELSE Place1
                END AS Abbrev
        FROM xPlacePartsTable
        WHERE PlaceTable.PlaceID = xPlacePartsTable.PlaceID
        )
WHERE PlaceID IN (
        SELECT PlaceID
        FROM xPlacePartsTable
        );

Update the working Place name with three parts

A similar query can set the working Place Name to up to the first three parts of the Standardized name:
PlaceNameUpdate.sql
-- PlaceNameUpdate.sql
/*
2013-02-17 Tom Holden ve3meo
Combines up to the first three parts of the Standardized Place name
(the Normalized column) and places the concatenated result in the
Name column of PlaceTable for Places having non-empty Normalized
fields.
 
Requires temp xPlacePartsTable generated by PlaceParse.sql or equiv.
AND (fake) RMNOCASE collation.
*/
UPDATE PlaceTable
SET Name = (
        SELECT CASE
                WHEN Comma2 > Comma1
                    THEN Place1 || ', ' || Place2 || ', ' || Place3
                WHEN Comma1 > 0
                    THEN Place1 || ', ' || Place2
                ELSE Place1
                END AS Name
        FROM xPlacePartsTable
        WHERE PlaceTable.PlaceID = xPlacePartsTable.PlaceID
        )
WHERE PlaceID IN (
        SELECT PlaceID
        FROM xPlacePartsTable
        );

Further ideas

With these queries as examples, others can be readily developed.
  1. Some may prefer just the first part of the Standardized Place name for the Place Abbreviation, i.e., typically the municipality's name alone.
  2. Those whose Standardized Names are more typically only 3 levels (e.g., Canadian places when geo-coded have only municipality, province, country) may prefer just two parts for the working Place Name (municipality, province).
  3. Perhaps revise the UPDATE query so that the working Place Name receives 2 parts of a 3-level Place and 3 parts of a 4-level place.
  4. The temporary table, xPlacePartsTable, when viewed is itself useful for inspecting Standardized Place names for incomplete names. Depending on the SQLite manager, it can be sorted and filtered for special views.
  5. xPlacePartsTable itself can be edited. Thus Placen values can be changed in it and UPDATEs run to propagate changes to the RM database. Note that TEMP tables are lost when the SQLite manager that created them closes its connection to the database nor are they available to another SQLite manager open at the same time.

Discussions

alerum68

Place Details

alerum68 08 February 2018 21:43:39

What happens to place details with this script? Are they taken into account, or are they lost in the conversion?

ve3meo
ve3meo 11 February 2018 16:14:12

These scripts have no effect on Place Details records. Neither do they split out a Place Detail embedded in a Place record.

www.000webhost.com