Merge/Intersection between table and dataset

Merge/Intersection between table and dataset - How to achieve?

Consider the following table:

Id          Hash

----------- ----------------------------------

1           0x31F777F0804D301936411E3ECD760859

2           0xD64A593F3E9ACC972158D522A4289EA0



(Id is an identity column)

In to that table I want to merge the following dataset:

Hash

----------------------------------

0x31F777F0804D301936411E3ECD760859

0x31F777F0804D301936411E3ECD760859

0x0C5A65264F92A543E7AAA06375349C06



(Id is NOT present in the dataset)

The rules for the merge are:

If the hash does not exist in the table, insert it to the table;

If the hash does not exist in the dataset, delete it from the table;

If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

The result of the merge should leave the table looking like this:

Id          Hash

----------- ----------------------------------

1           0x31F777F0804D301936411E3ECD760859

3           0x31F777F0804D301936411E3ECD760859

4           0x0C5A65264F92A543E7AAA06375349C06

What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.

edited 1 hour ago

asked 1 hour ago

James

1165

add a comment |

Consider the following table:

Id          Hash

----------- ----------------------------------

1           0x31F777F0804D301936411E3ECD760859

2           0xD64A593F3E9ACC972158D522A4289EA0



(Id is an identity column)

In to that table I want to merge the following dataset:

Hash

----------------------------------

0x31F777F0804D301936411E3ECD760859

0x31F777F0804D301936411E3ECD760859

0x0C5A65264F92A543E7AAA06375349C06



(Id is NOT present in the dataset)

The rules for the merge are:

If the hash does not exist in the table, insert it to the table;

If the hash does not exist in the dataset, delete it from the table;

If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

The result of the merge should leave the table looking like this:

Id          Hash

----------- ----------------------------------

1           0x31F777F0804D301936411E3ECD760859

3           0x31F777F0804D301936411E3ECD760859

4           0x0C5A65264F92A543E7AAA06375349C06

What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.

edited 1 hour ago

asked 1 hour ago

James

1165

add a comment |

Consider the following table:

Id          Hash

----------- ----------------------------------

1           0x31F777F0804D301936411E3ECD760859

2           0xD64A593F3E9ACC972158D522A4289EA0



(Id is an identity column)

In to that table I want to merge the following dataset:

Hash

----------------------------------

0x31F777F0804D301936411E3ECD760859

0x31F777F0804D301936411E3ECD760859

0x0C5A65264F92A543E7AAA06375349C06



(Id is NOT present in the dataset)

The rules for the merge are:

If the hash does not exist in the table, insert it to the table;

If the hash does not exist in the dataset, delete it from the table;

If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

The result of the merge should leave the table looking like this:

Id          Hash

----------- ----------------------------------

1           0x31F777F0804D301936411E3ECD760859

3           0x31F777F0804D301936411E3ECD760859

4           0x0C5A65264F92A543E7AAA06375349C06

What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.

edited 1 hour ago

asked 1 hour ago

James

1165

Consider the following table:

Id          Hash

----------- ----------------------------------

1           0x31F777F0804D301936411E3ECD760859

2           0xD64A593F3E9ACC972158D522A4289EA0



(Id is an identity column)

In to that table I want to merge the following dataset:

Hash

----------------------------------

0x31F777F0804D301936411E3ECD760859

0x31F777F0804D301936411E3ECD760859

0x0C5A65264F92A543E7AAA06375349C06



(Id is NOT present in the dataset)

The rules for the merge are:

If the hash does not exist in the table, insert it to the table;

If the hash does not exist in the dataset, delete it from the table;

If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

The result of the merge should leave the table looking like this:

Id          Hash

----------- ----------------------------------

1           0x31F777F0804D301936411E3ECD760859

3           0x31F777F0804D301936411E3ECD760859

4           0x0C5A65264F92A543E7AAA06375349C06

What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.

sql-server insert merge

edited 1 hour ago

asked 1 hour ago

James

1165

edited 1 hour ago

asked 1 hour ago

James

1165

edited 1 hour ago

asked 1 hour ago

James

1165

asked 1 hour ago

James

1165

asked 1 hour ago

James

1165

add a comment |

1 Answer
1

active

oldest

votes

You could write it as a MERGE:

WITH

    T AS

    (

        SELECT

            T.[Hash], 

            rn = ROW_NUMBER() OVER (

                PARTITION BY T.[Hash] 

                ORDER BY T.[Hash], T.Id)

        FROM @T AS T

    ),

    S AS

    (

        SELECT DISTINCT

            S.[Hash],

            rn = ROW_NUMBER() OVER (

                PARTITION BY S.[Hash] 

                ORDER BY S.[Hash])

        FROM @S AS S

    )

MERGE T

USING S

    ON S.[Hash] = T.[Hash]

    AND S.rn = T.rn

WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])

WHEN NOT MATCHED BY SOURCE THEN DELETE;

But for performance reasons, I would normally write it as two separate statements:

WITH ToDelete AS

(

    SELECT

        T.*

    FROM @T AS T

    WHERE 

        NOT EXISTS 

        (

            SELECT

                S.* 

            FROM @S AS S 

            WHERE 

                S.[Hash] = T.[Hash]

        )

)

DELETE ToDelete;

WITH ToInsert AS

(

    SELECT

        S.[Hash], 

        rn = ROW_NUMBER() OVER (

            PARTITION BY S.[Hash] 

            ORDER BY S.[Hash])

    FROM @S AS S

    EXCEPT

    SELECT

        T.[Hash], 

        rn = ROW_NUMBER() OVER (

            PARTITION BY T.[Hash] 

            ORDER BY T.[Hash], T.Id)

    FROM @T AS T

)

INSERT @T

    ([Hash])

SELECT

    ToInsert.[Hash]

FROM ToInsert;

Where @S is the source table and @T is the target.

You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].

There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.

answered 28 mins ago

Paul White♦

50.3k14269435

There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

– Randi Vertongen
2 mins ago

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228529%2fmerge-intersection-between-table-and-dataset-how-to-achieve%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You could write it as a MERGE:

WITH

    T AS

    (

        SELECT

            T.[Hash], 

            rn = ROW_NUMBER() OVER (

                PARTITION BY T.[Hash] 

                ORDER BY T.[Hash], T.Id)

        FROM @T AS T

    ),

    S AS

    (

        SELECT DISTINCT

            S.[Hash],

            rn = ROW_NUMBER() OVER (

                PARTITION BY S.[Hash] 

                ORDER BY S.[Hash])

        FROM @S AS S

    )

MERGE T

USING S

    ON S.[Hash] = T.[Hash]

    AND S.rn = T.rn

WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])

WHEN NOT MATCHED BY SOURCE THEN DELETE;

But for performance reasons, I would normally write it as two separate statements:

WITH ToDelete AS

(

    SELECT

        T.*

    FROM @T AS T

    WHERE 

        NOT EXISTS 

        (

            SELECT

                S.* 

            FROM @S AS S 

            WHERE 

                S.[Hash] = T.[Hash]

        )

)

DELETE ToDelete;

WITH ToInsert AS

(

    SELECT

        S.[Hash], 

        rn = ROW_NUMBER() OVER (

            PARTITION BY S.[Hash] 

            ORDER BY S.[Hash])

    FROM @S AS S

    EXCEPT

    SELECT

        T.[Hash], 

        rn = ROW_NUMBER() OVER (

            PARTITION BY T.[Hash] 

            ORDER BY T.[Hash], T.Id)

    FROM @T AS T

)

INSERT @T

    ([Hash])

SELECT

    ToInsert.[Hash]

FROM ToInsert;

Where @S is the source table and @T is the target.

You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].

There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.

answered 28 mins ago

Paul White♦

50.3k14269435

There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

– Randi Vertongen
2 mins ago

add a comment |

You could write it as a MERGE:

WITH

    T AS

    (

        SELECT

            T.[Hash], 

            rn = ROW_NUMBER() OVER (

                PARTITION BY T.[Hash] 

                ORDER BY T.[Hash], T.Id)

        FROM @T AS T

    ),

    S AS

    (

        SELECT DISTINCT

            S.[Hash],

            rn = ROW_NUMBER() OVER (

                PARTITION BY S.[Hash] 

                ORDER BY S.[Hash])

        FROM @S AS S

    )

MERGE T

USING S

    ON S.[Hash] = T.[Hash]

    AND S.rn = T.rn

WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])

WHEN NOT MATCHED BY SOURCE THEN DELETE;

But for performance reasons, I would normally write it as two separate statements:

WITH ToDelete AS

(

    SELECT

        T.*

    FROM @T AS T

    WHERE 

        NOT EXISTS 

        (

            SELECT

                S.* 

            FROM @S AS S 

            WHERE 

                S.[Hash] = T.[Hash]

        )

)

DELETE ToDelete;

WITH ToInsert AS

(

    SELECT

        S.[Hash], 

        rn = ROW_NUMBER() OVER (

            PARTITION BY S.[Hash] 

            ORDER BY S.[Hash])

    FROM @S AS S

    EXCEPT

    SELECT

        T.[Hash], 

        rn = ROW_NUMBER() OVER (

            PARTITION BY T.[Hash] 

            ORDER BY T.[Hash], T.Id)

    FROM @T AS T

)

INSERT @T

    ([Hash])

SELECT

    ToInsert.[Hash]

FROM ToInsert;

Where @S is the source table and @T is the target.

You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].

There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.

answered 28 mins ago

Paul White♦

50.3k14269435

There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

– Randi Vertongen
2 mins ago

add a comment |

You could write it as a MERGE:

WITH

    T AS

    (

        SELECT

            T.[Hash], 

            rn = ROW_NUMBER() OVER (

                PARTITION BY T.[Hash] 

                ORDER BY T.[Hash], T.Id)

        FROM @T AS T

    ),

    S AS

    (

        SELECT DISTINCT

            S.[Hash],

            rn = ROW_NUMBER() OVER (

                PARTITION BY S.[Hash] 

                ORDER BY S.[Hash])

        FROM @S AS S

    )

MERGE T

USING S

    ON S.[Hash] = T.[Hash]

    AND S.rn = T.rn

WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])

WHEN NOT MATCHED BY SOURCE THEN DELETE;

But for performance reasons, I would normally write it as two separate statements:

WITH ToDelete AS

(

    SELECT

        T.*

    FROM @T AS T

    WHERE 

        NOT EXISTS 

        (

            SELECT

                S.* 

            FROM @S AS S 

            WHERE 

                S.[Hash] = T.[Hash]

        )

)

DELETE ToDelete;

WITH ToInsert AS

(

    SELECT

        S.[Hash], 

        rn = ROW_NUMBER() OVER (

            PARTITION BY S.[Hash] 

            ORDER BY S.[Hash])

    FROM @S AS S

    EXCEPT

    SELECT

        T.[Hash], 

        rn = ROW_NUMBER() OVER (

            PARTITION BY T.[Hash] 

            ORDER BY T.[Hash], T.Id)

    FROM @T AS T

)

INSERT @T

    ([Hash])

SELECT

    ToInsert.[Hash]

FROM ToInsert;

Where @S is the source table and @T is the target.

You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].

There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.

answered 28 mins ago

Paul White♦

50.3k14269435

You could write it as a MERGE:

WITH

    T AS

    (

        SELECT

            T.[Hash], 

            rn = ROW_NUMBER() OVER (

                PARTITION BY T.[Hash] 

                ORDER BY T.[Hash], T.Id)

        FROM @T AS T

    ),

    S AS

    (

        SELECT DISTINCT

            S.[Hash],

            rn = ROW_NUMBER() OVER (

                PARTITION BY S.[Hash] 

                ORDER BY S.[Hash])

        FROM @S AS S

    )

MERGE T

USING S

    ON S.[Hash] = T.[Hash]

    AND S.rn = T.rn

WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])

WHEN NOT MATCHED BY SOURCE THEN DELETE;

But for performance reasons, I would normally write it as two separate statements:

WITH ToDelete AS

(

    SELECT

        T.*

    FROM @T AS T

    WHERE 

        NOT EXISTS 

        (

            SELECT

                S.* 

            FROM @S AS S 

            WHERE 

                S.[Hash] = T.[Hash]

        )

)

DELETE ToDelete;

WITH ToInsert AS

(

    SELECT

        S.[Hash], 

        rn = ROW_NUMBER() OVER (

            PARTITION BY S.[Hash] 

            ORDER BY S.[Hash])

    FROM @S AS S

    EXCEPT

    SELECT

        T.[Hash], 

        rn = ROW_NUMBER() OVER (

            PARTITION BY T.[Hash] 

            ORDER BY T.[Hash], T.Id)

    FROM @T AS T

)

INSERT @T

    ([Hash])

SELECT

    ToInsert.[Hash]

FROM ToInsert;

Where @S is the source table and @T is the target.

You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].

There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.

answered 28 mins ago

Paul White♦

50.3k14269435

answered 28 mins ago

Paul White♦

50.3k14269435

answered 28 mins ago

Paul White♦

50.3k14269435

answered 28 mins ago

Paul White♦

50.3k14269435

There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

– Randi Vertongen
2 mins ago

add a comment |

There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

– Randi Vertongen
2 mins ago

There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

– Randi Vertongen
2 mins ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Fdzfgy