Merge/Intersection between table and dataset - How to achieve?
Consider the following table:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0
(Id is an identity column)
In to that table I want to merge the following dataset:
Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06
(Id is NOT present in the dataset)
The rules for the merge are:
- If the hash does not exist in the table, insert it to the table;
- If the hash does not exist in the dataset, delete it from the table;
- If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.
The result of the merge should leave the table looking like this:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06
What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.
sql-server insert merge
add a comment |
Consider the following table:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0
(Id is an identity column)
In to that table I want to merge the following dataset:
Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06
(Id is NOT present in the dataset)
The rules for the merge are:
- If the hash does not exist in the table, insert it to the table;
- If the hash does not exist in the dataset, delete it from the table;
- If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.
The result of the merge should leave the table looking like this:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06
What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.
sql-server insert merge
add a comment |
Consider the following table:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0
(Id is an identity column)
In to that table I want to merge the following dataset:
Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06
(Id is NOT present in the dataset)
The rules for the merge are:
- If the hash does not exist in the table, insert it to the table;
- If the hash does not exist in the dataset, delete it from the table;
- If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.
The result of the merge should leave the table looking like this:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06
What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.
sql-server insert merge
Consider the following table:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0
(Id is an identity column)
In to that table I want to merge the following dataset:
Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06
(Id is NOT present in the dataset)
The rules for the merge are:
- If the hash does not exist in the table, insert it to the table;
- If the hash does not exist in the dataset, delete it from the table;
- If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.
The result of the merge should leave the table looking like this:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06
What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.
sql-server insert merge
sql-server insert merge
edited 1 hour ago
James
asked 1 hour ago
JamesJames
1165
1165
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You could write it as a MERGE
:
WITH
T AS
(
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
),
S AS
(
SELECT DISTINCT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
)
MERGE T
USING S
ON S.[Hash] = T.[Hash]
AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;
But for performance reasons, I would normally write it as two separate statements:
WITH ToDelete AS
(
SELECT
T.*
FROM @T AS T
WHERE
NOT EXISTS
(
SELECT
S.*
FROM @S AS S
WHERE
S.[Hash] = T.[Hash]
)
)
DELETE ToDelete;
WITH ToInsert AS
(
SELECT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
EXCEPT
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
)
INSERT @T
([Hash])
SELECT
ToInsert.[Hash]
FROM ToInsert;
Where @S
is the source table and @T
is the target.
You should have a unique index on the target on ([Hash], [Id])
. Chances are you have this already, or equivalently an index on [Hash]
and a unique (possibly clustered) index on [Id]
.
There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.
There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…
– Randi Vertongen
2 mins ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228529%2fmerge-intersection-between-table-and-dataset-how-to-achieve%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You could write it as a MERGE
:
WITH
T AS
(
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
),
S AS
(
SELECT DISTINCT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
)
MERGE T
USING S
ON S.[Hash] = T.[Hash]
AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;
But for performance reasons, I would normally write it as two separate statements:
WITH ToDelete AS
(
SELECT
T.*
FROM @T AS T
WHERE
NOT EXISTS
(
SELECT
S.*
FROM @S AS S
WHERE
S.[Hash] = T.[Hash]
)
)
DELETE ToDelete;
WITH ToInsert AS
(
SELECT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
EXCEPT
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
)
INSERT @T
([Hash])
SELECT
ToInsert.[Hash]
FROM ToInsert;
Where @S
is the source table and @T
is the target.
You should have a unique index on the target on ([Hash], [Id])
. Chances are you have this already, or equivalently an index on [Hash]
and a unique (possibly clustered) index on [Id]
.
There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.
There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…
– Randi Vertongen
2 mins ago
add a comment |
You could write it as a MERGE
:
WITH
T AS
(
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
),
S AS
(
SELECT DISTINCT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
)
MERGE T
USING S
ON S.[Hash] = T.[Hash]
AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;
But for performance reasons, I would normally write it as two separate statements:
WITH ToDelete AS
(
SELECT
T.*
FROM @T AS T
WHERE
NOT EXISTS
(
SELECT
S.*
FROM @S AS S
WHERE
S.[Hash] = T.[Hash]
)
)
DELETE ToDelete;
WITH ToInsert AS
(
SELECT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
EXCEPT
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
)
INSERT @T
([Hash])
SELECT
ToInsert.[Hash]
FROM ToInsert;
Where @S
is the source table and @T
is the target.
You should have a unique index on the target on ([Hash], [Id])
. Chances are you have this already, or equivalently an index on [Hash]
and a unique (possibly clustered) index on [Id]
.
There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.
There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…
– Randi Vertongen
2 mins ago
add a comment |
You could write it as a MERGE
:
WITH
T AS
(
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
),
S AS
(
SELECT DISTINCT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
)
MERGE T
USING S
ON S.[Hash] = T.[Hash]
AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;
But for performance reasons, I would normally write it as two separate statements:
WITH ToDelete AS
(
SELECT
T.*
FROM @T AS T
WHERE
NOT EXISTS
(
SELECT
S.*
FROM @S AS S
WHERE
S.[Hash] = T.[Hash]
)
)
DELETE ToDelete;
WITH ToInsert AS
(
SELECT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
EXCEPT
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
)
INSERT @T
([Hash])
SELECT
ToInsert.[Hash]
FROM ToInsert;
Where @S
is the source table and @T
is the target.
You should have a unique index on the target on ([Hash], [Id])
. Chances are you have this already, or equivalently an index on [Hash]
and a unique (possibly clustered) index on [Id]
.
There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.
You could write it as a MERGE
:
WITH
T AS
(
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
),
S AS
(
SELECT DISTINCT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
)
MERGE T
USING S
ON S.[Hash] = T.[Hash]
AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;
But for performance reasons, I would normally write it as two separate statements:
WITH ToDelete AS
(
SELECT
T.*
FROM @T AS T
WHERE
NOT EXISTS
(
SELECT
S.*
FROM @S AS S
WHERE
S.[Hash] = T.[Hash]
)
)
DELETE ToDelete;
WITH ToInsert AS
(
SELECT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
EXCEPT
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
)
INSERT @T
([Hash])
SELECT
ToInsert.[Hash]
FROM ToInsert;
Where @S
is the source table and @T
is the target.
You should have a unique index on the target on ([Hash], [Id])
. Chances are you have this already, or equivalently an index on [Hash]
and a unique (possibly clustered) index on [Id]
.
There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.
answered 28 mins ago
Paul White♦Paul White
50.3k14269435
50.3k14269435
There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…
– Randi Vertongen
2 mins ago
add a comment |
There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…
– Randi Vertongen
2 mins ago
There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…
– Randi Vertongen
2 mins ago
There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…
– Randi Vertongen
2 mins ago
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228529%2fmerge-intersection-between-table-and-dataset-how-to-achieve%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown