How to delete every two lines after 3rd lines in a file contains very large number of lines?
Like
If I have :
1st line (keep)
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)
etc....
bash shell awk sed
New contributor
add a comment |
Like
If I have :
1st line (keep)
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)
etc....
bash shell awk sed
New contributor
1
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
yesterday
can you please clarify more,
– Jaguar Jom
yesterday
3
Possible duplicate of How to print lines number 15 and 25 out of each 50 lines?
– Sundeep
21 hours ago
1
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would beprint lines 1,2,3 out of each 5 lines
for ex:seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a'
andseq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
21 hours ago
also, thesed
version above might be faster than theawk
one for large files
– Sundeep
20 hours ago
add a comment |
Like
If I have :
1st line (keep)
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)
etc....
bash shell awk sed
New contributor
Like
If I have :
1st line (keep)
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)
etc....
bash shell awk sed
bash shell awk sed
New contributor
New contributor
edited 9 mins ago
Prvt_Yadv
3,00031328
3,00031328
New contributor
asked yesterday
Jaguar JomJaguar Jom
161
161
New contributor
New contributor
1
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
yesterday
can you please clarify more,
– Jaguar Jom
yesterday
3
Possible duplicate of How to print lines number 15 and 25 out of each 50 lines?
– Sundeep
21 hours ago
1
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would beprint lines 1,2,3 out of each 5 lines
for ex:seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a'
andseq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
21 hours ago
also, thesed
version above might be faster than theawk
one for large files
– Sundeep
20 hours ago
add a comment |
1
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
yesterday
can you please clarify more,
– Jaguar Jom
yesterday
3
Possible duplicate of How to print lines number 15 and 25 out of each 50 lines?
– Sundeep
21 hours ago
1
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would beprint lines 1,2,3 out of each 5 lines
for ex:seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a'
andseq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
21 hours ago
also, thesed
version above might be faster than theawk
one for large files
– Sundeep
20 hours ago
1
1
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
yesterday
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
yesterday
can you please clarify more,
– Jaguar Jom
yesterday
can you please clarify more,
– Jaguar Jom
yesterday
3
3
Possible duplicate of How to print lines number 15 and 25 out of each 50 lines?
– Sundeep
21 hours ago
Possible duplicate of How to print lines number 15 and 25 out of each 50 lines?
– Sundeep
21 hours ago
1
1
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be
print lines 1,2,3 out of each 5 lines
for ex: seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a'
and seq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
21 hours ago
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be
print lines 1,2,3 out of each 5 lines
for ex: seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a'
and seq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
21 hours ago
also, the
sed
version above might be faster than the awk
one for large files– Sundeep
20 hours ago
also, the
sed
version above might be faster than the awk
one for large files– Sundeep
20 hours ago
add a comment |
6 Answers
6
active
oldest
votes
Try:
awk '(NR-1)%5<3' file
For example:
$ awk '(NR-1)%5<3' file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
How it works
The command (NR-1)%5<3
tells awk
to print any line for which (NR-1)%5<3
is true. In awk
, NR
is the line number with the first line counting as 1
. For every five lines in the file, that statement will be true for the first three.
add a comment |
Basically, you want something like 'Fizz-Buzz' in awk ...
awk '{ if (i++%5 < 3) print $0;}'
To show this works...
for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
awk '{ if (i++%5 < 3) print $0;}'
When your file is named, 'mybigfile.csv',
awk '{ if (i++%5 < 3) print $0;}' < mybigfile.csv > mybigfile-123.csv
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
23 hours ago
add a comment |
A simple command is:
awk '{if((NR-1) % 5<=2){print $0}}' file
It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5
will give output like 0 1 2 3 4
, and first 3 lines are less than equal to 2. So it will only print them.
I have file with contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The output is:
1
2
3
6
7
8
11
12
13
Or as suggested in comments you can use:
awk '(NR - 1) % 5 <= 2' file
2
Or, with idiomatic use ofawk
syntax:awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
19 hours ago
Thanks I didnt know it.
– Prvt_Yadv
19 hours ago
add a comment |
A generic solution for masking out a particular pattern of lines from a file:
#!/bin/sh
# The pattern is given on the command line.
pattern=$1
# The period is simply the length of the pattern.
period=${#pattern}
# Use bc to convert the binary pattern to an integer.
mask=$( printf 'ibase=2; %sn' "$pattern" | bc )
awk -v mask="$mask" -v period="$period" '
BEGIN { p = lshift(1, period-1) }
and(rshift(p, (FNR-1) % period), mask)'
This relies on awk
implementing the non-standard functions and()
(bitwise AND), rshift()
and lshift()
(bitwise right and left shift), which both GNU awk
and some BSD implementations of awk
does, but not mawk
.
This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1
means "keep" and a 0
means "delete".
For example: The pattern of line that should be applied in your question is 11100
, which means "for each set of five lines, keep the first three and delete the others".
Using 01001000
would delete all but the 2nd and 5th lines in every 8 lines.
The awk
program could also be written without the BEGIN
block as
and(lshift(1, (period-1) - (FNR-1) % period), mask)
Left-shifting 1 by (period-1) - (FNR-1) % period
positions is the same as calculating 2 to that power, but I'm using lshift()
since awk
does its arithmetics using floating point operations rather than in exact integer arithmetics.
Since the code relies on the binary representation of the pattern, very long patterns may not work well.
Testing:
Removing the lines you want to remove:
$ sh script.sh 11100 <file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Inverting the pattern:
$ sh script.sh 00011 <file
4rth lines (delete)
5th (del)
9th (del)
10th (del)
14th (del)
15th (del)
add a comment |
This can be solved using GNU sed
:
sed '4~5,5~5d' file
Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew
, after which it can be used as gsed
. On Linux, GNU sed is the default.
This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d'
fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.
The top-voted answer suggests using awk '(NR-1)%5<3'
. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.
New contributor
2
+1 ... or4~5{N;d;}
– steeldriver
13 hours ago
add a comment |
Tried with below command and it worked fine
for((i=1;i<=20;i++)); do j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done
output
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
16 hours ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Jaguar Jom is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f509553%2fhow-to-delete-every-two-lines-after-3rd-lines-in-a-file-contains-very-large-numb%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try:
awk '(NR-1)%5<3' file
For example:
$ awk '(NR-1)%5<3' file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
How it works
The command (NR-1)%5<3
tells awk
to print any line for which (NR-1)%5<3
is true. In awk
, NR
is the line number with the first line counting as 1
. For every five lines in the file, that statement will be true for the first three.
add a comment |
Try:
awk '(NR-1)%5<3' file
For example:
$ awk '(NR-1)%5<3' file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
How it works
The command (NR-1)%5<3
tells awk
to print any line for which (NR-1)%5<3
is true. In awk
, NR
is the line number with the first line counting as 1
. For every five lines in the file, that statement will be true for the first three.
add a comment |
Try:
awk '(NR-1)%5<3' file
For example:
$ awk '(NR-1)%5<3' file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
How it works
The command (NR-1)%5<3
tells awk
to print any line for which (NR-1)%5<3
is true. In awk
, NR
is the line number with the first line counting as 1
. For every five lines in the file, that statement will be true for the first three.
Try:
awk '(NR-1)%5<3' file
For example:
$ awk '(NR-1)%5<3' file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
How it works
The command (NR-1)%5<3
tells awk
to print any line for which (NR-1)%5<3
is true. In awk
, NR
is the line number with the first line counting as 1
. For every five lines in the file, that statement will be true for the first three.
edited 19 hours ago
Kusalananda♦
138k17258428
138k17258428
answered 23 hours ago
John1024John1024
48.1k5113128
48.1k5113128
add a comment |
add a comment |
Basically, you want something like 'Fizz-Buzz' in awk ...
awk '{ if (i++%5 < 3) print $0;}'
To show this works...
for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
awk '{ if (i++%5 < 3) print $0;}'
When your file is named, 'mybigfile.csv',
awk '{ if (i++%5 < 3) print $0;}' < mybigfile.csv > mybigfile-123.csv
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
23 hours ago
add a comment |
Basically, you want something like 'Fizz-Buzz' in awk ...
awk '{ if (i++%5 < 3) print $0;}'
To show this works...
for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
awk '{ if (i++%5 < 3) print $0;}'
When your file is named, 'mybigfile.csv',
awk '{ if (i++%5 < 3) print $0;}' < mybigfile.csv > mybigfile-123.csv
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
23 hours ago
add a comment |
Basically, you want something like 'Fizz-Buzz' in awk ...
awk '{ if (i++%5 < 3) print $0;}'
To show this works...
for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
awk '{ if (i++%5 < 3) print $0;}'
When your file is named, 'mybigfile.csv',
awk '{ if (i++%5 < 3) print $0;}' < mybigfile.csv > mybigfile-123.csv
Basically, you want something like 'Fizz-Buzz' in awk ...
awk '{ if (i++%5 < 3) print $0;}'
To show this works...
for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
awk '{ if (i++%5 < 3) print $0;}'
When your file is named, 'mybigfile.csv',
awk '{ if (i++%5 < 3) print $0;}' < mybigfile.csv > mybigfile-123.csv
answered 23 hours ago
ChuckCottrillChuckCottrill
722814
722814
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
23 hours ago
add a comment |
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
23 hours ago
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
23 hours ago
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
23 hours ago
add a comment |
A simple command is:
awk '{if((NR-1) % 5<=2){print $0}}' file
It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5
will give output like 0 1 2 3 4
, and first 3 lines are less than equal to 2. So it will only print them.
I have file with contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The output is:
1
2
3
6
7
8
11
12
13
Or as suggested in comments you can use:
awk '(NR - 1) % 5 <= 2' file
2
Or, with idiomatic use ofawk
syntax:awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
19 hours ago
Thanks I didnt know it.
– Prvt_Yadv
19 hours ago
add a comment |
A simple command is:
awk '{if((NR-1) % 5<=2){print $0}}' file
It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5
will give output like 0 1 2 3 4
, and first 3 lines are less than equal to 2. So it will only print them.
I have file with contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The output is:
1
2
3
6
7
8
11
12
13
Or as suggested in comments you can use:
awk '(NR - 1) % 5 <= 2' file
2
Or, with idiomatic use ofawk
syntax:awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
19 hours ago
Thanks I didnt know it.
– Prvt_Yadv
19 hours ago
add a comment |
A simple command is:
awk '{if((NR-1) % 5<=2){print $0}}' file
It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5
will give output like 0 1 2 3 4
, and first 3 lines are less than equal to 2. So it will only print them.
I have file with contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The output is:
1
2
3
6
7
8
11
12
13
Or as suggested in comments you can use:
awk '(NR - 1) % 5 <= 2' file
A simple command is:
awk '{if((NR-1) % 5<=2){print $0}}' file
It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5
will give output like 0 1 2 3 4
, and first 3 lines are less than equal to 2. So it will only print them.
I have file with contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The output is:
1
2
3
6
7
8
11
12
13
Or as suggested in comments you can use:
awk '(NR - 1) % 5 <= 2' file
edited 18 hours ago
answered 23 hours ago
Prvt_YadvPrvt_Yadv
3,00031328
3,00031328
2
Or, with idiomatic use ofawk
syntax:awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
19 hours ago
Thanks I didnt know it.
– Prvt_Yadv
19 hours ago
add a comment |
2
Or, with idiomatic use ofawk
syntax:awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
19 hours ago
Thanks I didnt know it.
– Prvt_Yadv
19 hours ago
2
2
Or, with idiomatic use of
awk
syntax: awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
19 hours ago
Or, with idiomatic use of
awk
syntax: awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
19 hours ago
Thanks I didnt know it.
– Prvt_Yadv
19 hours ago
Thanks I didnt know it.
– Prvt_Yadv
19 hours ago
add a comment |
A generic solution for masking out a particular pattern of lines from a file:
#!/bin/sh
# The pattern is given on the command line.
pattern=$1
# The period is simply the length of the pattern.
period=${#pattern}
# Use bc to convert the binary pattern to an integer.
mask=$( printf 'ibase=2; %sn' "$pattern" | bc )
awk -v mask="$mask" -v period="$period" '
BEGIN { p = lshift(1, period-1) }
and(rshift(p, (FNR-1) % period), mask)'
This relies on awk
implementing the non-standard functions and()
(bitwise AND), rshift()
and lshift()
(bitwise right and left shift), which both GNU awk
and some BSD implementations of awk
does, but not mawk
.
This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1
means "keep" and a 0
means "delete".
For example: The pattern of line that should be applied in your question is 11100
, which means "for each set of five lines, keep the first three and delete the others".
Using 01001000
would delete all but the 2nd and 5th lines in every 8 lines.
The awk
program could also be written without the BEGIN
block as
and(lshift(1, (period-1) - (FNR-1) % period), mask)
Left-shifting 1 by (period-1) - (FNR-1) % period
positions is the same as calculating 2 to that power, but I'm using lshift()
since awk
does its arithmetics using floating point operations rather than in exact integer arithmetics.
Since the code relies on the binary representation of the pattern, very long patterns may not work well.
Testing:
Removing the lines you want to remove:
$ sh script.sh 11100 <file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Inverting the pattern:
$ sh script.sh 00011 <file
4rth lines (delete)
5th (del)
9th (del)
10th (del)
14th (del)
15th (del)
add a comment |
A generic solution for masking out a particular pattern of lines from a file:
#!/bin/sh
# The pattern is given on the command line.
pattern=$1
# The period is simply the length of the pattern.
period=${#pattern}
# Use bc to convert the binary pattern to an integer.
mask=$( printf 'ibase=2; %sn' "$pattern" | bc )
awk -v mask="$mask" -v period="$period" '
BEGIN { p = lshift(1, period-1) }
and(rshift(p, (FNR-1) % period), mask)'
This relies on awk
implementing the non-standard functions and()
(bitwise AND), rshift()
and lshift()
(bitwise right and left shift), which both GNU awk
and some BSD implementations of awk
does, but not mawk
.
This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1
means "keep" and a 0
means "delete".
For example: The pattern of line that should be applied in your question is 11100
, which means "for each set of five lines, keep the first three and delete the others".
Using 01001000
would delete all but the 2nd and 5th lines in every 8 lines.
The awk
program could also be written without the BEGIN
block as
and(lshift(1, (period-1) - (FNR-1) % period), mask)
Left-shifting 1 by (period-1) - (FNR-1) % period
positions is the same as calculating 2 to that power, but I'm using lshift()
since awk
does its arithmetics using floating point operations rather than in exact integer arithmetics.
Since the code relies on the binary representation of the pattern, very long patterns may not work well.
Testing:
Removing the lines you want to remove:
$ sh script.sh 11100 <file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Inverting the pattern:
$ sh script.sh 00011 <file
4rth lines (delete)
5th (del)
9th (del)
10th (del)
14th (del)
15th (del)
add a comment |
A generic solution for masking out a particular pattern of lines from a file:
#!/bin/sh
# The pattern is given on the command line.
pattern=$1
# The period is simply the length of the pattern.
period=${#pattern}
# Use bc to convert the binary pattern to an integer.
mask=$( printf 'ibase=2; %sn' "$pattern" | bc )
awk -v mask="$mask" -v period="$period" '
BEGIN { p = lshift(1, period-1) }
and(rshift(p, (FNR-1) % period), mask)'
This relies on awk
implementing the non-standard functions and()
(bitwise AND), rshift()
and lshift()
(bitwise right and left shift), which both GNU awk
and some BSD implementations of awk
does, but not mawk
.
This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1
means "keep" and a 0
means "delete".
For example: The pattern of line that should be applied in your question is 11100
, which means "for each set of five lines, keep the first three and delete the others".
Using 01001000
would delete all but the 2nd and 5th lines in every 8 lines.
The awk
program could also be written without the BEGIN
block as
and(lshift(1, (period-1) - (FNR-1) % period), mask)
Left-shifting 1 by (period-1) - (FNR-1) % period
positions is the same as calculating 2 to that power, but I'm using lshift()
since awk
does its arithmetics using floating point operations rather than in exact integer arithmetics.
Since the code relies on the binary representation of the pattern, very long patterns may not work well.
Testing:
Removing the lines you want to remove:
$ sh script.sh 11100 <file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Inverting the pattern:
$ sh script.sh 00011 <file
4rth lines (delete)
5th (del)
9th (del)
10th (del)
14th (del)
15th (del)
A generic solution for masking out a particular pattern of lines from a file:
#!/bin/sh
# The pattern is given on the command line.
pattern=$1
# The period is simply the length of the pattern.
period=${#pattern}
# Use bc to convert the binary pattern to an integer.
mask=$( printf 'ibase=2; %sn' "$pattern" | bc )
awk -v mask="$mask" -v period="$period" '
BEGIN { p = lshift(1, period-1) }
and(rshift(p, (FNR-1) % period), mask)'
This relies on awk
implementing the non-standard functions and()
(bitwise AND), rshift()
and lshift()
(bitwise right and left shift), which both GNU awk
and some BSD implementations of awk
does, but not mawk
.
This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1
means "keep" and a 0
means "delete".
For example: The pattern of line that should be applied in your question is 11100
, which means "for each set of five lines, keep the first three and delete the others".
Using 01001000
would delete all but the 2nd and 5th lines in every 8 lines.
The awk
program could also be written without the BEGIN
block as
and(lshift(1, (period-1) - (FNR-1) % period), mask)
Left-shifting 1 by (period-1) - (FNR-1) % period
positions is the same as calculating 2 to that power, but I'm using lshift()
since awk
does its arithmetics using floating point operations rather than in exact integer arithmetics.
Since the code relies on the binary representation of the pattern, very long patterns may not work well.
Testing:
Removing the lines you want to remove:
$ sh script.sh 11100 <file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Inverting the pattern:
$ sh script.sh 00011 <file
4rth lines (delete)
5th (del)
9th (del)
10th (del)
14th (del)
15th (del)
edited 15 hours ago
answered 18 hours ago
Kusalananda♦Kusalananda
138k17258428
138k17258428
add a comment |
add a comment |
This can be solved using GNU sed
:
sed '4~5,5~5d' file
Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew
, after which it can be used as gsed
. On Linux, GNU sed is the default.
This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d'
fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.
The top-voted answer suggests using awk '(NR-1)%5<3'
. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.
New contributor
2
+1 ... or4~5{N;d;}
– steeldriver
13 hours ago
add a comment |
This can be solved using GNU sed
:
sed '4~5,5~5d' file
Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew
, after which it can be used as gsed
. On Linux, GNU sed is the default.
This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d'
fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.
The top-voted answer suggests using awk '(NR-1)%5<3'
. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.
New contributor
2
+1 ... or4~5{N;d;}
– steeldriver
13 hours ago
add a comment |
This can be solved using GNU sed
:
sed '4~5,5~5d' file
Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew
, after which it can be used as gsed
. On Linux, GNU sed is the default.
This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d'
fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.
The top-voted answer suggests using awk '(NR-1)%5<3'
. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.
New contributor
This can be solved using GNU sed
:
sed '4~5,5~5d' file
Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew
, after which it can be used as gsed
. On Linux, GNU sed is the default.
This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d'
fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.
The top-voted answer suggests using awk '(NR-1)%5<3'
. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.
New contributor
New contributor
answered 14 hours ago
tomsmedingtomsmeding
1413
1413
New contributor
New contributor
2
+1 ... or4~5{N;d;}
– steeldriver
13 hours ago
add a comment |
2
+1 ... or4~5{N;d;}
– steeldriver
13 hours ago
2
2
+1 ... or
4~5{N;d;}
– steeldriver
13 hours ago
+1 ... or
4~5{N;d;}
– steeldriver
13 hours ago
add a comment |
Tried with below command and it worked fine
for((i=1;i<=20;i++)); do j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done
output
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
16 hours ago
add a comment |
Tried with below command and it worked fine
for((i=1;i<=20;i++)); do j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done
output
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
16 hours ago
add a comment |
Tried with below command and it worked fine
for((i=1;i<=20;i++)); do j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done
output
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Tried with below command and it worked fine
for((i=1;i<=20;i++)); do j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done
output
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
answered 20 hours ago
Praveen Kumar BSPraveen Kumar BS
1,6981311
1,6981311
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
16 hours ago
add a comment |
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
16 hours ago
1
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
16 hours ago
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
16 hours ago
add a comment |
Jaguar Jom is a new contributor. Be nice, and check out our Code of Conduct.
Jaguar Jom is a new contributor. Be nice, and check out our Code of Conduct.
Jaguar Jom is a new contributor. Be nice, and check out our Code of Conduct.
Jaguar Jom is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f509553%2fhow-to-delete-every-two-lines-after-3rd-lines-in-a-file-contains-very-large-numb%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
yesterday
can you please clarify more,
– Jaguar Jom
yesterday
3
Possible duplicate of How to print lines number 15 and 25 out of each 50 lines?
– Sundeep
21 hours ago
1
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be
print lines 1,2,3 out of each 5 lines
for ex:seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a'
andseq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
21 hours ago
also, the
sed
version above might be faster than theawk
one for large files– Sundeep
20 hours ago