What method does unzip use to find a single file in an archive?
Let's say I create 100 files with random text data of size 30MB each. Now I create a zip archive with 0 compression i.e. zip dataset.zip -r -0 *.txt
. Now I want to extract just one file from this archive.
As described here, there are two ways of unzipping/extracting files from archives:
1. Seek to the end of the file and lookup the central directory. Then use that for fast random access to the file to be extracted.(Amortized O(1)
complexity)
2. Look through each local header and extract the one where theres a match.(O(n)
complexity)
Which method does unzip use? From my experiments it seems like it uses method 2?
zip archive
add a comment |
Let's say I create 100 files with random text data of size 30MB each. Now I create a zip archive with 0 compression i.e. zip dataset.zip -r -0 *.txt
. Now I want to extract just one file from this archive.
As described here, there are two ways of unzipping/extracting files from archives:
1. Seek to the end of the file and lookup the central directory. Then use that for fast random access to the file to be extracted.(Amortized O(1)
complexity)
2. Look through each local header and extract the one where theres a match.(O(n)
complexity)
Which method does unzip use? From my experiments it seems like it uses method 2?
zip archive
add a comment |
Let's say I create 100 files with random text data of size 30MB each. Now I create a zip archive with 0 compression i.e. zip dataset.zip -r -0 *.txt
. Now I want to extract just one file from this archive.
As described here, there are two ways of unzipping/extracting files from archives:
1. Seek to the end of the file and lookup the central directory. Then use that for fast random access to the file to be extracted.(Amortized O(1)
complexity)
2. Look through each local header and extract the one where theres a match.(O(n)
complexity)
Which method does unzip use? From my experiments it seems like it uses method 2?
zip archive
Let's say I create 100 files with random text data of size 30MB each. Now I create a zip archive with 0 compression i.e. zip dataset.zip -r -0 *.txt
. Now I want to extract just one file from this archive.
As described here, there are two ways of unzipping/extracting files from archives:
1. Seek to the end of the file and lookup the central directory. Then use that for fast random access to the file to be extracted.(Amortized O(1)
complexity)
2. Look through each local header and extract the one where theres a match.(O(n)
complexity)
Which method does unzip use? From my experiments it seems like it uses method 2?
zip archive
zip archive
edited 18 mins ago
IvyMike
1033
1033
asked 1 hour ago
tangytangy
1185
1185
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
It uses method 1, which you can see using strace
:
open("dataset.zip", O_RDONLY) = 3
ioctl(1, TIOCGWINSZ, 0x7fff9a895920) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, "Archive: dataset.zipn", 22Archive: dataset.zip
) = 22
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "340P356(s34230620520127360U[250/2207346<252+u2342251[<2310E342274"..., 4522) = 4522
lseek(3, 943722880, SEEK_SET) = 943722880
read(3, "3f225P\uxv14350343503", 20) = 20
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "340P356(s34230620520127360U[250/2207346<252+u2342251[<2310E342274"..., 8192) = 4522
lseek(3, 849346560, SEEK_SET) = 849346560
read(3, "D262nv210343240C24227344367q300223231306330275266213276M7I'&352234J"..., 8192) = 8192
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
open("rand-28.txt", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4
ioctl(1, TIOCGWINSZ, 0x7fff9a895790) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, " extracting: rand-28.txt "..., 37 extracting: rand-28.txt ) = 37
read(3, "2753279Y206223217}355W%:220YNT257260z^361T242237021336372+306310"..., 8192) = 8192
unzip
opens dataset.zip
, seeks to the end, then seeks to the start of the requested file in the archive (rand-28.txt
, at offset 849346560) and reads from there.
Could you add additional information if possible about how it actually finds the central directory record(which here seems to be 943718400)
– tangy
12 mins ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497509%2fwhat-method-does-unzip-use-to-find-a-single-file-in-an-archive%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It uses method 1, which you can see using strace
:
open("dataset.zip", O_RDONLY) = 3
ioctl(1, TIOCGWINSZ, 0x7fff9a895920) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, "Archive: dataset.zipn", 22Archive: dataset.zip
) = 22
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "340P356(s34230620520127360U[250/2207346<252+u2342251[<2310E342274"..., 4522) = 4522
lseek(3, 943722880, SEEK_SET) = 943722880
read(3, "3f225P\uxv14350343503", 20) = 20
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "340P356(s34230620520127360U[250/2207346<252+u2342251[<2310E342274"..., 8192) = 4522
lseek(3, 849346560, SEEK_SET) = 849346560
read(3, "D262nv210343240C24227344367q300223231306330275266213276M7I'&352234J"..., 8192) = 8192
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
open("rand-28.txt", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4
ioctl(1, TIOCGWINSZ, 0x7fff9a895790) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, " extracting: rand-28.txt "..., 37 extracting: rand-28.txt ) = 37
read(3, "2753279Y206223217}355W%:220YNT257260z^361T242237021336372+306310"..., 8192) = 8192
unzip
opens dataset.zip
, seeks to the end, then seeks to the start of the requested file in the archive (rand-28.txt
, at offset 849346560) and reads from there.
Could you add additional information if possible about how it actually finds the central directory record(which here seems to be 943718400)
– tangy
12 mins ago
add a comment |
It uses method 1, which you can see using strace
:
open("dataset.zip", O_RDONLY) = 3
ioctl(1, TIOCGWINSZ, 0x7fff9a895920) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, "Archive: dataset.zipn", 22Archive: dataset.zip
) = 22
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "340P356(s34230620520127360U[250/2207346<252+u2342251[<2310E342274"..., 4522) = 4522
lseek(3, 943722880, SEEK_SET) = 943722880
read(3, "3f225P\uxv14350343503", 20) = 20
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "340P356(s34230620520127360U[250/2207346<252+u2342251[<2310E342274"..., 8192) = 4522
lseek(3, 849346560, SEEK_SET) = 849346560
read(3, "D262nv210343240C24227344367q300223231306330275266213276M7I'&352234J"..., 8192) = 8192
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
open("rand-28.txt", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4
ioctl(1, TIOCGWINSZ, 0x7fff9a895790) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, " extracting: rand-28.txt "..., 37 extracting: rand-28.txt ) = 37
read(3, "2753279Y206223217}355W%:220YNT257260z^361T242237021336372+306310"..., 8192) = 8192
unzip
opens dataset.zip
, seeks to the end, then seeks to the start of the requested file in the archive (rand-28.txt
, at offset 849346560) and reads from there.
Could you add additional information if possible about how it actually finds the central directory record(which here seems to be 943718400)
– tangy
12 mins ago
add a comment |
It uses method 1, which you can see using strace
:
open("dataset.zip", O_RDONLY) = 3
ioctl(1, TIOCGWINSZ, 0x7fff9a895920) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, "Archive: dataset.zipn", 22Archive: dataset.zip
) = 22
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "340P356(s34230620520127360U[250/2207346<252+u2342251[<2310E342274"..., 4522) = 4522
lseek(3, 943722880, SEEK_SET) = 943722880
read(3, "3f225P\uxv14350343503", 20) = 20
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "340P356(s34230620520127360U[250/2207346<252+u2342251[<2310E342274"..., 8192) = 4522
lseek(3, 849346560, SEEK_SET) = 849346560
read(3, "D262nv210343240C24227344367q300223231306330275266213276M7I'&352234J"..., 8192) = 8192
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
open("rand-28.txt", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4
ioctl(1, TIOCGWINSZ, 0x7fff9a895790) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, " extracting: rand-28.txt "..., 37 extracting: rand-28.txt ) = 37
read(3, "2753279Y206223217}355W%:220YNT257260z^361T242237021336372+306310"..., 8192) = 8192
unzip
opens dataset.zip
, seeks to the end, then seeks to the start of the requested file in the archive (rand-28.txt
, at offset 849346560) and reads from there.
It uses method 1, which you can see using strace
:
open("dataset.zip", O_RDONLY) = 3
ioctl(1, TIOCGWINSZ, 0x7fff9a895920) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, "Archive: dataset.zipn", 22Archive: dataset.zip
) = 22
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "340P356(s34230620520127360U[250/2207346<252+u2342251[<2310E342274"..., 4522) = 4522
lseek(3, 943722880, SEEK_SET) = 943722880
read(3, "3f225P\uxv14350343503", 20) = 20
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "340P356(s34230620520127360U[250/2207346<252+u2342251[<2310E342274"..., 8192) = 4522
lseek(3, 849346560, SEEK_SET) = 849346560
read(3, "D262nv210343240C24227344367q300223231306330275266213276M7I'&352234J"..., 8192) = 8192
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
open("rand-28.txt", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4
ioctl(1, TIOCGWINSZ, 0x7fff9a895790) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, " extracting: rand-28.txt "..., 37 extracting: rand-28.txt ) = 37
read(3, "2753279Y206223217}355W%:220YNT257260z^361T242237021336372+306310"..., 8192) = 8192
unzip
opens dataset.zip
, seeks to the end, then seeks to the start of the requested file in the archive (rand-28.txt
, at offset 849346560) and reads from there.
answered 1 hour ago
Stephen KittStephen Kitt
169k24379457
169k24379457
Could you add additional information if possible about how it actually finds the central directory record(which here seems to be 943718400)
– tangy
12 mins ago
add a comment |
Could you add additional information if possible about how it actually finds the central directory record(which here seems to be 943718400)
– tangy
12 mins ago
Could you add additional information if possible about how it actually finds the central directory record(which here seems to be 943718400)
– tangy
12 mins ago
Could you add additional information if possible about how it actually finds the central directory record(which here seems to be 943718400)
– tangy
12 mins ago
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497509%2fwhat-method-does-unzip-use-to-find-a-single-file-in-an-archive%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown