What is an efficient data structure for prefix matching?Efficient map data structure supporting approximate...

How can I give a Ranger advantage on a check due to Favored Enemy without spoiling the story for the player?

Are all power cords made equal?

Is it possible to methodically find the total of ways to read a given phrase making a stack?

Why would you use 2 alternate layout buttons instead of 1, when only one can be selected at once

Crack the bank account's password!

Is there any danger of my neighbor having my wife's signature?

How would an EMP effect spacesuits (and small-arms weapons)?

How to write Muḥammad ibn Mūsā al-Khwārizmī?

Why write a book when there's a movie in my head?

Is Screenshot Time-tracking Common?

Process substitution inside a subshell to set a variable

What is an efficient data structure for prefix matching?

Was there a pre-determined arrangment for division of Germany in case it surrendered before any Soviet forces entered its territory?

Performance and power usage for Raspberry Pi in the Stratosphere

What happened to Hermione’s clothing and other possessions after she wiped her parents’ memories of her?

If a 12 by 16 sheet of paper is folded on its diagonal, what is the area of the region of the overlap?

What is formjacking?

Would water spill from a bowl in a Bag of Holding?

How do I narratively explain how in-game circumstances do not mechanically allow a PC to instantly kill an NPC?

What is wrong with my use of "find -print0"?

Players preemptively rolling, even though their rolls are useless or are checking the wrong skills

How do I avoid the "chosen hero" feeling?

How can I keep my gold safe from other PCs?

How do I fight with Heavy Armor as a Wizard with Tenser's Transformation?



What is an efficient data structure for prefix matching?


Efficient map data structure supporting approximate lookupIs there a data structure for efficiently searching a string that contains a given substring?Is there a 'string stack' data structure that supports these string operations?Efficient data structures for building a fast spell checkerEfficient map data structure supporting approximate lookupData Structure For Closest Pair ProblemUnderstanding the Baeza-Yates Régnier algorithm (multiple string matching, extended from Boyer-Moore)Efficient data structure for set to union of sets mappingMatch dictionary to misspelled word, corner casesdata structure admitting top items with prefixInfix search in millions of stringsFind the Most Frequent Ordered Word Pair In a Document













2












$begingroup$


I'm looking for an data structure that supports efficient random prefix matching queries (pattern) over a previously known set of words (dictionary). The dictionary is expected to contain about 10,000 words of varying length (I haven't calculated average word length, but I don't expect any word to be more than 80 characters long). Prefix matching in this case would be equivalent to words[i].toLowerCase().startsWith(pattern.toLowerCase()).



A Trie is an obvious choice, and provides linear time search corresponding to the length of the pattern. However, I'm confused whether a Suffix Tree, or a Suffix Array, might provide any improvements over a Trie. It seems a Suffix whatever is commonly used for one text, not multiple. I also have no requirement for supporting the various cool use cases (longest common prefix, or number of times a prefix occurs etc) that Suffix whatever can efficiently support.



I'm looking for a recommendation on which data structure to use, and why. Tries use a lot of space, and if I end up using one, I'd consider compress the branches with nodes with outdegree 1.



For the duplicate button happy readers, I've read this and this question, none seemed directly relevant.



Example:



Dictionary: [banana, apple, banter, orange]

Pattern: ban
Return: banana (any match)

Pattern: grapes
Return: null









share|cite|improve this question











$endgroup$








  • 2




    $begingroup$
    What does "prefix matching queries (pattern)" mean? Can you specify the operation that you want to perform on the data structure more carefully?
    $endgroup$
    – D.W.
    5 hours ago










  • $begingroup$
    Why "ban" pattern does not yield "banana" and "banter"? What pattern **"e" **returns in your case?
    $endgroup$
    – Evil
    5 hours ago








  • 1




    $begingroup$
    @Evil It's specified already in my example, any match, assuming "banana" is the path the code chose. "banter" would be a perfectly acceptable answer too.
    $endgroup$
    – Abhijit Sarkar
    5 hours ago










  • $begingroup$
    @D.W. I thought it was plenty obvious what prefix match means, but I've updated my question with a code snippet to demonstrate it explicitly.
    $endgroup$
    – Abhijit Sarkar
    5 hours ago
















2












$begingroup$


I'm looking for an data structure that supports efficient random prefix matching queries (pattern) over a previously known set of words (dictionary). The dictionary is expected to contain about 10,000 words of varying length (I haven't calculated average word length, but I don't expect any word to be more than 80 characters long). Prefix matching in this case would be equivalent to words[i].toLowerCase().startsWith(pattern.toLowerCase()).



A Trie is an obvious choice, and provides linear time search corresponding to the length of the pattern. However, I'm confused whether a Suffix Tree, or a Suffix Array, might provide any improvements over a Trie. It seems a Suffix whatever is commonly used for one text, not multiple. I also have no requirement for supporting the various cool use cases (longest common prefix, or number of times a prefix occurs etc) that Suffix whatever can efficiently support.



I'm looking for a recommendation on which data structure to use, and why. Tries use a lot of space, and if I end up using one, I'd consider compress the branches with nodes with outdegree 1.



For the duplicate button happy readers, I've read this and this question, none seemed directly relevant.



Example:



Dictionary: [banana, apple, banter, orange]

Pattern: ban
Return: banana (any match)

Pattern: grapes
Return: null









share|cite|improve this question











$endgroup$








  • 2




    $begingroup$
    What does "prefix matching queries (pattern)" mean? Can you specify the operation that you want to perform on the data structure more carefully?
    $endgroup$
    – D.W.
    5 hours ago










  • $begingroup$
    Why "ban" pattern does not yield "banana" and "banter"? What pattern **"e" **returns in your case?
    $endgroup$
    – Evil
    5 hours ago








  • 1




    $begingroup$
    @Evil It's specified already in my example, any match, assuming "banana" is the path the code chose. "banter" would be a perfectly acceptable answer too.
    $endgroup$
    – Abhijit Sarkar
    5 hours ago










  • $begingroup$
    @D.W. I thought it was plenty obvious what prefix match means, but I've updated my question with a code snippet to demonstrate it explicitly.
    $endgroup$
    – Abhijit Sarkar
    5 hours ago














2












2








2





$begingroup$


I'm looking for an data structure that supports efficient random prefix matching queries (pattern) over a previously known set of words (dictionary). The dictionary is expected to contain about 10,000 words of varying length (I haven't calculated average word length, but I don't expect any word to be more than 80 characters long). Prefix matching in this case would be equivalent to words[i].toLowerCase().startsWith(pattern.toLowerCase()).



A Trie is an obvious choice, and provides linear time search corresponding to the length of the pattern. However, I'm confused whether a Suffix Tree, or a Suffix Array, might provide any improvements over a Trie. It seems a Suffix whatever is commonly used for one text, not multiple. I also have no requirement for supporting the various cool use cases (longest common prefix, or number of times a prefix occurs etc) that Suffix whatever can efficiently support.



I'm looking for a recommendation on which data structure to use, and why. Tries use a lot of space, and if I end up using one, I'd consider compress the branches with nodes with outdegree 1.



For the duplicate button happy readers, I've read this and this question, none seemed directly relevant.



Example:



Dictionary: [banana, apple, banter, orange]

Pattern: ban
Return: banana (any match)

Pattern: grapes
Return: null









share|cite|improve this question











$endgroup$




I'm looking for an data structure that supports efficient random prefix matching queries (pattern) over a previously known set of words (dictionary). The dictionary is expected to contain about 10,000 words of varying length (I haven't calculated average word length, but I don't expect any word to be more than 80 characters long). Prefix matching in this case would be equivalent to words[i].toLowerCase().startsWith(pattern.toLowerCase()).



A Trie is an obvious choice, and provides linear time search corresponding to the length of the pattern. However, I'm confused whether a Suffix Tree, or a Suffix Array, might provide any improvements over a Trie. It seems a Suffix whatever is commonly used for one text, not multiple. I also have no requirement for supporting the various cool use cases (longest common prefix, or number of times a prefix occurs etc) that Suffix whatever can efficiently support.



I'm looking for a recommendation on which data structure to use, and why. Tries use a lot of space, and if I end up using one, I'd consider compress the branches with nodes with outdegree 1.



For the duplicate button happy readers, I've read this and this question, none seemed directly relevant.



Example:



Dictionary: [banana, apple, banter, orange]

Pattern: ban
Return: banana (any match)

Pattern: grapes
Return: null






data-structures search-algorithms strings substrings






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 5 hours ago







Abhijit Sarkar

















asked 6 hours ago









Abhijit SarkarAbhijit Sarkar

17913




17913








  • 2




    $begingroup$
    What does "prefix matching queries (pattern)" mean? Can you specify the operation that you want to perform on the data structure more carefully?
    $endgroup$
    – D.W.
    5 hours ago










  • $begingroup$
    Why "ban" pattern does not yield "banana" and "banter"? What pattern **"e" **returns in your case?
    $endgroup$
    – Evil
    5 hours ago








  • 1




    $begingroup$
    @Evil It's specified already in my example, any match, assuming "banana" is the path the code chose. "banter" would be a perfectly acceptable answer too.
    $endgroup$
    – Abhijit Sarkar
    5 hours ago










  • $begingroup$
    @D.W. I thought it was plenty obvious what prefix match means, but I've updated my question with a code snippet to demonstrate it explicitly.
    $endgroup$
    – Abhijit Sarkar
    5 hours ago














  • 2




    $begingroup$
    What does "prefix matching queries (pattern)" mean? Can you specify the operation that you want to perform on the data structure more carefully?
    $endgroup$
    – D.W.
    5 hours ago










  • $begingroup$
    Why "ban" pattern does not yield "banana" and "banter"? What pattern **"e" **returns in your case?
    $endgroup$
    – Evil
    5 hours ago








  • 1




    $begingroup$
    @Evil It's specified already in my example, any match, assuming "banana" is the path the code chose. "banter" would be a perfectly acceptable answer too.
    $endgroup$
    – Abhijit Sarkar
    5 hours ago










  • $begingroup$
    @D.W. I thought it was plenty obvious what prefix match means, but I've updated my question with a code snippet to demonstrate it explicitly.
    $endgroup$
    – Abhijit Sarkar
    5 hours ago








2




2




$begingroup$
What does "prefix matching queries (pattern)" mean? Can you specify the operation that you want to perform on the data structure more carefully?
$endgroup$
– D.W.
5 hours ago




$begingroup$
What does "prefix matching queries (pattern)" mean? Can you specify the operation that you want to perform on the data structure more carefully?
$endgroup$
– D.W.
5 hours ago












$begingroup$
Why "ban" pattern does not yield "banana" and "banter"? What pattern **"e" **returns in your case?
$endgroup$
– Evil
5 hours ago






$begingroup$
Why "ban" pattern does not yield "banana" and "banter"? What pattern **"e" **returns in your case?
$endgroup$
– Evil
5 hours ago






1




1




$begingroup$
@Evil It's specified already in my example, any match, assuming "banana" is the path the code chose. "banter" would be a perfectly acceptable answer too.
$endgroup$
– Abhijit Sarkar
5 hours ago




$begingroup$
@Evil It's specified already in my example, any match, assuming "banana" is the path the code chose. "banter" would be a perfectly acceptable answer too.
$endgroup$
– Abhijit Sarkar
5 hours ago












$begingroup$
@D.W. I thought it was plenty obvious what prefix match means, but I've updated my question with a code snippet to demonstrate it explicitly.
$endgroup$
– Abhijit Sarkar
5 hours ago




$begingroup$
@D.W. I thought it was plenty obvious what prefix match means, but I've updated my question with a code snippet to demonstrate it explicitly.
$endgroup$
– Abhijit Sarkar
5 hours ago










1 Answer
1






active

oldest

votes


















3












$begingroup$

A trie is asymptotically optimal for this. No data structure can achieve better asymptotic running time.



If you care about constant factors, the only way to know what will be optimal is to try multiple approaches and benchmark them. Standard theoretical running time analysis is not reliable at predicting constant factors.



Another data structure to consider is to store every prefix of each word in a hashtable. This will increase the space usage by about 10x (if the average word length is 10) but might speed up lookups in practice. The asymptotic running time will remain the same. You'll have to decide whether you're willing to trade off space for time.



Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching or matching in the middle of the string.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    While I agree with you regarding using Trie, I don't think the claim "Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching" holds. One of the most common uses of a suffix tree/array is pattern matching or substring search. See this or this paper.
    $endgroup$
    – Abhijit Sarkar
    4 hours ago












  • $begingroup$
    @AbhijitSarkar, yes, I know how they are used for that; that's for finding an instance of the pattern in the middle of the string. Here you want to anchor the pattern so it is only allowed to occur at the beginning of the string. Suffix trees/arrays don't seem relevant for the latter. (If you think they're useful for that, I suggest trying to work out a specific algorithm for how you would use a suffix tree to help you answer that query, and then maybe you'll see what I mean; or else you'll have some more details you can add to your question to highlight whatever I've overlooked...)
    $endgroup$
    – D.W.
    3 hours ago











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "419"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f104771%2fwhat-is-an-efficient-data-structure-for-prefix-matching%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3












$begingroup$

A trie is asymptotically optimal for this. No data structure can achieve better asymptotic running time.



If you care about constant factors, the only way to know what will be optimal is to try multiple approaches and benchmark them. Standard theoretical running time analysis is not reliable at predicting constant factors.



Another data structure to consider is to store every prefix of each word in a hashtable. This will increase the space usage by about 10x (if the average word length is 10) but might speed up lookups in practice. The asymptotic running time will remain the same. You'll have to decide whether you're willing to trade off space for time.



Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching or matching in the middle of the string.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    While I agree with you regarding using Trie, I don't think the claim "Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching" holds. One of the most common uses of a suffix tree/array is pattern matching or substring search. See this or this paper.
    $endgroup$
    – Abhijit Sarkar
    4 hours ago












  • $begingroup$
    @AbhijitSarkar, yes, I know how they are used for that; that's for finding an instance of the pattern in the middle of the string. Here you want to anchor the pattern so it is only allowed to occur at the beginning of the string. Suffix trees/arrays don't seem relevant for the latter. (If you think they're useful for that, I suggest trying to work out a specific algorithm for how you would use a suffix tree to help you answer that query, and then maybe you'll see what I mean; or else you'll have some more details you can add to your question to highlight whatever I've overlooked...)
    $endgroup$
    – D.W.
    3 hours ago
















3












$begingroup$

A trie is asymptotically optimal for this. No data structure can achieve better asymptotic running time.



If you care about constant factors, the only way to know what will be optimal is to try multiple approaches and benchmark them. Standard theoretical running time analysis is not reliable at predicting constant factors.



Another data structure to consider is to store every prefix of each word in a hashtable. This will increase the space usage by about 10x (if the average word length is 10) but might speed up lookups in practice. The asymptotic running time will remain the same. You'll have to decide whether you're willing to trade off space for time.



Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching or matching in the middle of the string.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    While I agree with you regarding using Trie, I don't think the claim "Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching" holds. One of the most common uses of a suffix tree/array is pattern matching or substring search. See this or this paper.
    $endgroup$
    – Abhijit Sarkar
    4 hours ago












  • $begingroup$
    @AbhijitSarkar, yes, I know how they are used for that; that's for finding an instance of the pattern in the middle of the string. Here you want to anchor the pattern so it is only allowed to occur at the beginning of the string. Suffix trees/arrays don't seem relevant for the latter. (If you think they're useful for that, I suggest trying to work out a specific algorithm for how you would use a suffix tree to help you answer that query, and then maybe you'll see what I mean; or else you'll have some more details you can add to your question to highlight whatever I've overlooked...)
    $endgroup$
    – D.W.
    3 hours ago














3












3








3





$begingroup$

A trie is asymptotically optimal for this. No data structure can achieve better asymptotic running time.



If you care about constant factors, the only way to know what will be optimal is to try multiple approaches and benchmark them. Standard theoretical running time analysis is not reliable at predicting constant factors.



Another data structure to consider is to store every prefix of each word in a hashtable. This will increase the space usage by about 10x (if the average word length is 10) but might speed up lookups in practice. The asymptotic running time will remain the same. You'll have to decide whether you're willing to trade off space for time.



Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching or matching in the middle of the string.






share|cite|improve this answer











$endgroup$



A trie is asymptotically optimal for this. No data structure can achieve better asymptotic running time.



If you care about constant factors, the only way to know what will be optimal is to try multiple approaches and benchmark them. Standard theoretical running time analysis is not reliable at predicting constant factors.



Another data structure to consider is to store every prefix of each word in a hashtable. This will increase the space usage by about 10x (if the average word length is 10) but might speed up lookups in practice. The asymptotic running time will remain the same. You'll have to decide whether you're willing to trade off space for time.



Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching or matching in the middle of the string.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited 3 hours ago

























answered 5 hours ago









D.W.D.W.

99.5k12121286




99.5k12121286












  • $begingroup$
    While I agree with you regarding using Trie, I don't think the claim "Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching" holds. One of the most common uses of a suffix tree/array is pattern matching or substring search. See this or this paper.
    $endgroup$
    – Abhijit Sarkar
    4 hours ago












  • $begingroup$
    @AbhijitSarkar, yes, I know how they are used for that; that's for finding an instance of the pattern in the middle of the string. Here you want to anchor the pattern so it is only allowed to occur at the beginning of the string. Suffix trees/arrays don't seem relevant for the latter. (If you think they're useful for that, I suggest trying to work out a specific algorithm for how you would use a suffix tree to help you answer that query, and then maybe you'll see what I mean; or else you'll have some more details you can add to your question to highlight whatever I've overlooked...)
    $endgroup$
    – D.W.
    3 hours ago


















  • $begingroup$
    While I agree with you regarding using Trie, I don't think the claim "Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching" holds. One of the most common uses of a suffix tree/array is pattern matching or substring search. See this or this paper.
    $endgroup$
    – Abhijit Sarkar
    4 hours ago












  • $begingroup$
    @AbhijitSarkar, yes, I know how they are used for that; that's for finding an instance of the pattern in the middle of the string. Here you want to anchor the pattern so it is only allowed to occur at the beginning of the string. Suffix trees/arrays don't seem relevant for the latter. (If you think they're useful for that, I suggest trying to work out a specific algorithm for how you would use a suffix tree to help you answer that query, and then maybe you'll see what I mean; or else you'll have some more details you can add to your question to highlight whatever I've overlooked...)
    $endgroup$
    – D.W.
    3 hours ago
















$begingroup$
While I agree with you regarding using Trie, I don't think the claim "Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching" holds. One of the most common uses of a suffix tree/array is pattern matching or substring search. See this or this paper.
$endgroup$
– Abhijit Sarkar
4 hours ago






$begingroup$
While I agree with you regarding using Trie, I don't think the claim "Suffix trees and suffix arrays don't seem relevant. You're asking about prefix matching, not suffix matching" holds. One of the most common uses of a suffix tree/array is pattern matching or substring search. See this or this paper.
$endgroup$
– Abhijit Sarkar
4 hours ago














$begingroup$
@AbhijitSarkar, yes, I know how they are used for that; that's for finding an instance of the pattern in the middle of the string. Here you want to anchor the pattern so it is only allowed to occur at the beginning of the string. Suffix trees/arrays don't seem relevant for the latter. (If you think they're useful for that, I suggest trying to work out a specific algorithm for how you would use a suffix tree to help you answer that query, and then maybe you'll see what I mean; or else you'll have some more details you can add to your question to highlight whatever I've overlooked...)
$endgroup$
– D.W.
3 hours ago




$begingroup$
@AbhijitSarkar, yes, I know how they are used for that; that's for finding an instance of the pattern in the middle of the string. Here you want to anchor the pattern so it is only allowed to occur at the beginning of the string. Suffix trees/arrays don't seem relevant for the latter. (If you think they're useful for that, I suggest trying to work out a specific algorithm for how you would use a suffix tree to help you answer that query, and then maybe you'll see what I mean; or else you'll have some more details you can add to your question to highlight whatever I've overlooked...)
$endgroup$
– D.W.
3 hours ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Computer Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f104771%2fwhat-is-an-efficient-data-structure-for-prefix-matching%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Szabolcs (Ungheria) Altri progetti | Menu di navigazione48°10′14.56″N 21°29′33.14″E /...

Discografia di Klaus Schulze Indice Album in studio | Album dal vivo | Singoli | Antologie | Colonne...

How to make inet_server_addr() return localhost in spite of ::1/128RETURN NEXT in Postgres FunctionConnect to...