Common String Manipulation Techniques in DSA
Most string problems are built from a handful of manipulation primitives: splitting text into pieces, joining pieces back together, slicing out a substring, and counting how often each character appears. This page shows each one idiomatically in all four languages and notes its cost so you can compose them without surprises. If you haven’t read about string mutability yet, start there — it explains why we join instead of += in a loop.
Split: text into parts
Splitting breaks a string into a list of substrings on a delimiter. It costs O(n) in the length of the string.
#include <string>
#include <vector>
#include <sstream>
std::vector<std::string> split(const std::string& s, char delim) {
std::vector<std::string> out;
std::stringstream ss(s);
std::string part;
while (std::getline(ss, part, delim)) out.push_back(part);
return out;
}
String[] split(String s) {
return s.split(","); // splits on a regex; "," here
}
function split(s) {
return s.split(","); // returns an array of substrings
}
def split(s):
return s.split(",") # returns a list of substrings
Join: parts into text
Joining is the inverse: glue a list of strings together with a separator. Build with join (not repeated +=) to stay O(n).
#include <string>
#include <vector>
std::string join(const std::vector<std::string>& parts, const std::string& sep) {
std::string out;
for (size_t i = 0; i < parts.size(); i++) {
if (i) out += sep;
out += parts[i];
}
return out;
}
String join(String[] parts) {
return String.join(",", parts);
}
function join(parts) {
return parts.join(",");
}
def join(parts):
return ",".join(parts)
Substring: slice a range
A substring copies a contiguous range of characters, costing O(m) for length m. Conventionally the start is inclusive and the end is exclusive.
#include <string>
std::string s = "abcdef";
std::string sub = s.substr(1, 3); // start=1, length=3 -> "bcd"
String s = "abcdef";
String sub = s.substring(1, 4); // [1, 4) -> "bcd"
const s = "abcdef";
const sub = s.slice(1, 4); // [1, 4) -> "bcd"
s = "abcdef"
sub = s[1:4] # [1, 4) -> "bcd"
Pitfall: C++
substr(pos, len)takes a start and a length, while Javasubstring, JSslice, and Python slicing take a start and an end index. Mixing them up is a common off-by-one bug.
Character frequency
Counting character occurrences underpins anagram checks, “first unique character,” and many hashing problems. Use a hash map (or a fixed-size array of 26/128/256 counts for known alphabets). It’s O(n) time.
#include <string>
#include <unordered_map>
std::unordered_map<char, int> charFreq(const std::string& s) {
std::unordered_map<char, int> freq;
for (char c : s) freq[c]++;
return freq;
}
import java.util.HashMap;
import java.util.Map;
Map<Character, Integer> charFreq(String s) {
Map<Character, Integer> freq = new HashMap<>();
for (char c : s.toCharArray())
freq.merge(c, 1, Integer::sum);
return freq;
}
function charFreq(s) {
const freq = new Map();
for (const c of s) freq.set(c, (freq.get(c) || 0) + 1);
return freq;
}
from collections import Counter
def char_freq(s):
return Counter(s)
Tip: When the input is restricted to lowercase English letters, an
int[26]array indexed byc - 'a'is faster and lighter than a hash map — same O(n) time, smaller constants and no hashing.
Other handy operations
- Case conversion —
toupper/tolower,toUpperCase(),.upper()/.lower(): O(n). - Trim whitespace —
strip()/trim(): O(n). - Replace / contains / startsWith — convenience methods, typically O(n).
- Reverse — see the two-pointer technique in array rotation & reversal; on immutable strings, reverse a char array/list then join.
Complexity recap
| Operation | Complexity |
|---|---|
| Split / join | O(n) |
| Substring (length m) | O(m) |
| Character frequency | O(n) time, O(k) space (k = alphabet size) |
Build via join vs += loop | O(n) vs O(n²) |
With these primitives you can tackle most string challenges. Put them to work on the string exercises.