Apache Commons Text

Wu Jun 2019-12-25 15:59:03
Categories: > Tags:

Apache Commons Text官网 文本处理

包结构


Package Description
org.apache.commons.text 文本处理基础类
org.apache.commons.text.diff 提供比较字符串差异的算法
org.apache.commons.text.lookup 提供使用StringSubstitutor查找字符串的算法
org.apache.commons.text.matcher 提供使用StringSubstitutor匹配字符串的算法
org.apache.commons.text.similarity 提供字符串相似度的算法
org.apache.commons.text.translate 由一组小的构件块创建文本翻译规则的API

常用


基础类

AlphabetConverter : 字母转换

 Character[] originals;   // a, b, c, d
 Character[] encoding;    // 0, 1, d
 Character[] doNotEncode; // d

 AlphabetConverter ac = AlphabetConverter.createConverterFromChars(originals, encoding, doNotEncode);

 ac.encode("a");    // 00
 ac.encode("b");    // 01
 ac.encode("c");    // 0d
 ac.encode("d");    // d
 ac.encode("abcd"); // 00010dd

CaseUtils : 驼峰转换

 toCamelCase(String str, boolean capitalizeFirstLetter, char... delimiters)

 CaseUtils.toCamelCase(null, false)                                 = null
 CaseUtils.toCamelCase("", false, *)                                = ""
 CaseUtils.toCamelCase("To.Camel.Case", false, new char[]{'.'})     = "toCamelCase"
 CaseUtils.toCamelCase(" to @ Camel case", true, new char[]{'@'})   = "ToCamelCase"
 CaseUtils.toCamelCase(" @to @ Camel case", false, new char[]{'@'}) = "toCamelCase"

RandomStringGenerator : 随机字符串生成

StringEscapeUtils : 字符串转义 适用于Java, Javascript, HTML 和XML.

StringEscapeUtils.escapeSql("or 1=1 ")
StringEscapeUtils.unescapeHtml("<a>dddd</a>") 
StringEscapeUtils.escapeJavaScript("<script>alert('1111')</script>")
StringEscapeUtils.escapeJava("测试") 

StringSubstitutor : 字符串替换器 通过方法或构造器,用 ${变量名} 替换 ,通常将变量放在 map 中

 Map valuesMap = HashMap();
 valuesMap.put("animal", "quick brown fox");
 valuesMap.put("target", "lazy dog");
 String templateString = "The ${animal} jumped over the ${target}.";
 StrSubstitutor sub = new StrSubstitutor(valuesMap);
 String resolvedString = sub.replace(templateString);
 StrSubstitutor.replaceSystemProperties("You are running with java.version = ${java.version} and os.name = ${os.name}.");

StringTokenizer : 字符串标记截取
与 java.util.StringTokenizer 类似,但更灵活可控。
与String.Split()的区别是,split使用正则表达式,StringTokenizer是逐字截取

StringTokenizer(String input)//构造默认分隔符(空格,制表符,换行符和换页符)解析器
StringTokenizer(String input, char delim)//构造制定分隔符解析器
String str = "100|66,55:200|567,90:102|43,54";
StringTokenizer st = new StringTokenizer(str ,":,|");
while( st.hasMoreElements() ){
    System.out.println(st.nextToken());
}

WordUtils : 词语操作

String str = "Here is one line of text that is going to be wrapped after 20 columns." 
WordUtils.wrap(str,20)//"Here is one line of\ntext that is going\nto be wrapped after\n20 columns."
WordUtils.wrap(str,20,"<br />",false)//"Here is one line of<br />text that is going< br />to be wrapped after<br />20 columns."
 WordUtils.capitalize(null)        = null
 WordUtils.capitalize("")          = ""
 WordUtils.capitalize("i am FINE") = "I Am FINE"
 WordUtils.capitalizeFully(null)        = null
 WordUtils.capitalizeFully("")          = ""
 WordUtils.capitalizeFully("i am FINE") = "I Am Fine"
 WordUtils.uncapitalize(null)        = null
 WordUtils.uncapitalize("")          = ""
 WordUtils.uncapitalize("I Am FINE") = "i am fINE"
 StringUtils.swapCase(null)                 = null
 StringUtils.swapCase("")                   = ""
 StringUtils.swapCase("The dog has a BONE") = "tHE DOG HAS A bone"
 WordUtils.initials(null)             = null
 WordUtils.initials("")               = ""
 WordUtils.initials("Ben John Lee")   = "BJL"
 WordUtils.initials("Ben J.Lee")      = "BJ"
containsAllWords(CharSequence word,CharSequence... words)
 WordUtils.containsAllWords("abcd", "ab", "cd") = false
 WordUtils.containsAllWords("abc def", "def", "abc") = true
abbreviate(String str,int lower,int upper,String appendToEnd)
 WordUtils.abbreviate("Now is the time for all good men", 0, 40, null));     = "Now"
 WordUtils.abbreviate("Now is the time for all good men", 10, 40, null));    = "Now is the"
 WordUtils.abbreviate("Now is the time for all good men", 20, 40, " ..."));  = "Now is the time for all ..."
 WordUtils.abbreviate("Now is the time for all good men", 1000, -1, ""));    = "Now is the time for all good men"
 WordUtils.abbreviate("Now is the time for all good men", 9, -10, null));    = IllegalArgumentException