求对文本文件里的数据进行解析

码拜

10年 ago

本人对文件解析不太熟悉，希望有高手帮忙解答或给出代码demo,万分感谢！例如目录下有A文件、B文件、C文件，A文件名fs20160629-00000.txt,B文件名ts20160629-00001.txt,C文件名Js20160629-00002.txt，
1）A文件中有数据：
http://www.oin.com:8081/?id=16696072KL|18129-02=rzrfs-3-201606290653047
http://www.oin.com:8081/?id=16696072KW|18129-02=rzrfs-3-201606290653047
http://www.oin.com:8081/?id=16696072L3|18129-02=rzrfs-3-201606290653047
http://www.oin.com:8081/?id=16696072L9|18129-02=rzrfs-3-201606290653047
2）B文件中有数据：
http://www.oin.com:8086/?id=16696073PP|18129-02=rzrfs-3-201606290653046
http://www.oin.com:8086/?id=16696073R0|18129-02=rzrfs-3-201606290653046
http://www.oin.com:8086/?id=16696073R8|18129-02=rzrfs-3-201606290653046
http://www.toin.com:8086/?id=16696073RJ|18129-02=rzrfs-3-201606290653046
http://www.oin.com:8086/?id=16696073RU|18129-02=rzrfs-3-201606290653046
http://www.oin.com:8086/?id=16696073S1|18129-02=rzrfs-3-201606290653047
http://www.oin.com:8086/?id=16696073S9|18129-02=rzrfs-3-201606290653047
3）C文件中数据结构相似，只是中间的id号不同。
现在需求：用JAVA语言或脚本语言对A,B,C文件进行读取
1）分别获取相应的文件名；
2）分别分别对相应的文件内容进行读取，解析，获取http地址（如：http://www.oin.com:8086/?id=16696073S9|18129-02=rzrfs-3-201606290653047）；
2）分别对相应的文件内容进行读取，解析，截取相应的数据（每个http路径最后面这个日期，如：201606290653047）；
3）把文件名、文件名中对应的HTTP地址、每条http地址后面截取的时间作为数据库字段插入数据库。

解决方案

新手解答：

public class Test {
	public static void main(String[] args) {
		File f = new File("src/txt"); //相对路径
		File[] children = f.listFiles();//取得全部子文件
		for(int i=0; i<children.length; i++) {
			File child = children[i];
			String fileName = child.getName();//得到文件名
			List<Info> list = new ArrayList<>();
			if(child.isFile()) {		//判断能否是标准文件(可以改换正则匹配)
				try {
					list = getInfoList(child);//取得文件中的信息集合
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
			System.out.println(fileName);
			for(int j=0; j<list.size(); j++) {
				Info info = list.get(j);
				System.out.println("path=" + info.getPath() + "--date=" + info.getDate());
			}
			System.out.println("--next file--");
		}
	}

	/**
	 * 读取文件返回信息集合
	 * @param file 文件对象
	 * @return
	 * @throws IOException
	 */
	public static List<Info> getInfoList(File file) throws IOException {
		List<Info> list = new ArrayList<>();
		BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
		String line = "";
		while((line = br.readLine()) != null) {
			int index = line.indexOf("?");
			String path = line.substring(0, index - 1);//根据？截子串
			int end = line.lastIndexOf("-");
			String date = line.substring(end + 1);//根据-截子串
			Info info = new Info();
			info.setPath(path);
			info.setDate(date);
			list.add(info);
		}
		return list;
	}
}
class Info {

	//地址
	private String path;
	//时间
	private String date;
	public String getPath() {
		return path;
	}
	public void setPath(String path) {
		this.path = path;
	}
	public String getDate() {
		return date;
	}
	public void setDate(String date) {
		this.date = date;
	}
}

提供思路
1、获取文件名这个不难
2、文件信息按行读取，信息存集合
3、截取日期时，获取最后的一个”-” 到末尾，就是你的日期
4、截取的数据入库