Hadoop MapReduce Task Log 无法查看syslog问题

Neilyo 发布于2019-04-25 17:14 / 3198人阅读

摘要：现象由于多个共用一个，所以只输出了一组文件通过获取的日志时，会出现无法获取原因方法尝试将参数加上，可以访问到但去掉就不行。问题解决类似一样，先从获取日志目录获取，从中判断是否有。查询时加入在上加入就可以看到不需要修改代码。

现象：

由于多个map task共用一个JVM，所以只输出了一组log文件

datanode01:/data/Hadoop-x.x.x/logs/userlogs$ ls -R
.:
attempt_201211220735_0001_m_000000_0 attempt_201211220735_0001_m_000002_0 attempt_201211220735_0001_m_000005_0
attempt_201211220735_0001_m_000001_0 attempt_201211220735_0001_m_000003_0
./attempt_201211220735_0001_m_000000_0:
log.index
./attempt_201211220735_0001_m_000001_0:
log.index
./attempt_201211220735_0001_m_000002_0:
log.index stderr stdout syslog

通过http://xxxxxxxx:50060/tasklog?attemptid= attempt_201211220735_0001_m_000000_0 获取task的日志时，会出现syslog无法获取

原因：

1.TaskLogServlet.doGet()方法

if (filter == null) {
printTaskLog(response, out, attemptId,start, end, plainText,
TaskLog.LogName.STDOUT,isCleanup);
printTaskLog(response, out, attemptId,start, end, plainText,
TaskLog.LogName.STDERR,isCleanup);
if(haveTaskLog(attemptId, isCleanup, TaskLog.LogName.SYSLOG)) {
printTaskLog(response, out,attemptId, start, end, plainText,
TaskLog.LogName.SYSLOG,isCleanup);
}
if(haveTaskLog(attemptId, isCleanup, TaskLog.LogName.DEBUGOUT)) {
printTaskLog(response, out,attemptId, start, end, plainText,
TaskLog.LogName.DEBUGOUT, isCleanup);
}
if(haveTaskLog(attemptId, isCleanup, TaskLog.LogName.PROFILE)) {
printTaskLog(response, out,attemptId, start, end, plainText,
TaskLog.LogName.PROFILE,isCleanup);
}
} else {
printTaskLog(response, out, attemptId,start, end, plainText, filter,
isCleanup);
}

尝试将filter=SYSLOG参数加上，可以访问到syslog,但去掉就不行。

看了代码多了一行

haveTaskLog(attemptId, isCleanup,TaskLog.LogName.SYSLOG)

判断，跟进代码发现，检查的是原来

attempt_201211220735_0001_m_000000_0目录下是否有syslog文件？

而不是从log.index找location看是否有syslog文件，一个bug出现了！

2.TaskLogServlet. printTaskLog方法

获取日志文件时会从log.index读取。

InputStreamtaskLogReader =
new TaskLog.Reader(taskId,filter, start, end, isCleanup);
TaskLog.Reader
public Reader(TaskAttemptIDtaskid, LogName kind,
long start,long end, boolean isCleanup) throwsIOException {
// find the right log file
MapallFilesDetails =
getAllLogsFileDetails(taskid, isCleanup);

static Map getAllLogsFileDetails(
TaskAttemptID taskid, booleanisCleanup) throws IOException {

MapallLogsFileDetails =
newHashMap();

File indexFile = getIndexFile(taskid,isCleanup);
BufferedReader fis;
try {
fis = newBufferedReader(new InputStreamReader(
SecureIOUtils.openForRead(indexFile,obtainLogDirOwner(taskid))));
} catch(FileNotFoundException ex) {
LOG.warn("Index file for the log of " + taskid + " does not exist.");

//Assume no task reuse is used and files exist on attemptdir
StringBuffer input = newStringBuffer();
input.append(LogFileDetail.LOCATION
+ getAttemptDir(taskid,isCleanup) + " ");
for(LogName logName : LOGS_TRACKED_BY_INDEX_FILES) {
input.append(logName + ":0 -1 ");
}
fis = newBufferedReader(new StringReader(input.toString()));
}
………………….

问题解决：

类似getAllLogsFileDetails一样，先从log.index获取日志目录获取logdir，

File indexFile = getIndexFile(taskid,isCleanup);
BufferedReader fis;
try {
fis = newBufferedReader(new InputStreamReader(
SecureIOUtils.openForRead(indexFile,obtainLogDirOwner(taskid))));
} catch(FileNotFoundException ex) {
LOG.warn("Index file for the log of " + taskid + " does not exist.");

//Assume no task reuse is used and files exist on attemptdir
StringBuffer input = newStringBuffer();
input.append(LogFileDetail.LOCATION
+ getAttemptDir(taskid,isCleanup) + " ");
for(LogName logName : LOGS_TRACKED_BY_INDEX_FILES) {
input.append(logName + ":0 -1 ");
}
fis = newBufferedReader(new StringReader(input.toString()));
}
String str = fis.readLine();
if (str== null) { //thefile doesn"t have anything
throw newIOException ("Index file for the log of " + taskid+"is empty.");
}
String loc =str.substring(str.indexOf(LogFileDetail.LOCATION)+
LogFileDetail.LOCATION.length());

从logdir中判断是否有syslog。

Workaround:

查询时加入在url上加入filter=SYSLOG就可以看到,不需要修改代码。

混合云服务器托管 hadoop mapreduce hadoop mapreduce编程 hadoop mapreduce hbase mapreduce和hadoop

文章版权归作者所有，未经允许请勿转载,若此文章存在违规行为，您可以联系管理员删除。

转载请注明本文地址：https://www.ucloud.cn/yun/3849.html

#yyds干货盘点#Hadoop企业级生产调优手册(二)

摘要：上节企业级生产调优手册一五存储优化注演示纠删码和异构存储需要一共台虚拟机。引入了纠删码，采用计算的方式，可以节省约左右的存储空间。纠删码案例实操纠删码策略是给具体一个路径设置。集群启动完成后，自动退出安全模式。超过说明有异常。上节：Hadoop企业级生产调优手册(一)五、HDFS存储优化注：演示纠删码和异构存...

番茄西红柿 2021-11-29 10:50 评论0 收藏2637
Hadoop 新 MapReduce 框架 Yarn 详解

摘要：同时监视当前机器的运行状况。上图虚线箭头就是表示消息的发送接收的过程。负责一个生命周期内的所有工作，类似老的框架中。对于资源的表示以内存为单位在目前版本的中，没有考虑的占用，比之前以剩余数目更合理。是为了将来作资源隔离而提出的一个框架。 Hadoop MapReduceV2(Yarn) 框架简介原 Hadoop MapReduce 框架的问题对于业界的大数据存储及分布式处理系...

RyanQ 2019-04-25 17:13 评论0 收藏0
Hadoop关于处理大量小文件的问题和解决方法

摘要：不仅如此，并不是为了有效的处理大量小文件而存在的。对于第一种情况，文件是由许许多多的组成的，那么可以通过件邪行的调用的方法和方法结合使用来解决。提供了一些选择是在版本中引入的，它的出现就是为了缓解大量小文件消耗内存的问题。小文件指的是那些size比HDFS的block size(默认64M)小的多的文件。如果在HDFS中存储小文件，那么在HDFS中肯定会含有许许多多这样的小文件(不然就不...

wua_wua2012 2019-04-25 17:12 评论0 收藏0
Hadoop入门

摘要：定义和，可选，将每行输入文件的内容转换为类供函数使用，不定义时默认为。函数，创建，定义，，和输入输出文件目录，最后把提交給，等待结束。第步增加不输入密码即可登陆。格式化，执行启动执行在本机启动现在将待查找的文件放入。一、概论作为Hadoop程序员，他要做的事情就是： 1、定义Mapper，处理输入的Key-Value对，输出中间结果。2、定义Reducer，可选，对中间结果进行规约，输出...

maochunguang 2019-04-25 17:01 评论0 收藏0
大数据技术Hadoop面试题,看看你能答对多少？答案在后面

摘要：下列哪个是运行的模式答案单机版伪分布式分布式提供哪几种安装的方法答案判断题不仅可以进行监控，也可以进行告警。但是在预警以及发生事件后通知用户上并不擅长。错误分析一旦节点宕机，数据恢复是一个难题命令用于检测损坏块。单项选择题1. 下面哪个程序负责 HDFS 数据存储。a)NameNodeb)Jobtrackerc)Datanoded)secondaryNameNodee)tasktracke...

JerryC 2019-04-25 17:18 评论0 收藏0